Procedurally generating a galaxy's worth of names
Background
I am investigating the practical utility and limitations of a procedural-generation-based naming scheme for stars and other notable or significant interstellar structures (e.g. nebulae, globular clusters) in a Milky Way-like galaxy, for use by an expansive interstellar civilization. Hundreds of billions of names, in other words. More specifically, however, a naming system based on "“ and communicating "“ key information about the objects it describes.
Preliminary assumptions and findings
Phonemic inventory
25 consonants and 15 vowels. Comparable to English, more or less.
Note: These are phones, not letters.
Syllable structure
(C)V; C for consonant (optional onset), V for vowel (mandatory nucleus).
Limitations
I am hesitant to put hard limits on how long a name can be before it becomes impractical, particularly when short forms would inevitably be adopted for the most frequently used names.
Taking a cue from the world of taxonomy, the longest taxonomic name appears to be in the realm of 18 syllables (not including "subspecies" or similar designations). The average word in English is just over six letters long, factoring frequency of use, so that's a fair amount of wiggle room.
Permutations
On first pass it appears to be achievable, with nearly seven and a half trillion unique names from just five syllables:
\begin{array} {|l|c|c|r|} \hline Syllables &Count &C^x\times V^y &Combinations \\ \hline CV &1 &25^1\times15^1 &375 \\ \hline VCV &2 &25^1\times15^2 &5,625 \\ \hline CVCV &2 &25^2\times15^2 &140,625 \\ \hline VCVCV &3 &25^2\times15^3 &2,109,375 \\ \hline CVCVCV &3 &25^3\times15^3 &52,734,375 \\ \hline VCVCVCV &4 &25^3\times15^4 &791,015,625 \\ \hline CVCVCVCV &4 &25^4\times15^4 &19,775,390,625\\ \hline VCVCVCVCV &5 &25^4\times15^5 &296,630,859,375\\ \hline CVCVCVCVCV &5 &25^5\times15^5 &7,415,771,484,375 \\ \hline \end{array}
This would seem to be enough, but I'm not sure it would allow for encoding information without producing ambiguous or duplicate names, or if such encoding would eliminate too many possible combinations. Six or more syllables and multi-word names are, of course, permitted.
Information density
As above, ideally such a system will communicate useful information about the body or object, perhaps including but not limited to:
Category of object "“ e.g. star, distinct from nebula, distinct from globular cluster, etc.
Class or type within category "“ e.g. Class K star, distinct from Class F; supernova remnant, distinct from planetary nebula; etc. Level of precision here will depend on the category of object.
Location "“ I'm unsure how granular this needs to be to be useful, but I suspect the general location is probably more useful in some cases, and easier to encode than precise coordinates. Duplication of names could be permitted with a convention for distinguishing locations, depending on referent (e.g. quadrant, arm, distance from core or home world, etc.).
Problems and considerations
Frequency: by some estimates three quarters of all stars in the galaxy are red dwarfs, meaning a high degree of information granularity and/or sophisticated encoding methods are needed to avoid most stars having very similar names, but this granularity or sophistication would be unnecessary for much rarer stars. In a galaxy of 400 billion stars five syllables (VCVCVCVCV) would be consumed "“ nearly 300 billion names, each of them differing from its neighbours on the list by just a single letter "“ just by red dwarfs, and those names would not include any other information about the individual stars. At the other end of the spectrum, the rarer an object is the shorter its name could be, potentially consuming all the shorter name spaces with rare objects rarely talked about or referenced.
Proximity: dozens or hundreds of red dwarfs in close proximity would all have nearly the same name without higher granularity of location encoding. Similarly, if higher location granularity constitutes a significant portion of the name, objects of different categories or types may all have similar names due to their location.
Higher granularity of any sort may translate into impractically long names, and long names with minor differences that could escape notice or cause confusion.
Potentially any possible combination of phonotactical rules could be considered so far as they do not contradict and are somewhat easily decrypted. Encoding methods need not be consistent from category to category or within categories.
As per our existing naming scheme, multiple gravity-bound objects might be distinguished by a second (or higher) order designation; e.g. Alpha Centauri vs. Beta Centauri (two different trinary systems distinguished by brightness), and Alpha Centauri A, B and C (three stars within the Alpha Centauri system). How this intersects with the above encoding needs to be resolved.
-
Convention may allow for exceptions, including but not limited to:
Relationship to other objects. Drawing from the point above, the most massive star in a multi-star system may lend its name to all stars within that system, overwriting the encoding in their names and demoting them to an affix, e.g. (DominantStarName) (∅; class encoded in name), (DominantStarName) (ClassK), (DominantStarName) (ClassM). This is only a partial solution as two or more stars in a system may be the same class.
Significance itself. Objects deemed to be of little to no importance may be relegated to a separate naming scheme that ignores or encodes information differently. E.g. the nearly invisible wisps of a disintegrating supernova remnant on the far side of the galaxy might be named in such a way to indicate it is a nebula-category object, then given an index number instead of further level of detail. Whatever the convention used, the naming scheme would have to allow for any object to be promoted from or demoted to this status, so a numeric scheme is not in itself a solution.
The (C)V syllable structure could be reconsidered, allowing (C)V(C), (C(CL))V, (C(CL))V(C), or even (C(CL))V((CL)C). (L for Liquid, which I will consider "r", "l", "y", and "w".) Compare the combinations per single syllable:
\begin{array} {|l|c|c|r|} \hline Syllables &Count &C^x\times C_L^y\times V^z &Combinations \\ \hline CV &1 &25^1\times 4^0\times 15^1 &375 \\ \hline CC_LV &1 &25^1\times 4^1\times 15^1 &1,500 \\ \hline CVC &1 &25^2\times 4^0\times 15^1 &9,375 \\ \hline CC_LVC &1 &25^2\times 4^1\times 15^1 &37,500 \\ \hline CC_LVC_LC &1 &25^2\times 4^2\times 15^1 &150,000 \\ \hline \end{array}
Note: This does not exclude onsets like "rr", "yl", "wl", "ww", etc.
- As noted in the comments, vowel clusters (not to be confused with diphthongs or triphthongs; each vowel phone is pronounced) would also be valid in (C)V, so in addition to the ~300 billion of VCVCVCVCV we'd have another ~12 billion:
\begin{array} {|l|c|c|r|} \hline Syllables &Count &C^x\times V^y &Combinations \\ \hline VCVCVCVV &5 &25^3\times 15^5 &11,865,234,375 \\ \hline VCVCVVV &5 &25^2\times 15^5 &474,609,375 \\ \hline VCVVVV &5 &25^1\times 15^5 &18,984,375 \\ \hline VVVVV &5 &25^0\times 15^5 &759,375 \\ \hline \end{array}
That is orders of magnitude more name space, potentially, though it would require more finessing to ensure pronounceability.
-
Taking some of the above to an extreme, perhaps the solution is to externalize nearly everything about an object in the form of category, and retain a minimal core necessary to "name" it:
[Category of Object] [Subcategory within Category] [Type within Subcategory] [Idiosyncratic Core Name] [Location Information]... Some of these could be quite short, even single syllables.
This might sidestep some issues but 300 billion red dwarfs may still require hundreds of millions of names being duplicated hundreds of times.
Question
Can this idea be made to work (and if so, how), or are the numbers simply against it?
This post was sourced from https://worldbuilding.stackexchange.com/q/171639. It is licensed under CC BY-SA 4.0.
0 comment threads