How to Create a Romanisation System for Your Conlang

I like IPA a lot, and I always highly recommend using it when creating and describing your conlang’s phonology. However, most people don’t even know what the IPA is and they may interpret that first sentence rather differently. In some situations, it is more useful to have a system for writing words in the more familiar Latin alphabet, a romanisation system.

I like to point out that I will using the term romanisation here and not orthography. Orthography refers to how a language is represented in any given writing system. While this includes romanisation, it also includes other writing systems such as native writing systems. This post is specifically about how to represent words from your conlang in the Latin alphabet. All my advice here is aimed at an English speaking audience.

First of all, you should decide on what kind of Romanisation system you should use. There are two main kinds:

Transliteration, which is when each letter (or combination of letters) corresponds to a character in the language’s native script. A good example of this is Nihon-Shiki for Japanese, where each syllable strictly corresponds to a character in the Kana writing system and does not necessarily reflect Japanese pronunciation.
Transcription is when the romanisation attempts to reflect the pronunciation of a word rather than its native spelling. This can be divided into two subtypes:
- Phonemic where only necessary distinctions are made, like broad transcription IPA.
- Phonetic which transcribes as much phonetic information as possible, like narrow transcription IPA.

If you intend to use it in a book (for character names, place names, etc.) then I recommend you aim for a phonemic transcription, so it is as intuitive for English speakers as possible. This is what I do most of the time, and so most of my advice in the rest of this post will apply to that kind of romanisation system.

Don’t expect casual readers to pronounce your conlang’s words perfectly like a hypothetical native speaker. I don’t think that’s possible. Instead, aim for a romanisation that will help readers pronounce words as if they were reasonably anglicised. For example, with pinyin (a Chinese Mandarin romanisation system) I cringe when people pronounce the q as a sort of /k/ sound. In Mandarin, it is pronounced /t͡ɕʰ/; so I feel that a better way of anglicising it would be /t͡ʃ/. To be fair though, I think pinyin is more about efficiency and learnability and not intuition for English speakers, so I can forgive the odd choice of letters.

This does bring up an interesting issue on how to represent non-English sounds. The simplest way is to use a letter that gives the closest approximation. But sometimes there are not enough letters in the Latin alphabet, so that’s where digraphs and diacritic marks come in.

Digraphs

A digraph is when a combination of two letters is used to represent a single sound. Common examples of these in English are: sh, ch, th and ng. This is an easy and intuitive way to represent your conlang’s more exotic phonemes.

You may have noticed that a lot of digraphs contain the letter h. This has a number of uses:

It can suggest a fricative. For example, th suggests a sound similar to /t/ but is a fricative ie. a non-sibilant coronal frictive such as /θ/. Ph is good if f is taken for some reason, for instance, if your conlang contrasts /ɸ/ and /f/. Kh could be used for a velar fricative, /x/, but English speakers would end up pronouncing it /k/. Alternatively, ch could be used for /x/ as well (like how some English speakers would pronounce it in the word loch), but is more likely to be interpreted as /t͡ʃ/ (I prefer the former method). You could also logically extend this idea so that gh represents /ɣ/ and dh represents /ð/.
It can also be used to suggest aspiration. The digraphs ph, th and kh (alternatively ch) come from the transliteration of the Ancient Greek letters ɸ, θ and χ respectively, where they were originally pronounced as aspirated stops. I only recommend using these if your conlang has a three-way distinction between aspirated, unaspirated and voiced consonants like in Ancient Greek. Do not use these if your conlang only distinguishes aspirated and unaspirated consonants, instead use the voiced letters (b,d,g) for unaspirated consonants and the voiceless letters (p,t,k) for aspirated consonants, like in Icelandic and Pinyin.
When preceded by a voiced consonant, it can be used to suggest a breathy voice. For example, bh represents /bʱ/.
When preceded by a sonorant, it can be used to suggest voicelessness. Like in welsh where rh represents /r̥/, or like how some English speakers pronounce wh /ʍ/.
It can also suggest a post-alveolar consonant, such as sh /ʃ/ or ch /t͡ʃ/.

Digraphs are also useful for affricates. For the affricates in my conlang Nìmpyèshiu, I use pf /p̪͡f/, bv /b̪͡v/, ts /t͡s/, dz /d͡z/, ch /t͡ɕ/, j /d͡ʑ/. For the first four affricates here, I have used the letters that represent the stop and fricative components of that affricate. Although the letters v and z are not used on their own, I chose to have the b and d precede them anyway so it is more intuitive. Pinyin does the opposite of this by having z represent /dz/, which is more efficient but less intuitive. For the alveolo-palatal affricates, however, I used ch and j instead of something like tsh and dzh, as the former is just as intuitive and yet more efficient.

For vowels, digraphs only really work for diphthongs. I recommend marking other features with diacritic marks. With diphthongs that end in semivowel, using y and w can affect the pronunciation of the preceding vowel, so I recommend you use i and u instead. For example: ai can suggest /aj/, while ay would probably be mispronounced as /eɪ/.

For geminate consonants, double letters are the obvious choice. But if you want to germinate a consonant that is represented by a digraph, then I recommend doubling just one of the letters rather than both of them. Dothraki just doubles the first letter, eg. ssh for /ʃː/. However, I don’t see anything wrong with doubling the second letter instead, if that would help remove any potential ambiguity.

However, doubling vowels for long vowels doesn’t work as well. It’s fine to use ii, uu, and aa (I’ve used ii and aa in Nìmpyèshiu); but ee and oo are going to be mispronounced as /i/ and /u/ respectively.

Diacritic marks

Diacritic marks are addition markings added to a letter to modify its pronunciation. They are usually used on vowels but can sometimes occur on consonants. Diacritic marks usually don’t mean anything to most English speakers as they only occur optionally in loan words. For example, the accent in the word café helps us read it as /ˈkæfeɪ/ and not /keɪf/. So they can be used to hint that the marked letter is pronounced slightly differently from its usual pronunciation.

á and à – Accents

Accents are the most common kind of diacritic mark. There are two kinds: the acute accent, á, and the grave accent, à. They called ‘accents’ because they were used to mark pitch accent in Ancient Greek. In IPA the acute accent is used to represent a high tone, while the grave accent is used to represent a low tone. In Pinyin, however, they represent a rising tone and a falling tone respectively. They are also commonly used to mark stress in languages that have unpredictable or contrastive stress.

Accents can be used to extend the number of vowels. For example in Icelandic, the acute accent is used to distinguish vowels. Sometimes it indicates a more peripheral vowel: i /ɪ/ becomes í /i/, and u /ʏ/ becomes ú /u/; but with the other vowels, it indicates a diphthong: á /au/, é /jɛ/ and ó /ou/. Generally, acute accents are used to suggest more closed vowels, while grave accents suggest more open vowels. In french, distinguishes e /ə/ from é /e/ and è /ɛ/.

In some languages, such as Polish, it can be used to represent a palatalised consonant.

a̋ and ȁ – Double Accents

There are also double accents, a̋ and ȁ. In IPA these represent extra high and extra low tones respectively. Also, an acute double accent is used in Hungarian as a way of combining an acute accent with an umlaut, which is something that may be very useful as an alternative to ǘ. This could be extended to the grave accent as well.

â – Circumflex

The circumflex also originates as a way of marking pitch accent in ancient greek. It’s basically a combination of an acute accent and a grave accent, so in IPA it represents a falling tone. In other languages, it can be used for various other things including stress and length, but it lacks any specific meaning so can be applied to almost anything, including consonants.

ǎ – Caron

Carons are the opposite of circumflexes. It’s best to think of these as a grave accent followed by an acute accent. In IPA they represent a rising tone, but in pinyin, it represents a falling rising tone.

ã – Tilde

The tilde originated as a shorthand way of writing an n so that more words can be fitted on an expensive sheet of parchment. They ended up representing nasal vowels in the IPA as nasals usually cause preceding vowels to become nasalised. In Spanish, ñ ended up representing a palatal nasal, /ɲ/.

In IPA, a tilde underneath the letter can mark a creaky voice.

ä – Diaresis/Umlaut

This diacritic mark has two different names since it has two different functions. Diaresis originates from Ancient Greek and is also present in French. It indicates that there is a syllable break between the marked vowel and the preceding vowel. This even exists functionally in some English words (usually borrowed from Greek or French) such as the constellation Boötes /boʊ’oʊtiːz/.

The Umlaut has a completely different origin. It comes from German and was originally a small e written above a vowel. Over time it withered away until it coincidently resembled the diaresis mark. It usually represents vowels that have been fronted. For instance, in Icelandic ö represents an /œ/.

In IPA, two dots underneath a letter can indicate a breathy voice. I use this in Nìmpyèshiu to represent the high-breathy tone.

ā – Macron

Macrons were originally used in Greek and Latin to mark ‘heavy’ syllables. This later was more specifically used to mark long vowels in Latin (which usually are not marked and you just had to know if a vowel was long or short). So a common application of these is to mark long vowels.

Another application for macrons is tone: IPA uses it for mid-tones, while pinyin uses it for the high flat tone.

What not to do: Klingon

Let’s finnish by talking about the bad ways of making a romanisation system. What I think is one of the worst romanisation systems I have heard of is the one for Klingon. Basically, it uses the capital letters Q, H, S, D and I to extend the alphabet to accommodate its very unusual phonology. This may seem like a good idea at first, but in practice, there are a lot of problems.

First of all, the only upper-case/lower-case pair used is q /q/ and Q /q͡χ/. Arguably you could say the H is also necessary to distinguish from the h in the digraphs (yes, it uses digraphs too), although this can be fixed by separating two syllables with something (a hyphen for instance) to avoid ambiguity. S /ʂ/ and D /ɖ/ are probably used to show that there are retroflex rather than alveolar, but Klingon doesn’t have alveolars so this is just unnecessary information.

But it gets worse, as there is a capital I and but also a lower case l. This is terrible since these two letters are identical to each other in sans-serif fonts (such as Arial). The Wikipedia article for Klingon has to put Klingon words in an alternative font just for this reason. Again, there is absolutely no good reason for it to simply be a lower case I. The only reason why the I is not the lower case is that it represents /ɪ/, so the capital I is used because it resembles the IPA symbol. This is really silly because most English speakers would pronounce i as /ɪ/ most of the time anyway.

There is also the obvious issue that if you use capital letters to make distinctions, then you cannot capitalise words. If you are going to use names that romanisation system in a book for instance, then it might look a bit awkward. Also, I like to have THE BASIC OPTION TO CAPITALISE THINGS FOR EMPHASIS. For the same reason you shouldn’t use bold, italic or underlined letters as part of your romanisation either.

Afterword

Most of this post was my personal opinion and about how I about I approach romanisation. If you disagree at all or have a different approach, I would like to hear it.

One response to “How to Create a Romanisation System for Your Conlang”

Oqolaawak – A Conlang Review – Stephen Escher

September 20, 2020 at 10:50 pm

[…] To be fair, he seems to regret this and says he will fix it in the future; and that’s great, but I would still like to give some advice to anyone having similar problems. When romanising diphthongs, it’s better for each component of the diphthong to match the monophthongs. For example, ao may seem quite intuitive but it suggests [ao] and not [aw]. Intuition for English speakers is great, but vowels and diphthongs are a mess in English spelling, so it’s best to keep things simple and logically consistent. Secondly, transcribing semivowels with y or w in the coda of a syllable can mess up the pronunciation of the preceding vowel, for example ay suggests /ej/ and not /aj/. To avoid this, follow this general rule: If a semivowel is followed by a vowel then spell it y or w, if it is not followed by a vowel then spell it i or u respectively. For more advice, see my post on romanisation. […]

LikeLike