I’ve decided to open this blog with a tutorial on how to construct an artificial language (also called a conlang). I’ve been creating conlangs for several years now, and I feel that I have a lot of advice to share. This will be the first in a series of articles about creating conlangs.

Since I’m a worldbuilder, I will be focusing on conlangs that aim to resemble a natural language. So most of my advice may not apply to other kinds of conlang.

The first step in creating a conlang is to determine the collection of sounds it will use, which is called a phonology. This article will focus on consonants.


A common mistake that beginners make is not using the IPA. The IPA (short for International Phonetic Alphabet) is a precise method for transcribing phonetic information. English spelling does not consistently represent how a word is pronounced. Anyone who knows IPA would understand how to pronounce a word from its IPA transcription. For learning IPA symbols, I recommend you use this Interactive IPA Chart so you can familiarise yourself with the sounds.

Using a romanisation system for your language is fine, but it should be clearly defined by the IPA.

You may have noticed that most of the IPA symbols are not on your keyboard. I recommend this website TypeIt which allows you to type IPA symbols in your browser. There is also an app version for windows for use off-line.

Also, it is important to know the difference between broad transcription and narrow transcription. Broad transcription, which is written between slashes, only transcribes any necessary phonetic information. While narrow transcription, which is written between square brackets, represents precise phonetic information. For example, the word ‘IPA’ in broad transcription is /aɪ piː eɪ/, while in narrow transcription it’s [aɪ pʰiː eɪ]. Note that in narrow transcription the ‘p’ has a superscript ‘h’ after it. This represents aspiration, which I will explain in more detail in the VOT continuum section. Since we don’t distinguish the difference between aspirated ‘p’ and unaspirated ‘p’ it is not represented in broad transcription. A group of closely related sounds that are treated the same is called a phoneme.

Starting with English

In this section, I will go through English’s consonant inventory so I can introduce some IPA symbols and the terminology that describes how sounds are produced.

Let’s start by looking at English’s consonant inventory:

Nasalm n  ŋ 
Plosivep,b t,d  k,g 
Affricate   t͡ʃ, d͡ʒ   
Fricativef,vθ,ðs,zʃ,ʒ  h
Approximantw  ɹ̠j  

As you can see, I’ve arranged English’s consonants into a table with the columns representing places of articulation and the rows representing manners of articulation.

Obviously, how IPA symbols are used to represent English’s sounds is not the same as English spelling. So I’ll list the differences here:

  • Most of the letters we use in English spelling are what they usually sound like. The big exception is /j/, which is the ‘y’ sound in ‘year’.
  • /θ/ and /ð/ are the ‘th’ sounds. /θ/ is the ‘th’ in ‘thin’, while /ð/ is the ‘th’ in ‘the’. Yes, they are two different sounds!
  • /t͡ʃ/, /d͡ʒ/, /ʃ/, and /ɹ̠/ are the ‘ch’, ‘j’, ‘sh’, and ‘r’ sounds respectively.
  • /ŋ/ is the ‘ng’ sound in ‘sing’ and the ‘n’ sound in ‘thank’.
  • /ʒ/ is the ‘s’ sound in ‘measure’ or ‘Asia’.

Place of Articulation

The columns represent where a sound is produced, which is called its place of articulation. Typically these are listed in order of their relative location along the vocal tract. Here I will explain the places of articulation mentioned above:

  • To save space in the table above, I’ve grouped several places of articulation under the term ‘labial’. This refers to any sound pronounced with the lips.
    • /m/, /p/ and /b/ are bilabial, which means they are articulated with both lips.
    • /f/ and /v/ are labiodental, which means they are articulated with lower lip touching the upper teeth.
    • /w/ is labio-velar, which means the back of the tongue is touching the soft-palate (see below) while the lips are rounded.
  • Dental: when the tongue is touching the upper teeth.
  • Alveolar: when the tongue is touching the alveolar ridge. The alveolar ridge is the bumpy bit of gum above the inside of the upper teeth.
  • Post-alveolar: when the front of the tongue is touching the part of hard palate near the alveolar ridge.
  • Palatal: when the back of the tongue is touching the centre of the hard palate.
  • Velar: when the tongue is touching the velum. This is the soft part of the roof of the mouth, which is behind the hard palate. It’s also known as the soft palate.
  • Glottal: these are sounds that are articulated using the glottis in the throat.

These can be categorised into some more general terms which are useful to know:

  • Labial: Any articulation pronounced with the lips.
  • Coronal: Any articulation that uses the flexible front part of the tongue.
  • Dorsal: Any articulation that uses the back of the tongue.
  • Laryngeal: Any articulation that occurs in the larynx, also known as the voice box.

Manner of Articulation

The rows represent how a sound is produced, which is called its manner of articulation.

  • Nasal: which is when the articulators in the mouth are completely closed and the air is redirected through the nose.
  • Stop: which is when the articulators close, allowing air to build up behind them, which is then released. They can also be called plosives.
  • Affricate: A stop that is released into a fricative. In IPA, this is represented with an arc above tieing a plosive and fricative together (eg. t͡ʃ).
  • Fricative: which is when the articulators create a tight gap for the air to flow through, causing turbulence.
  • Approximant: which is like a fricative, but the gap is looser so there is no turbulence.
  • Lateral: which is when the articulators are closed in the middle leaving gaps on the sides for air to flow through. This is only possible with articulations that involve the tongue. This can be applied to affricates, fricatives, approximants and the release of plosives.

There are also taps, flaps, and trills; which I will talk about more in the rhotic section.

You can divide the manners of articulation into two useful categories:

  • Obstruents: consonants that obstruct the air stream. This includes stops, affricates, and fricatives.
  • Sonorants: Any sound with a constant air stream with no turbulence. This includes nasals, approximants, taps, flaps, trills and even vowels.


Some phonemes are grouped in pairs. These phonemes have the same place and manner of articulation but differ in whether they are voiced or unvoiced. This is when the vocal folds, located in the voice box, vibrate during articulation. The phoneme on the left is unvoiced, and the phoneme on the right is voiced.

Modifying English

A good approach to start creating a phonology is to tweak English’s phonology. This will help you to create a phonology that will feel natural and relatively easy for you and other English speakers to understand.

One important thing to remember is that your accent may differ from what I described above. I recommend looking up the phonology of your accent knowing how your accent differs will give you a unique starting point.

A good place to start is to remove the dental fricatives: /θ/ and /ð/. These sounds are relatively rare across the world, and they’re not even found in all English accents. Only keep them if you really want them. If you do, I highly recommend that you remove similar sounding phonemes instead, such as labiodental fricatives.

When adding or removing sounds, its good to think in features. For example: if you want to get rid of /ð/ but want to keep /θ/, it would make sense to get rid of all voiced fricatives. Although having a few irregularities isn’t a problem, if you make it too asymmetrical it will look unnatural and it will look like you’ve picked sounds at random from an IPA chart.

Adding Exotic Sounds

Removing English sounds means that your conlang will have fewer sounds than English. This is quite limiting and you may want to add some sounds that don’t occur in English.

Some people may feel that adding non-English sounds will make the pronunciation of their conlang difficult. This would only make speaking the language like a native difficult. As long as you don’t make distinctions between sounds that are too similar to each other, learners will approximate the exotic sounds with more familiar ones.

So here are some relatively easy to learn sounds that don’t usually occur in English:

  • The voiceless labio-velar approximant: [ʍ]. This occurs in some English accents and is represented by the ‘wh’ in the words ‘whale’, ‘white’, and ‘what’.
  • The voiceless velar fricative: [x]. This also occurs in some English accents and is the ‘ch’ sound in the word ‘loch’.
  • The labio-dental approximant: [ʋ]. This is pronounced like [w] but with the lower lip under the teeth.
  • The labio-dental nasal: [ɱ]. This is pronounced like a [m] but with the lower lip under the teeth.
  • The glottal stop: [ʔ]. This is the sound made when you ‘drop your Ts’ in-between vowels, or as linguists like to call it: T-glottalisation. Since this involves the closure of the glottis, there is no voiced equivalent.


Rhotics are sounds that don’t have any particular articulatory features in common but have a common R-like quality. I recommend only including one of them as would be difficult to distinguish from each other.

Some rhotics are approximants. Examples include the alveolar approximant [ɹ] and the retroflex approximant [ɻ]. Retroflex consonants are pronounced with the tip of the tongue pulled back so its underside is touching the roof of the mouth. A labialised retroflex approximant [ɻʷ] is found in some American, Irish and West-country accents.

The voiced uvular fricative [ʁ]. Uvular consonants are like velar consonants but with the tongue pulled even further back. Basically, to pronounce a voiced uvular fricative, try to produce an R-like sound with the tongue pulled back as far as possible.

Some rhotics are taps and flaps. The difference between the two is not important but they both essentially involve a momentary contact between the two articulators. Examples include the alveolar tap [ɾ], and the retroflex flap [ɽ]. Note that not all taps/flaps are rhotics, for example, the labiodental flap [ⱱ].

Finally, some rhotics are trills. A trill involves a powerful air stream that causes one of the articulators to rapidly hit the other articulator. Examples include the alveolar trill [r] and the uvular trill [ʀ]. Again, not all trills are rhotic, for example, the bilabial trill [ʙ]. I personally find trills very difficult to pronounce, except the bilabial trill, and yet the alveolar trill is more common.

Secondary Articulation

The form of secondary articulation that is the easiest to understand is labialisation, which involves rounding the lips. This is easy to do with velar consonants. They are represented in IPA by adding a superscript ‘w’: [kʷ],[gʷ], and [xʷ]. Note that secondary articulation is different from co-articulation: Co-articulation is two articulations in different places but of the same manner, while a consonant with a secondary articulation has a second articulation of a different manner and in a different place. So [w] and [ʍ] are co-articulated since both of their articulations are of the same manner, while [kʷ] has a primary velar plosive articulation and a secondary bilabial approximant articulation. Note that the articulations of [kʷ] are simultaneous, which differs from the sequence [kw] which is not simultaneous.

Other forms of secondary articulation include palatalisation and velarisation. In Russian, most consonants have a plain and palatalised version, called ‘hard’ and ‘soft’ respectively. Similarly, Irish makes a distinction between velarised consonants and palatalised consonants, called ‘broad’ and ‘slender’ respectively.

Non-Pulmonic Consonants

So far, I’ve been talking about consonants that use the lungs to generate an air stream. However, there are other ways to produce an air stream.

Ejectives utilise air trapped between a closed glottis and the articulators. By raising the voice box, the trapped air becomes pressurised. This pressure can be released like a plosive or passed through a tight gap like a fricative. Ejective affricates are also possible. Other manners of articulation do not provide enough closure for air pressure to be generated. It is also not possible to voice these consonants, as the air stream does not pass through the glottis. If you try to make a plosive, fricative or affricate without exhaling, then you will naturally produce these consonants.

Alternatively, the voice box can be moved downwards. With closed articulators, this creates lower pressure above the glottis. When the articulators are released, air will move into the mouth. These are called implosives. Unlike ejectives, implosives can be voiced since the glottis can be opened to allow air from the lungs to the lower-pressure space above.

Clicks utilise air in the mouth. By having the back of the tongue make contact with the soft palate and making a closure at another place of articulation; a pocket of pressurised air is created. This air is then released to cause a click sound. Like ejectives, air does not pass through the glottis so clicks cannot be voiced either.

VOT Continuum

Voice Onset Time is the duration of time between the release of a consonant and the initiation of voicing. If voicing occurs before release, then the VOT is negative; If voicing occurs after release then the VOT is positive. When a consonant has positive VOT, aspiration occurs. This is represented by a superscript ‘h’ in IPA. When a consonant has negative VOT, it’s voiced; and when a consonant has a VOT of zero it’s known as tenuis.

As VOT is a duration of time, this means that its a continuum. Different languages divide this continuum up in different ways. In English, the unvoiced plosives generally have a small positive VOT making them slightly aspirated. This contrasts with the voiced stops, which have a slightly negative VOT. This means that voicing is initiated about half-way between the closure and the release. In Mandarin Chinese, stops and affricates can be aspirated or unaspirated but never voiced. The aspirated consonants in Mandarin have a longer VOT than the English aspirated consonants. Some languages, such as Ancient Greek, make a three-way distinction between aspirated, unaspirated and voiced consonants. However, this would make it harder to learn.


I highly recommend this book: ‘A practical introduction to phonetics – J.C Catford’. I’ve learnt a lot from it. It contains many useful exercises explaining how to properly produce various sounds.

I know that is a very long article and its full of technical terms, but I hope that you have learnt something from it and didn’t get too confused!

The next part of this series will be on vowels, but the next post on this site might be something a bit different.


