Just how many languages are there?

Given that we work in an industry that is entirely about language, it’s amazing how little time we spend actually talking and thinking about language itself. Not about workflows and automation and cost-reduction and efficiencies, but just language. This fact first occurred to me some years ago after a brief exchange with a senior operations manager at a major language services provider. Somehow we had gotten onto the topic of the maximum number of languages a certain client might potentially need and she commented, “Well, there are probably only around, what, maybe 200 total languages in the whole world?” I noted that in fact there are well over 6000, which earned me a look of incredulity and a raised eyebrow that suggested I had perhaps been spiking my morning coffee.

So let’s take this seemingly straightforward question of the total number of languages as an opportunity to talk about language as a topic on its own.

To be fair to that ops manager, it is very easy to believe that the number of languages could be as low as a couple of hundred. After all, according to one study, 80% of the buying power of the Web population can be reached with just 10 languages.* Upping that to 90% requires just six more languages, and 95% just seven beyond that. Just 14 more and you have reached 99%. So with 37 of the world’s 6000-odd languages, you have reached very far down the long tail, leaving the other thousands of languages languishing in relative obscurity.

What about those other 6000+ languages? Well, first of all, it’s next to impossible to be certain about the numbers. Your count depends on a number of factors, not least of which is the very definition of what constitutes a language. Some criteria are easy enough, e.g. living v dead (e.g. French counting as a language in our tally, but an ancient language like Latin being omitted). But others are less straightforward. For example, one of the thorniest issues is determining if a given language is in fact a distinct language on its own, or merely a dialect of another language. For example, do you say that Columbian Spanish is a language or a dialect of a generic language we call Spanish? I think most linguists agree it’s the latter.

But what about, say, Norwegian v Danish v Swedish? That’s a bit trickier: they are far more distinct from each other than Columbian Spanish is from Iberian Spanish, but to a very real extent, they are mutually intelligible. When I was studying in Norway, for example, my textbooks were almost as likely to be Danish or Swedish as they were to be Norwegian. And there are some dialects of spoken Norwegian that I found harder to understand than Danish. And for that matter, even on the written level, I found Danish far less challenging than the Nynorsk form of written Norwegian (since I had only studied the Bokmål form). So are all those forms of writing and all those spoken dialects counted as languages or just to be considered variants of a common ‘Scandinavian’? And that in turn opens a whole new can of worms: the definition of language can become a very personal and even political and cultural question. Many a Norwegian, for example, might find it downright offensive to hear you say that his language is nothing more than a dialect of a somewhat artificial construct called ‘Scandinavian’.**

Another tricky example is Chinese. Calling Chinese one language is not at all practical really, despite the fact that most people still do. The temptation to talk about ‘Chinese’ as a single language stems from the fact that everyone who speaks variants of it, all write in just two variants (Simplified or Traditional script). The fact that Chinese script isn’t a phonetic system employing an alphabet, makes this trap hard for Westerners to appreciate, because their reasoning is that if all those languages didn’t sound more or less the same, they couldn’t be limited to just the two written variants. But when you stop thinking in terms of written language being a representation of the sounds a language contains, you suddenly realize that you can have many, many very differentiated spoken languages represented by even a single written form. When a character represents an idea without much phonetic information being imparted, I could use the same character for ‘bike’, even though in two spoken languages employing that script, the sounds being uttered for ‘bike’ are nothing alike. And indeed, that is the case: several spoken versions of ‘Chinese’ are in fact mutually unintelligible, even though they employ the same script and can thus be mutually intelligible on the written level, something to keep in mind the next time you’re tempted to say, ‘Let’s record some audio for that Simplified Chinese script’.

So we started out with what seemed like a straightforward question: how many languages are there? But we end without a real answer, because the counting is in the eye of the beholder. And with that we have yet another example of the beautiful complexity that is language!

*They are English, S. Chinese, Spanish, Japanese, German, French, Portuguese, Russian, Arabic and Korean

**And for the record, almost everyone considers these Scandinavian ones to be separate languages… and I do not recommend disputing that assertion when sharing a beer with a Norwegian, Dane or Swede!


Leave a Reply