Language origins

Ruhlen, Merritt

The question of language origins may be construed in two quite different ways that are, unfortunately, often confused. On the one hand we may speak of the origin of the language capacity, that is, the universal human capacity to learn and use the language of the culture in which one is raised. While it seems likely that our hominid ancestors such as Eve(1), and much earlier Lucy, probably had linguistic abilities intermediate between those of modern humans and chimpanzees, nothing is really known about the linguistic abilities of such human precursors.

The second sense of language origins, which I will focus on in this article, has to do with the origin of the roughly 5,000 languages currently spoken around the world. Where did these languages come from? In a few instances we actually have a historical record of the origin of certain languages. Thus we are able to trace the various Romance languages–Rumanian, Italian, French, Catalan, Spanish, Portuguese–back through the historical record to their common source, the Latin spoken in Rome some two millennia ago. But even without such historical attestation, we could still recognize that these particular languages share a common origin by the simple fact that they all share resemblant words that are not shared by other languages.

For example, the word for hand in the Romance languages generally looks something like MAN-: Rumanian mina, Italian mano, French main, Catalan ma, Spanish mano, Portuguese mao. We do not need to know that all of these forms derive from the Latin word manus (hand), to recognize that they are similar to one another and contrast sharply with the word for hand in other languages, for example, English hand, Russian ruka, or Japanese te. Furthermore, comparing additional words would reveal that the Romance languages share many such similar words that set them off as a group from other languages. Such historically related languages are called a language family, and the modern descendants of a single original word–the various Romance words for hand in our example–are called cognates.

This evolutionary explanation provides a satisfactory answer to the question of the origin of Spanish, French, or Rumanian, but, at the same time, it raises a further question: What is the origin of the Latin language that gave rise to the various Romance idioms? In other words, where did Latin come from? An English jurist serving in India, Sir William Jones, first gave the answer to this question in 1786 when he observed that the similarities among Latin, (Classical) Greek, and Sanskrit were so numerous, and so precise, that “no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists.”

Indo-European and Proto-Indo-European

During the nineteenth century his even more ancient family that Jones had identified–but not named–came to be known as Indo-European. The Romance languages constitute one branch; other branches include Germanic (English, German, Swedish), Slavic (Russian, Polish, Serbo-Croatian), Baltic (Latvian, Lithuanian), Celtic (Irish, Welsh), Albanian, Greek, Iranian (Farsi), and Indic (Hindi, Bengali). Two branches are extinct: Anatolian (Hittite), spoken 4,000 years ago in present-day Turkey, and Tocharian, spoken in western China in the first millennium A.D. That all of these languages (and many others of course) derive from a single earlier language–Proto-Indo-European–is a fact accepted by all historical linguists. Surprisingly, where this language was actually spoken, and when, remain hotly debated topics to the present day, with the Ukraine at 4,000 B.C. or Anatolia at 7,000 B.C. the favored locations.

Virtually all European languages–and many of those in South Asia–belong to the Indo-European–family. In Europe only Basque, Hungarian, Finnish, Estonian, and Saame (= Lapp, a pejorative term to be avoided) do not. We now have a satisfactory answer to the origin of European languages and, interestingly, Proto-Indo-European, by general consensus, was spoken at least a thousand years before the invention of writing by the Sumerians around 3,000 B.C. By extrapolating back in time on the basis of modern linguistic evidence, we have in fact entered the arena of human prehistory.


But what about Proto-Indo-European? Where did it come from? To this question there are two answers. First, according to most historical linguists–especially Indo-Europeanists–Indo-European has no known relatives. After around 6,000 years, the shifting sands of linguistic evolution have erased whatever evidence that once may have existed of more distant relationships. Everything before Indo-European is forever lost in the mists of time.

The second answer, supported mainly by Russian historical linguists and virtually all taxonomists, is that Indo-European is obviously related to a set of other families spread across northern Eurasia and extending into northern North America. According to the eminent American linguist Joseph Greenberg, this larger, more ancient family is called Eurasiatic, and its branches are Indo-European, Uralic (Finnish, Estonian, Hungarian), Altaic (Turkish, Mongolian, Manchu), Korean, Japanese, Ainu, Gilyak, Chukchi-Kamchatkan (Chukchi), and Eskimo-Aleut (Eskimo, Aleut).

One of the most obvious pieces of evidence supporting the Eurasiatic family is a specific pronominal pattern: M (I), T (you). Not every Eurasiatic language preserves both pronouns. English, for example, has generally lost the T pronoun, with you replacing earlier thou/thee. The vast majority of Indo-European languages, however, retain both, as seen, for example, in French moi (me), toi (you), or in Russian menya (me), tebya (you). Furthermore, although the M/r pattern characterizes Eurasiatic languages, other comparable families are characterized by different pronominal patterns. In the Americas, for example, the most common pattern is N (I) and M (you). This pattern characterizes the Amerind family, which includes almost all Native American languages (Blackfoot, Mohawk, Cherokee, Nahuatl, Quechua, Guarani), excluding only the Eskimo-Aleut and the Athabaskan Indian (Navajo, Apache).

Space does not permit a discussion of why most historical linguists persist in their collective myopia, refusing to see anything but the obvious. I have discussed the multiple causes of this bizarre situation in The Origin of Language.

Origins of Eurasiatic

What then of Eurasiatic? What is its origin? Approximately a dozen large families are comparable to Eurasiatic in the world. While recent research has uncovered significant lexical and grammatical similarities among all these families, it is more difficult to tell precisely which of the twelve groups is actually closest historically to Eurasiatic. My own feeling is that Eurasiatic is probably closest to the Amerind family in the Americas and to the Afro-Asiatic family (Arabic, Hebrew, Hausa), located for the most part in north and east Africa. Whether or not this particular conjecture is true, it now seems highly probable to a growing number of linguists that not only are all the world’s languages related, but that there is evidence among extant languages for this single human family.

One of the most widespread roots supporting monogenesis of extant languages is TIK (finger, one). Examples of this root are seen in Gur (West Africa) dike (one), Dinka (East Africa) tok (one), Proto-Afro-Asiatic*tak (one) (the asterisk indicates a reconstructed, rather than historically attested, form), Latin dig-itus (finger), Turkish tek (only), Ainu tek (hand),Japanese te (hand), Proto-Yeniseian (central Siberia) *tok (finger), Archaic Chinese tyek (one), Proto-Tibeto-Burman *tik (one), Proto-Miao-Yao (Southeast Asia) *nto (finger), Proto-Karonan (New Guinea) *dik (one), Eskimo tik-iq (index finger), Eyak (Alaska) tikhi (one), Karok (California) tik (finger, hand), Mohawk (New York) tsi’er (finger), Mangue (Nicaragua) tike (one), and Kukura (Brazil) tikua (finger). This is but a fraction of the evidence that John Bengtson and I have offered in support of this one root. And this root is just one among many.

What are we to make of this? What are the implications of these data? How are they to be explained? First, the myth that Indo-European is unrelated to any other family should be recognized for what it is–a feeble, but so far successful, attempt to preserve the inviolate independence of Indo-European. Second, the origin of the world’s present linguistic diversity must be of recent date. Though the 6,000-year limit adopted by Indo-Europeanists is clearly not reasonable, it is equally implausible that the origin of current linguistic diversity goes back to Eve 200,000 years ago, much less to the speech of Lucy almost four million years ago. The striking global similarities enumerated in the previous paragraph must be of much more recent date. But how recent?

The answer to this question must come from fields other than linguistics for the simple reason that linguistics is notoriously poor at providing absolute dates. But this is where archaeology, and to a lesser extent, human genetics excel. They do have means of providing absolute dates for various cultures and populations. The general picture they present is that people like us–anatomically-modern humans–appeared around 100,000 years ago in Africa. This is the species Homo sapiens sapiens to which all of us belong. The most significant fact about these people, however, is that while they looked like us, they did not behave like us. In fact, they lived more like Neanderthals than like us.

Around 50,000 years ago (give or take 10,000 years) a fundamental change occurred that is clearly reflected in the archaeological record. Modern behavior appears for the first time. But what exactly is modern behavior? It is nothing less than a fundamental and profound change in the manner in which humans lived:

* Artifacts (blades and such) suddenly become much more refined and stylized. Where previously simple tools had persisted almost unchanged for several hundred thousand years over vast territories, all of a sudden each locality has its own style of tools, and these styles change quickly over time and space.

* Art appears for the first time.

* While burial of human remains predates fully modern behavior, burials become more ritualized with the appearance of behaviorally modern people.

* Expansions of these modern people, probably out of Africa, led to the replacement of non-modern people like the Neanderthals, as well as anatomically-modern people who were not behaviorally modern. It is tempting to speculate that the development of fully modern language at this time made modern human behavior possible.

If we identify the origin of current linguistic diversity with the appearance of behaviorally modern people–a not unreasonable inference given the linguistic and archaeological evidence–then the linguistic similarities connecting all the world’s languages may have had a common origin as recently as 50,000 years ago. If this scenario is correct–and linguistic, genetic, and archaeological evidence seem to support it–then all modern humans (and all contemporary human cultures and languages) share a recent common origin.

1 Eve is the hypothetical female ancestor to all modern humans (based on analysis of mitochondrial DNA) who lived about 200,000 years ago.

Merritt Ruhlen is a lecturer in Human Biology at Stanford University. His recent publications include A Guide to the World’s Languages (Stanford University Press, 1991), The Origin of Language (John Wiley, 1994), and On the Origin of Languages (Stanford University Press, 1994).

