The Chemistry of Protolanguage

Yuri Tarnopolsky


----------------------------------  -----Text Box:------------------------------------


Keywords: protolanguage, language origin, language evolution, speech generation, chemistry, chemical nomenclature,

linearization,  Pattern Theory,  Transition State Theory,   complexity,   Ulf Grenander,   George Zipf,   Noam Chomsky,
Joseph Greenberg, Manfred Eigen, Ilya Prigogine, Walter Ross Ashby, René Thom, George Hammond, Robert Rosen,
axiom of closure.



Protolanguage (Derek Bickerton) in linguistics corresponds to an evolutionary stage preceding the grammaticalized language

as we know it. It could be possible to reconstruct the principles of protolanguage by turning to most general principles of evolution
in a larger picture, of which chemistry is a relevant part.  Both linguistics and  chemistry  are discrete  combinatorial systems.
Considering the chemical origin of life, chemical analogies might offer some insight into the origin of mind, language, and society,
all of which developed on the platform of life. The conceptual basis for discrete combinatorial systems, including chemistry and
language,  can be found in  Pattern Theory (Ulf Grenander)  where ideas, utterances, and molecules are configurations. To draw
the parallel further, chemistry uses its own language of chemical nomenclature to represent non-linear molecular structures as linear
strings of symbols. Chemistry pays particular attention to the intimate mechanisms of structural transformations. A tentative concept
of the mechanism of protolanguage generation is suggested as kinetically controlled linearization of a typically non-linear observable
configuration through a non-observable thought. Generation of linear expressions in protolanguage is viewed as a process of
generalized chemistry, going from a typically non-linear initial state through a transition state toward the linear output, under
the constraint of a maximized preservation of configuration topology.



     1.  Introduction                                                                4

     2.  Preview of main ideas                                                  8
      3.  Chemistry and linguistics: sister sciences                    13
      4.  Noam Chomsky and Joseph Greenberg                    16
      5.  Chinese and Chemicalese                                          25
      6.  René Thom and images of change                              32
      7.  Configurations, patterns, and Nean                            40
      8.  Some risky ideas about mathematics and life              49
      9.  Chemolinguistry: a chimera                                        52
    10. Tikki Tikki Tembo: language as a form of life              64
    11. Zipfing the chimera                                                     69
    12. A chemist and a chimp speak Nean                            78
    13. Scenes from the cave life told in Nean                         84
    14. Concluding remarks                                                   98
    15. APPENDIX
         15.1    Example of Chemicalese                                   101
        15.2    Examples of real-life large configurations            102
        15.3    The chemical view of the world                          103
        15.4.   Program nean                                                    110
References                                                                           111


The words and verses differ, each from each,
Compounded out of different elements...........

        Lucretius (De rerum natura, II )

The order and connection of ideas is the same
as the order and connection of things..............

                Spinoza (Ethica, I,  VII)


Mark S. Baker in his book The Atoms of Language (Baker, 2001) drew a consistent analogy of linguistics with chemistry.

Within the Principles and Parameters framework, a language is similar to a chemical element in the sense that it is a combination
of certain parameters. Baker acknowledged that most people would associate words with atoms of language, but he simply put
this view aside as “correct—in one sense.” (Baker, 2001, p. 51). He was, of course, right, pointing to the Periodic System as
a metaphor for the combinatorial nature of language. Moreover, the book manifested the true chemical spirit: it was built around
numerous observable linguistic examples, as any typical chemical monograph is built around hundreds of structures and their
        It would be unbecoming for a linguist to say “It’s Greek to me” about chemistry, anyway, but there is some genuine feeling of kinship
between both areas. Linguists evoke chemistry as the science epitomizing not only complexity but also its successful conquest.
The kinship was prophetically noticed long ago, even before the birth of chemistry, but after the birth of the famous Greek Democritus:
        The words and verses differ, each from each,
        Compounded out of different elements—
        Not since few only, as common letters, run
        Through all the words, or no two words are made,
        One and the other, from all like elements,
        But since they all, as general rule, are not
        The same as all. Thus, too, in other things,
        Whilst many germs common to many things
        There are, yet they, combined among themselves,
        Can form new who to others quite unlike.
        Thus fairly one may say that humankind,
        The grains, the gladsome trees, are all made up
        Of different atoms  (Lucretius, 1958, Book II).
        Chemistry, on its part, has been using extensive linguistic parallels for nucleic acids and proteins since the discovery of their relation.
Moreover, much earlier, chemistry developed its own tongue with a lexicon heavily borrowed from Greek and a refined grammar
with codified flexions and word order.
        Mark Baker’s book was the last drop into the bucket of observations that I had accumulated over a significant time. This paper is
an attempt of a chemist to view words as atoms of a chemistry.
        I am a chemist without any linguistic credentials whatsoever, but with a life long interest in languages. I am, to a variable degree, familiar
with properties of such languages as Russian (native), English (current), German (studied at school), French, Hungarian, Japanese,
Hebrew, and a few others. My very limited hands-on experience with the non-Indo-European languages, as well as a better (but by
no means perfect) knowledge of both Indo-European but diametrically opposite English and Russian, persuaded me that, with all
the striking differences in their design, all languages perform the same function with the same means. Neither the opulence of Bantu
languages, with their classifiers and suffixes, nor the intricately woven ribbons of  the Na-Dene verbs could shake my conviction.
The function is a representation of a non-linear “source,” whatever it is, and the means is an optimal linearization of the non-linear
                "The vocal-auditory channel has some desirable features as a medium of communication: it has a high bandwidth, its intensity
                can be modulated to conceal the speaker or to cover large distances, and it does not require light, proximity, a face-to-face
                orientation, or tying up the hands. However it is essentially a serial interface, lacking the full two-dimensionality needed to
                convey graph or tree structures and typographical devices such as fonts, subscripts, and brackets. The basic tools of a coding
                scheme employing it are an inventory of distinguishable symbols and their concatenation." (Pinker and Bloom, 1990).
        For over twenty years I have been watching the development of Pattern Theory (Grenander, 1976-2003), sometimes from a close
distance, regarding it as a general approach to complex systems consisting of atom-like elements and connecting bonds. It became
clear to me that this mathematical theory of everything nicely covered not only molecules and languages but also every discrete
combinatorial system we could come in touch, and did it with an unprecedented combination of generality and realism.
        Furthermore, I have witnessed the entire genesis and evolution of the science of complexity, starting from Prigogine (1984) , who
formulated the most fundamental principles of complex natural systems such as life, mind, and society, and further toward Artificial Life
(Adami, 1998) where languages and molecules were of the same kin already at the inception (Eigen, 1971-1979). “Natural” here is
the opposite of “artificial,” such as virtual reality, where people can walk on the ceiling and turn into wolves right before your eyes.
        I am also familiar with the language of musical notation, a couple of programming languages, and, due to my profession, with the curious
language of chemical nomenclature invented by organic chemistry to verbally communicate the non-linear molecular structure.
        Finally, the language of poetry—rarely spoken in everyday life—is my bonus pass to a gym where one can exercise linking distant
meanings and close sounds.
        The enormous literature comprising computational, formal, traditional, and historical linguistics, Artificial Intelligence, Artificial Life,
mathematical structures, physics of open systems, chemistry, and details of Pattern Theory is probed here only highly selectively and
superficially. The growing but still manageable bibliography on language evolution and computation has been nicely collected and
presented at the website of University of Illinois at Urbana-Champaign (Language Origin, WWW).
        My intent is not to formulate a theory—this should be entrusted to professionals—but to offer a new (but organically grown!) spice
for the boiling cauldron of linguistic ideas. Whoever likes the aroma can use it for meditation, inspiration, and, who knows, for some
fun time after a Ph.D. thesis. I wish to share a widest and most comprehensive—an illusive but honorable goal—view of the intellectual
jungle where linguistics and chemistry are of the same blood. In short, if it all boils, then down and up to Pattern Theory.
        I believe that my outsider status, as well as the claim for a larger picture, grants me the privilege of  choosing my own far-from-academic
style—which is just being natural. I cannot walk on the ceiling.
        I further refer to the following key figures of painting a large picture with language in the landscape: Lucretius, Ulf Grenander,
George Zipf, Manfred Eigen, Ilya Prigogine, Walter Ross Ashby, Rene Thom, George Hammond, all of them, except Lucretius and Zipf,
natural scientists and mathematicians. Among the modern linguists, David Lightfoot is, I believe, the mâitre of the eagle’s eye view.
I mention others in the main text.
        I will widely use the WWW sources. They may die out with time, but a peculiar life-like property of the Web is that the new ones
will be cooked, could be searched for, and found, garnished with ads. For better or worse, money will never be out of the larger picture.

Full text, pdf file


©    Yuri Tarnopolsky, 2004