The materials presented on this site represent the progress to date of a project to develop a succinct description of various languages a means of learning each one by speaking it. These materials at present take various forms, ranging from ordinary .html files to language-learning software packages; some are free, while others are chargeable. I intend to develop them over time so that each language is described in a standard format. Eventually, all the materials will be chargeable.
These materials illustrate what might be called the engineering approach to language learning. The engineering approach takes a language to pieces and shows the learner how to build it for himself. This surely is what language learning is all about. Different languages are not just different words for the same things: they construct their utterances in radically different ways. The French je n'ai rien vu (literally I not have nothing seen), for example, is structured very differently from the English I didn't see anything - it's made up of quite different elements, with different functions, strung together in a different sequence. Learning a new language means learning to fit new elements together in new patterns.
Languages fit elements together in patterns because language is a combinatory system. A language doesn't consist just of the things that people currently say in it, or even the things that anybody has said in it hitherto: it consists of all the things that can ever be said in it, a more or less limitless set of utterances. Creating such a set of utterances requires a system that can combine elements to make new meanings, rather than just heaping word upon word; in that way a slender set of resources - a couple of dozen speech-sounds, a short grammar - can not only create anything that anybody wants to say, but also enable a speaker to understand without effort an utterance never heard before, such as Auntie Lizzie was trying to toast a cabbage, which is probably a new combination to most readers of this page. The slender resources can even enable us to understand part of an utterance when the words don't mean anything. For example, in 'Twas brillig, and the slithy toves / Did gyre and gimble in the wabe..., we know that things called 'toves' pursued the activities of gyring and gimbling. (And they did it in the wabe!)
The engineering approach to language learning unpicks each of the three components of a language - speech-sounds, grammar, and vocabulary - into their basic elements, then shows how to recombine those elements to create any utterance. This is not as difficult as it might seem. Most languages have an inventory of twenty to forty speech-sounds, and a core grammar which can be covered in about thirty pages. Vocabulary takes longer to describe and learn, but even here there is good news. Some vocabulary - notably prepositions and conjunctions - belongs to what linguists call a 'closed class', which means that there are not many words and that new ones are not being created, so learning those is not difficult; and with the 'open classes' of vocabulary - principally verbs and nouns - learners can get a grip of the language with a few hundred words, and then develop the vocabulary for their particular field.
The first of the three components of a language is its speech-sounds, and in most of the materials on this website I deal with the language on a phonetic basis. That is, I describe the behaviour of the groups of sounds that form the spoken language, rather than the groups of letters that constitute the written language. This of course means representing the language by means of a phonetic alphabet - one that shows its specific sounds - instead of by its traditional orthography. This method is not new: it follows a tradition developed in the 1930s by the Phonetics Department of University College, London (though not much pursued latterly). It reflects my conviction that speech is more natural, more deep-seated, than writing: children learn automatically to speak, but not to write, and whereas it is common to find societies and individuals that can speak but not write, there are none that can write but not speak.
But there are also more practical reasons why a coursebook should use phonetic symbols. One is that orthography - the ordinary written language - is a second best. Learners want to speak the language, but in most languages orthography is a complicated and unreliable guide to the pronunciation. The result is that learners never know whether the pronunciation they have worked out from the traditional spelling is correct or not. With a phonetic transcription, however, the language represented on the page is exactly the language that is spoken, and though it is sometimes argued that learning a phonetic transcription is an additional burden, its reliability makes learning it worthwhile. A second reason for using phonetic symbols is that orthography places an additional stage between understanding the structure and speaking the utterance: the learner must select the appropriate elements, modify them and pat them into order in their orthographical form, then convert that to a stream of speech-sounds. It's more economical to start with the elements in their speech-sound form, and work with those to formulate the utterance; and the more closely a textbook can match that process, the better textbook it will be. In short, the traditional orthographic textbook may be a good way of teaching the written language, but it is not a good way of teaching the spoken language - as witness the many learners who have used such textbooks and find themselves with a reasonably good reading knowledge, but no ability to speak or to understand what is said. Of course a textbook has to cover the writing system at some stage, but this is best done (in my view) when the spoken language has been fully described. It's easier to learn to write when you know the inventory of speech-sounds to be represented, and are working with familiar words.
A phonetic description of a language means in practice a traditional phonological account - one which identifies those speech-sounds in the language that make a difference between one word and another (technically, its phonemes). Such an account distinguishes, for example, between beaucoup /boku/ 'a lot' and beau cul /boky/ 'nice bum' in French (the first with what phoneticians call a close rounded back vowel and the second with a close rounded front vowel). This distinction is lost if you simply use the nearest English equivalent, as a cousin of mine found to her cost. So these materials tell you in detail what the phonemes are, how to produce each one, and how to modify these sounds in connected speech so that you sound more like a native speaker; each sound is described in both non-technical and technical language, with a glossary of technical terms, and for some languages a recording is provided. As far as the actual phonetic notation is concerned, the only sensible choice is the International Phonetic Alphabet (IPA), which has been in existence for 120 years and is widely used in language-learning texts (though less widely in Britain and America than in continental Europe). It is true that other phonetic systems exist, especially for languages that are not written with the western alphabet, but the great merit of the IPA is that it uses the same set of symbols for all languages. Learning it is therefore an intellectual investment: given that you have to learn a phonetic system anyway, you may as well learn one that can be partly reused if you learn another language.
With the foreign text all presented in a phonetic transcription, one of the most expensive and unwieldy features of a typical coursebook - namely the audio recordings - can be dispensed with, because the learner can take the pronunciation from the printed symbols. This has two further advantages. One is that you can find out the pronunciation of a word simply by finding it in the text - a much more practical proposition than listening through endless recordings. The second is that the pronunciation is more reliable, because the texts show the essential sounds, without the variability and lack of clarity that particular speakers inevitably introduce. Indeed it could be argued that the ready availability of recordings has set back the cause of language learning, because recordings appear to provide an ideal model for imitation, whereas in fact, because they don't identify what is essential in the speech-sounds and separate it from what is incidental, they tend to mislead.
Grammar generally - and especially in a traditional school context - is thought of as a set of rules that tell you when to use its and it's and not to say it were uz if you come from the north of England; its role in this context is to promote a national standard by commenting on usages that deviate from it. This is not the kind of grammar used in language learning. The kind of grammar used in language learning is a more deeply-grounded component of the language, the one which provides the blueprint for constructing any well-formed utterance. Grammar, in other words, is the entire combinatory system or procedure that joins words together to make phrases and sentences. It has been called a device for generating all and only the correct utterances in a language.
The materials presented here take this latter view of grammar. They follow a scheme based on phrase structure, one that seems likely to prove suitable for almost any language, and describe how the language forms noun phrases and noun clauses (your house, my old house, what I used to live in...), verb phrases (they left, did they leave?, didn't they ever leave?...), and qualifying phrases and clauses (after Henry, when Mary was born...). Since everything in a language - whether word, phrase or clause - is either a noun, a verb or a qualifier, this scheme must in principle cover all possible utterances. It does however oblige the materials writer to ensure that all significant patterns of noun phrase, verb phrase and so on are represented. It also requires him to deal with all the morphology, which is much more extensive in some languages than others. But such a grammar does not need to deal with - and doesn't deal with - such elusive features as for example the difference between English we intend to go and we're committed to going (one with infinitive go and one with gerund going). For features of this sort advanced students will eventually need a full reference grammar, or they can simply take on trust what they deduce from reading.
The main difference between the normative schoolroom grammar and a language-learning grammar is that the language-learning grammar is descriptive: it simply describes the language as it is found. This descriptive stance may require a sceptical look at the traditional analysis and terminology of the language in question. A well-known Serbo-Croat handbook, for example, lists separate 'dative' and 'locative' forms of all types of noun and pronoun; but they are always identical, and the only distinction seems to be that the case is called 'dative' if the word refers to a living thing and 'locative' if the word refers to a non-living thing. So the analysis is more elaborate than the facts, and merging these two cases would make the description simpler and clearer. Similarly in French there's a case for dropping the terms 'masculine' and 'feminine' in favour of 'Class 1' and 'Class 2': grammatical gender correlates with sex only in living things (and not in all of those), and the allocation of nouns denoting non-living things to gender is arbitrary, with the result that the terminology is misleading.
A grammar on a phonetic basis also throws up features that are often overlooked. Sometimes the description of the way sounds behave is more complicated than the way the orthography behaves: the English past-tense ending ed, for example, must be specified as /t/, /d/, or /id/, depending on what sound precedes it (passed, posed, parted). Spanish verbs become simpler, since much of their complication arises from orthographic changes designed to represent the unchanging pronunciation of the stem; French verbs become harder, because the question (for example) of whether the stem-vowel in nous aimons is pronounced like e-grave or e-acute can no longer be sidestepped. Similarly the Portuguese verb dever, which is irregular in that it changes its stem-vowel for different persons, does not normally appear in lists of irregular verbs, because orthographically speaking it's regular. Overall, phonetic treatment of the grammar brings lots of hidden features out into the open, and results in a more accurate and useful account.
Grammatical coursebooks normally contain sets of exercises, one on each form or structure. These are provided, in the materials presented here, in The Language Engine, a software package that provides the learner with virtually unlimited practice in morphology (changes to the forms of individual words) and syntax (fitting words together to make phrases). The exercises in The Language Engine have a number of advantages over conventional exercises in books. Firstly, they practise the whole vocabulary equally, and cover all the forms and syntax-patterns in the language, whereas conventional textbook exercises don't necessarily reckon to do this. Secondly, they are context-free, so the point of the exercises - which is to develop fluency in all the resources of the language - is not obscured by situational considerations. In French, for example, the exercise will specify whether tu or vous is to be used for you, so focussing attention on how to form that part of the verb, and removing the separate and social decision as to which part of the verb it should be. And thirdly, the exercises can be repeated indefinitely.
A further difference from conventional grammatical coursebooks is that the materials presented here are not progressive: learners are not forced to follow a set sequence through the material, but can dip and skip as they wish, maintaining the motivation that comes from curiosity. The result of the rigour and economy described above is that a workable grammar can usually be kept down to about 30 printed pages, and this means that items can be readily found, which removes the fear of forgetting something and the frustration of not being able to find it again.
The third component of a language is its vocabulary. Vocabulary is the least systematic part of a language, and the least productive when learnt. Learning the speech-sound system enables the learner to pronounce any word or phrase, and learning the grammar enables the learner to construct any utterance; but learning vocabulary just gives the learner more vocabulary. This lack of productivity is one reason that vocabulary takes so long to learn, because lack of productivity means lack of motivation. The other reason is that there is so much of it!
One way of mitigating this difficulty is to concentrate on what linguists call 'function words'. These are words such as articles, prepositions and conjunctions, whose role in language is to link things together. Then if you hear the Swedish phrase ett hus (for example), and you know that ett is the indefinite article a, you can be reasonably sure that hus is a noun. Function words are few in number and don't change over time, so once learned they stay learned. They are also highly useful. In 'Twas brillig... it's the function words and, the, etc. that enable us to make sense of the lines.
A second way of reducing the burden of learning vocabulary is to focus on the words that are actually needed. There's a core vocabulary - greetings, numbers, food and drink - that is likely to be useful in almost any situation, and the materials presented here extend that notion by adopting a limited vocabulary - about 600 words - of high everyday usefulness. Beyond that, it makes sense to focus on the words that are needed when using the language in real life, and this will depend on what the learner intends to use the language for. A practical way of developing this vocabulary is to look up, before any linguistic encounter, all the words that are likely to be needed, and then to check afterwards for any new words that turned up. It's often remarked that each learner's vocabulary is personal to them, and this necessity to choose a learnable number of words out of the tens of thousands in the language is the reason.
In the materials on this website, vocabulary is grouped by meaning. This makes it easier to remember it - there's evidence that we store words by meaning in our memories. This grouping doesn't prevent the learner from looking up a word quickly - there isn't so much vocabulary that an alphabetic list is required. So the learner doesn't need to worry so much about memorization, and can take a more relaxed approach.
Two things remain to be said. One is that it is impossible to describe all varieties of a language. Languages vary from locality to locality, from social class to social class, and from generation to generation (which is why teenagers' speech-habits horrify their parents). So the materials writer has to decide which variety he is going to describe; and to keep the description economical and the learner's task manageable, this means in practice describing one variety only. So the materials writer has to arrive at a view (for example) about the pronunciation teng cups versus ten cups, the use of between you and me versus between you and I, and the choice of kids versus children. He has to do this not because one alternative is inherently more correct than the other - the point of a descriptive account is to describe, not to pass judgement - but because one alternative is a more accurate record of what people actually say, and so makes a better learning tool. In the materials shown here, I aim to present a version of each language that has no unusual features and will pass without comment as ordinary, everyday speech.
The other point worth mentioning is that these materials are obviously incomplete. Firstly, no one language is as yet fully described in these materials, even within the limitations on variety given above. And secondly, out of the 4,500 languages spoken in the world today, only half-a-dozen are shown here. The current selection has a European bias, and Mandarin, Japanese, Cantonese, Thai, Korean and more Arabic ought surely to feature. The languages that appear are those that interested me, or that someone asked me to teach, or whose textbooks I found irritating. No doubt more languages will fall into those categories before long!