Friday, 30 October 2009

Wholly holy

Have a look at the second of these “word picture” puzzles in yesterday’s London Lite.It consists of the words RELIGIOUS TOME perforated with a number of holes. The answer is “The holy bible”.
It depends, then, on the homophony of hole-y (full of holes) and holy (sacred). But they are not homophones for the many speakers in England who use a special allophone [ɒʊ] for /əʊ/ before morpheme-final /l/. (These are the people for whom a goalie, where the /l/ is morpheme-final, doesn’t really rhyme with slowly, where it is morpheme-initial.) I think everyone probably pronounces hole-y identically with wholly. In both the /l/ is treated as morpheme-final. But in holy it isn’t.
I discussed this in my blog of 31 July 2006, and will repeat here what I wrote then.
I recounted how, when I was a small boy and couldn’t sleep one night, my father told me the Bible story of Moses and the burning bush (Exodus 3).
In the words of the Authorized Version, God spake unto Moses from out of the midst of the bush and said, “Draw not nigh hither: put off thy shoes from off thy feet, for the place whereon thou standest is holy ground”.
But I heard this as hole-y ground, ground with holes in it. (If Moses kept his shoes on, I thought, perhaps he would get them caught in the holes.)
This implies that my late father pronounced [ˈhəʊli] holy ‘sacred’ and [ˈhəʊli] hole-y ‘containing holes’ identically — like me, and unlike the speakers mentioned above.

Thursday, 29 October 2009


I don’t know when or why one lost its weak form /ən/ in standard accents. It remains, with the spelling ’un, as a dialectal or jocular form. When I was a boy there was an evening sports paper called The Pink ’Un.

I imagine it was used only for dummy one after an adjective, as in a green ’un, a new ’un, a big ’un. That seems to be the position in those forms of English that retain it. I don’t think anyone would use it in contexts such as *I’ve got ’un, even though the dummy pronoun one in I’ve got one is normally unaccented.

Today’s Guardian has a picture of a small lemur with the headline Hallo wee’un, one of the worst puns for months. Since it doesn’t seem to be on their website, I have scanned it for your delectation.

Wednesday, 28 October 2009

corn beef and fry rice

Sili (24 October) mentioned the rivalry between “boxed set” and “box set” or “boxset”. A favourite example of this phenomenon that I used to use in my teaching days is “corned beef” (which is what it says on the tin) and “corn beef” (which corresponds to what we mostly say).
The principle this illustrates is that final /d/ in a consonant cluster is susceptible to elision when the next word begins with a consonant sound. In the case of a lexicalized phrase such as corned beef, people learn the pronunciation in its reduced form and may be unaware of the full form underlying it. They then spell it in accordance with the reduced pronunciation, which is for them the only pronunciation.
In Google, corn beef gets 180,000 hits, as compared with 1,450,000 for corned beef, a ratio of 1:8.
The books don’t tell you this, but I think this elision is less usual before /r/. I don’t think I can omit the /d/ from boiled rice.
I certainly can’t omit it in fried rice, where the final /d/ is not in a cluster and therefore not a candidate for elision. In Google fry rice gets 37,000 hits as compared with the 2,190,000 for fried rice, a ratio of 1:59.
In Chinese English, however, fry rice seems to be quite frequent. Here’s a picture of some “fry mushroom”.
In cases such as stir-fry rice noodles I think we have to analyse stir-fry as a nominalization of the verb (“noodles for stir-frying”).

Tuesday, 27 October 2009

How do you spell that address again?

Until now, a url has had to consist only of ASCII characters (and not even all of them are allowed). According to reports in the press, this is about to change. As from the middle of November “domain names written in Asian, Arabic or other scripts” will be allowed.
This is only fair. Everyone ought to be allowed a domain name written in their own usual writing system.
I haven’t seen the details yet, but this presumably means that we will be able to start using domain names and urls written in IPA symbols, too.
I look forward to encountering email addresses like dʒɒn_smɪθ@lʌfbrə (More seriously, email addresses like корженков@москва.ru and νικολάϊδης@αθήνας.gr will presumably become available, as well as the equivalents in Chinese, Japanese and Korean.)
That means that a whole new group of users will need to be taught how to enter Unicode characters into their email programs and web browsers. Not only will the Japanese need to be able to enter Latin letters with their keyboards, as now, but the English will need to be able to enter kana characters. And Chinese. And IPA.

Monday, 26 October 2009


As we know, English [ʃ] can be spelt not only sh but also in a number of other ways, as seen in the examples ocean, machine, precious, sugar, conscience, compulsion, pressure, mission, creation. However, sh is clearly felt as the basic way to spell this sound in English. Why? Why did we choose this particular digraph?
Historically speaking, the basic problem is that classical Latin had no palatoalveolars. In consequence, languages which use the Latin alphabet and which do have these sounds have not inherited any single way of representing them.
Greek had and has no palatoalveolars, either. So the Greek alphabet, too, lacks a letter for the sound [ʃ].
In Cyrillic, on the other hand, there is a letter used for just this purpose: Шш, presumably modelled on the Hebrew letter shin ש. This is also the origin of the Arabic ش.
The Armenian and Georgian alphabets also have special [ʃ] letters, upper- and lower-case: Շշ and Ⴘშ respectively.
Getting back to the Roman alphabet, I do not know why the predominant English way of writing [ʃ] is the digraph sh. French expresses this sound as ch. Words that in standard French now have [ʃ], and are so spelt, are (or were) pronounced in Norman French with [tʃ], and that is supposed to be the reason we use ch in English for the affricate.
For our fricative German writes sch and Polish sz. Does anyone know the historical reasons for these choices?
Hungarian writes it with the simple s, reserving the spelling sz for the sound [s]. Czech, Slovak, Lithuanian, Latvian, Slovene and Croatian all use the háček-bearing š. Romanian and Turkish use a subscript cedilla or comma, ş.
We’d better not go into Swedish too deeply: rs, sj, kj.

Friday, 23 October 2009


An SMS, a short message that you send or receive using your mobile phone (AmE cellphone) is generally known as a text. This word is also used as a verb and verbal noun: texting. (I see that David Crystal is giving a talk today at Berkeley entitled “From texting to tweeting: the brave new world of internet linguistics”.)

But what is the past tense of this verb?
Leo Holroyd asked
What is your opinion on the past form of the verb "text"? Many people use "text" rather than "texted" in the past tense (in both speech and writing), which is clearly irregular, and to me surprising. I can only guess that it has something to do with the unusual ending "-ext". Is it common for verbs to become irregular for this sort of reason?
He’s right. Many people, in Britain at least, use [tekst] as the past tense. I suppose we could spell it texed.
You can see how this has arisen. The final cluster [kst] is highly susceptible to losing its final consonant, particularly when followed by a consonant sound.
ðə neks(t) θɪŋ
ə bɒks(t) set
ə mɪks(t) ɡrɪl
— in all of these it’s usual for the final [t] to be elided (lost) except in very careful (over-enunciated) speech. Likewise we say
teks mesɪdʒɪz
aɪl sen ju ə teks wen aɪm redi
ðə wər ə həʊl lɒt əv tekss weɪtɪŋ fə mi
— so that [teks] can come to seem to be the basic form.
Then, just as the plural of box [bɒks] is boxes [bɒksɪz], so [teks] seems to need the plural [teksɪz]. And just as the past tense of box is boxed [bɒkst], so the past tense of tex(t) comes to be [tekst].
Indeed, [ə teks(t) mesɪdʒ] could then be interpreted as a texed message, one that you can tex to someone.
It may seem shocking to us highly literate people. But many users of text messaging are not highly literate (though I agree with David Crystal that text messaging, by encouraging people to read and write more frequently, helps literacy rather than hindering it).
I should think that the contex(t)-free pronunciation [teks] for the verb will persist.

Thursday, 22 October 2009

Che cosa? ¿Qué?

In the male-voice choir I belong to we warm up thoroughly at the start of every rehearsal. This involves body mobilization like the exercises that actors do, humming, and singing scales to syllables such as ba, mɛ.
After that we sing scales to words. One exercise we often do involves going up and down doh-mi-so-dohʹ…doh…, usually to the words I am a zebra (with /e/, of course, not /iː/, because we’re mostly British) or I am a walrus (where some sing /ɔː/ and some /ɒ/, as you might expect).
But there’s another set of words we sing to this that I don’t know how to spell. Phonetically they go ˈbe la se ˈnjɔ ra. Presumably they mean “beautiful lady”: but in what language? Probably most people in the choir think they’re Spanish, though the sprinkling of native speakers of Spanish that we have could quickly disabuse them: señora is fine, but not *bela. They’re not Italian, either: here bella is fine, but not *segniora.
No, this phrase is a hybrid. The first word is Italian (sort of), the second one Spanish (sort of — we can’t manage a proper ɲ, of course). I wonder where it came from.
And before you ask, it’s not Esperanto either.

Wednesday, 21 October 2009

A Korean IPA?

This is one last posting about Korean.
A claimed advantage of hangul over the Roman and other alphabets is that it is feature-based. The symbols for the aspirated plosives, ᄑᄐᄏ [pʰ tʰ kʰ], are derived from those for the unaspirated/voiced ᄇᄃᄀ [p~b t~d k~g] by the addition of a single horizontal line (sort of). Those for the tense plosives, ᄈ ᄄᄁ [p* t* k*], involve doubling the basic jamo. Given that [e] is written ㅔ and [je] as ㅖ, it is logical that from ㅏ [a] we get ㅑ [ja]. And so on. Further, by exercising a modicum of imagination we can see in ᄀ [k~g] the outline of the tongue dorsum raised against the velum, and in ᄂ [n] the tongue tip raised to contact the alveolar ridge.

Hyun Bok Lee, the Professor Emeritus of Phonetics at SNU, wants to go beyond this. He is keen that the Korean writing system should also be used internationally in place of the IPA. You can read about his “IKPA” proposal here. It involves extending the hangul by adding various additional and modified letters so as to cover everything included in the IPA. And — an important point which he fails to discuss — it is to be written linearly, not arranged in syllable-sized character spaces as is done in Korean orthography.
It is exemplified, as applied to Korean, in the IPA Handbook, p. 123, and he has also produced a booklet about it. Unfortunately the web page supposedly devoted to it,, doesn’t seem to be available.
(As you can perhaps see, the English sentence transcribed actually ends “evening”, not “morning”. And there doesn’t seem to be any indication of vowel nasalization in the French [bõ-].)

Tuesday, 20 October 2009


At the Seoul conference (blog, yesterday) the Koreans were very proud to be able to draw to our attention the first use of the Korean writing system for a language other than Korean itself: namely for the language Cia-cia.
This is an Austronesian language spoken in and around the town of Bau-Bau on Buton island off the coast of Sulawesi in Indonesia.

My former PhD student Lee Ho-young (pictured), now professor of phonetics at Seoul National University, tells me that he was one of the main activists behind this achievement. He helped produce literacy materials including a textbook. Read more here.

With one exception, the Cia-Cia phonemes can be mapped onto a subset of those of Korean and are therefore written the same way. The exception is the fricative /v/, which is not found in contemporary Korean, but for which Lee resurrected the obsolete hangul jamo (or Korean letter) ᄫ (U+112B). (ᄫ was used as a symbol for the voiced bilabial fricative.)
The Cia-Cia implosives /ɓ/ and /ɗ/ are written with standard hangul jamo, as ㅍ and ㅌ. So the series /t, d, ɗ/ are written with the jamo that in Korean stand for /t*, t~d, th/ respectively, namely ㄸ, ㄷ, ㅌ.

The Cia-Cia word for ‘television’ is borrowed from Indonesian televisi, and is now written 뗄레ᄫㅣ시. (Actually that’s not exactly how it's written: the β jamo ᄫ and its following vowel ㅣ ought to be written within a single character space, but I’m not sure how to achieve that on my computer.)
Note the Korean-style treatment of /l/, written as double ᄙ and straddling two character spaces.

Monday, 19 October 2009

the Seoul olympics

The event that took me to Korea earlier this month was the ambitiously named First World Alphabets Olympics, held in Seoul.
The idea was that representatives of various different writing systems from around the world would each give a lecture-length presentation of their system. A panel of judges would then award gold, silver and bronze medals to the best systems. I thought it had the potential to be a very interesting event.
Accordingly, there were presentations on the Latin alphabet (as applied to Italian), on the Greek alphabet, on the Hebrew, Arabic and Armenian writing systems, on Devanagari and a number of related Asian scripts, on the Chinese and Japanese writing systems and of course on Korean hangeul.
The subtext was that the Korean writing system is the best in the world, a sentiment with which I can actually agree (subject to various provisos).
Unfortunately the moving force behind the Alphabet Olympics, Soon-Jick Bae, despite his enthusiasm and hard work, had no specialist knowledge of phonetics or linguistics and did not think to consult phoneticians until very late in the day. Had we been consulted earlier, we would have explained to him that in order to discuss writing systems (not just “alphabets”) it is not sufficient to be a literate native speaker of the language whose writing system you wish to describe. You can’t discuss how well letters (etc.) correspond to sounds unless you can describe the sounds. You can’t describe the sounds properly without some knowledge of phonetics.
A few of the speakers had this necessary phonetic background (notably those dealing with Thai, Lao, Japanese and Korean), but most did not. Most of the speakers (and for that matter several of the judges) appeared to have been chosen on the grounds of their being specialists in Korean or acquaintances of Mr Bae.
Sometimes I happened to have picked up enough knowledge elsewhere to be able to fill in the missing facts. Sometimes I didn’t, and the other judges didn’t either. For example, I am aware that the Greek alphabet as applied to modern Greek, admirable though it is, has the serious shortcoming that the five vowel sounds of the language can be written in twelve different ways (the worst is /i/, which can be spelt η, ι, υ, ει, οι, or υι). But I don’t actually know how well, say, the Myanmar “alphabet” reflects the sound system of Burmese (though I know a man who does).
We duly awarded the gold medal to the Korean alphabet.

Friday, 16 October 2009


On hoardings in the tube and elsewhere Londoners are being treated to a poster that includes the two IPA symbols ɛ and ɔ (aka cardinals 3 and 6).

As you see, the poster includes the lines Biɛ Wɔshwɛɔɔ and . You can read more about it here.

The characters ɛ (U+025B) and ɔ (U+0254) are used in the orthography of various Ghanaian languages. The one we have here is presumably Akan (a cover name for Twi and the mutually intelligible Fante), spoken by about 19 million people, half in Ghana and half elsewhere. As you might expect, these symbols stand for the lax mid vowels, front unrounded and back rounded respectively. (There is a system of vowel harmony. Phonologists classify these two vowels as [-ATR], as opposed to their [+ATR] counterparts e and o.)

The IPA does not distinguish upper and lower case, although the Latin alphabet used for orthographies does. So Twi also makes use of the special upper-case characters Ɛ (U+0190) and Ɔ (U+0186).

Probably most of the Londoners who see this poster imagine that these special letters, if they notice them at all, are just fancy ways of writing ordinary e and o. Phoneticians and Africanists know better.

Thursday, 15 October 2009

nuclear-free zones

The recorded voice of the tour guide that comes over the headphones provided in the Seoul City Tour Bus is available in five languages. The English one is spoken by a Korean lady with an excellent American English accent.
But every now and again she says something in a way no native speaker of English would. She uses a non-English intonation pattern. It’s not a matter of the choice between rises, falls and fall-rises; it’s not the division into chunks. It’s the location of the accents.
The National Theater | is divided into a large theater | and a small theater.

If a native speaker distributed accents in that way you’d think he must be suffering from some kind of pragmatic defect (the kind that SLTs used to call “bizarre use of language”).
The problem is the speaker’s placing of the nuclear accent on repeated occurrences of the same word.
What I would say, reading these same words from a script, is
The National Theater | is divided into a large theater | and a small theater.

An Indian speaker at the conference I attended in the same city spoke of
the world’s alphabets, | including the Indian alphabet.

A Thai lady told us about the two types of syllable in Thai:
ˈlive /syllables | and ˈdead \syllables.

These are all instances of the same failure by users of English as an International Language to place the nucleus correctly (which even Jennifer Jenkins thinks they should).

As well as making them sound bizarre to us native speakers, it means, too, that EIL users are missing out receptively on part of the richness of linguistic and pragmatic information that native speakers of English convey through intonation.

Thursday, 1 October 2009


Some twenty or more years ago I invented the term “buttressing” to refer to the use of the strong form for an unaccented preposition with a pronoun complement, after the nucleus in sentences like
(1) I had a letter from him.
aɪ ˈhæd ə ˈletə frɒm ɪm.

(2) I had a note from him.
aɪ ˈhæd ə ˈnəʊt frəm ɪm.

The use of strong or weak form can go either way in both cases, but on the whole we tend to use a strong form (frɒm) in (1), a weak form (frəm) in (2). The more weak syllables intervene after the nucleus, the more likely a strong form.
I used this terminology when I gave some lectures on English phonetics in Buenos Aires in 1992.
Later I decided (or perhaps was persuaded by my colleagues) that it is not necessary to have a new term for this phenomenon: we can just call it “rhythmic strong form”. It complements “stranding” (What are you looking at?): the two principles together account for strong forms in unaccented syllables.

When I returned to Buenos Aires last month several people approached me to say how much they liked the term “buttressing”. Why had I abandoned it? Would I reinstate it?

I have a slight feeling of guilt at having invented, or at least popularized, rather a large number of new technical terms in English phonetics, and try to keep them in check. (For instance, I abandoned the Latinate “correption” in favour of the English “smoothing”.)

But perhaps I needn’t feel guilty after all.
_ _ _ _

From tomorrow I shall be travelling again, and this blog will be suspended. The next entry will be on Thursday 15 October.