Friday, 30 July 2010

gut, foot, hoot

Warren Maguire’s website has a nice map of the British Isles showing preliminary results of his survey of answers to the question
Which of the words gut, foot and hoot rhyme for you?

The coloured dots on his map nicely display the distribution in the country of the three typical setups. In Scotland and Northern Ireland foot and hoot rhyme (the ‘Scottish’ system, blue dots). Everywhere else they don’t. In the north of England foot and gut rhyme (the ‘Northern’ system, yellow dots). In the south of England, and in non-Scottish, non-Northern English in general, none of the three rhyme (the ‘Southern’ system, red dots).
’Scottish’  ʌuublue
’Northern’  ʊʊyellow
’Southern’  ʌʊred

Because people were allowed to give more than one answer, there are also mixed possibilities, as in Warren’s own Northern Irish speech, in which (green dots) foot can rhyme either with gut (fʌt-ɡʌt) or with hoot (fʉt-hʉt), but presumably not both at once.

These keywords represent my own lexical sets STRUT, FOOT, and GOOSE. I didn’t choose gut as a keyword, even though it’s a commoner word than strut, because I judged that one speaker’s gut could well be confused with another speaker’s got.

Thursday, 29 July 2010

two placenames

Two placenames today.

One is Duisburg in Germany, recently in the news because of the tragedy at the Love Parade. In the British media, beside the ‘established anglicization’ (OBGP) ˈdjuːzbɜːɡ, I also heard newsreaders say ˈdjuːɪzbɜːɡ, an obvious spelling pronunciation. In German this place is ˈdyːsbʊʁk, which does not exactly follow the spelling. Personally, given that I learnt German in Kiel in the far north of the country, I tend to pronounce it ˈdyːsbʊɐç (like ˈhambʊɐç Hamburg) unless I remind myself not to.

The other placename is Slaugham, a village just off the main A23 road from London to Brighton, near the intriguingly named Pease Pottage. Driving past, I’ve sometimes idly wondered how this written form is to be interpreted: how do the locals pronounce this name? Does it rhyme with Maugham mɔːm? The answer is no.

Yesterday I was watching a traffic police video programme on television, when the action moved to this area. As the officers in the pursuit car reported their position over the radio I noted with interest that they called it ˈslɑːfəm. So it’s like laughter, not like slaughter. The old BBC Pronouncing Dictionary of British Names says it can be either ˈslɑːfəm or ˈslæfəm, prioritizing the latter.

Wednesday, 28 July 2010

sound comparisons

Warren Maguire’s comments on yesterday’s blog reminds me that I have not previously written about the interesting Sound Comparisons website.

This is the showcase for a research project conducted at the University of Edinburgh in 2005–2007. The website offers you ‘sound comparisons’ for about a hundred English words pronounced in fifty or so different native-speaker accents, mainly but not exclusively British. They are presented in narrowish IPA transcription, and for many (but not all) there are sound clips. There’s no connected speech.

Unusually, there are also a dozen or so ‘historical’ accents/varieties covered, ranging from Proto-Germanic to Shakespearean. Strangely, no native speakers of these varieties seem to have been available to offer recordings.

There are also a dozen or so ‘other Germanic’ varieties, giving the cognates in the relevant languages of the items in the English word list. If you’ve always wanted to listen to the Frisian word for ear, this is where to find it. (It’s iˑər, which you could easily take as some kind of postGaelic Scottish.)

With my browser at least the sound clips are rather flaky: you tend to get two plays (perhaps overlapping) of the word you ask for, followed shortly or indeed after quite a time by other words seemingly chosen at random.

Presumably because the research money was not renewed, the website now gives the impression of having been abandoned. I hope this is not the case: it would be nice to have the missing sound files, e.g. for London or Norwich. It would be nice to have some connected speech. It would be nice to have more accents represented.

Perhaps Warren Maguire’s ongoing research will fill some of these gaps.

Tuesday, 27 July 2010


More than two years ago (blog, 25 Mar 2008) I reported on the work being done by Jenny Cheshire, Sue Fox, Paul Kerswill and Eivind Torgersen on the speech of young Londoners living in the inner city. Traditional Cockney has given way to what they call “Multicultural London English”.

Eivind and Paul (pictured) have now kindly made available to me some sound clips of this new variety. I am not at liberty to let you hear any extended samples, but what I can do is let you listen to one or two words or phrases.

One of the innovations they identify is ‘k-backing’.

We’re used to the idea that velars tend to accommodate to the place of the following vowel, being somewhat fronter before front vowels and backer before back vowels. We routinely compare the initial plosive of keep with that of cool and perhaps use this to illustrate the notion of allophones of a phoneme.

The k-backing innovation is a kind of exaggeration of the backing of velars before back vowels. Rather than a mildly retracted k in words such as car, come, caught, many younger inner-London speakers have a very retracted plosive, perhaps even a uvular q.

Listen to clips of a young Anglo (= ethnically white) speaker pronouncing the phrases he's comi..., coming into it, yeah? and (a) parked car. This speaker has a multicultural friendship network. Note the backed ks, which are typical of such multiculturally-oriented anglos and of non-anglos. Anglos whose friendship network is Anglo-only do it just slightly less. Older people don’t do it at all.

No one knows where this innovation comes from.

= = =

In other news, the next International Congress of Phonetic Sciences will be held in Hong Kong from the 17th to the 21st of August 2011. The website has recently gone live here.

Monday, 26 July 2010

disunification (2)

Michael Everson correctly identifies a number of reasons to advocate the disunification of the Latin letters beta, theta, and chi from their Greek versions. If this happened, as IPA symbols we would use the Latin versions rather than the Greek ones.
He quotes briefly, without identifying the source, from the IPA 1949 Principles booklet. Here, more fully, is what is says there (The Principles of the International Phonetic Association, pages 1-2). Although unattributed, these are clearly Daniel Jones’s words.
Note the very clear intention to treat IPA θ (vertical) as distinct from Greek theta (typically oblique). Greek letters are to be incorporated into the IPA only as roman [sic] adaptations.
As Jones says, Greek theta has an alternative form, ϑ. This is encoded at U+03D1, whereas ordinary θ is at U+03B8.

In English printed texts that mix the Latin and Greek scripts, the Greek letters are typically oblique, the Latin ones upright. The purpose is to distinguish clearly between the two scripts (whereas the IPA wants everything in the same script). Here is an example, from Abbott and Mansfield’s Primer of Greek Grammar (my copy printed in 1949).
I think disunification of Latin and Greek beta, theta, chi would be a good thing.

An existing disunification that might be thought surprising is that of the IPA symbol for a voiced velar plosive, ɡ, U+0261, from ordinary lower-case g, U+0067. In many fonts there is no difference in the appearance of these two; in other fonts there is, e.g. in Times New Roman ɡ g (which I hope shows up properly in your browser). The IPA is on record as declaring that the two symbol shapes are equivalent and interchangeable. Nevertheless many phoneticians persist in treating them as distinct, which justifies Unicode’s disunification.

It is worth noting that a number of obsolete, derecognized former IPA symbols are located in the Unicode block Latin Extended-B. They include ƍ ƞ ƪ ƫ ƺ ƾ ƻ. This is also where we find upper-case versions of certain IPA symbols. These might be used in orthographies, though not in phonetic texts as such: Ɔ Ə Ɛ Ɣ Ɯ Ɵ Ʊ Ʌ. I have sometimes had to correct careless authors who used them in place of the lower-case phonetic symbols.

Another difficult area is that of letters with diacritics. It is possible to encode any such letter by using the base form plus one (or more) of the Combining Diacritical Marks provided in Unicode 0300–036F. However, doing so puts you at the mercy of the designers of fonts, browsers and word processing software, who may or may not have done the necessary work to make diacritics line up correctly above, below, or through the base letter. For “accented” letters used in orthographies Unicode provides separate encoding, as for example in the case of precomposed á ê ï õ ù ă ē į ő ů ç đ ġ ķ ň. However no precomposed combinations are provided for explicitly phonetic use. Obviously, since the range of possible combinations is potentially enormous we cannot expect to have many of these; but it would certainly be convenient to have precomposed versions of the symbols for the French nasalized vowels (blog, 15 July), ɑ̃ ɛ̃ ɔ̃ œ̃, which are abundantly attested in printed texts.

Friday, 23 July 2010

disunification (1)

Consider the following pairs of symbols: a а ä ӓ æ ӕ c с e е è ѐ ë ё i і j ј o о p р s ѕ x х y у. Can you see any difference between the members of each pair? Nor can I. Nor can anyone.

However in each pair the first symbol is a letter of the Latin alphabet, while the second is Cyrillic.

Correspondingly, the two members of each pair have different Unicode encodings. While Latin a is U+0061, Cyrillic а is U+0430. While Latin j is U+006A, Cyrillic ј (used in writing Serbian) is U+0458. And so on.

This situation is convenient in that it keeps all the basic Latin letters together in the block 0021–00FF (I give Unicode numbers in the usual hexadecimal form) and all the Cyrillic letters together in the block 0400–04FF. But it is also highly inconvenient, because it opens up potential breaches in security. Now that non-ASCII letters are allowed in URLs, the fact that two differently coded letters look identical could be exploited for malicious purposes, for phishing or scamming. While is a website you know and love (or not), “www.fасеbоо” would be somewhere quite different. (In the latter case, the Latin a,c,e,o have been replaced by the identical-looking Cyrillic equivalents.)

That is why they tell you not to click on links in emails, but to type them into the browser yourself.

It’s not quite as bad as that, because the domain name authorities will (we hope) refuse to register such deceptive domain names. On the other hand there is nothing to stop someone using this sort of thing as their Facebook name.

By the time it came to encoding IPA symbols, the Unicode consortium had become aware of this danger and resolved to take a much more conservative line. The new policy was that if two characters (“glyphs”) look the same, then normally they should have the same encoding. That’s why although most phonetic symbols are located in the IPA Extensions block (0250–02AF) some aren’t. We use the basic Latin a b c… rather than having special IPA ones. We also use the “Latin-1 Supplement” coding for the characters æ ç ð ø (U+00E6, U+00E7, U+00F0, U+00F8) since they occur in the ordinary spelling of Danish, French, Icelandic, and Norwegian. We also use the “Latin Extended-A” coding for the ħ (U+0127) used in Maltese orthography, for the œ (U+0153) used in French, and even for the ŋ (U+014B) used in spelling Sami and Mende. None of these is repeated in the IPA Extensions block, though ћ is separately coded for Cyrillic (Serbian, U+045B).

Worse, the phonetic symbols β θ χ (U+03B2, U+03B8, U+03C7) are to be found only in the “Greek and Coptic” block, since they are treated as identical with the Greek letters beta, theta and chi.

Fortunately, our IPA ɫ is not lumped in with Polish ł, nor ɪ (lax front unrounded vowel, small cap i) with Turkish dotless ı or Greek iota ι.

Meanwhile — rather incredibly, and going to the other extreme — our phonetic schwa ə is among the IPA symbols at U+0259, while the identical-appearing ǝ and ә are respectively LATIN SMALL LETTER TURNED E (U+01DD) of the Pan-Nigerian alphabet and CYRILLIC SMALL LETTER SCHWA (U+04D9) as used in Azerbaijani orthography.

The problem we face in all such cases is that of the “unification” versus “disunification” of identical-looking symbols.

More on this next week. Meanwhile, you might like to read Michael Everson’s discussion here.

Thursday, 22 July 2010


A few days ago the Guardian crossword included a word clued in such a way as to require crux to be a homophone of crooks. I remember noticing it at the time I solved the crossword and thinking that I have a distinction between krʌks and krʊks, whereas the compiler, Rufus, presumably did not. But it did not occur to me to write a letter to the editor about it.

Others did not hesitate. Two days ago there was a letter saying that Rufus must be a northerner (and by implication not properly educated) because he pronounces the two words the same. Today someone writes from an address in Greater London as follows.
…whereas “crux” as pronounced oop north rhymes with crooks as pronounced in t’ south … it does not rhyme with crooks in t’ north, where it approximates to “crewks”.

Clear? Let me explain. Popular northern speech merges the STRUT and FOOT sets, making dull rhyme with full and cut with put. (Hence the eye-dialect joke spelling 'oop' for up in the comment.) However among the FOOT words (i.e. words that have ʊ in RP and in ‘General American’) there is a variable subset in which some northerners (and also some Irish people) use a long vowel . This subset includes several words spelt -ook, such as book, cook, look… and crook.

This is why you have to be careful when selecting minimal pairs to test for the STRUT-FOOT merger. Cut vs. put is fine; but luck vs. look is not. Nor is crux vs. crooks, the pair at issue here.

  crux crooks
most speakers of English krʌks krʊks
some northerners krʊks krʊks
other northerners krʊks kruːks

(As usual the notation ʊ can cover a multitude of qualities for northern speech, ranging from close to mid and from back to central. Some people use a kind of ə for both STRUT and FOOT. The point is not the exact phonetic quality involved but the sameness or differentness of the vowel qualities in particular lexical sets or subsets.)

It’s difficult even for those who understand this situation to explain it in simple terms in a line or two of a letter to the editor.

Wednesday, 21 July 2010


The pronunciation of Portuguese — particularly European Portuguese rather than Brazilian — has a number of unusual and interesting features. The vowel system contains not only the widely found i e ɛ a ɔ o u but also two central vowels, ɨ and ɐ, and furthermore as many as six nasalized vowels, ĩ ẽ ɐ̃ ã õ ũ. (French and Polish, the other two European languages known for nasalized vowels, pale by comparison.) There are also plenty of diphthongs, including nasalized ones such as ɐ̃ĩ̯. The word têm has the superficially improbable pronunciation ˈtɐ̃ĩ̯ɐ̃ĩ̯. Vowels in unstressed syllables are subject to weakening, manifested as raising and (for front vowels) centralization.

The consonant inventory includes the four liquids ɾ ʀ ɫ ʎ.

All of this is admirably set out and illustrated in a new book that has just come into my hands, Fonética do português europeu: descrição e transcrição by António Emiliano (Lisboa: Guimarães).

I am glad to say that the author uses IPA notation throughout. He supplements it by the equivalent SAMPA notation, given alongside. This is perhaps rather wasteful of space, given that a simple table of symbol equivalents would have sufficed. Nevertheless, it is good to see this ASCIIization of the IPA (whose development I oversaw) treated so seriously and in such detail.

As well as discussion and illustration of each sound of Portuguese and its representation in orthography, there is also a seventy-page Vocabulário fonético de geónimos portugueses [Phonetic vocabulary of Portuguese place names], which I shall find very useful. For LPD I shall have to correct the Portuguese phonetics of Coimbra to read ˈkwĩ bɾɐ and will need to adjust the Portuguese name of Lisbon, Lisboa, to read ɫiʒˈboɐ.

Typographically, the book is notable for being set in Gentium, a font devised by Victor Gaultney (blog, 9 May 2006) both for phonetics and for ordinary text. (Distribution and further development of the font has now been taken over by SIL.) As well as being eminently readable, it looks very distinguished. Well done everyone.

Tuesday, 20 July 2010

Oh, Tracy!

There’s a Youtube clip that has been going the rounds in Britain recently. It is of the American voice teacher Tracy Goodwin purporting to teach the British LOT vowel, ɒ.

Scroll down here to read some reactions to it from the British general public. The consensus is that it is utterly hilarious. On Facebook I have seen reactions ranging from LOL to OMG, WTF, and ROTFL.

Here is a reply from a young lady in London, demonstrating how we really say coffee and dog.

Several commentators are bemused at Tracy’s American use of the term dialect (where we would say ‘accent’). Americans really do need to be aware of this difference in usage when communicating with the British: it’s only us linguists and phoneticians who are likely to have come across dialect in this sense before, because we read American textbooks and interact with American professionals.

I don’t like to criticize a fellow professional, but we do need to ask why Tracy’s demo is such a disaster. Here are some of the reasons, as I see it.
1. Her “British” LOT vowel is not open enough. It is in the mid ɔ area rather than the open ɒ area. So to us it sounds working-class Scottish rather than English.
2. She doesn’t realize that all of us English (though not all Scots) distinguish the LOT and THOUGHT sets. Her first two examples, hot and coffee, are LOT words, but her third example, fought, is a THOUGHT word and ought therefore to have ɔː, not ɒ. Her attempts at dog and fog sound particularly ludicrous. (They are both LOT words.)
3. Her happY vowel (at the end of coffee) is much too open. It approaches ɛ or perhaps more precisely [ɛ̝̈], which in England is highly marked both socially and regionally. Socially, it belongs in a variety of U-RP which is probably now entirely obsolete, a subvariety of what Cruttenden calls “Refined RP”. Alternatively, geographically it is associated with (the working-class accent of) central Northern places such as Leeds. No actor should use this kind of happY vowel for “British” unless playing an upper-class character in a play set a hundred years ago or more.
4. Putting these points together, we can say that Tracy’s version of BrE represents an impossible mixture of different social classes and different geographical locations. Bits [= conscious Briticism] of it are Scottish, bits of it are northern English, bits are RP/southern. Some of it is caricature-upper-class, some of it is working-class. Nobody, but nobody, talks like that in real life.

I expect Tracy thinks that we call policemen ‘bobbies’, too. But then that’s what most Americans believe.

Tracy’s own website says she has a Masters and ten years experience. She is the author of Be Delicious: The Art of Voice & Movement Integration.

I do hope my own attempts at AmE sound better than this.

Monday, 19 July 2010

un dizionario enorme

The delivery company DHL turned up on my doorstep a few days ago with an unsolicited and unexpected package addressed to me. A massive 5kg in weight, it proved to contain two volumes of a pronunciation dictionary from RAI, Radiotelevisione Italiana, entitled Dizionario italiano multimediale e multilingue d’Ortografia e di Pronunzia, or DOP for short.
I imagine I owe this honour to the fact that in the Foreword to LPD I acknowledged the help I had gained from an earlier (and much smaller) edition of this work.

With 133 pages of introduction and 1253 pages of dictionary proper — large pages, almost as big as A4 — it’s an enormous work. And this is only the part devoted to Italian: there’s also a promised third volume, not yet available, to be devoted to words belonging to other languages.

The two volumes already published are claimed to cover 92,000 voci di lessico e nomi propri della lingua italiana [Italian lexical words and proper names]; the third will cover 37,000 nomi propri e altre voci d’una sessantina di lingue diverse [proper names and other words from some sixty different languages].

All this despite the fact that the pronunciation of Italian words is mostly pretty predictable from the spelling. The chief uncertainties are the placement of the stress in longer words and one or two other things that I discuss below.

In the face of the publisher’s generosity in sending me a complimentary copy, it may seem churlish to look a gift horse in the mouth: but nevertheless here goes.

Why oh why don’t they use IPA? Instead, they use an idiosyncratic mishmash of a transcription system, of which I reproduce a small part here. At the very least, this makes the dictionary difficult for non-Italian phoneticians to use. Instead of seeing the familiar IPA symbols that we use for the tens or hundreds of other languages in whose pronunciation we may be interested, we have to contend with this set of strange symbols found nowhere else. Not only does DOP not use IPA, it applies familiar IPA symbols in unusual meanings. Thus for example ʃ is used for the voiced alveolar fricative (IPA z). The sound represented in IPA by ɕ, the voiceless alveolopalatal fricative, is shown in DOP as s’, which in IPA would be an ejective alveolar fricative. And so on.

You can sort of see the logic behind this. In Italian orthography the letter z stands for an affricate, either ts or dz. So Italians without phonetic training might be confused if a word such as caso ‘case’ were transcribed as IPA ˈkazo. Writing it as [kàʃo] draws the attention of the naïve reader to the fact that there’s something special about this s (compare casa ‘house’, where the fricative is voiceless). However, the millions of Italians who study or have studied English will be familiar with the use of IPA transcription for English, with ʃɪp standing for ship, busy transcribed ˈbɪzi and so on. And precisely the same arguments would apply in Germany, which hasn’t inhibited the Duden Aussprachewörterbuch and other German pronunciation dictionaries from using IPA. Then there’s Wikipedia, which insists that its authors use IPA rather than other possible phonetic alphabets or ad-hoc solutions.

The caso example also reveals the strongly normative nature of DOP. Although I am no expert on Italian, I am pretty sure that there are millions of Italians who don’t consistently make the distinction between s and z in the prescribed way. The same applies to the mid vowels, as venti ‘twenty’ ˈventi (DOP “vénti”) vs. venti ‘winds’ ˈvɛnti (DOP “vènti”), not to mention the complexities of the doubling of consonants. (DOP quite sensibly marks those words which are supposed to trigger gemination of the first consonant of a following word by a final raised + sign, thus appiè “appi̯è+”, making appiè della croce “[appi̯è ddella króče]”, or in IPA apˈpjɛddellaˈkroːtʃe.)

The phonetic terminology is flaky, too. You can see from the above sample that ‘voiceless’ is rendered as aspro [harsh, sharp], while ‘voiced’ is rendered dolce [sweet]. The standard Italian terms are added in brackets.

I wonder what it means to say that English/Spanish θ, symbolized by DOP’s special barred th symbol, is found in varie sfumature [in various shades].

At the end of the symbol list (not shown above), Arabic ‘ayn, IPA ʕ, is defined as faringale, ossia profondamente gutturale [pharyngeal, or rather deeply guttural]. And that’s it. Would it be voiceless or voiced? a plosive, a fricative, or an approximant? Or perhaps it’s a nasal or a lateral? OK, we can leave it to the specialists to argue whether it’s best described as epiglottal rather than pharyngeal, a pharyngealized plosive rather than a fricative or an approximant; but I wouldn’t let even an undergraduate get away with no attempt at all at a full VPM label, let alone the author of a work for publication.

Friday, 16 July 2010


Shane White asks
Could you tell me why the <th> in cloth is voiceless, but the <th> in clothes is voiced? (And indeed, why the vowel in cloth is /ɒ/, but in clothes is /əʊ/?)

The reason cloth has θ but clothes ð lies in the fact that in Old English the th in clothes was between vowel sounds. In words inherited from Old English you regularly get voiced th in this position (compare father, mother) but voiceless th elsewhere (thing, mouth, bath).

Compare north, south with θ but northern, southern with ð.

In the case of cloth-clothes it’s also part of a wider pattern in which a singular noun with a final voiceless fricative is matched with a plural with a voiced fricative: leaf — leaves, half — halves, truth truːθtruths truːðz, house haʊshouses ˈhaʊzɪz. You also get alternation between nouns with θ and verbs with ð: mouth maʊθ but to mouth maʊð, and sheath ʃiːθ but to sheathe ʃiːð, parallel with cloth klɒθ but to clothe kləʊð. At other places of articulation we similarly have shelf ʃelfto shelve ʃelv, use (noun) juːsto use juːz. (But we keep the voiceless f in to knife.)

In the OED there’s a lonɡ note at clothes explaining the history and also discussing the pronunciation with no dental fricative.

Thursday, 15 July 2010

French nasalized vowels

My formal learning of French ended at O Level (= today’s GCSE), though I later did a course in French phonetics as a postgraduate at UCL. So I can sort of get by in the language, but am far from being an expert on it. Nevertheless, I try to be helpful when people ask questions about it.

Rohan Dharwadkar writes
When I began learning French, I was initially mystified as to why a word like bon was transcribed as /bɔ̃/, and 'bien' was rendered /bjɛ̃/, since I heard words containing these nasal vowels being pronounced noticeably differently. Quite quickly, however, I realised that the postulation of these underlying representations facilitated the explication of alternations like bon/bonne or méxicain/méxicaine, by means of a denasalisation rule. ...
I think I must interrupt at that point. Rohan doesn’t expand on his claim that he “heard words containing these nasal vowels being pronounced noticeably differently”. The words are transcribed that way because they are pronounced that way, give or take. Admittedly, the vowel of bon is typically rather closer than cardinal 6 ɔ, and one could certainly justify the choice of an alternative symbol õ. The final vowel of mexicain, on the other hand, is typically slightly opener than cardinal 3 ɛ, and one could justify the choice of an alternative symbol æ̃ (which is what I write in LPD in such words). I’m referring here to the standard French of France — in Canadian French things are rather different.

And whatever the rule is that governs the alternations mentioned, it is surely not one of “denasalisation”. Back in the days of generative phonology, people analysed bon bɔ̃ as underlyingly #bɔn# and bonne bɔn as underlyingly #bɔn+ə#. The masculine form then underwent a rule changing Vn into a nasalized vowel: Vn → Ṽ / _{#,C}. How this is handled in these days of Optimality Theory someone else will have to tell us.

Rohan continues
However, this symmetry is broken when it comes to pairs like 'un/une' (œ̃/yn), 'commun/commune', etc.. Whence my first question — why is it that 'un', for example, is not analysed as being, phonemically, /ỹ/? Apart from possible historical reasons, what considerations have led to the retention of the /œ̃/ phoneme in French?

I am sure that some phonologists at least would argue exactly for /ỹ/. Likewise, they would posit /ĩ/ in a word such as fin (cf. finir). These [+hi +nas] vowels do not surface as such, because a late context-free lowering rule converts all high nasalized vowels to mid: [V +nas] → [-hi].

There are no reasons other than “possible historical” ones for the retention of the /œ̃/ phoneme (insofar as it has been retained — see next question). Everything in pronunciation that is, is so because of historical reasons.

Rohan’s not finished yet.
Another question is: I've read that many French speakers today pronounce the un of words ending in -un (commun, importun, etc.) as though the underlying phoneme were /ɛ̃/ instead of /œ̃/ . For those who retain /œ̃/, how is the vowel of the final syllable in such words phonetically realised?

I’m not sure what he means by this question. I would say that those who retain /œ̃/ pronounce œ̃, while those who use the now more general /ɛ̃/ in its place pronounce ɛ̃ (not just finally, but in all positions: lundi lɛ̃di). Am I missing something?

Wednesday, 14 July 2010

an enemy anemone

There seems to be something particularly difficult about VCVCV strings involving nasals at different places of articulation. We can manage enemy ˈenəmi, but an enemy ən ˈenəmi can start to feel slightly like a tongue-twister. By the time we get to the flower anemone əˈneməni and give it an indefinite article, an anemone ən əˈneməni, we may have to monitor ourselves carefully.

I remember as a child being shown some wood anemones by my mother and thinking they were wooden enemies.

What started me on this line of thought was an email in which someone was discussing an employee’s “renumeration”. This should of course be remuneration. But the m-n pronunciation problem in riˌmjuːnəˈreɪʃn̩ is reinforced by an evident etymological/semantic confusion involving words such as numeral (count the salary!).

The prevalence of the spoken form (mispronunciation) with -ˈnjuːm- instead of -ˈmjuːn- leads often enough to the written form (misspelling) with -num- instead of -mun-.

Etymologically, remuneration has nothing to do with numbers. The -mun- part is the same as in munificent ‘generous’, and goes back to the Latin mūnus, mūneris, a word with several meanings, one of which is ‘gift’. Cicero used the term remūnerātio, -ōnis in the sense of ‘recompense, repayment’, and the word has been in use in English since around 1400.

= = = =

The BBC is pleased to announce a vacancy for a Pronunciation Linguist in the BBC Pronunciation Unit. The vacancy is for a full-time fixed term one-year contract to cover maternity leave. The closing date for applications is Wednesday 28th July 2010. We expect to conduct interviews on Monday 16th August, and would want the successful candidate to begin work no later than w/c 13th September.
The BBC can only accept applications from candidates who are eligible to work in the UK. For full details of the job, including the specification, competencies and application form, please visit this url (job ref. no. 385939).

Tuesday, 13 July 2010

STRUT and commA

(As a further experiment, I’ve coded this posting in such a way that it should appear in the font Segoe UI for those who have it. Those who don’t, but have Lucida Grande, will see that. Those who have neither will see their default font.)

There’s an old limerick that goes like this.

ðə ˈwɒz ə jʌŋ ˈmæn əv kælˈkʌtə
əˈflɪktɪd ət ˈtaɪmz wɪð ə ˈstʌtə —
 hi ˈsed pəpəˈpliːz
 wʊdʒu ˈpɑːs mi ðə ˈtʃiːz
ən ðə ˈbəbəbəˈbəbəbəˈbʌtə.

I wouldn’t normally use this even for exemplification nowadays, not only because Calcutta has been renamed Kolkata (Bengali কলকাতা ˈkolkat̪a) but also because we know that we should not mock the afflicted.

However it does illustrate an interesting phonetic point. The issue is whether the STRUT vowel and the schwa are allophones of the same phoneme (realizations of the same underlying phonological unit) or not.

Some would claim that this is a non-issue, because STRUT is always stressed and schwa is never stressed. This argument might work if we define stress lexically, but it will not hold if by stress we mean a rhythmic beat in running speech.

The advantage of using a strongly rhythmic verse form such as the limerick is that we hold the rhythm constant as we play around with vowel qualities. And in my speech I feel, and believe I make, a clear difference between the last line as transcribed above and two other possibilities,
 ðə ˈbʌbʌbʌˈbʌbʌbʌˈbʌtə and
 ðə ˈbʌbəbəˈbʌbəbəˈbʌtə.
I can, if I choose, produce both ə and ʌ either stressed or unstressed.

There are though, I know, many other speakers who would not feel able to make any such distinction. They are the people who think of above as having the same vowel sound in each syllable.

This is what I referred to in Accents of English as ‘the STRUT-Schwa Merger’, adducing such variably distinct pairs as an orthodoxy vs unorthodoxy and a large and tidy room vs a large untidy room. It is presumably responsible, via restressing, for the AmE strong forms of of and from (compare BrE ɒv, frɒm).

Monday, 12 July 2010

Segoe UI

In my recent discussion of fonts that include the IPA symbols I overlooked one that I now think may be the best of all of those currently available. What is more, it is the current Windows system font.

It is called Segoe UI, and comes bundled with Windows Vista, Windows 7, and Office 2007. As far as I can see, it contains the full complement of IPA symbols (excepting only the labiodental flap, U+2C71). Here’s what it looks like, with a screengrab of how Word 2007 displays some phonetic text in 10-pt size.
You will see that its small-cap i (the lax close front vowel) has the serifs I pleaded for last week (blog, 6 July); so does the ordinary upper-case I. The diacritics sit nicely in their proper places. The dental click symbol extends below the line, making it easily distinguishable from lower-case L. (The fourth line in my screenshot is a transcription of the Zulu phrase isicathulo nesigqoko ‘a shoe and a hat’.) I’m not really satisfied by the proportions of the implosive-g symbol — hook too big, bowl too small — but it’ll do.

Microsoft tells us
Segoe UI is an approachable, open, and friendly typeface, and as a result has better readability than Tahoma, Microsoft Sans Serif, and Arial. It has the characteristics of a humanist sans serif: the varying widths of its capitals (narrow E and S, for instance, compared with Helvetica, where the widths are more alike, fairly wide); the stress and letterforms of its lowercase; and its true italic (rather than an "oblique" or slanted roman, like many industrial-looking sans serifs). The typeface is meant to give the same visual effect on screen and in print. It was designed to be a humanist sans serif with no strong character or distracting quirkiness.
Segoe UI is optimized for ClearType, which is on by default in Windows. With ClearType enabled, Segoe UI is an elegant, readable font. Without ClearType enabled, Segoe UI is only marginally acceptable. This factor determines when you should use Segoe UI.
Wikipedia says
It is distinguishable from its predecessor Tahoma and the Mac OS user interface font Lucida Grande by its rounder letters.
Segoe was designed by Steve Matteson during his employment at Agfa Monotype. Licensed to Microsoft for use as a branding typeface and user interface font, it was designed to be friendly and legible.

So it’s goodbye Tahoma, and hello and welcome Segoe UI.
Do Mac users have this font?

If all is well, and you have Segoe UI on your system, this paragraph should be in the font. aɪ ˈduː ˈhəʊp ju kŋ ˈriːd ɪʔ.

UPDATE: If all is well, and you have Segoe UI on your system, this paragraph should be in the font. Mac users should see Lucida Grande. aɪ ˈduː ˈhəʊp ju kŋ ˈriːd ɪʔ.

Friday, 9 July 2010


The latest quarterly update of the online OED covers the alphabetical range Rh to rococoesque. John Simpson provides his usual insightful commentary exploring some of the new entries. Ben Zimmer, too, has seized the moment to write about what he calls the “fascinatingly complex entry for a seemingly simple word: rock, used as a verb”.

You will forgive me if I draw attention instead to a coinage of my own.

In 1968, inspired by Labov’s innovative approach to data collection in New York’s department stores, I spent an afternoon in Southampton stopping people in the street and asking them about preferred flavours of ice cream. My covert aim was to elicit the word vanilla and see whether people on the streets of this town pronounced it with a final plain schwa ə, as in most kinds of English, or with an r-coloured ‘schwar’ ɚ. As I expected, I encountered a number of cases of the latter. (I was of course careful to exclude instances where the word was followed by a vowel at the beginning of the next word, the position where it would be normal for English people to use intrusive r.)

The reason for this sudden burst of activity was that I was being pressed for a contribution, any contribution, to the annual Progress Report put out by the Phonetics Laboratory of my Department. I wrote up my findings over the next two days: four-page paperette, job done.

At the time there was no satisfactory word available to describe the kind of pronunciation used in places such as Southampton, in which r is preserved in all positions, as against the more general type of English English in which historical r is lost except immediately before a vowel (as in RP, where farmer is ˈfɑːmə). Some American writers used the term r-ful, but this not only lacked the dignity appropriate for technical terminology but was also susceptible (in England, at least) to very inconvenient confusion with the word awful.

So I invented the word rhotic, with derivatives such as rhoticity, non-rhotic and, for accents like Southampton that additionally impose r-colouring in cases where it is historically unjustified, hyperrhotic.
Since then my coinage has not only been widely taken up but has also had a second meaning added, ‘exhibiting r-colouring’. This has also been made into a noun ‘rhotic’, meaning any kind of r-sound. Here’s Wikipedia.
Rhotics are … generally found to carry out similar phonological functions and have similar phonological features across different languages.

Ladefoged and Maddieson’s book The Sounds of the World’s Languages (Blackwell, 1996), has a chapter entitled Rhotics.
The new OED formulation does not recognize the noun.

Some people did not like my coinage. But it’s too late now. You’re stuck with it.

Thursday, 8 July 2010

elision (not!)

Every now and again I notice the words elide, elision being used in a way that is quite different from how I would use them. And it’s always in the same place: in an editorial in the Guardian newspaper.

We had an example two days ago. There had been some discussion whether the people who invaded England in 1066 are better described as ‘French’ or as ‘Normans’. Correspondents had pointed out that the two terms cannot be regarded as synonymous.
A correspondence on the letters page wrestles with the question of whether the French or the Normans invaded England in 1066, and whether there is any difference. As some contributors have pointed out, the elision of the French and the Normans is too crude.

The writer is clearly using elision here to mean something like ‘confusion, conflation, confounding’ of the two categories.

Another example of the same thing, this time involving the verb, is dated 12 April 2010 and bears the byline of Beatrix Campbell.
… towards the end of the 20th century within a single generation the numbers marrying halved, the numbers divorcing trebled, the proportion of children born outside marriage quadrupled.
Intimacy, however, did not diminish and parenting – as commitment, care and companionship – has flourished.
Yet, Tories subliminally elide these changes with the collapse of civilisation as we know it.

As far as I can see, no dictionary includes this meaning. Every dictionary I can lay hands on defines elision as ‘omission’ or words to that effect, particularly the omission of a vowel or syllable, or sometimes of a passage in a text. Correspondingly, to elide is to omit a vowel or syllable by elision.
1. The action of dropping out or suppressing: a. a letter or syllable in pronunciation; b. a passage in a book or connecting links in discourse. Also, an instance of either of these. Also fig.

That is how we use the term in phonetics, as when we refer to the possible elision of the t in next when we say the next day ðə ˈneks ˈdeɪ, or of the h in him when we say I’ve seen him aɪv ˈsiːn ɪm. (Contrary to my borrowed illustration above, you can’t elide the final t in night naɪt, though you can make it glottal, naɪʔ.)

See also the blog entry for 7 May 2010.

I consulted the Guardian in the person of my former student David Marsh, author of the Guardian’s Style Guide. He replied
In answer to your question, it is the journalist (and editors/subeditors) who don't know the meaning of the word, which still means (or should mean) what you understand it to mean.

Four centuries ago, the OED reveals a different misuse of the term. In 1626 Bacon, in an early debate about the mechanism of speech production, criticized another writer (Boyle?) for attributing sound to ‘elision of the air’.
The Cause given of Sound, that it should be an Elision of the Air (whereby, if they mean anything, they mean Cutting or Dividing, or else an Attenuating of the Air) is but a Terme of Ignorance.

Let’s have no more Termes of Ignorance in the Guardian or anywhere else.

Wednesday, 7 July 2010

hej, sokoły!

The choir I sing in makes an annual trip abroad. This year we are going to Warsaw (though on this occasion without me). Accordingly, we are learning a song in Polish to supplement our repertoire.
Żal, żal za dziewczyną,
Za zieloną Ukrainą,
Żal, żal serce płacze,
Już jej więcej nie zobaczę.

Hej, hej, hej sokoły
Omijajcie góry, lasy, doły.
Dzwoń, dzwoń, dzwoń dzwoneczku,
Mój stepowy skowroneczku.

There are two slight problems.
First, the music score we have been given omits all the diacritics in the Polish text. I don’t know if this is because the Sibelius software we use is not Unicode-compliant, or (more likely) because the person preparing the score didn’t know how to input Polish characters.

The reading rules for Polish, though unfamiliar to English eyes, are straightforward. Given the spelling, you can predict the pronunciation with some confidence — provided the diacritics are there. If they’re not there, you can’t.

The other problem is that the chorus member teaching us the Polish pronunciation, although he’s doing his best, is not a native speaker and not a phonetician.

He’s given us a Pronunciation Guide, consisting of the words of the song (with diacritics) supplemented by a respelling using English spelling conventions. It looks like this:
Żal, żal, za dziewczyną, za zieloną Ukrainą, żal, żal serce płacze,
Zhal Zhal zah jehv-che-nahw, zah jeh-lo-nahw Oo-crah-e-nahw, zhal zhal sertseh pwacheh

Not a disaster — but if I had been consulted about the respelling, that’s not exactly what I would have come up with. For example, the word dziewczyną is pronounced authentically, to the best of my knowledge, as dʑɛfˈtʃɨnɔɯ̃ (or you could write the second affricate as ʈʂ). The nearest I would be able to get with English spelling would be ‘jeff-chin-ong’. I think that would be closer than 'jehv-che-nahw'.

I know that Polish has obligatory obstruent voicing assimilation, which is why I respell the first syllable of dziewczyną as ‘jeff’. However our teacher is convinced that the fricative should be voiced (‘jehv’), which shows that he is dazzled by the Polish spelling (w) rather than listening to how Poles really say the word. It’s the same with ą: the respelling with ‘ah’ reflects the Polish letter but not the Polish sound.

Then in the next line there’s the word już ‘already’. It’s been respelled for us as you-zh. But I know that Polish has obligatory final obstruent devoicing. I would have written yoosh.

These details clearly don’t matter in the great scheme of things. Please don’t take this as a disparagement of the valiant efforts of our teacher. I’m sure that the Polish audience will be delighted that we are attempting a Polish song they will know, and they’ll probably all join in and drown us out anyway.

Tuesday, 6 July 2010

Calibri and Cambria

With each new version of Windows, more and more of the fonts supplied include phonetic symbols. And the handling of diacritics is gradually improving, too.
Michael Ashby wrote to me about updating the advice we give to students of phonetics about Unicode and fonts.
Students may no longer need to download any fonts at all, since
as you know Vista and Windows 7 come with a number of Unicode phonetic fonts. Have you evaluated them at all? I don't even know for certain how many there are. The Arial, Tahoma, Cambria, and Times New Roman fonts don't look bad, and (at least with Office 2007) even the diacritics seem to behave intelligently. But in Calibri, small cap i is without its top and bottom, and a general thing about all these Microsoft fonts is that the stress marks are a bit small, and spaced too far to the left. What do you think?

With Vista (in the UK release that I have) we got IPA symbols included in five bundled fonts: Times New Roman, Arial, Tahoma, and Courier New, as well as the old Lucida Sans Unicode (which in XP days was the only bundled font with IPA). Now, with Windows 7, Calibri and Cambria have been added.

Here is what they look like, as rendered on-screen by Microsoft Office 2007. (I haven’t got Office 2010.) I have also included Doulos SIL (downloadable from, for comparison.
At first glance the new phonetic fonts, Calibri and Cambria, seem quite good. Look, however, at these further samples.You can see that, as Michael says, Calibri (like the font most of you will see in this blog) has a small cap i without the serifs it really needs for good legibility. It also has too much space before the stress mark, after the ɡ and after the length mark. In Cambria the serifs and stress mark are satisfactory, but the character spacing in the word ɡɑːdn̩ still leaves a lot to be desired.

David Bond wrote to me complaining that with nasalized vowel symbols the tilde appears to the right of the base character on his screen and printout, instead of centred over it. I told him that this is somtimes a matter of the browser and word processor software he uses, and that to see the diacritics properly displayed he should install newer versions. But in the case of Courier New (see sample), it is the font that is at fault.

Getting back to the fonts, perhaps for the moment it’s safer to stick to the SIL fonts. Unlike Microsoft, SIL understands how phoneticians use symbols and what they want of them.

Monday, 5 July 2010

glottal t in AmE

Jarek Weckwerth, commenting on Friday’s blog, said
There's a rather widespread misconception … that there's little (or in fact no) glottalisation in American English. This even extends to my fellow (NNS) pronunciation teachers. In my experience, this is blatantly untrue... But very little has been published on this -- probably fewer than five articles. The amount of attention in the literature that glottalisation in the UK has received is larger by several orders of magnitude. One reason is, of course, the social importance that is attached to it in the UK. Glottalisation seems to be far less sociophonetically marked in the US.
and added the kind invitation
maybe our host would be so kind as to write a whole separate post about it?

I don’t think I can really add much to what I said in the ‘language panel’ on Glottal stop in LPD, where I wrote
ʔ is found as an allophone of t only
• at the end of a syllable, and
• if the preceding sound is a vowel or sonorant

Provided these conditions are satisfied, it is widely used in both BrE and AmE where the following sound is an obstruent

football ˈfʊt bɔːl → ˈfʊʔ bɔːl
outside ˌaʊt ˈsaɪd → ˌaʊʔ ˈsaɪd
that faint buzz ˌðæt ˌfeɪnt ˈbʌz → ˌðæʔ ˌfeɪnʔ ˈbʌz

or a nasal

atmospheric ˌæt məs ˈfer ɪk → ˌæʔ məs ˈfer ɪk
button ˈbʌt ən → ˈbʌʔ n
that name ˌðæt ˈneɪm → ˌðæʔ ˈneɪm

or a semivowel or non-syllabic l

Gatwick ˈɡæt wɪk → ˈɡæʔ wɪk
quite well ˌkwaɪt ˈwel → ˌkwaɪʔ ˈwel
brightly ˈbraɪt li → ˈbraɪʔ li

Things are more complicated than that in real life, of course. In particular, you often get an alveolar gesture accompanying the glottal closure, so that in football the speaker can correctly report a contact between the tongue tip and the alveolar ridge, while the hearer can correctly report hearing a glottal stop.

Am I right in thinking this is the case in most AmE just as in most BrE?

An exploded alveolar plosive in football sounds “overarticulated” for my kind of English. Complete absence of glottalization here tends to sound South African or Welsh.

Friday, 2 July 2010

cockney then and now

Thanks to Amy Stoller for pointing us to a video clip of Julie Andrews being coached for her Cockney accent for the film of My Fair Lady, 1956. (The dialect coach is American…). As Amy says, “it was clearly staged for the cameras to publicize the show, but is a find nonetheless”.
That kind of stage Cockney was, I think, always a little too good to be true. But in any case that sort of Cockney, Cockney of the kind I describe in my Accents of English book, Cockney as researched by the likes of Eva Sivertsen, is disappearing from the traditional area within earshot of Bow Bells in Cheapside as a result of demographic changes. Its speakers have largely moved out to outer boroughs and to new towns in Essex and Hertfordshire.

Yesterday’s London Evening Standard (now a freesheet and much improved on its earlier, paid, version) carried an article on the subject.
As its traditional speakers emigrate to Essex and Hertfordshire, the 650-year-old accent is dying off in London, to be replaced by multicultural London English, heavily influenced by West Indian patois, Bangladeshi and remnants of old cockney. The dialect won't die off altogether. It will survive in the descendants of those Home Counties émigrés. You can hear it happening today: teenagers in Essex speak like Henry Cooper and Barbara Windsor; in Lambeth, they are more likely to sound like Ali G.

The article is written on a popular style but based on serious academic research by Paul Kerswill and his team working on “Multicultural London English”. (Blog, 16 Nov 2006 and 25 Mar 2008)
Cockney in the East End is now transforming itself into multicultural London English, a new, melting-pot mixture of all those people living here who learned English as a second language. Ever since the 1960s, these areas of London have become home to immigrants from the West Indies, the Indian subcontinent and many other places, from South America and Africa to Central Asia and the Far East. Some of these people spoke the kind of English typical of their original countries. Others couldn't speak English, so children were speaking their native language at home but were learning English at school.
“This means that children were no longer learning their English dialect from local cockney speakers but from older teenagers, who themselves had developed their English in the linguistic melting pot. Out of all this, the new English which we call multicultural London English emerged, and this is the sound of inner-city London we hear today.”
This hybrid, known in slang terms as “Jafaican”, is a mixture of cockney, Bangladeshi and West Indian. Its leading fictional exponent is Ali G; a genuine user is Dizzee Rascal, the 24-year-old rapper, born in Bow, and a supporter of the cockney football team, West Ham.
The newspaper article chooses, perhaps wisely, not to attempt any description of the phonetic changes involved in the switch from the old Cockney to the new mLE, but entertains us instead with a Jafaican glossary.

Creps — trainers
Endz — area, estate, neighbourhood
Low batties — trousers that hang low on the waist
Skets — derogatory term for loose girls ...

I have to say, though, that young Londoners I encounter don’t seem to use most of these expressions, at least when speaking to me: only ɑːks for ask and ˈɪnɪʔ innit as the universal question tag.

Thursday, 1 July 2010


Thanks to Giridhar Rao for alerting me to this: a news website called DNA, based in Mumbai, India, reports that
94 per cent of the students in primary schools across the country cannot recognise the English alphabets.

Er, come again?
Which English alphabets would that be? When I last checked, we only use one alphabet, the one comprising the letters A to Z, the one also known as Latin script.
The writer can hardly be talking about Indian children being ignorant of the International Phonetic Alphabet or of alphabetic shorthand systems.
No, by an “alphabet” the writer of this report means what in core standard English is called a “letter”. The children can’t recognize the letters of the English (Latin) alphabet.

Isn’t it interesting that a reporter on an English-medium news website, writing a shock-horror article about falling standards of literacy, should confuse ‘alphabet’ and ‘letter’ in this way? This is a mistake I associate more with Japanese, Korean, or Chinese learners of EFL.

Or perhaps it’s not considered an error in Indian English.

Given the role of English in India, we conventionally call Indian English a variety of English as a Second Language, not English as a Foreign Language. But it’s worth remembering that the number of Indians speaking English as their first or only language is under a quarter of a million (1991 census), or putting it another way less than the population of Barbados. For the remaining ninety million or so Indians who reportedly speak it, it’s not their primary language. So it’s not surprising that in it we encounter some typical EFL errors.