How to pronounce English like a native
speaker of English
We’ve already dealt with the topic Flow
Production Techniques in Lesson 2. There we saw several
techniques that would help you speak English by making the
end of one word flow into the beginning of the
next word. While dealing with that topic, I told you that
the sounds made by five of the letters in English (a,
e, i, o and u) are called vowels and that the sounds
made by the remaining twenty one letters are called consonants.
Now, when you speak, words come together, and when words come
together, four different types of junctions are formed. In
Lesson 2, we saw that these junctions are:
• Consonant-consonant junctions.
• Consonant-vowel junctions.
• Vowel-vowel junctions.
• Vowel-consonant junctions.
And in Lesson 2, you learnt certain important
techniques that would help you utter one word after another
smoothly — without the junctions between every two of them
causing problems and forcing you to falter. Now this is what
I am going to do through the present Supplement: I’m going
to deal with the Flow Production Techniques at an advanced
Connected speech and pronunciation
Let me explain. When you watch
an English film, are you able to understand what the people
in that film are saying? When you listen to native speakers
of English having a conversation, are you able to understand
what they are saying? Well, many people aren’t able to. And
in this Supplement, I’m going to tell you what one of the
chief reasons is.
Well, simply put, this is what happens: When
you listen to them, you hear several clusters of sounds that
are unintelligible to you. That is, you’re not able
to make out what words these sound clusters represent.
Although they actually stand for everyday words that you know
very well, these sound clusters don’t sound to you to be like
anything you know. For example, suppose that you hear a native
speaker of English say something like this:
Note: As I’ve already told you in Lesson
3, ‘’ stands for the ‘schwa’.
This is a vowel sound — but not a distinct one. It
occurs in the unstressed syllables in words. This
is the sound of ‘a’ in “above”, “about”, etc., that of ‘e’
in “water”, that of ‘i’ in “possible”, that of ‘o’ in “actor”,
and that of ‘u’ in “suppose”. For all practical purposes,
these sounds are one and the same.)
What do you think was he saying? Well, if
he had written the same thing down (rather than uttered
it aloud), this is how it would’ve looked:
He isn’t your sort of man.
Or suppose that you hear him say things like
• ’snochos. • ’so’right. • ’sipmatter?
• ’kyou. • Praps.
If he had written these things down, they
would’ve looked as follows:
• It’s not yours. • It’s all right.
• What does it matter? • Thank you. • Perhaps.
A foreign learner finds spoken word groups
like these difficult to understand (when a native speaker
of English say them aloud). This is mainly because of two
1). He (the foreign learner) has had his
training mainly in written English, and his eyes
are used to seeing spaces between every two written
words. And he gets confused and somewhat disoriented when
he hears a group of words uttered as a single unit
— without even the briefest possible pause corresponding
to those spaces.
2). He has learnt to pronounce every word
individually, and he expects that a particular word would
sound the same whether it’s pronounced individually (in
isolation) or as part of a word group (in connected speech).
As far as the first point is concerned,
understand this: Blank spaces among the words in a written
word group have no importance when you utter that word group
in connected speech. In connected speech, there are no
pauses corresponding to the spaces among written words. No.
In connected speech, there are normally no pauses between
two neighbouring words in a word group (except when you make
use of a pause as a device in overcoming hesitation or as
a device that helps you compose and speak at the same time).
In general, there are only pauses between word groups,
and not between words. And the words in a word group
are spoken as a single, tight, well-knit unit, having no gaps
among them. You can even say that, in speech, a group of words
is treated as equivalent to a single word — and so the spaces
you see among the words (when you write that word group
down) have no relevance at all when you utter them in connected
Now let’s take up the second point.
In a way, this entire lesson is going to be a detailed study
of this (second) point.
At the outset, there’s something you should
understand firmly: Words in English don’t sound the same when
they’re pronounced individually (in isolation) as when they’re
pronounced as part of a word group in connected speech. No.
A word is pronounced in one way when it’s uttered in isolation
— that’s its ideal pronunciation. And it’s often pronounced
in a different way when it’s uttered in combination
with other words — that’s its pronunciation in practice.
Tongue movement and phonetic simplification
You see, when you utter a consonant
or a vowel individually, your tongue gets into the
ideal position that’s required to produce that sound. When
you utter another consonant or vowel after that, the tongue
will have to get back from that ideal position, and
then get into the ideal position required to produce the new
sound. This is only possible when you utter words individually
in isolation, because then you’ll be uttering the sounds
slowly, and your tongue will have enough time to move from
ideal position to ideal position. But when words are combined
(and uttered aloud) in speech, a cluster of consonants or
a cluster of vowels come together. And your tongue will have
to move from one position to another in quick succession.
And in that process, the positions to which the tongue moves
will not often be the ideal positions required to produce
the various sounds. So the consonant sound and the vowel sound
the tongue produces in connected speech will be different
from the ideal sounds. (The quality of the sounds the
tongue actually produces thus would depend on the nature of
the neighbouring sounds.)
In English, stressed syllables are
normally uttered slowly and clearly, and unstressed
syllables are always uttered quickly and far less clearly.
So when you utter stressed syllables in speech, there’ll
be time enough for your tongue to get into the ideal positions
required to produce the ideal consonant sounds and vowel sounds.
But when you utter unstressed syllables, your tongue
won’t have enough time to get into the ideal positions required
to produce those syllables, because they’re uttered quickly.
So when you utter a cluster of unstressed syllables, your
tongue gets into such positions as it finds easier to get
into from the preceding positions, and not into the ideal
positions. As a result, a cluster of unstressed syllables
often sounds different in speech from what it might sound
if those syllables are pronounced slowly one after another.
As it’s difficult (and sometimes impossible)
for the tongue to move from ideal position to ideal position
in connected speech, it only moves from possible position
to possible position, and each consonant and each vowel
in a cluster will have to adjust to the sounds of the neighbouring
consonants and vowels. In this process of mutual adjustment,
this is what happens: The sounds of various consonant clusters,
vowel clusters and consonant-vowel clusters become different
from their ideal sounds — because the sounds that the tongue
produces are those that it finds easier to produce rather
than the ideal sounds. And that’s not all. Many consonants
and vowels even get left out, and are not pronounced. In other
words, in the process of mutual adjustment among neighbouring
consonants and vowels, a lot of phonetic simplification (of
consonant and vowel clusters) takes place.
Remember this: The tongue sometimes finds that it’s easier
to utter a cluster of consonants or vowels if it modifies
the sounds of some of them or leave them out altogether (without
pronouncing them), and that’s when all these phonetic
changes happen. So if you want to understand a native speaker
of English, you must never expect him to pronounce words with
the same precision as he would if he were asked to pronounce
them individually. Expect that the shapes of most of the words
would change in speech. And you should have a clear idea of
the sort of changes that can be expected. And this Supplement,
would help you here.
Phonetic simplification and fluency
Now as far as fluency development is
concerned, how are these phonetic changes important? In Lesson
3, we noted the following points:
• English is a semi-musical language.
• You should speak English by uttering
stressed syllables very clearly, and unstressed syllables
far less clearly.
• This contrast between stressed
syllables and unstressed syllables is the key to
the rhythm of English speech.
• You should speak English in stress-units
• Each “foot” is made up of a stressed
syllable which may (or may not) be followed by one
or more unstressed syllables.
• The number of syllables a foot has varies
from foot to foot within an idea unit. But you should only
take approximately the same amount of time to utter each
foot — no matter how many unstressed syllables a foot has.
• You should utter stressed syllables at
fairly equal intervals of time.
Now, for example, in an idea unit that you
utter, one foot may only have a single syllable (a stressed
syllable), another may have two syllables (a stressed syllable
and an unstressed syllable) and another may have four syllables
(a stressed syllable and three unstressed syllables). How
can you utter each of these feet by giving each the same amount
of time? We’ve already seen in Lesson 3 that you can do this
by doing two things:
1). You should utter the stressed
syllables alone clearly, and you should play down the unstressed
syllables by not uttering them clearly.
2). And you should utter the unstressed
syllables (that follow a stressed syllable) as fast as is
necessary to allow the next stressed syllable to come up
at the next rhythmic beat. (See Lesson 3 for details and
Now when you try to utter a foot containing,
say, as many as four syllables within the same length of time
as a foot containing, say, a single syllable, you can imagine
what’s going to happen to the three unstressed syllables in
that foot. Obviously, they’ll have to be pronounced so quickly
that they run into one another. And then, it’s only
natural that these two things happen:
1). Some of the consonants in those
unstressed syllables undergo a change in sound (to suit
the neighbouring consonants) or get dropped altogether from
2). And some of the vowels in them
get weakened or dropped from the utterance.
Phonetic changes like these are quite normal
in all styles of speech in English — formal, informal
(= casual) and neutral styles. You can notice them whenever
a native speaker of English speaks. Yes, whenever —
because all styles of speech in English are subject to the
pressures of rhythm and stress, and it’s these pressures that
make it difficult for the tongue to move into ideal positions
during a long utterance and thus brings about the phonetic
changes. These phonetic changes happen even when non-native
speakers speak English, but many non-native speakers
(wrongly) think that these changes are abnormal — and they
try hard to deliberately avoid these changes. And this is
what happens then:
1). The (unnecessary) effort they make
to avoid the phonetic changes interrupts the natural flow
of speech when they speak.
2). This effort takes away their concentration
from what they are saying to how they are
saying it, and their attention gets diverted away from the
meaning of their message to the details of pronunciation.
This stops them from concentrating on composing the content
of their message, and they falter.
So if you want to be fluent in spoken English,
remember this: You should never make a conscious effort
to resist the natural tendency of unstressed syllables to
undergo phonetic simplification. Instead, you should give
in or yield to this phenomenon.