Back in the day when I was doing my technology baccalaureate (Okay, I am showing off – I mean BTech) there was a team among my batchmates that was working on developing a text to speech generator in 1998. The concept certainly sounded both good and novel, till I graduated and was introduced to BonziBUDDY in 2000. At that point my friends’ idea only felt good, not novel. Also fascinating were concepts like voice recognition software and speech processors, which essentially worked the opposite route.
The trouble with speech processors is that they have a fairly complex problem to solve. Even for a single language like English there are so many different pronunciations and accents that they can drive you crazy. I have already documented “schedule” being pronounced as “shedyule” and “skejyule”. In addition I learnt the hard way that the Brit / Indian pronunciation of geyser is actually offensive in the US. Firstly you are supposed to say “hot water spring”, and if you say “geyser” you are not supposed to pronounce it as “geezer”, but “guyzer”. Forget the difference in British and American pronunciations – each country where English is spoken you have multiple pronunciations and accents. Followers of cricket will chuckle at the recollection of Geoffrey Boycott, a staunch Brit saying rubbish (“roobish”), cricket (“crickit”) and wicket (“wickit”). And when you go to a country like India the pronunciations and accents can have you in splits. Just imagine a simple line, “Will you have some food?” being said by a stereotypical Bengali, South Indian and North Indian:
Bengali: Ooill you hab some phood?
South Indian: Vill you haoow some foodda?
North Indian: (Forget it – they would bypass English and ask you the same question in Hindi)
(Apologies if the above offends you, but I know of several people in each of the buckets above who speak in this manner)
Which is why I find it really amusing to see the ubiquity of automated voice based response systems provided by different companies. There are so many reasons why it is not a good idea to let loose a half-baked speech processing software on an unsuspecting audience that it makes you wonder about the people in charge. One of the funniest commercials highlighting this was an ad for a Kyocera phone:
Another is the more recent Mac vs. PC ad from Apple:
Some speech recognition software is heuristic in nature, however, and that helps improve the process. But whatever be the case, my luck with automated speech processing has been disastrous to say the least. Most of the time I end up hating an automated speech processing-based response system within 30 seconds of my encounter. Here are some snapshots from different telephone calls. A lot of these come down to the fact that I have a tough time doing the American accent sub-consciously and I avoid doing it consciously.
System: Please say what you would like to do?
Me: Talk to a representative
System: Okay, so you want to go to a mail order pharmacy. Is that correct?
~~~
I have a subscription for Vonage Visual Voicemail. It tries to transcribe a voice message to text and sends me an email. In some cases it butchers a message left by an American, but that is nothing compared to the hideous mockery that takes place with some messages left by Tanuka’s Indian friends. The messages are in English and have the sporadic Indian interjections:
- A devastating massacre:
Date : May 19 2009 03:41:12 PM
From : 1408xxxxxxx
To : Sayontan Sinha (1408yyyyyyy)
"Hey, hi Tanuka. This is (??) here. We are leaving in another 10 minutes because (Mojin Paki Pasbi?) and Toni Pasbi will go to library and return it. I just wanted to... that mushroom dried scar is simply rich of that side(??) get the door speaker I wanted badly. Okay when you return just give a call. Bye. "
--- Brought to you by Vonage ---
- Or an abject surrender:
Date : May 07 2009 10:49:32 AM
From : 1408xxxxxxx
To : Sayontan Sinha (1408yyyyyyy)
We're sorry. We were unable to transcribe this message. You will not be charged for this message. Please listen to your voicemail.
--- Brought to you by Vonage ---
~~~
And the piece de resistance:System: Before we start, can you please say your first name?
Me: Sayontan
System: You said Frying Pan. Is that correct?
Ha ha ha! They make a good mockery of voice recognition in the movie: “Burn After Reading” too…!
Koke,
I haven’t seen “Burn After Reading” yet, but I will try to catch it.