Threads on Speech/Text Recognition and Conversations With Computers
Bob Jensen at Trinity University
This is a page that has not been updated since 2006
Many of the links are now broken.
Much, albeit not all, of my comments are out of date.
Speech/text recognition has greatly improved in the past few years!
I decided to leave the page on the Web since it may be useful from a historical perspective.
Introductory Remarks by Bob Jensen
Sending and Receiving Email Messages Via the Telephone
Telephone Conversations With Audio Portals
Speech Recognition Software
Text Reading Software
Free Long Distance Telephoning via Computers or Computers/Telephones in Combination
Introductory Remarks by Bob Jensen
Skype --- https://en.wikipedia.org/wiki/Skype
Audio to Audio Language Translation
MIT: For decades, machine-learning experts have tried to perfect language translation. Now Microsoft’s making strides with Skype ---
Dragon Speech Recognition --- https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking
Software That Translates Audio to Written Text (and vice versa)
MIT: Update in Dragon Speaking ---
I used Dragon on two previous computers and found it to be about 90% accurate. The key is the recommended voice training that should precede the use of this speech recognition product. Dragon will train within reason to accents and dialects. For example, David Raggay has a British/Trinidad accent. Dragon should do quite well when trained to his accent. I suspect it has a bit harder time with some other accents and dialects, but with sufficient training Dragon should catch on to those variations.
I used Dragon in failed efforts to translate video tape audio into text. Those of you who’ve known me over the years know that I carry a video camera to almost every session I attend at a conference. In most instances speakers allow me to video tape their presentations. In some cases I also got permission to post their remarks at my Web site. There are many examples at my Web site. For example, you can read Paul Pacter’s presentation about the early development of IAS 39 when he was still on the staff of the International Accounting Standards Board --- http://faculty.trinity.edu/rjensen/acct5341/speakers/pacter.htm
The problem with using speech recognition transcription software for obtaining text of conference presentations is that you generally cannot have conference speakers do a speech recognition training session before they make a presentation. Without such training, Dragon or any other speech recognition system will have trouble with accents and dialects.
In general, whenever I wanted to transcribe my video tapes of presentations into text, Dragon failed badly. My poor human secretaries had to pour over the video tapes themselves in very tedious efforts to transcribe the speakers’ comments into text. And that was only about 60% accurate because my secretaries were good at their jobs but were not familiar with the technical terms of most presentations I wanted them to transcribe, including technical terms that are not in the dictionary. My secretaries also had troubles with accents, dialects, and voice fade outs on the video tapes.
I do not yet have Vista on any of my computers and am usually not in a great hurry to catch the brass ring on any Microsoft upgrade. Microsoft always releases products before their time. I’d rather wait until you folks iron out many of the bugs.
My problem with speech recognition software is that I found dictation to be a slow process when combined with editing needed after the text files were generated. The problem is more me than the software. I tend to type and think more efficiently than I speak and think --- Dahh!
A real problem can be Internet links (URLs). Most everything I deal with these days is accompanied by a slew of URLs. It’s highly impractical to read links and other scholarly references into speech recognition software. When writing directly, I simply cut and paste URLs and references.
When you get down to it, a huge problem with speech recognition is the lack of a “Cut and Paste” clip board. Sure you can cut and paste after you have a dictation draft on your screen, but what a pain this becomes. It’s like having to eat stale leftovers.
In the final analysis I found that speech recognition software installed on my system ended up being rarely used. My secretaries were better but far from perfect when transcribing my video tapes into text. And they really, really hated those job assignments!
The speech recognition world is and probably always will be a very limited world. It works best when it is fully trained to your voice before being used for transcription. It works lousy without this tedious training beforehand.
By the way, I donated hundreds of video tapes I recorded over the years to the Accounting History Library at the University of Mississippi. You can go to this library and play any of these tapes. If you’re interested in doing so, contact Dale Flesher at the University of Mississippi --- http://www.olemiss.edu/depts/accountancy/facstaff.html
Here's how speech recognition works and the
problems it faces ---
I don’t think it’s possible to get decent speech digitization without at least at some voice training into the machine. The software grows better with time, but it most likely will never get better to a point where minimal training can be eliminated. Of course, you can try to tolerate what you capture without training and then tediously try to fix it up later on.
problem is that even humans need help in speech transcription. I finally
resorted to having my secretaries transcribe portions of my video tapes, but
they had great difficulties with some speakers. You’re going to get errors no
matter what. Of course if you have lots of money the pros can fix up your
initial transcription errors. One of the best examples of speech transcription
is the Northwestern University Oyez captures of Supreme Court session analog
recordings, but Northwestern had millions of dollars for this great project ---
They also had the transcriber’s records, but they had to fix up the transcribers errors.
One of the huge problems with speaking in English is that this difficult language has many words that sound and/or look the same but are totally different in context. It really helps if the training session includes some of these words to put into context in advance.
English Can Be QUITE Confusing.TO write it correctly English can be TOO much with TWO or more meanings of the same spelled letters
Forwarded (mostly) by James Don Edwards
Subject: English Can Be QUITE Confusing.
Can you read these correctly the first time?
01) The bandage was wound around the wound.
02) The farm was used to produce produce.
03) The dump was so full that it had to refuse more refuse.
04) We must polish the Polish furniture.
05) He could lead if he would get the lead out.
06) The soldier decided to desert his dessert in the desert.
There is no desert on Mount Desert Island in Maine
07) Since there is no time like the present, he thought it was time to present the present.
08) A bass was painted on the head of the bass drum.
09) When shot at, the dove dove into the bushes.
10) I did not object to the object.
11) The insurance was invalid for the invalid.
12) There was a row among the oarsmen about how to row.
13) They were too close to the door to close it.
14) The buck does funny things when the does are present.
15) A seamstress and a sewer fell down into a sewer line.
16) To help with planting, the farmer taught his sow to sow.
17) The wind was too strong to wind the sail.
18) Upon seeing the tear in the painting, I shed a tear.
19) I had to subject the subject to a series of tests.
20) How can I intimate this to my most intimate friend?
Let's face it, English is a crazy language. There is no egg in eggplant, nor ham in hamburger; neither apple nor pine in pineapple. English muffins weren't invented in England nor French fries in France .
Sweetmeats are candies while sweetbreads, which aren't sweet, are meat.
We take English for granted, but if we explore its paradoxes, we find that quicksand can work slowly, boxing rings are square and a guinea pig is neither from Guinea nor is it a pig.
And why is it that writers write but fingers don't fing, grocers don't groce and hammers don't ham? If the plural of tooth is teeth, why isn't the plural of booth, beeth? One goose, 2 geese. So one moose, 2 meese? One index, 2 indices? If teachers taught, why didn't preachers praught? Doesn't it seem crazy that you can make amends but not one amend? If you have a bunch of odds and ends and get rid of all but one of them, what do you call it?
Sometimes I think all the English speakers should be committed to an asylum for the verbally insane. In what language do people recite at a play and play at a recital, ship by truck and send cargo by ship, have noses that run and feet that smell?
How can a slim chance and a fat chance be the same, while a wise man and a wise guy are opposites? You have to marvel at the unique lunacy of a language in which your house can burn up as it burns down, in which you fill in a form by filling it out, and in which an alarm goes off by going on.
English was invented by people, not computers, and it reflects the creativity of the human race, which, of course, is not a race at all. That is why, when the stars are out, they are visible, but when the lights are out, they are invisible.
PS. - Why doesn't Buick rhyme with quick?
You lovers of the English language might enjoy this:
There is a two-letter word that perhaps has more meanings than any other two-letter word, and that is UP.
It's easy to understand UP, meaning toward the sky or toward the top of the list, but when we awaken in the morning, why do we wake UP? At a meeting, why does a topic come UP? Why do we speak UP and why are the officers UP for election and why is it UP to the secretary to write UP a report?
We call UP our friends. We use something to brighten UP a room, polish UP the silver, warm UP the leftovers, and clean UP the kitchen. We lock UP the house and some guys fix UP the old car. At other times the little word has real special meaning. People stir UP trouble, line UP for tickets, work UP an appetite, and think UPexcuses. To be dressed is one thing but to be dressed UP is special.
And this UP is confusing: A drain must be opened UP because it is stopped UP. We open UP a store in the morning but we close it UP at night.
We seem to be pretty mixed UP about UP! To be knowledgeable about the proper uses of UP, look the wordUP in the dictionary. In a desk-sized dictionary, it takes UP almost 1/4th of the page and can add UP to about thirty definitions. If you are UP to it, you might try building UP a list of the many ways UP is used. It will takeUP a lot of your time, but if you don't give UP, you may windUP. When it threatens to rain, we say it is cloudingUP. When the sun comes out we say it is clearing UP.
When it rains, it wets the earth and often messes things UP.
When it doesn't rain for awhile, things dry UP.
We could go on, but I'll wrap it UP for now my time is UP, so time to zip UP my lips!
How can I zip UP my lips UP instead of across?
Using Speech Recognition in a Search Engine
Boston-based startup EveryZing has launched a search engine that it hopes will change the way that people search for audio and video online. Formerly known as PodZinger, a podcast search engine, EveryZing is leveraging speech systems developed by technology company BBN that can convert spoken words into searchable text with about 80 percent accuracy. This bests other commercially available systems, says EveryZing CEO Tom Wilde.
Kate Greene, "More-Accurate Video Search: Speech-recognition software could improve video search," MIT's Technology Review, June 12, 2007 --- http://www.technologyreview.com/Infotech/18847/
Bob Jensen's threads on video searching are at http://faculty.trinity.edu/rjensen/searchh.htm
Knowledge Portals ---
Using Speech Recognition to Search Video
Despite recent advances in visual-search engines, accurate video search still remains a challenge, particularly when dealing with sports footage, says Michael Fleischman, a computer scientist at MIT. "The difference between a home run and a foul ball is often hard for a human novice to notice, and nearly impossible for a machine to recognize." To cope with growing video repositories, cutting-edge systems are now emerging that use automatic speech recognition (ASR) to try to improve the search accuracy by generating text transcripts. (See "More-Accurate Video Search.")
Duncan Graham-Rowe, "Searching Sportscasts A new way to search video could help fans find footage," MIT's Technology Review, June 21, 2007 --- http://www.technologyreview.com/Infotech/18957/
Sending and Receiving Email Messages Via the Telephone
You can send or receive audio email messages via CoolMail.net --- http://www.planetarymotion.com/
You can send or receive audio email message via Sonic Mail ---
- No more typing. Just talk and send
- Include pictures of friends and family
- No large file attachments
- Return receipts let you know when your message has been heard
- Works with address books from AOL, Netscape, Outlook Express, PalmPilot, Yahoo Mail, and Eudora
- Available in English, Spanish, French, Italian, and German
Yahoo also offers this service. At this point I would probably recommend Yahoo since Yahoo claims to offer a "lifetime" of free email service. My wife's sister Nancy and her husband love the new feature in Yahoo mail that lets you listen to your email messages over the phone. They especially liked this service when traveling across country by car. Dial up a free 800 number from your cell phone and listen to your email. Nancy indicates that this works best with text messages that are not too garbled up with pictures, animations, and attachments.
Telephone Conversations With Audio Portals
In the August 22 Edition of New Bookmarks, I featured the BeVocal website where you can have a conversation with a computer regarding driving directions, stock quotes, weather, etc. That website is at http://www.bevocal.com/index.html. You can hold a conversation by phone with a woman and not even know that she is only a virtual woman and not someone you can invite for cocktails and dinner (she only gulps on electricity).
The PBS show called Computer Chronicles recently demonstrated Quack
Quack is owned by AOL. You can read the following at http://www.quack.com/company_press_4.html
The Quack service is the first voice portal to include nationwide access to web-based information from any phone including personalized weather, traffic, sports scores, stock prices and movie information. By dialing 800-73-QUACK (800-737-8225), anyone can reach Web information from any phone, anytime, anywhere, for free.
SpeechWorks International, Inc. is the market leader in the telephony-based speech technology industry. Award-winning speech recognition solutions from SpeechWorks enable the development of services that let consumers direct their calls, obtain information and complete transactions automatically, simply by speaking naturally over any phone.
“Quack.com’s ability to work closely with SpeechWorks, and extend SpeechWorks’ technology and speech design services has been instrumental to Quack’s quick-to-market delivery,” said Alex Quilici, CEO and co-founder of Quack.com. “The relationship with SpeechWorks means Quack.com will continually develop and introduce new, state-of-the art speech-based services much more quickly than has previously been possible.”
TellMe lets you have a phone conversation
with it various databases at http://www.tellme.com/
After you sign up for free at the above website, you can phone to have a conversation about the following:
Call 1-800-555-TELL and say:
Sorry --- no answers to Bob Jensen's accounting theory questions (yet)!
Tellme My Favorites Sports Soap Operas Restaurants News Lottery Movies Election Blackjack Taxi Traffic Time Driving Directions Weather Phone Booth Travel Horoscopes Extensions Stock Quotes
Knowledge Portals --- See http://faculty.trinity.edu/rjensen/290wp/290wp.htm#Predictions
I think the advantage of the computer is that you can both have the audio and transcribe the audio into text. Hopefully, knowledge portals will do both.
However, present audio portals such as BeVocal can only be accessed by telephone.
One day, we hope that telephones will have the ability to convert your typed messages into audio for the phone and translate the incoming audio into instant text. That day is almost here!
Technology will be fantastic in aiding the deaf. It will be equally fantastic for the blind with the ability to translate text into audio. For Helen Keller-type handicaps, however, technology will be less exciting. There are experiments taking place that link computers directly to the brain and bypass audio and visual sensory preceptors. However, this technology is a long way off.
Deaf people should actively encourage accompaniment of audio with text transcriptions, especially in knowledge portals.
America Online on Wednesday launched AOL 6.0 and its AOL by Phone initiative. Not to be outdone, rival Microsoft announced ambitious plans for MSN --- http://www.eweek.com/a/pcwt0010265/2645004/
AOL and Microsoft add fuel to online fire
By Dennis Fisher and Carmen Nobel, eWEEK October 25, 2000 4:44 PM ET
With AOL 6.0, the Dulles, Va., ISP has served up a long list of enhancements, most notably the addition of support for HTML e-mail and the creation of Groups@AOL, a way for members to establish private sites that only small groups of friends or family members can access.
The new version also includes an updated AOL Instant Messenger client and an integrated media player, which supports all of the major online media formats.
The AOL by Phone initiative gives members the ability to check their e-mail and get news, weather and stock headlines from any landline or Web-enabled wireless phone. The plan is part of the company's AOL Anywhere strategy, which aims to make AOL content available to members on any device at any time.
Speech Recognition Software
To date, vocabulary limitations and other problems make this a less than perfect option for authoring at the moment. However, technology seems to be adequate for major companies like American Express, UPS, Schwab & Co., and other companies to move from "curious novelty to strategic technology" according to Mary Thyfault in "Voice Recognition Enters the Mainstream" in Information Week, July 14, 1997, p. 20. These companies intend to have computers respond to customer voices. For example, using technology developed by Nuance, Scwab & Co. introduced the "Voice Broker" that responds to telephone requests for market price quotations and other investment information. American Express uses voice recognition for travel services. The ability to talk directly with a computer was anticipated years ago in Star Trek television shows and with the supercomputer named HAL in the popular film "2001 Space Odyssey". Eventually speech recognition will be commonplace when using both large and small computers. Apple Corporation led the way in speech recognition, but the gap has been closed between Mac and PC users. The latest excitement in software that will recognize normal (continuous) speaking speeds is Dragon's Naturally Speaking fromhttp://www.dragonsys.com/. Other options such as Voice Assist from Creative Labs (800-998-1000) are available for PCs. However, the leading and most reliable PC software at the time of this writing are Naturally Speaking from Dragon and VoicePlus ViaVoice Simply Speaking Software from IBM Corporation. VoiceType sells for less than $100 and had 94% accuracy rate in tests reported in Consumer Reports, July 1997, p. 6. Another competitor (Kurzweil VoiceCommands) only had a 72% accuracy in the same tests, although VoicePad did receive the Software Publishers Association's Award for the "Best New Software Program of the Year" in 1997. Older links for discrete (non-continuous) speaking recognition include IBM's VoiceType and AVRI's SpeechCommander. Microsoft has Speech Dictation software. Siemens Business Communication also has products on speech recognition. One product from Siemens is ComManager telephony and call accounting software. Microsoft Agent can be downloaded free from http://www.microsoft.com/workshop/imedia/agent/agentdl.asp (See also Text reading and Disabilities products)
For applications of speech recognition see TRACI Talk: The Mystery and Let's go Read! An Island Adventure. Islip Media Inc. in Pittsburgh offers a speech recognition search engine for video libraries. It is costly, howver, at $50,000 for a 50 user license. The Islip web site is at http://www.islip.com/
Probably the most exciting thing this week is the featured speech recognition software on the PBS television show called Computer Chronicles. This show was a summer re-run of the Computers Without Keyboards show summarized at http://www.cmptv.com/computerchronicles/shows/99-00/1721keyboards/1721-summary.html
There were various demonstrations, including almost flawless letter dictation using Dragon's Naturally Speaking. You simply say "new paragraph," "comma," or other accepted commands, including correction comments such as a command to change "two" to "too." The Dragon Naturally Speaking software and other leading speech recognition websites are given at http://www.trinity.edu/~rjensen/245glosf.htm#Speech1
But everything else on the show paled in comparison to the BeVocal demonstration of how you can call a free long distance number and interact by phone with a virtual woman at http://www.bevocal.com/index.html
It's the only way to get FREE driving directions, traffic reports, weather forecasts, business locations, flight information, stock quotes, and more by phone. Just call 1-800-4-BVOCAL, speak up, and get what you need.
What is impressive is the fact that you can interrupt the virtual woman and ask her to repeat herself or spell words like names of city streets. You can also ask for current delays due to construction or traffic at the moment.
- You can "barge in" by saying commands anytime; you don't have to wait until the end to speak.
- Some BeVocal commands can be said anytime. That is, they can be used in any BeVocal service. Voice commands you can say anytime are: BeVocal Home, BeVocal Tips, BeVocal Driving Directions, BeVocal Traffic, BeVocal Flight Information, BeVocal Weather, BeVocal Stock Quotes, Pause, Repeat, What Are My Choices?, and Goodbye.
- Other commands are specific to individual BeVocal services.
What is important to educators and librarians is not this particular virtual woman and this particular application with a knowledge base on the above topics. What is important is that this demonstrates the future of education and training of the 21st Century. Suppose you really do not know how to account for a cross-currency swap using a EURIBOR index. Someday it will be possible to dial up (from a hand-held phone which will also be a wireless computer) and listen to a detailed interactive tutorial that walks you through your particular problem (where you feed in your own particular parameters). You will be able to "barge in" when you don't understand something, ask for definitions, ask for diagrams, ask for history, ask for examples, ask for current index levels, etc. One day in the future you will also be able to do the same thing when trying to understand passages from Hamlet or Bob Jensen's muddled up theory paper at http://faculty.trinity.edu/rjensen/315wp/315wp.htm
As educators, we have a responsibility to begin to organize the academy to design speech-recognition knowledge bases for BeVocal types of education and training.
The flip side of "speech recognition" is "text reading" conversion of written text into audio. The pioneer in this technology is Bell Labs at http://www.islip.com/. That Bell Labs web site has some wonderful demonstrations of this technology. (See Text reading.)
Information Week on May 10, 1999, Page 26 elaborates its notices that SpeechWorks International has speech recognition modules for ERP systems. For example, these modules can now be deployed in SAP. See http://www.speechworks.com/ .
Added June 27, 1999 --- The June 27 broadcast of the Dynamic Duo had some helpful information to pass on to the world. I like the way the Duo is willing to tell it like it is from the standpoint of user friendliness and reliability. The web site for the Duo is at http://www.digitalduo.com/ .
The lead segment was on the state of speech recognition. Speech recognition has come a long way in a short time. It is especially wonderful for persons who cannot use keyboards for one reason or another. Dragon Systems Naturally Speaking Mobile is an award winning pocket-size recorder --- see http://www.dragonsys.com/products/naturallyspeaking/mobile/index.html .
A major advantage of speech recognition is that audio files are recorded on the fly. This would be great product for me since I usually videotape conference presentations and student presentations. My beleaguered secretary spends over half her time transcribing the audio into text. It would be wonderful if I could bypass her by recording directly into my Dragon Mobile. The Dynamic Duo, however, reports that this will probably not be possible until speech recognition gets much better. Although the time it takes to "train the system" on a particular voice such as my own voice has been reduced from two hours to 30 minutes, it is not likely that each speaker at a conference will want to speak into my recorder for 30 minutes prior to his or her presentation. Even when the Dragon Mobile is properly trained, the Dynamic Duo found an average of one error in 20 words --- and that is an average number. When there is ambient noise the error rate explodes. Recording from a distance such as 15 feet greatly increases error rates. I think I will wait for a while before going Dragon Mobile. You can find links to other speech recognition vendors at http://faculty.trinity.edu/rjensen/245glosf.htm#Speech1
L&H Voice Xpress Professional has some key advantages over leading voice recognition software according to Jeff Angus in "Balanced Skills Make Voice Xpress a Winner," in Information Week, August 23, 1999, pp. 56-59. The online version is at http://www.informationweek.com/749/voice.htm. One of the advantages is that voice training is only takes about a third as much time as the training required for Dragon Systems. Another advantage is integration with Office 2000 products, especially Internet Explorer 5.0. You can dictate Office 2000 instructions by voice. Jeff Angus states the following
With about eight hours of use, Voice Xpress worked well enough for me to prefer it to typing. With 12 hours of use (work and training) it's a hands-down winner.
Voice Xpress still requires more help from me than I'd like recognizing Windows and application commands. Even going to the Voice Xpress toolbar and clicking the button that tells the utility to expect a command doesn't guarantee it will recognize my command every time.
In terms of desktop applications, Voice Xpress works best with Microsoft Word and PowerPoint, both text-intensive processes. I struggled a little bit to have it work with my spreadsheet, and while it occasionally pulled the correct set of format and numbers ($1,287, for example) out of a string of spoken input, this complex task requires more training. Users who work extensively with spreadsheets may find the payback time quick enough.
The web site for Voice Xpress is at http://wemark.com/oivl.html. The base price is $149. Beware that you should not even think about this product without 96 Mb of RAM with Windows 98 and 128 Mb of Ram with Windows NT. I think I will wait for this product to be a bit more user friendly. When there's a Voice Xpress for Dummies I will be the first in line.
December 1999 Update Update on speech technologies --- http://www.zdnet.com/pcweek/stories/news/0,4153,2409293,00.html
Dragon Systems Inc. has begun previewing its new AudioMining speech technology, which will enable users to search and retrieve audio and streaming media content on the Web.
The AudioMining technology converts audio data into text, which can then be accessed by keyword searches, company officials said. That saves time and helps users be more productive because they don't need to listen to entire recordings to find information, they added.
Dragon demonstrated the technology for the first time at the Giga Showcase for Innovative IT Solutions earlier this month (December 1999) in Palm Desert, Calif., and conference participants voted it Best Overall Winner, Most Innovative Product, Best Business Application Potential and Highest-Quality Demonstration.
Text-to-Speech (Audio) is Quite Good Unless There Are Words Not in a Standard Dictionary
Try it out at
The free software tries on such terms as "homoscedasticity" and "heteroscedasticity."
This software is useful for blind persons --- http://faculty.trinity.edu/rjensen/000aaa/thetools.htm#Handicapped
The pioneer in this technology is Bell Labs.
Also see http://atto.buffalo.edu/registered/ATBasics/Curriculum/Reading/textScreen.php
Free Long Distance Telephoning via
Computers/Telephones in Combination
HotTelephone.com --- http://www.hottelephone.com/
Users behind firewalls probably will not have any luck with free PC Phone service.)
The hottest free PC to PHONE service on the Web! Make unlimited free calls to more than 30 countries right from your PC. If you're paying anything for long distance, you're paying too much! The quality is the best and the service is unbeatable. Visit Our Global Community to find what our members are saying.
CTDepot.com --- http://www.ctdepot.com/
If you've heard that you can make free or very inexpensive long distance calls over the Internet, then you've come to the right place. On our web site you'll find all kinds of information on how to make calls over the Internet. We have tutorials that will show you how to use some of the most popular Telephony programs, and our Getting Started section will show you all the different ways you can use the Internet to make long distance calls.
We also have an online store where you can purchase products such as headsets that can improve the quality of your Internet phone calls, and PC cameras that let you see the person you're speaking with.
IPStarPhone.com - Internet telephone for free long distance using dial-up, cable, or DSL. --- http://www.ipstarphone.com/
Computer Telephony Depot - information on computer and Internet telephony. Learn how to make long distance calls over the Internet for free --- http://www.ipstarphone.com/ (including free calls to Hong Kong).
DialPad.com claims to have 10,000,000 registered users of their free long distance service --- http://www.dialpad.com/
[More results from melfort.com]
Advances in Speech to Text
August 10, 2006 message from Scott Bonacker [AECM@BONACKER.US]
And for when size matters:
Talk to the Machine
One of the frontiers of computing that remains stubbornly outside the reach of most mainstream applications despite 20 years of effort is speech recognition. On the face of it, replacing the somewhat cumbersome graphical user interface and keyboard with voice commands feels like something that should have already happened.
But it's only in the last couple of years that we've seen some major advances in recognizing grammar and patterns of words in ways that allow people to think about building more robust speech recognition applications.
In fact, voice-driven telephony applications such as those powered by tools from companies such as <http://www.convergys.com/toc_customer.html>
Convergys are experiencing rapid growth. But while that's certainly an improvement, the real question is how soon speech recognition will become a standard element of just about every application.
In separate podcasts on the
<http://www.acmqueue.com/modules.php?name=Queuecasts> ACMQueue Web site, Roberto <http://acmqueue.com/modules.php?name=Queuecasts&id=9> Sicconi from IBM and Mike <http://acmqueue.com/modules.php?name=Queuecasts&id=10> Cohen from Google said that day is coming a lot sooner than we think. IBM's Sicconi said that within two years we'll see a major explosion in speech recognition applications starting with gaming and then working its way through a whole host of applications.
And although Google isn't talking about any specific plans, it wouldn't take much imagination to see the possibilities of hosted voice recognition services. To that end, Google has enlisted Cohen, a co-founder of speech recognition pioneer <http://www.nuance.com/> Nuance Communications, to serve as the head of its research in this space.
As Steve Chirokas, the senior director for products and channels for the Customer Management Group at Convergys puts its, throwing a lot of hosted hardware at speech recognition applications not only makes financial sense, it also creates a more secure environment.
Either way, thanks to the advent of <http://www.voicexml.org/> VoiceXML and better natural language pattern recognition, we have not seen or heard anything yet.
Nuance has been promoting Dragon Naturally Speaking at 50% off for a month or so to their customers with other products, maybe this is the incentive.
Scott Bonacker, CPA
Advances in Text to Speech
Type in some text and hear it read back to you ---
Hint: Try some words that are not in the dictionary.
The Oddcast homepage is at http://vhost.oddcast.com/vhost_minisite/
This may be very useful as an aid to teaching sight impaired students in your courses.
May 3, 2006 reply from Stephen Field (Professor of Chinese at Trinity University)
Bob, for your information it also works when I type Chinese characters into the window.
Even the tones are correct when spoken!
In the Groove
Blackboard users should especially note Amy Dunbar's comments near the end of this message.
Comment on Groove from Bob Jensen:
It seems highly unlikely that the audio in Groove will penetrate firewalls. My guess is that the same problem that arises with free long distance telephone audio that will not penetrate our campus firewall computers. For my threads on free long distance telephone, see <http://faculty.trinity.edu/rjensen/speech.htm#LongDistance>
One time, our computer center director lowered the firewall guard to experiment with incoming long distance audio. The audio quality was disappointing. My guess is that the quality will also be questionable for off-campus audio from Groove. However, the on-campus quality is excellent according to Richard Campbell.
Original Groovy message from Richard Campbell
Late next week, I'll be starting some virtual office hours for my students. Anyone who wants to audit these randomly scheduled mini-tutorials on managerial accounting should email me at<mailto:campbell@VirtualPublishing.NET> with Groove.Net in the subject line. You also would need to download the free beta at www.groove.net Groove.Net was founded by Ray Ozzie, the developer of Lotus Notes while he was at Lotus.
Richard J. Campbellwww.VirtualPublishing.NET <http://www.VirtualPublishing.NET> <mailto:campbell@VirtualPublishing.NET>
Reply from Amy Dunbar
I went towww.groove.net <http://www.groove.net> and found the following description of Groove:
Groove is Internet software for making direct connections with the people who are important to you. With Groove you can talk, chat, instant message, draw pictures,swap photos and files, play games and browse the Web together with friends, family and co-workers -- at the same time or whenever one of you has a moment. In Groove, having conversations with context is as easy as sending an email or accessing the Web. Groove runs on Windows' PCs and uses the Internet for transporting communication among PCs.
What does "talk" and "chat" mean - audio/text or only text. Can you have audio communication (not pre-recorded) with Groove? If so, how many users?
Reply from Richard Campbell
The chat is both audio (voice over IP) and text chat. The performance of audio chat is very good. I'm not sure of performance through a firewall though. I'm not sure if there are limitations on number of users during the beta testing period. When they start charging real money, I'm sure there will be charging on the basis of file storage and number of users.
Richard J. Campbellwww.VirtualPublishing.NET <http://www.VirtualPublishing.NET> <mailto:campbell@VirtualPublishing.NET>
Reply from Amy Dunbar
Groove is worth checking out. Three faculty members here just "chatted" in a conversation space in groove. Now I'm wondering how it works over modems with the audio. Even with text chat, however, the notepad space works nicely as a "blackboard" where an instructor could go thru a solution, while carrying on a text chat in the space below the notepad. If you check the button "Navigate together" you can move through web pages together, so if you had developed a flash file, you could go through it with the students. Richard, thank you so much for bringing this product to our attention.
"Just talk to me," The Economist, December 6, 2001 --- http://www.economist.com/science/tq/displayStory.cfm?Story_ID=885022
Speech recognition: At long last, speech is becoming an important interface between man and machine. In the process, it is helping to slash costs in business, create new services on the Internet, and make cars a lot safer and easier to drive
In the early days of computing, information was put into computers by flipping switches. After this came the relative sophistication of loading programs and data by means of punched cards or punched paper-tape. These were followed in their turn by such devices as the keyboard, the mouse, the trackball, the joystick, the touchpad and the touch-sensitive screen. Throughout all this, speech—the most natural, and perhaps the most effective, interface between people and computers—has remained largely neglected. Apart from some modest developments in software for desktop dictation in the 1990s, the only time most people have talked to their computers has been when cursing them.
All this is changing. Already, speech recognition is a not-uncommon feature at the call-centres of telephone companies, financial-service providers and airlines in the United States. In Japan and Europe, meanwhile, speech recognition is being adapted for use as a hands-free input device for motor cars.
Technologies such as automatic speech recognition (ASR), speaker verification and text-to-speech generators (see article) are catching on fast. They promise to deliver access to information and services anytime and anywhere that there is telephone. With more than 1 billion phones in the world and new subscribers being added to the global networks at double-digit rates, the enthusiasm is understandable. What is really driving the enthusiasm for the technology is not just that people are used to talking over telephones and so need little encouragement or training. They have also proved themselves willing to pay a premium for such services.
Continued at http://www.economist.com/science/tq/displayStory.cfm?Story_ID=885022
Bob Jensen's threads on speech recognition are at http://www.trinity.edu/~rjensen/245glosf.htm#Speech1
Bob Jensen's Homepage is at http://faculty.trinity.edu/rjensen/