Speech-Recognition Software

Like Having a Secretary in Your PC

by David Pogue

TESTING, testing, one two three. Is this thing on?

Well, I’ll be darned. It’s really on and it’s really working. I’m wearing a headset, talking, and my PC is writing down everything I say in Microsoft Word. I’m speaking at full speed, perfectly normally except that I’m pronouncing the punctuation (comma), like this (period).

Let’s try something a little tougher. Pyridoxine hydrochloride. Antagonistic Lilliputians. Infinitesimal zithers.

Hm! Not bad.

Oh, hi, honey. Did you get to the bank before it closed? Oh, hold on, let me turn off the mike. Wouldn’t want our conversation to wind up in my column!

O.K., back again. The software I’m using is Dragon NaturallySpeaking 9.0 (, the latest version of the best-selling speech-recognition software for Windows. This software, which made its debut Tuesday, is remarkable for two reasons.

Reason 1: You don’t have to train this software. That’s when you have to read aloud a canned piece of prose that it displays on the screen ­ a standard ritual that has begun the speech-recognition adventure for thousands of people.

I can remember, in the early days, having to read 45 minutes’ worth of these scripts for the software’s benefit. But each successive version of NaturallySpeaking has required less training time; in Version 8, five minutes was all it took.

And now they’ve topped that: NatSpeak 9 requires no training at all.

I gave it a test. After a fresh installation of the software, I opened a random page in a book and read a 1,000-word passage ­ without doing any training.

The software got 11 words wrong, which means it got 98.9 percent of the passage correct. Some of those errors were forgivable, like when it heard “typology” instead of “topology.”

But Nuance says that you’ll get even better accuracy if you do read one of the training scripts, so I tried that, too. I trained the software by reading its “Alice in Wonderland” excerpt. This time, when I read the same 1,000 words from my book, only six errors popped up. That’s 99.4 percent correct.

The best part is that these are the lowest accuracy rates you’ll get, because the software gets smarter the more you use it ­ or, rather, the more you correct its errors.

You do this entirely by voice. You say, “correct ‘typology,’ ” for example; beneath that word on the screen, a numbered menu of alternate transcriptions pops up. You see that alternate 1 is “topology,” for example, so you say “choose 1.” The software instantly corrects the word, learns from its mistake and deposits your blinking insertion point back at the point where you stopped dictating, ready for more.

Over time, therefore, the accuracy improves. When I tried the same 1,000-word excerpt after importing my time-polished voice files from Version 8, I got 99.6 percent accuracy. That’s four words wrong out of a thousand ­ including, of course, “topology.”

For this reason, it doesn’t much matter whether or not you skip the initial training; the accuracy of the two approaches will eventually converge toward 100 percent.

NatSpeak 9 is remarkable for a second reason, too: it’s a new version containing very little new.

Yes, they’ve eliminated the training requirement. And yes, the new NatSpeak is 20 percent more accurate than before if you do the initial training. Then again, what’s a 20 percent improvement in a program that’s already 99.4 percent accurate ­ 99.5? That’s maybe one less error every 1,000 words.

(Nuance has done some clever engineering to wring these additional drops of accuracy out of the program. For example, the program has always used context to determine a word’s identity, taking into account the two or three words on either side of it to distinguish, say, “bear” from “bare.” The company says that Version 9 scans an even greater swath of the surrounding words.)

But the rest of the changes are minor. The top-of-the-screen toolbar has shed the squared-off Windows 3.1 look in favor of a more rounded Windows Vista look. You can now use certain Bluetooth wireless headsets for dictation, although Nuance has found only two so far that put the microphone close enough to your mouth to get clear sound. A new toolbar indicator lets you know when you’re in a “select and say” program like Word ­ that is, a program where you can highlight, manipulate and format any text you see on the screen using voice commands.

At least Nuance hasn’t gone the way of so many software companies, piling on features and complexity in hopes of winning your upgrade dollars. For the second straight revision, the company has preferred to nip and tuck, making careful and selective improvements.

Now, Nuance isn’t the only game in speech-recognition town. Microsoft says that Windows Vista, when it makes its debut next year, will come with built-in dictation software.

Nuance claims not to be worried, pointing out that Vista will understand only English. NatSpeak, on the other hand, is available in French, Italian, German, Spanish, Dutch, Japanese, British English and “World English,” which can handle South African, Southeast Asian and Australian accents.

NatSpeak is also available in a range of versions for the American market, including medical and legal incarnations. Mere mortals will probably want to consider either the Standard version ($100) or the Preferred version ($200), each of which comes with a headset. Both offer the same accuracy.

The Preferred edition, however, offers several shiny bells and whistles. One of them is transcription from a digital pocket voice recorder. This approach doesn’t provide the same accuracy as a headset, and it requires what today is considered an excruciating amount of training reading: at least 15 minutes. But it does free you from dictating at the computer.

The Preferred perk is voice macros, where you teach it to type one thing when you say another. For example, you can say “forget it” and have the software spit out, “Thank you so much for your inquiry. Unfortunately, after much consideration, we regret that we must decline your application at this time.”

There’s also a $900 version called Professional, which offers, among other advanced features, complete control over your PC by voice; it can even set in motion elaborate multi-step automated tasks.

NatSpeak also runs beautifully on the Macintosh. The setup is a bit involved: you need a recent Intel-based Mac, Apple’s free Boot Camp utility, a copy of Windows XP, and a U.S.B. adapter on your headset. And you have to restart the Mac in Windows each time you want to use NatSpeak. But if you can look past all that fine print, NatSpeak on Macintosh is extremely fast and accurate.

If that sounds like too much effort, there is a Macintosh-only alternative: iListen ($130 with headset). Version 1.7, newly adapted for Intel Macs, offers better accuracy and a shorter training time than previous versions, though nothing like the sophistication or accuracy of NatSpeak. After 30 minutes of training, the program made 42 mistakes in my 1,000-word book excerpt, which the company says is better than average.

As for NaturallySpeaking: if you’re already using Version 8, it’s probably not worth upgrading to Version 9. Most people will find the changes to be too few and too subtle.

But if you’re among the thousands who have abandoned dictation software in the past, it’s a different story. Version 9 is a stronger argument than ever that for anyone who can’t or doesn’t like to type, dictation software is ready for prime time; the state of this art has attained nearly “Star Trek” polish.

Excuse me ­ what, honey?

O.K., I’m just finishing up here; I’ll be right down. Let me just turn my mike off.


