Voice Recognition - This article was originally published in the Campbell Law Observer, a monthly legal newsletter published by the Campbell University School of Law in Buies Creek, N.C. To subscribe, contact Shannon Vandiver at (910) 893-1798.

Voice Recognition

What it can ----- and still can’t do for you

Hello to all. I trust everyone made it to the year 2000 safe and sound. Y2K has done whatever damage it will do and civilization has survived. Now that we are sure technology will continue to dominate our near future I wanted to caution you to scrutinize your documents and programs closely in the next few weeks. Y2K can affect computers in insidious ways as well as shut them down completely. Make sure calculations are verified in spreadsheets, accounting and real estate programs. Any program that prorates or calculates based on a time period could be affected. IT IS NOT SUFFICIENT THAT THE PROGRAM CONTINUES TO OPERATE! The question is whether it is operating properly. Hopefully, the answer is yes for all of you.

Voice recognition programs are very resource intensive programs, that is, they need a LOT of computer power to operate properly. Development has been, and still is, hampered by the relative lack of power available in the personal computer. We have come a long way but computers still do not begin to approach the power of an infant child’s brain. With that said, there is a place in today’s law office for voice recognition software, as long as you understand its limits and costs.

Remember our mantra:

Why
do I need it?
What does it do?
How does it work?
When should I implement it?
Who should be using it?
Where do I obtain it?

Why do I need it?

You may not need it, yet. Voice recognition will continue to evolve as computers become more powerful. It will be gradually integrated into every computer program available. For this reason alone, it is important you do not casually dismiss it as a gimmick or luxury.

Windows arrived and many resisted. If you are not in Windows by now you are losing money. Why? Because there are software programs available to Windows users which are not available to DOS or LINUX users. Software is the path to ROI (Return On Investment).

Hardware is analogous to the roads, bridges, and highways across the nation. Software is analogous to the vehicle that takes advantage of it and makes it valuable. Without the vehicle, the roads become mere decoration.

Like the vehicles in which we have traveled, software has evolved from walking to the equivalent of the early automobile. Software has not yet achieved the equivalent of air travel or space travel. Voice recognition will be the software equivalent of man’s leap into air travel. First, it will provide access to the power of computers to those who cannot type. Eventually, it will do the same for everyone. The science fiction of talking with computers is only a decade or so away.

Still, since the first effect will be to give access to those who want to use computers efficiently but cannot type efficiently, those who can type will not benefit to a large degree over the next few years. Of course, there is the specter of carpal tunnel syndrome to worry about and this may be reason enough to make it available to staff who otherwise may not need it.

Another aspect of the transition to voice can be seen by looking at the mouse. As Windows software developed, the integration of the mouse into software operation rose to an entirely new level. Whereas the mouse was an afterthought to the keyboard in DOS development, the mouse has now become the primary method to control programs in the Windows world. The keyboard is used only for data entry by a majority of computer users today.
The same thing will happen as voice gradually moves into mainstream usage. The mouse (and keyboard) will eventually take a backseat to the power of voice command.

Now you may be wondering, “Gee, if I can use my voice to run the computer then every thing will be easy to operate. I can just wait for the day when the computer talks to me.” WRONG. Unfortunately, it will take some amount of time until computers can truly carry on a conversation with humans. Until then, the power of voice will be an advantage but will require a little bit of work on your part.

What does it do?

Voice recognition (VR) software takes sounds and attempts to match them up with a standardized version of a language. When you say “well”, the computer takes the raw sound, compares it to a dictionary of recorded American English words and tries to match it up, all in about a 1/100th of a second. As you can imagine, as the dictionary becomes larger, the computer needs more power.

There are also a couple of other wrinkles involved. Slang, pronunciation, enunciation, homonyms, and background noise are just a few of them. All of the VR products on the market allow you to add a limited number of words to a custom dictionary. This generally takes care of the few slang words necessary.

Pronunciation and enunciation, however, are much bigger problems. Since the computer is not listening to a word and trying to make sense of its meaning (not yet anyway), regional accents change the raw sound from the “standard” on which the VR product is based. To help overcome this, the newest products incorporate various regional accents into the product and try to match you up with one during the training phase of the installation.

Unfortunately, there are many variations in regional accents and to hone the computers ability to match up your voice with its dictionary, training is a necessary evil. This is the stage where most of us fail to make the effort required in order for our voice software to function properly.

Training is performed in two stages. First, you read a pre-determined selection to the computer. It compares what you say with the sound dictionary that came from the programmer. It then makes adjustments for your pronunciation and enunciation. Obviously, the more you read the more refinement of the sound dictionary. This reading stage can take up to several hours.
The second stage is the correction stage. This is where we usually fail to keep up. We are to read something to the computer of our choice only this time we are supposed to monitor the result on the screen. If the program misinterprets a word or phrase, we are to immediately correct it using the tools provided. Depending on how close your diction fits the “standard” there can be a LOT of correction necessary before the program begins to reach an acceptable accuracy.

Performing the correction is aggravating. It interrupts the flow and causes you to lose your train of thought. For this reason, I advise you not to attempt to use the product for real work until you have taken the time to do correction for at least an hour or two. Read from your current novel or textbook. Read from cases you need to read anyway. This makes the correction stage a lot easier to tolerate.

Last of all, don’t be discouraged by previous versions inability to handle correction properly. All of the major vendors have made tremendous strides with their respective products. In other words, they actually work now.

How does it operate?

Once you have gotten the program trained for your voice you are now ready to learn how to use the program. Yes, you read correctly, learn to use the program. Unfortunately, current VR software is not ‘smart’ enough to figure out context.
This means that homonyms will cause some problems although newer software is getting more intelligent all the time. For example, suppose you dictate, “…and the Guarantor hereby affirms he has read the foregoing …”. The VR program must decide whether you mean ‘red’ or ‘read’. Remember it is only listening for the sound. Newer programs have rules which help the computer decide but homonym errors are still common.

Another requirement in using the program is telling the program when to put in punctuation. Since the software isn’t yet able to understand context, it can’t tell when a sentence ends. Fortunately, most VR programs have adopted standard dictation practices so it is not much different from standard dictation. Unfortunately, many of us don’t use standard dictation methods. We rely on our secretary’s computer (i.e. brain) to figure out punctuation, capitalization, line spacing and other formatting. Consequently, there is often a learning curve involved here as well.

Once we have trained the software and learned how to use it we are ready to go to work. (Finally!) We have two modes
for operating the program. Voice Control and Voice Dictation. In other words, we can choose between controlling the program we are using (i.e. File Open, File Save, Select All, Delete, etc.) and dictating into the program. There is usually both a voice and a keyboard command that switches between the two modes of operation.

When should I implement it?

The implementation needs to be driven by how beneficial the software will be to the user balanced by how much time the user has to devote to voice training and user training. If the lawyer is in the middle of a big case or a flurry of estate plans then it is unlikely they will devote the necessary time to training. The best method is to actually schedule the training as if it was a client. This clearly establishes the cost (lost billable hours) and clears out the necessary time. Unlike the setup of case management and document assembly software, it cannot be farmed out to anyone else.

Who should be using it?

This is easy. Anyone who needs to use the computer software but is inefficient because they cannot type well. This could include certain paralegals and would include most lawyers. As discussed above, it could also include certain injured or disabled individuals.

As with all software purchases the cost-benefit must be weighed on a case by case basis. Virtually all lawyers will derive substantial benefit once they learn how to use the software.

Where do I obtain it?

This is a trickier question than it first appears. There are so many different programs and versions available. First, let’s establish something we all know but generally refuse to acknowledge. You get what you pay for. Of course, this means we can get less value than we deserve for what we spend but we almost never get more. This is especially true with computer software. If you buy junk, you get junk.

The major VR software providers are Dragon, L & H, IBM & Kurzweil. All of them use the same underlying programming code and then make changes in the user interface. All of them have a cheap version of their software that is intended for recreational home use. Please don’t buy it expecting it to work the same as the business level version. Most have a ‘legal’ version which means it has an additional vocabulary for the legal field. In other words, a legal specific sound dictionary is added. Generally, the additional cost outweighs the usefulness of the additional dictionary. Most lawyers are moving away from ‘legalese’ so buy the best non-legal specific dictionary and you should be fine. If you practice in an area that still requires legal jargon then you will need to ante up to the ‘legal’ edition of the software.

One last consideration is the other software in use on your system. Some software packages have chosen a specific product and will not work with any others. Find out about this interaction before making your choice.

Conclusion

Voice recognition will be one of the hottest technologies for the next few years. It will eventually become the ‘preferred’ method for manipulating your computer and its software. By keeping up as VR software evolves, your firm will reap the benefits of the efficiency it provides while avoiding large learning curves down the road. Don’t be an ostrich. Take your head out of the sand and look around. Your computer may be trying to talk to you.

Lee D. Cumbie is the founder of Cumbie Law Office Automation Consulting, one of the Technology Assistance Program consultants for the N.C. Bar Assn. He is also an Adjunct Professor of Law at Campbell University where he teaches the Law Firm Computer Lab course. Lee is a member of Tyson & Cumbie, PLLC in Fayetteville, N.C.
Lee earned his B.S. degree from Regents College after military service in the U.S. Navy. He earned his J.D. from Campbell University, cum laude, in 1997.

Resources to get you started:

Information:

NetsaversCenter Y2K Section www.suttondesigns.com/NetsaversCenter/Y2K/Y2K-Links.html
Year 2000 Software Windowing Solutions www.suttondesigns.com/NetsaversCenter/index.htm
PC Magazine Online www.pcmag.com/y2k
PC Magazine October 6, 1998 "What To Do About the Year 2000", Jim Seymour, p. 100
Ziff-Davis ZDNet
www.zdnet.com/y2k
Year 2000 Information Center www.year2000.com
Legal/Management Issues www.y2k.com

Programs:

Netsavers Y2K TSR Scanner Kit, V. 4.0.1 www.suttondesigns.com/NetsaversCenter/Y2K/NetY2K
The RighTime Clock Company www.rightime.com
DMX II www.dmx2.com
Computer Experts Ltd www.computerexperts.co.uk
UniComp Products www.unicomp-products.com
Network Associates www.nai.com