Practical speech recognition options for non-Windows operating systems are few. Yet after years of overuse my hands needed a break from the keyboard and mouse. After a few months with SphinxKeys on Linux, some experiments with Simon Listens, and reading about the limitations of Dragon Dictate for Mac the only viable option was to return to Microsoft’s OS.
As a child in the early 1990’s I grew accustomed to Microsoft‘s DOS and consumer editions of Windows. College and a job at a very Apple-friendly company led to spending a lot more time with Linux, enterprise Windows, and OS X. As newer versions offered speech recognition and text-to-speech I toyed with these features like everything else. Sadly those brief trials left me with the impression that they were not ready for everyday use. Years later, typing and mousing around had caught up with me. In late 2013 there was no denying it was time to revisit speech recognition, and much more seriously.
By this point I was years into Linux and loving it: powerful shells, federated package management, light resource usage, lots of software choices … besides voice input. While Linux has several speech tools they all seemed impractical:
- IBM’s ViaVoice was sold and died out
- Palaver sends voice data through Google, incompatible with my job requirements
- Platypus didn’t work with my version of Dragon
- Simon Listens was cumbersome and never worked for me
- SphinxKeys only simulated keystroke input
- Vedics didn’t compile and seems out-of-date
There are more options on Linux. Though, after trying so many I had already found more success with Windows.
Around this time an old copy of Dragon NaturallySpeaking (circa 2007) turned up at a local thrift shop. Spending some time with it revealed how useful the different modes were, showed the promise of the software development kit, and piqued my curiosity into the tools others had built on it. Sadly it didn’t support 64-bit and integration into existing software was very limited. Apart from Microsoft Office it didn’t have a lot to offer out of the box. Reviews of later versions seemed to reaffirm that the software wasn’t going to work for my needs.
Microsoft began offering Windows Speech Recognition with Windows Vista. And after using it for a few months on Windows 7 I can say it does a passable job with a good, properly configured microphone. Integration with built software like Internet Explorer and Windows Live Mail is solid. Other applications like Miranda IM work reasonably well too. Too bad most fall back to the annoying, if usable, dictation pad. Patience and persist help in the hunt for the most practical solutions.
WSR can be resource intensive. My computer’s memory usage climbs a bit. Things also get slower as I keep many programs open. Using a lot of tabs in IE or Firefox caused the most slowdown; making scrolling a chore. Underpowered computers like netbooks, Celeron-equipped laptops, or older desktops only served to disappointed. Your mileage may vary.
While WSR works alright as is it really needs customization options to fit a wider variety of workflows. There are a few tools out there:
The first two also offer versions that work with Nuance’s Dragon products which helps avoid lock in. At this point I’ve settled for WSR Macros with some AutoIt tweaks to get voice clicking without the mouse grid and other things.
Today my voice does about 15% of the work. It helps most with e-mail, instant messaging, blogging, clicking, and window management. After seeing Tavis Rudd‘s presentation on programming by voice I hope to achieve a similar proficiency. Until then the experiments will continue as time permits.
Have you ever tried speech recognition? What did you think? If you’d like to share please comment.