You can download the archive containing the needed files at http://www.r-kraft.com/french.tgz
I'm working on french voice recognition, i have some results but not enough accurate at the moment.
The key to make Sphinx work is in the makefile.in and makefile.am that have to be modified to integrate the targeted foreign language, changing language on the fly is not possible, at least with the code i have.
The documentation is very rich but also spreading in every way and sometimes showing big holes so not easy to have a global overview about Sphinx
Archives of french language can be found at the Le Mans university under the LIUM name.
Seems that the train part is the other key to have accurate recognition but with a bad microphone not that easy.
I'm also looking inside of perlbox which is a good point to start from.
This sound interesting! I've been planning on voice recognition, but I really started with hardware part with some noise cancellation.
I have blackberry pearl phone that recognizes the voice pretty well, but completely fails in noisy environment. I want to avoid such problems by developing hardware solution with at least 2 microphones.
EPIA TC 1G 256MB 60GB Linux,WindowMaker, Roadnav, Xine, XMMS, iGuidance3
Lilliput 8", Pharos i360, WUSB11v2.6 WiFi
Hardware solution is interesting indeed, it offers straight targetting but it loses scalability while software solution is more flexible.
Sphinx2 is really able to operate voice recognition without you previously train it, in other words, it works out of the box, at least for english and spanish language, this means that it recognizes not the voice wave as many competitors do, it make a real sentence content analysis.
You can find more informations about the Sphinx project at http://cmusphinx.sourceforge.net/
Now there is another axis which can enter in conjunction with voice recognition, it's the lip's reading, it's known under the multimodal recognition. I know that there were some experiments with Sphinx but i saw this one or two years ago, i don't know if some progress has been done, probably yes
The advantage of this technic is that it allows far best recognition because it compares what has been recognised with sound and what has been recognized with lip's reading.
It works with models of mouth's shape through vectors. Camera is watching you and make it's own recognition.
Last thing, i can suggest beginners to try Sphinx2 and perlbox-voice, it can give a good approach.
cmu also works on pocketsphinx, which not not nearly as archaic as sphinx2 was, their current codebase is currently focused around sphinx3/pocketsphinx/sphinxbase, and any new features would be added to that, i've built language models for pocketsphinx based on the telephone bandwidth models that do a good job picking out parts of voice apps for an ivr.
Thank you for the information, i saw pocketsphinx on the CMU website but as the target was not a pocket computer i left it aside, I'll give a try to pocketsphinx.
Sphinx 2 even if more archaic is far faster than Sphinx 3.
In the documents it appears that :
Due to curiosity i tried Sphinx3 as well but the response time was not accurate to react fast enough to events.
pocketsphinx testing results:
i built a basic language model for pocketsphinx today with phrases that would likely be used for car navigation.
with pocketsphinx_continuous the results weren't very good with background noise, i only had default settings enabled though, so it could be tweaked to work better.
pocketsphinx_ptt worked very well though, while i was listening to music it understood everything i said and i had a friend try it out, and they had no problems with it. pocketsphix_ptt is a setup where you push a button, say something, then push the button again. which is a good approach for in a vehicle where you could be carrying on conversations, listening to music, or just not want it to pick up random things. consecutive number dialing worked well out of one model, but i think pocketsphinx supports changeing models during run time, if so we could have a number entry mode, or even seperate models for each type of action we want if recognition levels are acceptable to the average setup
i'm currently using a junk radioshack $10 mic sitting in front of me, i plan to use a bluetooth cellphone headset though in a week or 2.
if people start posting word lists they'd find useful i can post some slimed down models for recognizing only the words we want that should work pretty well
This is mainly due to the hysteresis needed before sphinx starts to decode. In example if you make some noise to "open" the active listening, sphinx will recognize fine after it failed anaysing the noise.
Perlbox you can find at http://www.perlbox.org, adds advantages of making sphinx behave like you were using ptt without having to bother with buttons and ptt mode, only declare a "magic" keyword to make sphinx start to really listen and you should have same results as using the ptt mode.
In other hand i don't know if perlbox would run with the pocketsphinx.
I don't have enough time in my hands to play with all this but if you have a bit more time than me why not give a try to perlbox ?