No announcement yet.

voice recognition

  • Filter
  • Time
  • Show
Clear All
new posts

  • voice recognition

    I'm working on french voice recognition, i have some results but not enough accurate at the moment.
    The key to make Sphinx work is in the and that have to be modified to integrate the targeted foreign language, changing language on the fly is not possible, at least with the code i have.
    The documentation is very rich but also spreading in every way and sometimes showing big holes so not easy to have a global overview about Sphinx
    Archives of french language can be found at the Le Mans university under the LIUM name.
    Seems that the train part is the other key to have accurate recognition but with a bad microphone not that easy.
    I'm also looking inside of perlbox which is a good point to start from.

  • #2
    You can download the archive containing the needed files at


    • #3
      This sound interesting! I've been planning on voice recognition, but I really started with hardware part with some noise cancellation.
      I have blackberry pearl phone that recognizes the voice pretty well, but completely fails in noisy environment. I want to avoid such problems by developing hardware solution with at least 2 microphones.
      EPIA TC 1G 256MB 60GB Linux,WindowMaker, Roadnav, Xine, XMMS, iGuidance3
      Lilliput 8", Pharos i360, WUSB11v2.6 WiFi


      • #4
        Hardware solution is interesting indeed, it offers straight targetting but it loses scalability while software solution is more flexible.

        Sphinx2 is really able to operate voice recognition without you previously train it, in other words, it works out of the box, at least for english and spanish language, this means that it recognizes not the voice wave as many competitors do, it make a real sentence content analysis.
        You can find more informations about the Sphinx project at

        Now there is another axis which can enter in conjunction with voice recognition, it's the lip's reading, it's known under the multimodal recognition. I know that there were some experiments with Sphinx but i saw this one or two years ago, i don't know if some progress has been done, probably yes
        The advantage of this technic is that it allows far best recognition because it compares what has been recognised with sound and what has been recognized with lip's reading.
        It works with models of mouth's shape through vectors. Camera is watching you and make it's own recognition.
        Last thing, i can suggest beginners to try Sphinx2 and perlbox-voice, it can give a good approach.


        • #5
          some more interesting informations


          • #6
            now that you probably red the documentations, here some more stuff


            and finally sphinx + this library = robust voice recognition


            • #7
              pocketsphinx may interest you

              cmu also works on pocketsphinx, which not not nearly as archaic as sphinx2 was, their current codebase is currently focused around sphinx3/pocketsphinx/sphinxbase, and any new features would be added to that, i've built language models for pocketsphinx based on the telephone bandwidth models that do a good job picking out parts of voice apps for an ivr.



              • #8
                Thank you for the information, i saw pocketsphinx on the CMU website but as the target was not a pocket computer i left it aside, I'll give a try to pocketsphinx.

                Sphinx 2 even if more archaic is far faster than Sphinx 3.

                In the documents it appears that :

                Due to curiosity i tried Sphinx3 as well but the response time was not accurate to react fast enough to events.


                • #9
                  pocketsphinx testing results:

                  i built a basic language model for pocketsphinx today with phrases that would likely be used for car navigation.

                  with pocketsphinx_continuous the results weren't very good with background noise, i only had default settings enabled though, so it could be tweaked to work better.

                  pocketsphinx_ptt worked very well though, while i was listening to music it understood everything i said and i had a friend try it out, and they had no problems with it. pocketsphix_ptt is a setup where you push a button, say something, then push the button again. which is a good approach for in a vehicle where you could be carrying on conversations, listening to music, or just not want it to pick up random things. consecutive number dialing worked well out of one model, but i think pocketsphinx supports changeing models during run time, if so we could have a number entry mode, or even seperate models for each type of action we want if recognition levels are acceptable to the average setup

                  i'm currently using a junk radioshack $10 mic sitting in front of me, i plan to use a bluetooth cellphone headset though in a week or 2.

                  if people start posting word lists they'd find useful i can post some slimed down models for recognizing only the words we want that should work pretty well


                  • #10
                    This is mainly due to the hysteresis needed before sphinx starts to decode. In example if you make some noise to "open" the active listening, sphinx will recognize fine after it failed anaysing the noise.

                    Perlbox you can find at, adds advantages of making sphinx behave like you were using ptt without having to bother with buttons and ptt mode, only declare a "magic" keyword to make sphinx start to really listen and you should have same results as using the ptt mode.
                    In other hand i don't know if perlbox would run with the pocketsphinx.
                    I don't have enough time in my hands to play with all this but if you have a bit more time than me why not give a try to perlbox ?


                    • #11
                      perlbox voice is based on sphinx2.x which is inferior to pocketsphinx and sphinx3. once getting a system in place changing from button push to constant recognition is just a few lines of code, its not like pocketsphinx is unable to to a keyword system like perlbox did, i'll take a look at their code again soon and see how the did it.


                      • #12
                        Hmmm, there is one little thing, what do we do if we have music playing, and we need to "start up" the recognition? I think that's why car manufacturers opted for button.
                        EPIA TC 1G 256MB 60GB Linux,WindowMaker, Roadnav, Xine, XMMS, iGuidance3
                        Lilliput 8", Pharos i360, WUSB11v2.6 WiFi