Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: voice recognition

  1. #1
    Maximum Bitrate
    Join Date
    Aug 2004
    Location
    at home
    Posts
    591

    voice recognition

    I'm working on french voice recognition, i have some results but not enough accurate at the moment.
    The key to make Sphinx work is in the makefile.in and makefile.am that have to be modified to integrate the targeted foreign language, changing language on the fly is not possible, at least with the code i have.
    The documentation is very rich but also spreading in every way and sometimes showing big holes so not easy to have a global overview about Sphinx
    Archives of french language can be found at the Le Mans university under the LIUM name.
    Seems that the train part is the other key to have accurate recognition but with a bad microphone not that easy.
    I'm also looking inside of perlbox which is a good point to start from.

  2. #2
    Maximum Bitrate
    Join Date
    Aug 2004
    Location
    at home
    Posts
    591
    You can download the archive containing the needed files at http://www.r-kraft.com/french.tgz

  3. #3
    Constant Bitrate
    Join Date
    Jun 2006
    Location
    Chicago, IL
    Posts
    143
    This sound interesting! I've been planning on voice recognition, but I really started with hardware part with some noise cancellation.
    I have blackberry pearl phone that recognizes the voice pretty well, but completely fails in noisy environment. I want to avoid such problems by developing hardware solution with at least 2 microphones.
    EPIA TC 1G 256MB 60GB Linux,WindowMaker, Roadnav, Xine, XMMS, iGuidance3
    Lilliput 8", Pharos i360, WUSB11v2.6 WiFi

  4. #4
    Maximum Bitrate
    Join Date
    Aug 2004
    Location
    at home
    Posts
    591
    Hardware solution is interesting indeed, it offers straight targetting but it loses scalability while software solution is more flexible.

    Sphinx2 is really able to operate voice recognition without you previously train it, in other words, it works out of the box, at least for english and spanish language, this means that it recognizes not the voice wave as many competitors do, it make a real sentence content analysis.
    You can find more informations about the Sphinx project at http://cmusphinx.sourceforge.net/

    Now there is another axis which can enter in conjunction with voice recognition, it's the lip's reading, it's known under the multimodal recognition. I know that there were some experiments with Sphinx but i saw this one or two years ago, i don't know if some progress has been done, probably yes
    The advantage of this technic is that it allows far best recognition because it compares what has been recognised with sound and what has been recognized with lip's reading.
    It works with models of mouth's shape through vectors. Camera is watching you and make it's own recognition.
    Last thing, i can suggest beginners to try Sphinx2 and perlbox-voice, it can give a good approach.

  5. #5
    Maximum Bitrate
    Join Date
    Aug 2004
    Location
    at home
    Posts
    591
    some more interesting informations
    http://www.cs.umu.se/education/exami...zRatajczak.pdf

  6. #6
    Maximum Bitrate
    Join Date
    Aug 2004
    Location
    at home
    Posts
    591
    now that you probably red the documentations, here some more stuff

    http://sourceforge.net/projects/opencvlibrary/

    and finally sphinx + this library = robust voice recognition

  7. #7
    Low Bitrate
    Join Date
    Sep 2006
    Posts
    80

    pocketsphinx may interest you

    cmu also works on pocketsphinx, which not not nearly as archaic as sphinx2 was, their current codebase is currently focused around sphinx3/pocketsphinx/sphinxbase, and any new features would be added to that, i've built language models for pocketsphinx based on the telephone bandwidth models that do a good job picking out parts of voice apps for an ivr.

    www.pocketsphinx.org

  8. #8
    Maximum Bitrate
    Join Date
    Aug 2004
    Location
    at home
    Posts
    591
    Thank you for the information, i saw pocketsphinx on the CMU website but as the target was not a pocket computer i left it aside, I'll give a try to pocketsphinx.

    Sphinx 2 even if more archaic is far faster than Sphinx 3.

    In the documents it appears that :
    Sphinx4>Sphinx2>Sphinx3

    Due to curiosity i tried Sphinx3 as well but the response time was not accurate to react fast enough to events.

  9. #9
    Low Bitrate
    Join Date
    Sep 2006
    Posts
    80
    pocketsphinx testing results:

    i built a basic language model for pocketsphinx today with phrases that would likely be used for car navigation.

    with pocketsphinx_continuous the results weren't very good with background noise, i only had default settings enabled though, so it could be tweaked to work better.

    pocketsphinx_ptt worked very well though, while i was listening to music it understood everything i said and i had a friend try it out, and they had no problems with it. pocketsphix_ptt is a setup where you push a button, say something, then push the button again. which is a good approach for in a vehicle where you could be carrying on conversations, listening to music, or just not want it to pick up random things. consecutive number dialing worked well out of one model, but i think pocketsphinx supports changeing models during run time, if so we could have a number entry mode, or even seperate models for each type of action we want if recognition levels are acceptable to the average setup

    i'm currently using a junk radioshack $10 mic sitting in front of me, i plan to use a bluetooth cellphone headset though in a week or 2.

    if people start posting word lists they'd find useful i can post some slimed down models for recognizing only the words we want that should work pretty well

  10. #10
    Maximum Bitrate
    Join Date
    Aug 2004
    Location
    at home
    Posts
    591
    This is mainly due to the hysteresis needed before sphinx starts to decode. In example if you make some noise to "open" the active listening, sphinx will recognize fine after it failed anaysing the noise.

    Perlbox you can find at http://www.perlbox.org, adds advantages of making sphinx behave like you were using ptt without having to bother with buttons and ptt mode, only declare a "magic" keyword to make sphinx start to really listen and you should have same results as using the ptt mode.
    In other hand i don't know if perlbox would run with the pocketsphinx.
    I don't have enough time in my hands to play with all this but if you have a bit more time than me why not give a try to perlbox ?

Page 1 of 2 12 LastLast

Similar Threads

  1. Voice recognition broken on 1.0.8.1
    By spaceskater in forum StreetDeck Bug Reports
    Replies: 7
    Last Post: 06-22-2007, 03:25 PM
  2. SpeakEasy Voice Recognition - Beta 1
    By ruairi in forum Road Runner
    Replies: 31
    Last Post: 08-22-2006, 11:12 PM
  3. SpeakEasy Voice Recognition - Release 0.9.2
    By ruairi in forum Software & Software Development
    Replies: 10
    Last Post: 11-22-2005, 03:29 PM
  4. Voice Recognition Whilst Playing Music?
    By konnichiwa in forum Software & Software Development
    Replies: 7
    Last Post: 09-30-2005, 01:54 PM
  5. Voice recognition problems. WinXP/Navivoice.
    By mikebaz in forum Software & Software Development
    Replies: 5
    Last Post: 08-07-2005, 03:44 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •