The MP3car.com Store  

Welcome to the MP3Car.com forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. Registering will also remove advertisements. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.

Go Back   MP3Car.com > Mp3Car Technical > Linux

Reply
 
Thread Tools Display Modes
Old 07-20-2007, 01:03 AM   #1
Maximum Bitrate
 
Join Date: Aug 2004
Location: at home
Posts: 554
My Photos: (0)
voice recognition

I'm working on french voice recognition, i have some results but not enough accurate at the moment.
The key to make Sphinx work is in the makefile.in and makefile.am that have to be modified to integrate the targeted foreign language, changing language on the fly is not possible, at least with the code i have.
The documentation is very rich but also spreading in every way and sometimes showing big holes so not easy to have a global overview about Sphinx
Archives of french language can be found at the Le Mans university under the LIUM name.
Seems that the train part is the other key to have accurate recognition but with a bad microphone not that easy.
I'm also looking inside of perlbox which is a good point to start from.
__________________
R-Kraft
kraft is offline   Reply With Quote
Sponsored Links
Old 07-20-2007, 01:17 AM   #2
Maximum Bitrate
 
Join Date: Aug 2004
Location: at home
Posts: 554
My Photos: (0)
You can download the archive containing the needed files at http://www.r-kraft.com/french.tgz
__________________
R-Kraft
kraft is offline   Reply With Quote
Old 07-20-2007, 07:11 AM   #3
Constant Bitrate
 
Join Date: Jun 2006
Location: Chicago, IL
Vehicle: 2000 VW Jetta VR6
Posts: 144
My Photos: (3)
This sound interesting! I've been planning on voice recognition, but I really started with hardware part with some noise cancellation.
I have blackberry pearl phone that recognizes the voice pretty well, but completely fails in noisy environment. I want to avoid such problems by developing hardware solution with at least 2 microphones.
__________________
EPIA TC 1G 256MB 60GB Linux,WindowMaker, Roadnav, Xine, XMMS, iGuidance3
Lilliput 8", Pharos i360, WUSB11v2.6 WiFi
dupa2 is offline   Reply With Quote
Old 07-20-2007, 10:59 AM   #4
Maximum Bitrate
 
Join Date: Aug 2004
Location: at home
Posts: 554
My Photos: (0)
Hardware solution is interesting indeed, it offers straight targetting but it loses scalability while software solution is more flexible.

Sphinx2 is really able to operate voice recognition without you previously train it, in other words, it works out of the box, at least for english and spanish language, this means that it recognizes not the voice wave as many competitors do, it make a real sentence content analysis.
You can find more informations about the Sphinx project at http://cmusphinx.sourceforge.net/

Now there is another axis which can enter in conjunction with voice recognition, it's the lip's reading, it's known under the multimodal recognition. I know that there were some experiments with Sphinx but i saw this one or two years ago, i don't know if some progress has been done, probably yes
The advantage of this technic is that it allows far best recognition because it compares what has been recognised with sound and what has been recognized with lip's reading.
It works with models of mouth's shape through vectors. Camera is watching you and make it's own recognition.
Last thing, i can suggest beginners to try Sphinx2 and perlbox-voice, it can give a good approach.
__________________
R-Kraft
kraft is offline   Reply With Quote
Old 07-22-2007, 04:38 AM   #5
Maximum Bitrate
 
Join Date: Aug 2004
Location: at home
Posts: 554
My Photos: (0)
some more interesting informations
http://www.cs.umu.se/education/exami...zRatajczak.pdf
__________________
R-Kraft
kraft is offline   Reply With Quote
Old 07-22-2007, 04:53 AM   #6
Maximum Bitrate
 
Join Date: Aug 2004
Location: at home
Posts: 554
My Photos: (0)
now that you probably red the documentations, here some more stuff

http://sourceforge.net/projects/opencvlibrary/

and finally sphinx + this library = robust voice recognition
__________________
R-Kraft
kraft is offline   Reply With Quote
Old 07-28-2007, 06:22 PM   #7
Low Bitrate
 
Join Date: Sep 2006
Posts: 63
My Photos: (2)
pocketsphinx may interest you

cmu also works on pocketsphinx, which not not nearly as archaic as sphinx2 was, their current codebase is currently focused around sphinx3/pocketsphinx/sphinxbase, and any new features would be added to that, i've built language models for pocketsphinx based on the telephone bandwidth models that do a good job picking out parts of voice apps for an ivr.

www.pocketsphinx.org
wirelessdreamer is offline   Reply With Quote
Old 07-29-2007, 03:03 AM   #8
Maximum Bitrate
 
Join Date: Aug 2004
Location: at home
Posts: 554
My Photos: (0)
Thank you for the information, i saw pocketsphinx on the CMU website but as the target was not a pocket computer i left it aside, I'll give a try to pocketsphinx.

Sphinx 2 even if more archaic is far faster than Sphinx 3.

In the documents it appears that :
Sphinx4>Sphinx2>Sphinx3

Due to curiosity i tried Sphinx3 as well but the response time was not accurate to react fast enough to events.
__________________
R-Kraft
kraft is offline   Reply With Quote
Old 08-31-2007, 05:40 PM   #9
Low Bitrate
 
Join Date: Sep 2006
Posts: 63
My Photos: (2)
pocketsphinx testing results:

i built a basic language model for pocketsphinx today with phrases that would likely be used for car navigation.

with pocketsphinx_continuous the results weren't very good with background noise, i only had default settings enabled though, so it could be tweaked to work better.

pocketsphinx_ptt worked very well though, while i was listening to music it understood everything i said and i had a friend try it out, and they had no problems with it. pocketsphix_ptt is a setup where you push a button, say something, then push the button again. which is a good approach for in a vehicle where you could be carrying on conversations, listening to music, or just not want it to pick up random things. consecutive number dialing worked well out of one model, but i think pocketsphinx supports changeing models during run time, if so we could have a number entry mode, or even seperate models for each type of action we want if recognition levels are acceptable to the average setup

i'm currently using a junk radioshack $10 mic sitting in front of me, i plan to use a bluetooth cellphone headset though in a week or 2.

if people start posting word lists they'd find useful i can post some slimed down models for recognizing only the words we want that should work pretty well
wirelessdreamer is offline   Reply With Quote
Old 09-02-2007, 03:13 PM   #10
Maximum Bitrate
 
Join Date: Aug 2004
Location: at home
Posts: 554
My Photos: (0)
This is mainly due to the hysteresis needed before sphinx starts to decode. In example if you make some noise to "open" the active listening, sphinx will recognize fine after it failed anaysing the noise.

Perlbox you can find at http://www.perlbox.org, adds advantages of making sphinx behave like you were using ptt without having to bother with buttons and ptt mode, only declare a "magic" keyword to make sphinx start to really listen and you should have same results as using the ptt mode.
In other hand i don't know if perlbox would run with the pocketsphinx.
I don't have enough time in my hands to play with all this but if you have a bit more time than me why not give a try to perlbox ?
__________________
R-Kraft
kraft is offline   Reply With Quote
Sponsored Links
Old 09-03-2007, 10:50 AM   #11
Low Bitrate
 
Join Date: Sep 2006
Posts: 63
My Photos: (2)
perlbox voice is based on sphinx2.x which is inferior to pocketsphinx and sphinx3. once getting a system in place changing from button push to constant recognition is just a few lines of code, its not like pocketsphinx is unable to to a keyword system like perlbox did, i'll take a look at their code again soon and see how the did it.
wirelessdreamer is offline   Reply With Quote
Old 09-04-2007, 07:56 PM   #12
Constant Bitrate
 
Join Date: Jun 2006
Location: Chicago, IL
Vehicle: 2000 VW Jetta VR6
Posts: 144
My Photos: (3)
Hmmm, there is one little thing, what do we do if we have music playing, and we need to "start up" the recognition? I think that's why car manufacturers opted for button.
__________________
EPIA TC 1G 256MB 60GB Linux,WindowMaker, Roadnav, Xine, XMMS, iGuidance3
Lilliput 8", Pharos i360, WUSB11v2.6 WiFi
dupa2 is offline   Reply With Quote
Sponsored Links
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Voice recognition broken on 1.0.8.1 spaceskater StreetDeck Bug Reports 7 06-22-2007 03:25 PM
SpeakEasy Voice Recognition - Beta 1 ruairi Road Runner 31 08-22-2006 11:12 PM
SpeakEasy Voice Recognition - Release 0.9.2 ruairi Software & Software Development 10 11-22-2005 03:29 PM
Voice Recognition Whilst Playing Music? konnichiwa Software & Software Development 7 09-30-2005 01:54 PM
Voice recognition problems. WinXP/Navivoice. mikebaz Software & Software Development 5 08-07-2005 03:44 PM


All times are GMT -5. The time now is 02:20 AM.


Sponsored Links
The MP3car.com Store

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.1.0
Copyright © 1999 - 2008 Mp3Car.com Inc.
Ad Management by RedTyger
Message Board Statistics