Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: I have speech recogniton working with 1500 cd's

  1. #1
    Newbie kykeon's Avatar
    Join Date
    Jan 2005
    Location
    kansas city area, missouri
    Posts
    20

    Lightbulb I have speech recogniton working with 1500 cd's

    Hello I'm Jeff. I have voice recognition working with 1500 cd's. I'm using MS SAPI 5.1, C++, Perl, and my own XML data structure. I can call up all but 3-4 cd's out of 1500. I wrote a perl program to make my artist and album names pronounceable. It corrected the pronunciation 90% or the remaining names automatically. I only had to edit less then 1% of the pronunciations by hand. E.g. I changed ac-dc to "a c d c" After the artist is recognized you can speak the album name or "open album number <nn>" e.g. "open album number two" I also figured out a very easy solution to make the remaining 1% pronounceable.

    I can call up any cd with better then 99% accuracy in a quiet room with a hands-free, Crown microphone. I can also command my jukebox software with better then 90% accuracy while standing 20 feet away with the Crown microphone. My software also loads fast. When it starts up it loads data for about 15,000 songs from a 3.5MB in less than 2 seconds. My software should easily be able to work with 100,000 songs or 10,000 cd's.

    I'm using a professional omni-directional, electret condenser, pressure zone microphone by Crown connected to a mini-mixer with 70dB of gain, balanced inputs, and 48V phantom power.

    In a noisy car I can get better then 90% accuracy using a good headset mic. With the Crown accuracy is probably 70% in a noisy car.

    I'm interested in trading my knowledge with some people here. Is anyone here as far as I am in their speech recognition software development? I've spent about a year on and off developing my jukebox system.

  2. #2
    Rub One Out Motoko's Avatar
    Join Date
    Apr 2004
    Location
    Hertfordshire, England, Earth, Solar System
    Posts
    919
    you wana talk to 01331, he's coded a voice program to be used with many front ends.
    CarPC status: iPod, 3,456,217 songs so **** you

  3. #3
    Clover Grayscale's Avatar
    Join Date
    May 2004
    Location
    Arkansas
    Posts
    1,549
    0l33l
    CarPC install is starting to come along again...

  4. #4
    Raw Wave god_of_cpu's Avatar
    Join Date
    Jan 2004
    Location
    SilverSpring Maryland
    Posts
    2,960
    I wrote a voice interface also built with the MS voice SDK to call up songs by artist and song name a couple years ago, but I have yet to run it in the car with a recognition level I can live with. It's incredible for use in a media center PC and I've had similar experience to you getting well over 90% recognition in the house when its quiet using a cheap *** webcam mic from across the room, but in the car it just sucks.

    I used an array microphone in the car, but when driving with road and engine noise, I can maybe get to 70% recognition only if I yell real loud. It would be great if you could actually get good recognition wihout a headset, but the other problem I have is that I can't remember the exact name of most of my songs half the time. I added a feature to play a random song by a specified artist if you just say the artists name and a visual confirmation of what is being recognized, but even with that I still find I rarly want to use the voice interface for any reason other then to impress my friends. I just find it easier to just pick a song from a list after gesturing in a couple of the first few characters then to try and remember the exact pronunciation of everything, particularly when recoginition is going to fail often. That and I look like a tool when I'm at a stop light yelling song names at my car

    I would still like to hear more about how you programmatically corrected pronunciation, the only thing I did was convert numbers and remove some punctuation, but found this left a few artists and songs that were pronounced differently then what I thought they should be. However, the real issue doesn't seem to be one that can be solved by software, it seems like microphone and noise cancellation technology just isn't at a level good enough to get clean sounds into the system.
    StreetDeck.com Developer (I am Chuck)
    Get StreetDeck at http://www.streetdeck.com
    The Official StreetDeck Forums have moved, please visit us at http://www.streetdeck.com/forum for official support for Streetdeck.

  5. #5
    FLAC IntellaWorks's Avatar
    Join Date
    Jun 2004
    Location
    NH
    Posts
    1,173

    .

    I'm working on Voice Reco. I'd be interested in how you were able to grab / edit all the artist / songname's.


    I have a system designed for use in vehicle's that compare's what the system heard with actual data (it can compare fragments of sounds also) The fragements of sounds compare sequence is interesting and effective because typically in a car the microphone will never get the full sound that you said. This is due to the noisey envirnment. Noise can occur at unknown times and it almost impossible to completely rule out..

    For example, you are trying to play the song "Inbucus - Megalomaniac" when you are 3/4 of the way through a semi honks a horn and your microphone picks up that noise.. Typically SAPI 5.1 would hear that noise and bounce back a FalseStatement because the speach pattern of: Incubus - Megalon*HONK* doesnt compare to ithe actual patter of Incubus - Megalomaniac.

    My software hooks into the SAPI 5.1 Hypothesis event and monitors what SAPI is hearing, it remembers and compares song name's with song name's and artist's with artist's... it looks for wild cards as well.. if it heard something 75% or more it then continues to find a better match of a better match is found it continues to look for an even better match. If no better match is found, then the 75% match is chosen.

    This system has dramatically improved voice recognition, I beleive this with noise reducing microphones could be a great start.

    However, my problem exists with song name / artist... Every music file has a different format and it's a daunting task to write code to format each .mp3.. but from this thread it looks like this has been done for us..

    This intregue's me, and I'm hoping you can put that code up here so we can use it with some of our system's.

    I would be willing to throw my comparison code up, I will try and do this later tonight when I'm on the machine that actually has the code.
    Progress [I will seriously never be done!]
    Via EPIA MII
    512MB RAM
    OEM GPS (embedded)
    nLite WinXP pro on
    1GB Extreme III CF card
    Carnetix 1260 startup/ DC-DC regulator
    Software: Still, re-Writing my existing front end in .Net

  6. #6
    Newbie kykeon's Avatar
    Join Date
    Jan 2005
    Location
    kansas city area, missouri
    Posts
    20

    Thumbs up perl code for making pronounceable names

    Quote Originally Posted by god_of_cpu
    I would still like to hear more about how you programmatically corrected pronunciation, the only thing I did was convert numbers and remove some punctuation, but found this left a few artists and songs that were pronounced differently then what I thought they should be. However, the real issue doesn't seem to be one that can be solved by software, it seems like microphone and noise cancellation technology just isn't at a level good enough to get clean sounds into the system.
    Here's my Perl code. You'll need to know Perl to use it. All the code below has been tested.
    //------------------------------------------------------------------------------

    #------------------------------------------------------------------------------
    # try to make the string $s pronounceable by the computer
    # - usage $s = makePronounceableName($s);
    sub makePronounceableName
    {
    my $key;
    my $val;

    my $s = $_[0];

    #------------------------------------------------------------------------------
    # delete matching braces, parenthesis and everything inside

    $s =~ s/\(.*?\)//g;
    $s =~ s/\[.*?\]//g;
    $s =~ s/\{.*?\}//g;

    #------------------------------------------------------------------------------
    # replace ampersands

    $s =~ s/\&/and/g;

    #------------------------------------------------------------------------------
    # replace abbreviations

    $s =~ s/vol\./volume/;
    $s =~ s/ vol / volume /;
    $s =~ s/no\./number/;
    $s =~ s/op\./opus/;
    $s =~ s/Mr\. /Mister /;

    #------------------------------------------------------------------------------
    # replace some roman numerals

    $s =~ s/ iv / four /;
    $s =~ s/ iv$/ four/;
    $s =~ s/ iii / three /;
    $s =~ s/ iii$/ three/;
    $s =~ s/ ii / two /;
    $s =~ s/ ii$/ two/;
    $s =~ s/volume i /volume one /;

    #------------------------------------------------------------------------------
    # replace non-alphanumerics with few exceptions

    $s =~ s/[^0-9a-zA-Z\b\'\,]/ /g;

    #------------------------------------------------------------------------------
    # replace two or more adjacent whitespace chars

    #------------------------------------------------------------------------------
    # spell numbers with whitespace on either side or ^ or $
    # - first check for whitespace

    my %number_list = ( "0", "zero", "1", "one", "2", "two", "3", "three",
    "4", "four", "5", "five", "6", "six", "7", "seven",
    "8", "eight", "9", "nine", "10", "ten"
    );

    foreach $key (keys (%number_list))
    {
    $val = $number_list{$key};
    $s =~ s/^($key)\s/$val /;
    $s =~ s/\s($key)$/ $val/;
    $s =~ s/\s($key)\s/ $val /;
    }
    #------------------------------------------------------------------------------

    return $s;
    }


    ################################################## ######################
    # TEST CODE
    ################################################## ######################

    #------------------------------------------------------------------------------
    # - "3 Doors Down" becomes "three Doors Down" (spell out numbers)
    # - singles digit replacement is fine for now
    # - "Beethoven (1770-1827)" becomes "Beethoven" (remove all parenthesis and their contents)
    # - "Absolutely_ Best Of ABC" becomes "Absolutely Best Of ABC" (remove all non-alphabetic characters)
    # - "Private Times...And The Whole 9!" becomes "Private Times And The Whole nine"
    # - spell out abbreviations: "vol." -> "volume"
    # - replace "&" with "and"

    sub test_makePronounceableName
    {
    my @testlist = ("Zero 7",
    "3 Doors Down",
    "50 cent & two pennies",
    "_compilations",
    "Absolutely_ Best Of ABC",
    "Beethoven (1770-1827)",
    "Private Times...And The Whole 9!",
    "hello 5 hello",
    "van halen ii",
    "van halen ii ",
    "hello (uu) [256] )}] tt{x} 4 44 5x x5 x5x (xx) (yy) ({["
    );

    # try to use "./testnames.txt"
    #my $testlist = `cat ./testnames.txt`;
    #@testlist = split /\n/, $testlist;

    my $name;
    my $pronounceable_name;

    foreach $name (@testlist)
    {
    $pronounceable_name = makePronounceableName($name);
    print "'$name' => '$pronounceable_name'\n";
    }

    }

  7. #7
    Newbie kykeon's Avatar
    Join Date
    Jan 2005
    Location
    kansas city area, missouri
    Posts
    20

    Thumbs up reading song data

    Quote Originally Posted by IntellaWorks
    I'm working on Voice Reco. I'd be interested in how you were able to grab / edit all the artist / songname's.


    I have a system designed for use in vehicle's that compare's what the system heard with actual data (it can compare fragments of sounds also) The fragements of sounds compare sequence is interesting and effective because typically in a car the microphone will never get the full sound that you said. This is due to the noisey envirnment. Noise can occur at unknown times and it almost impossible to completely rule out..

    For example, you are trying to play the song "Inbucus - Megalomaniac" when you are 3/4 of the way through a semi honks a horn and your microphone picks up that noise.. Typically SAPI 5.1 would hear that noise and bounce back a FalseStatement because the speach pattern of: Incubus - Megalon*HONK* doesnt compare to ithe actual patter of Incubus - Megalomaniac.

    My software hooks into the SAPI 5.1 Hypothesis event and monitors what SAPI is hearing, it remembers and compares song name's with song name's and artist's with artist's... it looks for wild cards as well.. if it heard something 75% or more it then continues to find a better match of a better match is found it continues to look for an even better match. If no better match is found, then the 75% match is chosen.
    I like this solution. Thanks. I'm going to use this solution in my system.

    Quote Originally Posted by IntellaWorks
    This system has dramatically improved voice recognition, I beleive this with noise reducing microphones could be a great start.

    However, my problem exists with song name / artist... Every music file has a different format and it's a daunting task to write code to format each .mp3.. but from this thread it looks like this has been done for us..

    This intregue's me, and I'm hoping you can put that code up here so we can use it with some of our system's.

    I would be willing to throw my comparison code up, I will try and do this later tonight when I'm on the machine that actually has the code.
    Here is my solution:

    1. Make sure you id3 tags are correct. If they are not re-rip the cd to get the proper cddb information
    2. Use perl to read all the id3 tag info
    3. Write all your id3 info to a data file
    4. Have your voice software read this data file at startup
    - My C++ software reads in data for 15,000 songs in less than 2 seconds.

  8. #8
    My Village Called 0l33l's Avatar
    Join Date
    Jul 2004
    Location
    Berkeley, CA
    Posts
    10,516
    Hi. This is one of the features that I was going to implement into NaviVoice. I just need some support from Frodo, to make a program that dumps all the artis/album names into a text file, and to make a feature to send search parameters to Frodoplayer

  9. #9
    Raw Wave rando's Avatar
    Join Date
    Mar 2004
    Location
    Redondo Beach, CA
    Posts
    1,973
    Could you read the Album/Arthis info directly from FP's JET DB? Also, could you fake the search interface by sending keystrokes?

  10. #10
    Clover Grayscale's Avatar
    Join Date
    May 2004
    Location
    Arkansas
    Posts
    1,549
    Quote Originally Posted by rando
    Could you read the Album/Arthis info directly from FP's JET DB? Also, could you fake the search interface by sending keystrokes?
    He could definately access the DB provided Frodoplayer doesn't lock it or something like that.

    Doesn't fpwebserver allow you to send searches to it?
    CarPC install is starting to come along again...

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •