Arduino Recognizing pre-recorded sound clips from an audio stream

arduinofftsound

My project is to control a Nintendo 3DS hand held video game system from my Arduino Mega.
I have all the basic controls wired up and working well.

I need to be able to recognize the state of a video game being played by it's known audio cues.

Specifically, the Super Smash Bros fighting game. At the end of the match there will be a winner. The winner is announced with a known sound clip.

An example is: When Mario wins, there will be a known sound clip that plays with the announcer saying "The winner is Mario!".

To my advantage the "match winner" sound clip that plays are all known and will be EXACTLY the same each time it is played.

I need to recognize this specific sound clip and then have the Arduino press the "Start" button on the Nintendo 3DS to move the game to the next screen.

I have done numerous searches on FFT, FHT, Matched Filters, Speech Recognition, etc and have not come up with a solid way to do this.

I don't think speech recognition techniques work well since the audio is not a regular human voice, but a video game announcer "cartoon" voice with lots of background sound affects.

One theoretical approach I have is to somehow "train" the Arduino with pre-recorded sound clips and have it react when it hears them. Similar to the EasyVR Shield 3.0.

Any thoughts on how to do this?

Video of the sound clips: https://www.youtube.com/watch?v=VzC6psWEGM8

I am a software engineer and I am very comfortable around coding! Any help is appreciated.

Best Answer

The device in question appears to be based on a COB (chip-on-board) version of Sensor Inc.'s neural network voice recognition chips.

That's mounted on a module and "shield" (barf) made by VeeaR, apparently an Italian company.

There are open-source voice recognition programs (maybe this one, for example), so this might be appropriate for a reasonably capable processor running Linux- perhaps Beaglebone Black or Raspberry Pi.