I'm planning on writing a program for Linux that uses text to speech and speech recognition. What are the best tools/libraries for this? Should I use Windows instead to be able to use better tools? The tools need to be easily callable from a console or C program.
Linux – Need text to speech and speech recognition tools for Linux
linuxspeech-recognitiontext-to-speech
Related Solutions
The simplest and most widely available method to get user input at a shell prompt is the read
command. The best way to illustrate its use is a simple demonstration:
while true; do
read -p "Do you wish to install this program?" yn
case $yn in
[Yy]* ) make install; break;;
[Nn]* ) exit;;
* ) echo "Please answer yes or no.";;
esac
done
Another method, pointed out by Steven Huwig, is Bash's select
command. Here is the same example using select
:
echo "Do you wish to install this program?"
select yn in "Yes" "No"; do
case $yn in
Yes ) make install; break;;
No ) exit;;
esac
done
With select
you don't need to sanitize the input – it displays the available choices, and you type a number corresponding to your choice. It also loops automatically, so there's no need for a while true
loop to retry if they give invalid input.
Also, Léa Gris demonstrated a way to make the request language agnostic in her answer. Adapting my first example to better serve multiple languages might look like this:
set -- $(locale LC_MESSAGES)
yesptrn="$1"; noptrn="$2"; yesword="$3"; noword="$4"
while true; do
read -p "Install (${yesword} / ${noword})? " yn
if [[ "$yn" =~ $yesexpr ]]; then make install; exit; fi
if [[ "$yn" =~ $noexpr ]]; then exit; fi
echo "Answer ${yesword} / ${noword}."
done
Obviously other communication strings remain untranslated here (Install, Answer) which would need to be addressed in a more fully completed translation, but even a partial translation would be helpful in many cases.
Finally, please check out the excellent answer by F. Hauri.
A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognition namespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:
public partial class Form1 : Form
{
SpeechRecognizer rec = new SpeechRecognizer();
public Form1()
{
InitializeComponent();
rec.SpeechRecognized += rec_SpeechRecognized;
}
void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
lblLetter.Text = e.Result.Text;
}
void Form1_Load(object sender, EventArgs e)
{
var c = new Choices();
for (var i = 0; i <= 100; i++)
c.Add(i.ToString());
var gb = new GrammarBuilder(c);
var g = new Grammar(gb);
rec.LoadGrammar(g);
rec.Enabled = true;
}
This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.
System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)
It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).
As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(
As for text-to-speech, System.Speech.Synthesis does this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")
Best Answer
For speech recognition there are the various Sphinxes. The different variants have different pros and cons, there is a comparison here Comparison of Sphinx versions. Sphinx 4 is Java, but the others are C, I believe.