maemo.org - Talk - [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

Page 1 of 3

1

Show 40 post(s) from this thread on one page

maemo.org - Talk (https://talk.maemo.org/index.php)

- Development (https://talk.maemo.org/forumdisplay.php?f=13)

- - [devel] PocketSphinx for Fremantle (Speech Recognition Engine) (https://talk.maemo.org/showthread.php?t=72565)

mc_teo

2011-04-27 15:11

[devel] PocketSphinx for Fremantle (Speech Recognition Engine)

Hello there, just a quick post on how I got PocketSphinx to work on my n900, as well as a basic python application to test your setup. I take no credit for anything on this thread, except the time spent putting these all together.

I downloaded all the .debs from http://repository.maemo.org/extras-d.../pocketsphinx/ into a new directory, removing any i386 specific .debs.

As root, I ran "dpkg -i *" and they tried to install, but were stopped, due to unment dependencies. (for me it was just python2.5-dbg)

This sucessfully ran, and installed pocketsphinx.

To try it out and make sure everything has installed correctly, run "pocketsphinx_continuous", and wait for everything to load. When prompted with "Ready..." say something clearly in the phones direction, (I used "Hello"). After another load of text there should be "000000001: hello (-12345676)".

To get the gstreamer hooks working, I had to install the package "gstreamer-tools".

After this I raw the Script here from the CMUSphinx example, tweaked to work for pulseaudio, http://pastebin.com/zCYzX65Z

Press the "Speak" button, then say your few words, and the textbox with update to show what you have said.

N.B. It uses the en_US acoustic model by default, therefore I had a good few mistakes at first which I attrute to my Irish accent.

This is another little sample that uses the JSGF grammer specification, and tries to interpret speech from a .wav file saved locally. (This needs to be recorded at 8khz mono, also)

==File grammer.jsgf==

PHP Code:


		
			
#JSGF V1.0;

grammar goforward;

public <move> = go <direction> <distance> [meter | meters];

<direction>= forward | backward;

<distance>= (one | two | three | four | five | six | seven | eight | nine | ten | twenty)+;

==File speechtest.py== (with myrecording.wav as the recording to interpret)

PHP Code:


		
			
#!/usr/bin/python

import pocketsphinx as ps

decoder = ps.Decoder(jsgf=’/path/to/your/jsgf/grammar.jsgf’,samprate=’8000&#8242;)

fh = open(“myrecording.wav”, “rb”)

nsamp = decoder.decode_raw(fh)

hyp, uttid, score = decoder.get_hyp()

print “Got result %s %d” % (hyp, score)

Boemien

2011-04-27 15:26

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

Yeah it seems interesting but noobs, like me of course, need some screenshots. Thanks in advance!!! :D

joerg_rw

2011-04-27 15:44

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

many thanks for this kickoff. I think this can be the start of a nice project to bring a missing feature to N900.

/j

skykooler

2011-04-27 15:48

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

So...would it be possible to use this for voice dialing via a dbus call?

niloy

2011-04-27 16:28

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

great, now if only someone could integrate it with the text editor of the phone

leojab

2011-04-27 16:50

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

This is just great news and thanks mc_teo...
Now that joerg_rw is interested in this project.. it will be a greater news soon :-)

cfh11

2011-04-27 18:20

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

Awesome! Now if this becomes feature complete and incorporated into the CSSU that would be a dream come true...

joerg_rw

2011-04-28 17:36

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

voice-call via dbus: should be rather simple, as long as you start the speech input engine on headset pushbutton and use a small set of pretrained contact name vocabulary.

integration with text editor: an ambitious project, as the vocabulary is virtually unlimited

@leojab: I'm planning to come up with a system architecture RFC eventually, so this could actually integrate into hildon/maemo seamlessly. NB you want both a) use speech input with unpatched possibly even closed source apps, and also work on several concurrent apps without multiple instances of pocketsphinx fighting each other
@cfh11: regarding my comments 1 line above I think we might integrate this in a way we can deploy it via extras, no need for CSSU. Well maybe hildon-desktop needs some hooks for cooperating with speech controlled task switching etc

/j

mc_teo

2011-05-01 00:15

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

1 Attachment(s)

So, I haven't been working too hard on this, due to school and all, but I have put together this Demo of what can be done.

I have attached a player.zip. within this archive, find three files, "player.py" which is the main script, "dict.lm" which contains some language stuff, and "dict.dic" which contains the dictionary.

so ensuring pocketsphix in installed, as outlined in my first post, run this script.

if the default mediaplayer is not open, it will attempt to open it, via a dbus command (and complain about file not found). so perhaps opening it before hand is the best solution.

then start the script, and you will be presented with a simple form. press enable to enable, and then say either play/stop/pause/resume/next/previous to run a command.

English only supported at the moment.

happy speaking

~mc_teo

Flandry

2011-06-16 18:40

Re: [devel] PocketSphinx for Fremantle (Speech Recognition Engine)

Good to see this getting some attention after it was passed over for the GSoC last year *.

A possibly less cumbersome alternative way for the curious to install is using fapman (choose the "All packages (ADVANCED)" under Category Filters and then search for sphinx). You don't need any of the debug packages or the two chinese model packages; install all the others. I did notice that the packages aren't optified, which means that with the available acoustic and language models you could eat up over 13MB root space. Consider yourself warned. I haven't access to my linux box to re-upload the packages with optification.

Worth a giggle if nothing else. With the provided large dictionary and language model the result of talking to your N900 is rather comical.

Edit: Removed command -- the default works fine.

All times are GMT. The time now is 20:37.

Page 1 of 3

1

Show 40 post(s) from this thread on one page

vBulletin® Version 3.8.8