Active Topics

 



Notices


Reply
Thread Tools
Posts: 7 | Thanked: 17 times | Joined on Jun 2012
#251
Hi taixzo

Originally posted by taixzo:
I hope there is some way to extract the score, if not I may have to dive into the plugin code and try to find a way to extract it.
I am not sure whether I made myself clear, the patched plugin already delivers the score. In order to be compatible it sends an additional message (called 'result_score') right before it sends the 'result' message. So after receiving 'result_score' the 'result' message can be omitted. If you want I can send you that plugin.

One thing that I hope for is runtime model/dictionary switching,
My app always aims to keep the current dictionary/language model as small as possible (for a better recognition accuracy) and switches the dict/lm according to its internal context. For example: imagine there is a small command-set which allows to launch a new app. After the voice-control-app recognized the command 'launch' it switches its context to 'I have to launch something', loads a new appropriate dictionary which only contains all the names of the programs that might be launched etc.

Perhaps this may also be an approach for your saera.

Last edited by myra; 2012-06-21 at 17:13.
 

The Following 4 Users Say Thank You to myra For This Useful Post:
Posts: 3,328 | Thanked: 4,476 times | Joined on May 2011 @ Poland
#252
Originally Posted by taixzo View Post
One thing that I hope for is runtime model/dictionary switching, to allow full dictation when sending e.g. a text. Hopefully I can use some gstreamer trickery (like using a filesink at the end of the pipeline), and re-try to understand what was said (with a larger dictionary/model) if the score is low...I hope there is some way to extract the score, if not I may have to dive into the plugin code and try to find a way to extract it.
The only way out for me is some kind of grammar engine. But it'd be hard to code.
__________________
If you want to support my work, you can donate by PayPal or Flattr

Projects no longer actively developed: here
 
Posts: 1,523 | Thanked: 1,997 times | Joined on Jul 2011 @ not your mom's FOSS basement
#253
Originally Posted by myra View Post
that german translation is absolutely correct except two minor spelling errors. It should be "Wusstest Du, dass das N900 Mac OSX verwenden kann?" instead of "Wußtest Du, das das N900 Mac OSX verwenden kann?" but this does not influence espeak's pronunciation.
Thanks. And you are somewhat right on the second part ('das das' is wrong, there should be a sharp 's' as the first one, i.e. 'daß das').

But i'm afraid i have to clarify i'm an evangelist of the true (literally old school) German language rules, and therefore i strongly detest the silly, unwanted (by most of the population) spelling reforms forced upon us by the Government in 1996 - so, f.e. no double 's' etc. for me, but 'ß'.

Last edited by don_falcone; 2012-06-21 at 17:17.
 

The Following 3 Users Say Thank You to don_falcone For This Useful Post:
Posts: 958 | Thanked: 3,426 times | Joined on Apr 2012
#254
Originally Posted by myra View Post
Hi taixzo

Originally posted by taixzo:

I am not sure whether I made myself clear, the patched plugin already delivers the score. In order to be compatible it sends an additional message (called 'result_score') right before it sends the 'result' message. So after receiving 'result_score' the 'result' message can be omitted. If you want I can send you that plugin.


My app always aims to keep the current dictionary/language model as small as possible (for a better recognition accuracy) and switches the dict/lm according to its internal context. For example: imagine there is a small command-set which allows to launch a new app. After the voice-control-app recognized the command 'launch' it switches its context to 'I have to launch something', loads a new appropriate dictionary which only contains all the names of the programs that might be launched etc.

Perhaps this also an approach for your saera.
I would greatly appreciate if you would send me that code. Regarding what you said: that was more or less what I had in mind, the base dictionary would have a few words (actions like "call", "text", "launch" etc. and question words like "what", "when" etc.) which would then switch to the appropriate dictionary, only re-trying with the larger list if the score wasn't high enough.

This should also allow Saera to start up faster: we don't necessarily need to load the big model until Saera needs to re-try something.
 

The Following User Says Thank You to taixzo For This Useful Post:
Posts: 958 | Thanked: 3,426 times | Joined on Apr 2012
#255
Originally Posted by don_falcone View Post
Thanks. And you are somewhat right on the second part ('das das' is wrong, there should be a sharp 's' as the first one, i.e. 'daß das').

But i'm afraid i have to clarify i'm an evangelist of the true (literally old school) German language rules, and therefore i strongly detest the silly, unwanted (by most of the population) spelling reforms forced upon us by the Government in 1996 - so, f.e. no double 's' etc. for me, but 'ß'.
Let's use 'ß' in the sentences list, and we can add an option to convert them all to 'ss' if necessary.
 

The Following User Says Thank You to taixzo For This Useful Post:
Posts: 7 | Thanked: 17 times | Joined on Jun 2012
#256
i strongly detest the silly, unwanted (by most of the population) spelling reforms
Same here don_falcone, but on the long run we have to accept it. Hopefully you don't mind, let me ask you: are you from Bavaria?
 
Posts: 1,523 | Thanked: 1,997 times | Joined on Jul 2011 @ not your mom's FOSS basement
#257
(H)o(b|pe)viously not in my lifetime And nope too - but check my location, it's obvious
 
Posts: 7 | Thanked: 17 times | Joined on Jun 2012
#258
taixzo,
you can download the patched gstpocketsphinx-plugin
here.
md5sum: 84fd12b19df535177870920c29615789

Here are the appropriate saera-code snippets to make use of the scored result:

Code:
class Saera:
	def __init__(self):
		self.result_score = False
Code:
	def init_gst(self):
		"""Initialize the speech components"""
		self.pipeline = gst.parse_launch('pulsesrc ! audioconvert ! audioresample '
										 + '! vader name=vad  auto-threshold=true '
										 + '! pocketsphinx name=asr ! fakesink')
		asr = self.pipeline.get_by_name('asr')
		asr.connect('partial_result', self.asr_partial_result)
		asr.connect('result', self.asr_result)
		asr.connect('result_score', self.asr_result_score)
		asr.set_property('configured', True)
Code:
	def asr_result_score(self, asr, text, score):
		"""Forward result signals on the bus to the main thread."""
		struct = gst.Structure('result_score')
		struct.set_value('hyp', text)
		struct.set_value('score', score)
		asr.post_message(gst.message_new_application(asr, struct))
Code:
	def application_message(self, bus, msg):
		"""Receive application messages from the bus."""
		msgtype = msg.structure.get_name()
		if msgtype == 'partial_result':
			self.partial_result(msg.structure['hyp'], msg.structure['uttid'])
		elif msgtype == 'result_score':
			self.result_score = True
			self.final_result_score(msg.structure['hyp'], msg.structure['score'])
			# self.pipeline.set_state(gst.STATE_PAUSED)
		elif msgtype == 'result'  and  self.result_score == False:
			self.final_result(msg.structure['hyp'], msg.structure['uttid'])
Code:
	def final_result_score(self, hyp, score):
		"""Insert the final result."""
		# All this stuff appears as one single action
		print "Final Result: ", hyp, "  score: ", score
		if int(score) > -18500000:
			self.run_saera(None, "speech-event", hyp)
Regarding the minimum score, this depends on the device (on my laptop the score is much higher than on the N900) and on the pocketsphinx settings (min frequency, max frequency, num channels etc), so I suggest to stay with the default settings. (I tried to vary these settings in order to get a better accuracy, but no luck.)

BTW: How can I upload a file on talk.maemo.org ?
 

The Following 8 Users Say Thank You to myra For This Useful Post:
Estel's Avatar
Posts: 5,028 | Thanked: 8,613 times | Joined on Mar 2011
#259
taixzo, it's absolutely wonderful and amazing, what this project evolved into in just few days time. I'm absolutely sure, that You're one of favorites for Coding Competition with Saera, so please, don't forget to apply there.

As for issues with uploading permissions, we're working hard with our technical contact, in order to fix it. Of course, I could also lend You my garage account with upload permissions, but I think it's quite pointless - we need whole procedure working smoothly.
---

As for program itself - again, what amazes me most, is that it's not simply "speak recognition" program, but first attempt to bring AI to our N900. Sure, I also don't expect it to pass Turing test soon (not that Turing test is good measurement of intelligence, anyway). I really hope, that being a basic - and developed - AI won't disappear from scope of Saera, for the sake of functionality.

BTW, what's the current status of possibility to change her name? For some reasons too long to explain here, it would be very useful in my use case

Also, I'm quite new to this whole digitized speech thing - even if I do, lets say, polish corpus.txt, how to ensure that text written in correct polish will be pronounced correctly?

I know You're busy guy her,e so I don't ask for (re)writing tutorial - maybe some link for documentation? Or things already present in package are everything I need?

/Estel
__________________
N900's aluminum backcover / body replacement
-
N900's HDMI-Out
-
Camera cover MOD
-
Measure battery's real capacity on-device
-
TrueCrypt 7.1 | ereswap | bnf
-
Hardware's mods research is costly. To support my work, please consider donating. Thank You!
 

The Following 2 Users Say Thank You to Estel For This Useful Post:
Posts: 958 | Thanked: 3,426 times | Joined on Apr 2012
#260
Originally Posted by Estel View Post
taixzo, it's absolutely wonderful and amazing, what this project evolved into in just few days time. I'm absolutely sure, that You're one of favorites for Coding Competition with Saera, so please, don't forget to apply there.
According to the website that the banner links to, the submission site isn't live yet, so I added it to the table on that site.

As for issues with uploading permissions, we're working hard with our technical contact, in order to fix it. Of course, I could also lend You my garage account with upload permissions, but I think it's quite pointless - we need whole procedure working smoothly.
Thank you!

As for program itself - again, what amazes me most, is that it's not simply "speak recognition" program, but first attempt to bring AI to our N900. Sure, I also don't expect it to pass Turing test soon (not that Turing test is good measurement of intelligence, anyway). I really hope, that being a basic - and developed - AI won't disappear from scope of Saera, for the sake of functionality.
Saera will continue to be an AI project, and I am hopeful that this will allow for increased functionality (maybe you can eventually 'teach' Saera how to do things she doesn't know how to do).

BTW, what's the current status of possibility to change her name? For some reasons too long to explain here, it would be very useful in my use case
Changing the official name or changing the name after installation? In the latter case it would be fairly simple, just search and replace "Saera" in a few files.

Also, I'm quite new to this whole digitized speech thing - even if I do, lets say, polish corpus.txt, how to ensure that text written in correct polish will be pronounced correctly?

I know You're busy guy her,e so I don't ask for (re)writing tutorial - maybe some link for documentation? Or things already present in package are everything I need?

/Estel
The beginning of each sentences_<language>.py file contains a line with the espeak command line, of the form
Code:
espeak_cmdline = "espeak -vCC+f2"
, where CC is the two-letter language code. You can also change the 'f' to a 'm' to give Saera a male voice.
As for documentation - I haven't written any; the basics of the corpus is that it defines the words that Pocketsphinx will recognize, and gives it a model to build grammar from; it should recognize any sentence in the corpus nearly 100%, but if you say something else it will try to build it out of words and grammar constructs found in the corpus.
 

The Following 4 Users Say Thank You to taixzo For This Useful Post:
Reply

Tags
saera, speech-to-text


 
Forum Jump


All times are GMT. The time now is 05:14.