Active Topics

 



Notices


Poll: What advanced text entry method(s) would you like to see on Sailfish?
Poll Options
What advanced text entry method(s) would you like to see on Sailfish?

Reply
Thread Tools
Posts: 424 | Thanked: 1,157 times | Joined on Feb 2014 @ Germany
#351
Originally Posted by meegouser View Post
@mautz Did you find some time in the last 2 months? I would be very happy if you would update the German language package some time soon.
I'll release a new version in the next days.

EDIT: Ok, i ran into some problems, my old corpora files seems to include some incompatible expressions and i can not get the clean_corpus.py script working. Any ideas on that problem?

EDIT2: Solved by lbzip2 -d < corpus.txt.bz2 | ./clean_corpus.py | lbzip2 > new_corpus.txt.bz2

EDIT3: Update published on openrepos.

Last edited by mautz; 2017-03-08 at 11:24.
 

The Following 5 Users Say Thank You to mautz For This Useful Post:
Posts: 805 | Thanked: 2,860 times | Joined on Apr 2012
#352
So I just updated OKBoard from 0.5.12 to 0.6.9, and now if I enable OKboard no keyboard comes up in any application, and even if I disable it again it stays missing (and all applications that had a textbox hang) until reboot. I don't see any log files, either.
 

The Following 3 Users Say Thank You to taixzo For This Useful Post:
Feathers McGraw's Avatar
Posts: 640 | Thanked: 2,304 times | Joined on Jul 2014 @ UK
#353
Originally Posted by taixzo View Post
So I just updated OKBoard from 0.5.12 to 0.6.9, and now if I enable OKboard no keyboard comes up in any application, and even if I disable it again it stays missing (and all applications that had a textbox hang) until reboot. I don't see any log files, either.
I had this problem too and there has been some discussion on the openrepos page - try uninstalling and reinstalling, that worked for me.
 

The Following 3 Users Say Thank You to Feathers McGraw For This Useful Post:
Posts: 805 | Thanked: 2,860 times | Joined on Apr 2012
#354
Thanks, that fixed it. And indeed, 0.6.9 is much more functional than 0.5.12 on non-Jolla devices.
 

The Following 3 Users Say Thank You to taixzo For This Useful Post:
Posts: 805 | Thanked: 2,860 times | Joined on Apr 2012
#355
I have a bug report: I cannot swipe the word "on". All other words work fine, including "in", but "on" never comes up, even in the suggestions, and it never learns it from manually typing either. Any suggestions?
 

The Following 2 Users Say Thank You to taixzo For This Useful Post:
Posts: 148 | Thanked: 182 times | Joined on Apr 2012 @ Austria
#356
how many words should the input dictionary have? and has anyone of you tried doing this for Hungarian?
 

The Following User Says Thank You to smatkovi For This Useful Post:
Posts: 642 | Thanked: 3,367 times | Joined on Aug 2016 @ Estonia
#357
Originally Posted by smatkovi View Post
how many words should the input dictionary have? and has anyone of you tried doing this for Hungarian?
For training, you count not the words as such, but the corpus of text on the basis of which you'll get the database (not dictionary, but words and their combinations with the corresponding probabilities).

Trick is to find such corpus. I have done some training for presage predictor and used the following corpora (sizes in MBs of text file):

* English: 156MB
* Estonian: 1900MB

Russian was downloaded as n-grams database immediately from a site providing it.

For Hungarian, I would suggest to contact @martonmiklos. He trained presage for Hungarian and should have the corpus somewhere.
 

The Following 3 Users Say Thank You to rinigus For This Useful Post:
Posts: 37 | Thanked: 27 times | Joined on Oct 2009 @ Finland
#358
First, huge thanks for @eber42 for making OKboard!

I'm working on a Finnish dictionary, but it's really hard to find quality corpuses. I'm currently experimenting with Wikipedia-based and news based, but it seems I need bigger and better corpora... Does anyone know any good sources?

I did manage to get the thing to build (by cruely skipping the very last step that causes the build to fail and suggesting a bigger corpora - I wanted a proof of concept, won't be skipping the test in release version) but there are problems. I cut the original word list in half, but I'm still getting kinda huge (12MB...30MB) fi.tre file, predict-fi.db is 26kB and predict-fi.ng is 813kB. In comparison the English en.tre is below two megabytes... As a result, the delay between the gesture and the word appearing is...very noticable to be modest. What would be a good size to aim at?

Thanks all!
 

The Following 2 Users Say Thank You to mattiviljanen For This Useful Post:
Posts: 642 | Thanked: 3,367 times | Joined on Aug 2016 @ Estonia
#359
Originally Posted by mattiviljanen View Post
First, huge thanks for @eber42 for making OKboard!

I'm working on a Finnish dictionary, but it's really hard to find quality corpuses. I'm currently experimenting with Wikipedia-based and news based, but it seems I need bigger and better corpora... Does anyone know any good sources?

I did manage to get the thing to build (by cruely skipping the very last step that causes the build to fail and suggesting a bigger corpora - I wanted a proof of concept, won't be skipping the test in release version) but there are problems. I cut the original word list in half, but I'm still getting kinda huge (12MB...30MB) fi.tre file, predict-fi.db is 26kB and predict-fi.ng is 813kB. In comparison the English en.tre is below two megabytes... As a result, the delay between the gesture and the word appearing is...very noticable to be modest. What would be a good size to aim at?

Thanks all!
For Estonian, I contacted an academic lab and got the corpus from them. I would expect that you could do so for Finnish similarly. Find some Finnish language institute and they may help you.

Formats are different, but for presage in its new Marisa-based format (trie and counts stored separately), I have 12MB database for Estonian. For English, its 6MB. So, its expected for languages, such as Finnish, to have larger database.

To regulate the size of the database, you would have to increase/decrease cut-off n-gram count. In this aspect, keep the corpus full size and just change that parameter.

Additional note - we are missing Finnish among languages supported by presage-based predictor. Would you mind to generate n-grams database for that too
 

The Following 5 Users Say Thank You to rinigus For This Useful Post:
Reply

Tags
bettertxtentry, huntnpeck sucks, okboard, sailfish, swype

Thread Tools

 
Forum Jump


All times are GMT. The time now is 11:17.