Active Topics

 


Poll: What advanced text entry method(s) would you like to see on Sailfish?
Poll Options
What advanced text entry method(s) would you like to see on Sailfish?

Reply
Thread Tools
Posts: 105 | Thanked: 205 times | Joined on Dec 2015 @ Spain
#161
Originally Posted by ljo View Post
with a modest download count of 0 after already some days available in openrepos, https://openrepos.net/content/ellefj...ources-okboard or just search for OKboard in warehouse client, I recalled to update this status. Please remember the curves must go on a for å and ä and o for ö for the predictions to be the ones you expect otherwise they will be totally and insanely wrong.
Hi Ijo. I'm very interesting in the process you follow to generate the swedish OKBoard in order to generate the same in Spanish language. I already have generated the Spanish dictionaries, but now I don't know how to continue the process. Can you help me, please?

Thanks in advance!
 
Posts: 281 | Thanked: 679 times | Joined on Feb 2010
#162
https://together.jolla.com/question/...post-id-125196 ist the same with german. How and from where did you get the corpus stuff?

Oups seems to be a loop post now. sorry...
 
Posts: 105 | Thanked: 205 times | Joined on Dec 2015 @ Spain
#163
 

The Following User Says Thank You to ferlanero For This Useful Post:
Posts: 281 | Thanked: 679 times | Joined on Feb 2010
#164
Originally Posted by ferlanero View Post
I get the dict's from here:

https://github.com/titoBouzout/Dicti...s/blob/master/
Hm according to @eber42 on tjc you need corpus files:

"Corpus files are compressed plain text files (.txt.bz2) and not tar files. So you can just join different sources before compression. They should contain only sentences separated with dots and/or blank lines, and with proper capitalization."
 

The Following User Says Thank You to cy8aer For This Useful Post:
Posts: 102 | Thanked: 187 times | Joined on Jan 2010
#165
@ferlanero, I get no clear status of where you are in the processing of the corpus file, since you say you got the hunspell dicts from github. The hunspell dicts cannot be used with OKboard more than as word lists.
So please give clear info on if you with "I have already generated the Spanish dictionaries", mean you have copied the hunspell ones only or actually processed a corpus of Spanish texts according to the README.md.
 

The Following User Says Thank You to ljo For This Useful Post:
Posts: 16 | Thanked: 11 times | Joined on Jun 2015
#166
@ljo. Swedish works very nicely, even though the åäö keys can't be used. Thanks alot
 
Posts: 738 | Thanked: 819 times | Joined on Jan 2012 @ Berlin
#167
Originally Posted by cy8aer View Post
Hm according to @eber42 on tjc you need corpus files:

"Corpus files are compressed plain text files (.txt.bz2) and not tar files. So you can just join different sources before compression. They should contain only sentences separated with dots and/or blank lines, and with proper capitalization."
can you please share the DE lang? Or Upload into openrepos!?
__________________
www.sailfishmods.de
 

The Following 3 Users Say Thank You to cvp For This Useful Post:
Posts: 105 | Thanked: 205 times | Joined on Dec 2015 @ Spain
#168
Originally Posted by ljo View Post
@ferlanero, I get no clear status of where you are in the processing of the corpus file, since you say you got the hunspell dicts from github. The hunspell dicts cannot be used with OKboard more than as word lists.
So please give clear info on if you with "I have already generated the Spanish dictionaries", mean you have copied the hunspell ones only or actually processed a corpus of Spanish texts according to the README.md.
I only have processed a corpus of Spanish from hunspell dicts in github... I don't know where to find more words or sentences or even if they are necessary and how to make it working with OKBoard. I have enough processing power to do that but I can't understand how to make it. And I think a predictable keyboard to Sailfish is a "must to" for this operating system.

If someone here could write a step by step guide in order to add more languages support to this keyboard I could do it without any problem

Thank you very much!!
 
Posts: 738 | Thanked: 819 times | Joined on Jan 2012 @ Berlin
#169
can someone explain how i setup the WORK_DIR and CORPUS_DIR environments variable
__________________
www.sailfishmods.de
 
Posts: 105 | Thanked: 205 times | Joined on Dec 2015 @ Spain
#170
Originally Posted by cvp View Post
can someone explain how i setup the WORK_DIR and CORPUS_DIR environments variable
The steps to do that are these:

-You need a linux environment (I'm using Archlinux, but Ubuntu or some other works too)

- You need to download the tarball first: http://git.tuxfamily.org/okboard/okb...master.tar.bz2 and uncompress it at your /home directory

- You need the dictionaries. I take it from https://github.com/titoBouzout/Dictionaries but it needs to be adjusted, so I attach the file already processed (see Spanish.dic.txt.zip on this post)

-You need the corpora files of your language (e.g. Spanish)
http://corpora2.informatik.uni-leipzig.de/download.html
http://www.cs.upc.edu/~nlp/wikicorpus/
http://opus.lingfil.uu.se/OpenSubtitles2016.php
http://www.lllf.uam.es/ESP/Corlec.html
https://tatoeba.org/spa/downloads

Take in mind this tip to make your corpora files:
Corpora file =< 4GB for 16GB RAM Computers
Corpora file =< 1,5GB for 8GB RAM Computers

- You need the "aspell-es" package (in case of Spanish) instaled from the repos of your distro.

- You need "lbzip2" package installed on your system too.

-You need "rsync" installed on your system.

-You need "QT5" installed on your system.

-You need "python3-dev" installed on your system.

- Now you need to create a folder somewhere and put the dictionary inside (e.g. /home/USERNAME/okboard/langs)

-If you have several corpora files, then:

Code:
cat file1 file2 file3 file4 file5 > corpus-es.txt
- Open a terminal window

- And set the two environment variables:

*NOTE: You should change USERNAME by your own name
Code:
export CORPUS_DIR=/home/USERNAME/okboard/langs
Code:
export WORK_DIR=/home/USERNAME/okboard/langs
- You can see those variables with

Code:
echo $VARIABLE_NAME
if you're curious

- You need to compress the file (corpus-es.txt) you put before in /home/username/okboard/langs:

Code:
bzip2 /home/USERNAME/okboard/langs/corpus-es.txt
- Now should be named corpus-$LANG.txt.bz2 In our case: corpus-es.txt.bz2 because of Spanish

- There should be a single file inside.

- The next thing is to do is to move in okboard files inside the same Terminal window in our case "/home/username/okb-engine-master/". Here is the okboard's source code.

Code:
cd /home/USERNAME/okb-engine-master/
- In 'db' folder you must create a lang-es.cf file first. You can copy it from another .cf file in the same folder (e.g. copy lang-en.cf and rename it into lang-es.cf)

-And left only ASCII characters on those files:

Code:
lbzip2 -d < /home/USERNAME/okboard/langs/corpus-es.txt.bz2 | ./tools/clean_corpus.py | lbzip2 > /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2
Then move the corpus-es.txt original file to a safe place:

Code:
mv /home/USERNAME/okboard/langs/corpus-es.txt.bz2 /home/USERNAME/okboard
And rename the cleaned file fro non valid ASCII characters to the proper name to start the process:

Code:
mv /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2 /home/USERNAME/okboard/langs/corpus-es.txt.bz2
- Execute
Code:
db/build.sh es
("es" in case of Spanish)

- After this, the script creates the dictionaries for OKBoard in path /home/USERNAME/okboard/langs/ :

add-words-es.txt
affixes-es.txt
clusters-es.log
clusters-es.txt
corpus-es.txt.bz2
db.version
es-full.dict
es-full.tre
es-learn.txt.bz2
es-predict.dict
es-test.txt.bz2
es.tre
grams-es-full.csv.bz2
grams-es-learn.csv.bz2
grams-es-test.csv.bz2
lang-es.cf
ngrams-es.rpt
predict-es.db
predict-es.id
predict-es.ng
predict-es.rpt.bz2
predict-es.txt.bz2
tmp-words-es.txt

- So, now we have the Spanish dictionary created.

Now we have to compress in .gz the files OKBoard will use to swype our texts:

Code:
gzip -9 /home/USERNAME/okboard/langs/es.tre
Code:
gzip -9 /home/USERNAME/okboard/langs/predict-es.db
Code:
gzip -9 /home/USERNAME/okboard/langs/predict-es.ng
Now, connect the phone via ssh to the computer as I explain here: http://www.linuxleon.org/2015/09/how...jolla-con.html

And follow these instructions to create RPM package directly on your Sailfish phone: http://talk.maemo.org/showthread.php?t=92963

After that, place .gz files onto phone:

Code:
scp /home/USERNAME/okboard/langs/es.tre.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .
Code:
scp /home/USERNAME/okboard/langs/predict-es.db.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .
Code:
scp /home/USERNAME/okboard/langs/predict-es.id /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .
Code:
scp /home/USERNAME/okboard/langs/predict-es.ng.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .

Last edited by ferlanero; 2017-01-10 at 10:05.
 

The Following 6 Users Say Thank You to ferlanero For This Useful Post:
Reply

Tags
bettertxtentry, huntnpeck sucks, okboard, sailfish, swype


 
Forum Jump


All times are GMT. The time now is 15:54.