Advanced text entry on Sailfish (Swype or similar) - Page 17 - maemo.org

Active Topics

How many of you still use N900 as main phone? (595)
to Community by deutch1976 - 2 days, 3 hrs ago
Big news (7)
to OS2008 / Maemo 4 / Chinook - Diablo by endsormeans - 3 days, 12 hrs ago
Firefox with Leste (8)
to Maemo 7 / Leste by endsormeans - 3 days, 12 hrs ago
Which is the best N95? What software modifications could be made to it? (3)
to General by Kalatti - 6 days, 12 hrs ago
Installing CSSU Stable in year 2024 (2)
to Maemo 5 / Fremantle by teroyk - 6 days, 20 hrs ago
more...

Page 17 of 38

Thread Tools

ferlanero	2016-01-07 , 10:18
Posts: 105 \| Thanked: 205 times \| Joined on Dec 2015 @ Spain	#161

Originally Posted by ljo

with a modest download count of 0 after already some days available in openrepos, https://openrepos.net/content/ellefj...ources-okboard or just search for OKboard in warehouse client, I recalled to update this status. Please remember the curves must go on a for å and ä and o for ö for the predictions to be the ones you expect otherwise they will be totally and insanely wrong.

Hi Ijo. I'm very interesting in the process you follow to generate the swedish OKBoard in order to generate the same in Spanish language. I already have generated the Spanish dictionaries, but now I don't know how to continue the process. Can you help me, please?

Thanks in advance!

Quote & Reply |

cy8aer	2016-01-07 , 10:53
Posts: 281 \| Thanked: 679 times \| Joined on Feb 2010	#162

https://together.jolla.com/question/...post-id-125196 ist the same with german. How and from where did you get the corpus stuff?

Oups seems to be a loop post now. sorry...

Quote & Reply |

ferlanero	2016-01-07 , 11:12
Posts: 105 \| Thanked: 205 times \| Joined on Dec 2015 @ Spain	#163

I get the dict's from here:

https://github.com/titoBouzout/Dicti...s/blob/master/

Quote & Reply |

The Following User Says Thank You to ferlanero For This Useful Post:
juiceme

cy8aer	2016-01-07 , 11:19
Posts: 281 \| Thanked: 679 times \| Joined on Feb 2010	#164

Originally Posted by ferlanero

I get the dict's from here:

https://github.com/titoBouzout/Dicti...s/blob/master/

Hm according to @eber42 on tjc you need corpus files:

"Corpus files are compressed plain text files (.txt.bz2) and not tar files. So you can just join different sources before compression. They should contain only sentences separated with dots and/or blank lines, and with proper capitalization."

Quote & Reply |

The Following User Says Thank You to cy8aer For This Useful Post:
juiceme

ljo	2016-01-07 , 11:37
Posts: 102 \| Thanked: 187 times \| Joined on Jan 2010	#165

@ferlanero, I get no clear status of where you are in the processing of the corpus file, since you say you got the hunspell dicts from github. The hunspell dicts cannot be used with OKboard more than as word lists.
So please give clear info on if you with "I have already generated the Spanish dictionaries", mean you have copied the hunspell ones only or actually processed a corpus of Spanish texts according to the README.md.

Quote & Reply |

The Following User Says Thank You to ljo For This Useful Post:
juiceme

Oxxyria	2016-01-07 , 12:36
Posts: 16 \| Thanked: 11 times \| Joined on Jun 2015	#166

@ljo. Swedish works very nicely, even though the åäö keys can't be used. Thanks alot

Quote & Reply |

cvp	2016-01-07 , 13:52
Posts: 738 \| Thanked: 819 times \| Joined on Jan 2012 @ Berlin	#167

Originally Posted by cy8aer

Hm according to @eber42 on tjc you need corpus files:

"Corpus files are compressed plain text files (.txt.bz2) and not tar files. So you can just join different sources before compression. They should contain only sentences separated with dots and/or blank lines, and with proper capitalization."

can you please share the DE lang? Or Upload into openrepos!?

__________________

www.sailfishmods.de

Quote & Reply |

The Following 3 Users Say Thank You to cvp For This Useful Post:
JoOppen, velox, willi6868

ferlanero	2016-01-07 , 16:04
Posts: 105 \| Thanked: 205 times \| Joined on Dec 2015 @ Spain	#168

Originally Posted by ljo

@ferlanero, I get no clear status of where you are in the processing of the corpus file, since you say you got the hunspell dicts from github. The hunspell dicts cannot be used with OKboard more than as word lists.
So please give clear info on if you with "I have already generated the Spanish dictionaries", mean you have copied the hunspell ones only or actually processed a corpus of Spanish texts according to the README.md.

I only have processed a corpus of Spanish from hunspell dicts in github... I don't know where to find more words or sentences or even if they are necessary and how to make it working with OKBoard. I have enough processing power to do that but I can't understand how to make it. And I think a predictable keyboard to Sailfish is a "must to" for this operating system.

If someone here could write a step by step guide in order to add more languages support to this keyboard I could do it without any problem

Thank you very much!!

Quote & Reply |

cvp	2016-01-07 , 16:30
Posts: 738 \| Thanked: 819 times \| Joined on Jan 2012 @ Berlin	#169

can someone explain how i setup the WORK_DIR and CORPUS_DIR environments variable

__________________

www.sailfishmods.de

Quote & Reply |

ferlanero	2016-01-07 , 17:16
Posts: 105 \| Thanked: 205 times \| Joined on Dec 2015 @ Spain	#170

Originally Posted by cvp

can someone explain how i setup the WORK_DIR and CORPUS_DIR environments variable

The steps to do that are these:

-You need a linux environment (I'm using Archlinux, but Ubuntu or some other works too)

- You need to download the tarball first: http://git.tuxfamily.org/okboard/okb...master.tar.bz2 and uncompress it at your /home directory

- You need the dictionaries. I take it from https://github.com/titoBouzout/Dictionaries but it needs to be adjusted, so I attach the file already processed (see Spanish.dic.txt.zip on this post)

-You need the corpora files of your language (e.g. Spanish)
http://corpora2.informatik.uni-leipzig.de/download.html
http://www.cs.upc.edu/~nlp/wikicorpus/
http://opus.lingfil.uu.se/OpenSubtitles2016.php
http://www.lllf.uam.es/ESP/Corlec.html
https://tatoeba.org/spa/downloads

Take in mind this tip to make your corpora files:
Corpora file =< 4GB for 16GB RAM Computers
Corpora file =< 1,5GB for 8GB RAM Computers

- You need the "aspell-es" package (in case of Spanish) instaled from the repos of your distro.

- You need "lbzip2" package installed on your system too.

-You need "rsync" installed on your system.

-You need "QT5" installed on your system.

-You need "python3-dev" installed on your system.

- Now you need to create a folder somewhere and put the dictionary inside (e.g. /home/USERNAME/okboard/langs)

-If you have several corpora files, then:

Code:

cat file1 file2 file3 file4 file5 > corpus-es.txt

- Open a terminal window

- And set the two environment variables:

*NOTE: You should change USERNAME by your own name

Code:

export CORPUS_DIR=/home/USERNAME/okboard/langs

Code:

export WORK_DIR=/home/USERNAME/okboard/langs

- You can see those variables with

Code:

echo $VARIABLE_NAME

if you're curious

- You need to compress the file (corpus-es.txt) you put before in /home/username/okboard/langs:

Code:

bzip2 /home/USERNAME/okboard/langs/corpus-es.txt

- Now should be named corpus-$LANG.txt.bz2 In our case: corpus-es.txt.bz2 because of Spanish

- There should be a single file inside.

- The next thing is to do is to move in okboard files inside the same Terminal window in our case "/home/username/okb-engine-master/". Here is the okboard's source code.

Code:

cd /home/USERNAME/okb-engine-master/

- In 'db' folder you must create a lang-es.cf file first. You can copy it from another .cf file in the same folder (e.g. copy lang-en.cf and rename it into lang-es.cf)

-And left only ASCII characters on those files:

Code:

lbzip2 -d < /home/USERNAME/okboard/langs/corpus-es.txt.bz2 | ./tools/clean_corpus.py | lbzip2 > /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2

Then move the corpus-es.txt original file to a safe place:

Code:

mv /home/USERNAME/okboard/langs/corpus-es.txt.bz2 /home/USERNAME/okboard

And rename the cleaned file fro non valid ASCII characters to the proper name to start the process:

Code:

mv /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2 /home/USERNAME/okboard/langs/corpus-es.txt.bz2

- Execute

Code:

db/build.sh es

("es" in case of Spanish)

- After this, the script creates the dictionaries for OKBoard in path /home/USERNAME/okboard/langs/ :

add-words-es.txt
affixes-es.txt
clusters-es.log
clusters-es.txt
corpus-es.txt.bz2
db.version
es-full.dict
es-full.tre
es-learn.txt.bz2
es-predict.dict
es-test.txt.bz2
es.tre
grams-es-full.csv.bz2
grams-es-learn.csv.bz2
grams-es-test.csv.bz2
lang-es.cf
ngrams-es.rpt
predict-es.db
predict-es.id
predict-es.ng
predict-es.rpt.bz2
predict-es.txt.bz2
tmp-words-es.txt

- So, now we have the Spanish dictionary created.

Now we have to compress in .gz the files OKBoard will use to swype our texts:

Code:

gzip -9 /home/USERNAME/okboard/langs/es.tre

Code:

gzip -9 /home/USERNAME/okboard/langs/predict-es.db

Code:

gzip -9 /home/USERNAME/okboard/langs/predict-es.ng

Now, connect the phone via ssh to the computer as I explain here: http://www.linuxleon.org/2015/09/how...jolla-con.html

And follow these instructions to create RPM package directly on your Sailfish phone: http://talk.maemo.org/showthread.php?t=92963

After that, place .gz files onto phone:

Code:

scp /home/USERNAME/okboard/langs/es.tre.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .

Code:

scp /home/USERNAME/okboard/langs/predict-es.db.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .

Code:

scp /home/USERNAME/okboard/langs/predict-es.id /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .

Code:

scp /home/USERNAME/okboard/langs/predict-es.ng.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .

Last edited by ferlanero; 2017-01-10 at 10:05.

Quote & Reply |

The Following 6 Users Say Thank You to ferlanero For This Useful Post:
Feathers McGraw, Jordi, juiceme, LameDuck, nodevel, nthn

Page 17 of 38

Active Topics

How many of you still use N900 as main phone? (595)

Big news (7)

Firefox with Leste (8)

Which is the best N95? What software modifications could be made to it? (3)

Installing CSSU Stable in year 2024 (2)