![]() |
Re: Advanced text entry on Sailfish (Swype or similar)
Quote:
Thanks in advance! |
Re: Advanced text entry on Sailfish (Swype or similar)
https://together.jolla.com/question/...post-id-125196 ist the same with german. How and from where did you get the corpus stuff?
Oups seems to be a loop post now. sorry... |
Re: Advanced text entry on Sailfish (Swype or similar)
|
Re: Advanced text entry on Sailfish (Swype or similar)
Quote:
"Corpus files are compressed plain text files (.txt.bz2) and not tar files. So you can just join different sources before compression. They should contain only sentences separated with dots and/or blank lines, and with proper capitalization." |
Re: Advanced text entry on Sailfish (Swype or similar)
@ferlanero, I get no clear status of where you are in the processing of the corpus file, since you say you got the hunspell dicts from github. The hunspell dicts cannot be used with OKboard more than as word lists.
So please give clear info on if you with "I have already generated the Spanish dictionaries", mean you have copied the hunspell ones only or actually processed a corpus of Spanish texts according to the README.md. |
Re: Advanced text entry on Sailfish (Swype or similar)
@ljo. Swedish works very nicely, even though the åäö keys can't be used. Thanks alot :)
|
Re: Advanced text entry on Sailfish (Swype or similar)
Quote:
|
Re: Advanced text entry on Sailfish (Swype or similar)
Quote:
If someone here could write a step by step guide in order to add more languages support to this keyboard I could do it without any problem :) Thank you very much!! |
Re: Advanced text entry on Sailfish (Swype or similar)
can someone explain how i setup the WORK_DIR and CORPUS_DIR environments variable
|
Re: Advanced text entry on Sailfish (Swype or similar)
Quote:
-You need a linux environment (I'm using Archlinux, but Ubuntu or some other works too) - You need to download the tarball first: http://git.tuxfamily.org/okboard/okb...master.tar.bz2 and uncompress it at your /home directory - You need the dictionaries. I take it from https://github.com/titoBouzout/Dictionaries but it needs to be adjusted, so I attach the file already processed (see Spanish.dic.txt.zip on this post) -You need the corpora files of your language (e.g. Spanish) http://corpora2.informatik.uni-leipzig.de/download.html http://www.cs.upc.edu/~nlp/wikicorpus/ http://opus.lingfil.uu.se/OpenSubtitles2016.php http://www.lllf.uam.es/ESP/Corlec.html https://tatoeba.org/spa/downloads Take in mind this tip to make your corpora files: Corpora file =< 4GB for 16GB RAM Computers Corpora file =< 1,5GB for 8GB RAM Computers - You need the "aspell-es" package (in case of Spanish) instaled from the repos of your distro. - You need "lbzip2" package installed on your system too. -You need "rsync" installed on your system. -You need "QT5" installed on your system. -You need "python3-dev" installed on your system. - Now you need to create a folder somewhere and put the dictionary inside (e.g. /home/USERNAME/okboard/langs) -If you have several corpora files, then: Code:
cat file1 file2 file3 file4 file5 > corpus-es.txt- And set the two environment variables: Quote:
Code:
export CORPUS_DIR=/home/USERNAME/okboard/langsCode:
export WORK_DIR=/home/USERNAME/okboard/langsCode:
echo $VARIABLE_NAME- You need to compress the file (corpus-es.txt) you put before in /home/username/okboard/langs: Code:
bzip2 /home/USERNAME/okboard/langs/corpus-es.txt- There should be a single file inside. - The next thing is to do is to move in okboard files inside the same Terminal window in our case "/home/username/okb-engine-master/". Here is the okboard's source code. Code:
cd /home/USERNAME/okb-engine-master/-And left only ASCII characters on those files: Code:
lbzip2 -d < /home/USERNAME/okboard/langs/corpus-es.txt.bz2 | ./tools/clean_corpus.py | lbzip2 > /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2Code:
mv /home/USERNAME/okboard/langs/corpus-es.txt.bz2 /home/USERNAME/okboardCode:
mv /home/USERNAME/okboard/langs/clean_corpus-es.txt.bz2 /home/USERNAME/okboard/langs/corpus-es.txt.bz2Code:
db/build.sh es- After this, the script creates the dictionaries for OKBoard in path /home/USERNAME/okboard/langs/ : add-words-es.txt affixes-es.txt clusters-es.log clusters-es.txt corpus-es.txt.bz2 db.version es-full.dict es-full.tre es-learn.txt.bz2 es-predict.dict es-test.txt.bz2 es.tre grams-es-full.csv.bz2 grams-es-learn.csv.bz2 grams-es-test.csv.bz2 lang-es.cf ngrams-es.rpt predict-es.db predict-es.id predict-es.ng predict-es.rpt.bz2 predict-es.txt.bz2 tmp-words-es.txt - So, now we have the Spanish dictionary created. Now we have to compress in .gz the files OKBoard will use to swype our texts: Code:
gzip -9 /home/USERNAME/okboard/langs/es.treCode:
gzip -9 /home/USERNAME/okboard/langs/predict-es.dbCode:
gzip -9 /home/USERNAME/okboard/langs/predict-es.ngAnd follow these instructions to create RPM package directly on your Sailfish phone: http://talk.maemo.org/showthread.php?t=92963 After that, place .gz files onto phone: Code:
scp /home/USERNAME/okboard/langs/es.tre.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .Code:
scp /home/USERNAME/okboard/langs/predict-es.db.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .Code:
scp /home/USERNAME/okboard/langs/predict-es.id /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ .Code:
scp /home/USERNAME/okboard/langs/predict-es.ng.gz /home/nemo/rpmbuild/BUILD/okboard-spanish-x.x-1.arm/ . |
| All times are GMT. The time now is 23:46. |
vBulletin® Version 3.8.8