|
|
2016-01-07
, 20:58
|
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#172
|
| The Following 3 Users Say Thank You to ljo For This Useful Post: | ||
|
|
2016-01-07
, 21:16
|
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#173
|
~/okb-engine/db $ sh build.sh de
build.sh: 5: set: Illegal option -o pipefail
|
|
2016-01-07
, 22:41
|
|
Posts: 105 |
Thanked: 205 times |
Joined on Dec 2015
@ Spain
|
#174
|
@ferlanero, it is better to direct people to http://git.tuxfamily.org/okboard/okb...tree/README.md directly. Like I said before you cannot use a dictionary as corpus, it does not work, since it needs to be running texts with actual sentences, a lot of sentences, like explained in the README.md.
|
|
2016-01-07
, 22:52
|
|
Posts: 89 |
Thanked: 243 times |
Joined on Jun 2014
|
#175
|
For example, I don't know where to look for the sentences you speak about, so maybe someone could give me some clue about it...
To train the model you need to feed it with a huge volume of text. The text should be representative of the kind of text you will type.
For example is you use a Wikipedia corpus, the keyboard will be very uncooperative if you try to type informal text that would look unnatural in a Wikipedia article.
Building language files is not just a matter of pouring random text in the build tool or you will end up with a high error rate.
I recommend using a lot of text (my French corpus is over 40 million words, and in some cases this is not enough), and using different kind of documents: articles (new / wikipedia), e-mail, IRC and chat logs ...
| The Following 2 Users Say Thank You to ssahla For This Useful Post: | ||
|
|
2016-01-07
, 23:04
|
|
Posts: 105 |
Thanked: 205 times |
Joined on Dec 2015
@ Spain
|
#176
|
|
|
2016-01-07
, 23:12
|
|
Posts: 105 |
Thanked: 205 times |
Joined on Dec 2015
@ Spain
|
#177
|
### How to distribute language files
You have several options for distributing language files (`$LANG.tre`, `predict-$LANG.ng`, `predict-$LANG.db`):
* Just copy them to any Jolla device in `~/.local/share/okboard/`. When you switch language on the keyboard, new files will be avalable. No need to restart the keyboard.
|
|
2016-01-08
, 00:36
|
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#178
|
And other error.
When I activate OKBoard, the program automatically deletes the files (`$LANG.tre`, `predict-$LANG.ng`, `predict-$LANG.db`) in my case (`es.tre`, `predict-es.ng`, `predict-es.db`) form `~/.local/share/okboard/`
So maybe the README.md needs more support...
| The Following 2 Users Say Thank You to ljo For This Useful Post: | ||
|
|
2016-01-08
, 02:20
|
|
Posts: 105 |
Thanked: 205 times |
Joined on Dec 2015
@ Spain
|
#179
|
[ferlanero@ferlanero-imac okb-engine-master]$ db/build.sh es
Building for languages: es
~/okb-engine-master/ngrams ~/okb-engine-master/db
running build
running build_ext
running build
running build_ext
~/okb-engine-master/db
~/okb-engine-master/cluster ~/okb-engine-master/db
make: No se hace nada para 'first'.
~/okb-engine-master/db
«/home/ferlanero/okb-engine-master/db/lang-en.cf» -> «/home/ferlanero/okboard/langs/lang-en.cf»
«/home/ferlanero/okb-engine-master/db/lang-es.cf» -> «/home/ferlanero/okboard/langs/lang-es.cf»
«/home/ferlanero/okb-engine-master/db/lang-fr.cf» -> «/home/ferlanero/okboard/langs/lang-fr.cf»
«/home/ferlanero/okb-engine-master/db/lang-nl.cf» -> «/home/ferlanero/okboard/langs/lang-nl.cf»
«/home/ferlanero/okb-engine-master/db/add-words-fr.txt» -> «/home/ferlanero/okboard/langs/add-words-fr.txt»
«/home/ferlanero/okb-engine-master/db/db.version» -> «/home/ferlanero/okboard/langs/db.version»
make: '.depend-es' está actualizado.
( [ -f "add-words-es.txt" ] && cat "add-words-es.txt" ; aspell -l es dump master ) | sort | uniq > es-full.dict
lbzip2 -d < /home/ferlanero/okboard/langs/corpus-es.txt.bz2 | /home/ferlanero/okb-engine-master/db/../tools/corpus-splitter.pl 200 50 es-learn.tmp.bz2 es-test.tmp.bz2
mv -vf es-learn.tmp.bz2 es-learn.txt.bz2
«es-learn.tmp.bz2» -> «es-learn.txt.bz2»
mv -vf es-test.tmp.bz2 es-test.txt.bz2
«es-test.tmp.bz2» -> «es-test.txt.bz2»
set -o pipefail ; lbzip2 -d < es-learn.txt.bz2 | /home/ferlanero/okb-engine-master/db/../tools/import_corpus.py es-full.dict | sort -rn | lbzip2 -9 > grams-es-full.csv.bz2.tmp
mv -f grams-es-full.csv.bz2.tmp grams-es-full.csv.bz2
set -o pipefail ; lbzip2 -d < grams-es-full.csv.bz2 | grep ';#NA;#NA;' | cut -f '1,4' -d';' \
| grep -v '#TOTAL' | sort -rn | cut -d';' -f 2 | egrep -v '^(i)$' | tee words-es.txt \
| sed -n "1,30000 p" > es-predict.dict.tmp # ok i've re-implemented "head" with sed to avoid ugly sigpipes (which hurt with -o pipefail)
mv -f es-predict.dict.tmp es-predict.dict
set -o pipefail ; lbzip2 -d < es-learn.txt.bz2 | /home/ferlanero/okb-engine-master/db/../tools/import_corpus.py es-predict.dict | lbzip2 -9 > grams-es-learn.csv.bz2.tmp
/home/ferlanero/okb-engine-master/db/../tools/loadkb.py es-full.tre < es-full.dict
set -o pipefail ; lbzip2 -d < es-test.txt.bz2 | /home/ferlanero/okb-engine-master/db/../tools/import_corpus.py es-predict.dict | lbzip2 -9 > grams-es-test.csv.bz2.tmp
mv -f grams-es-learn.csv.bz2.tmp grams-es-learn.csv.bz2
Computing clusters for language es. Please make some coffee ...
(logs can be found in clusters-es.log)
set -o pipefail ; lbzip2 -d < grams-es-learn.csv.bz2 | sort -rn | sed -n "1,13500000 p" \
| /home/ferlanero/okb-engine-master/db/../tools/cluster -n 10 -o clusters-es.tmp > clusters-es.log 2>&1
mv -f clusters-es.tmp clusters-es.txt
mv -f grams-es-test.csv.bz2.tmp grams-es-test.csv.bz2
1000
set -o pipefail ; lbzip2 -d < grams-es-learn.csv.bz2 \
| /home/ferlanero/okb-engine-master/db/../tools/clusterize.py -l 8 -w 200000 -c 500000 clusters-es.txt \
| tee predict-es.txt \
| /home/ferlanero/okb-engine-master/db/../tools/load_cdb_fslm.py predict-es-tmp.db
Import CSV corpus data ...
Dumping compressed ngram file ...
Dumping words to database ...
2000
lbzip2 -9fv predict-es.txt
lbzip2: compressing "predict-es.txt" to "predict-es.txt.bz2"
lbzip2: "predict-es.txt": compression ratio is 1:2.274, space savings is 56.02%
/home/ferlanero/okb-engine-master/db/../tools/db_param.py predict-es-tmp.db version 11
lbzip2 -9f predict-es-tmp.rpt
mv -f predict-es-tmp.db predict-es.db
mv -f predict-es-tmp.ng predict-es.ng
mv -f predict-es-tmp.rpt.bz2 predict-es.rpt.bz2
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
26000
27000
28000
29000
30000
31000
32000
33000
34000
35000
36000
37000
38000
39000
40000
41000
42000
43000
44000
45000
46000
47000
48000
49000
50000
51000
52000
53000
54000
55000
56000
/home/ferlanero/okb-engine-master/db/../tools/loadkb.py es.tre < words-es.txt # all word seens in learn corpus (smaller than full directory, but bigger than prediction learning dictionary)
OK es
sending incremental file list
es-full.tre
es.tre
predict-es.db
predict-es.ng
predict-es.rpt.bz2
sent 2,423,995 bytes received 111 bytes 4,848,212.00 bytes/sec
total size is 2,423,052 speedup is 1.00
|
|
2016-01-08
, 07:17
|
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#180
|
Ok. Thanks for the info. Now I can figure how OKBoard is working with the corpus files. But my question remain: Why OKBoard deletes the `es.tre`, `predict-es.ng`, `predict-es.db` files that I copy into `~/.local/share/okboard/` in Sailfish OS to test if it works, while in the creation process "db/build.sh es" doesn't give any errors?
Here is my log:
| The Following User Says Thank You to ljo For This Useful Post: | ||
![]() |
| Tags |
| bettertxtentry, huntnpeck sucks, okboard, sailfish, swype |
|
~/okb-engine/db $ sh build.sh de
build.sh: 5: set: Illegal option -o pipefail
PS: if i want check this:
echo $VARIABLE_NAME
i get no list.
Last edited by cvp; 2016-01-07 at 20:19.