maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   SailfishOS (https://talk.maemo.org/forumdisplay.php?f=52)
-   -   Advanced text entry on Sailfish (Swype or similar) (https://talk.maemo.org/showthread.php?t=92764)

ferlanero 2016-01-18 00:15

Re: Advanced text entry on Sailfish (Swype or similar)
 
Good news. OKBoard already has Spanish language to work with :)

All the problems was related to amount of RAM. 4GB isn't enough to process corpus.txt of 3 GB, so I have to reduce it drastically. For next week I already have ordered 16Gb for my PC so I can improve the accuracy of Spanish predictive in OKBoard.

Thank you for all your support!! I'll keep you informed about the process of a RPM creation to install Spanish language for OKBoard :)

mautz 2016-01-19 19:46

Re: Advanced text entry on Sailfish (Swype or similar)
 
I think that a good dictionary is far more important than an extremely huge corpora file. Mine is only 200mb compressed and with my dictionary the prediction accuracy is nearly perfect.

ssahla 2016-01-19 23:13

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by mautz (Post 1495528)
I think that a good dictionary is far more important than an extremely huge corpora file. Mine is only 200mb compressed and with my dictionary the prediction accuracy is nearly perfect.

This probably depends on the language? With Finnish for example, you need to have all the inflected forms of a word (kirjoittaa-kirjoitan-kirjoitat-kirjoitamme-kirjoitatte-kirjoittavat and a few dozen more for the verb ”to write”, just to have the common ones) and you need to know which one is most probably used in a context...

(I would very much like to have OKBoard working with Finnish, but just haven't had the time to look into building it myself yet... Hoping someone will beat me to it :)

ljo 2016-01-20 01:40

Re: Advanced text entry on Sailfish (Swype or similar)
 
@ssahla, @mautz, yes, a good mix of genres and basic understanding of lexicon and the collected corpus data can probably save some time and effort. 100 MB compressed is probably the threshold to aim to get over for the corpus data. Zipfs law will make it hard to find more than a fraction of the inflection forms of morphologically rich languages, so a good frequency dictionary could be beneficial if it is hard to find available corpus data.

mautz 2016-01-20 06:14

Re: Advanced text entry on Sailfish (Swype or similar)
 
I used all the news files from the leipzig website to build my corpus file. The wiki articles use a to formal and sometimes scientific speech for everday use i think. For the dictionary i found a file which contains the 10000 most used words of my language(german in my case) and i dumped my sms conversation from my jolla and added them to the dictionary file as well. A good think to add to the corpora file are some ebooks which use everyday language.

I'm fine tuning my dictionary and corpus file and will look into building an rpm file to release the german language on openrepos in a few days.

mautz 2016-01-20 12:56

Re: Advanced text entry on Sailfish (Swype or similar)
 
German Language for OKBoard- first try. Feedback welcome.

https://www.dropbox.com/s/qwruts98l9...mv7hl.rpm?dl=0

ferlanero 2016-01-20 13:02

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by mautz (Post 1495625)
German Language for OKBoard- first try. Feedback welcome.

https://www.dropbox.com/s/qwruts98l9...mv7hl.rpm?dl=0

Hi, Mautz! I already have the files of Spanish working with OKBoard... so can you explain here the steps for building the .rpm for the distribution, please?

mautz 2016-01-20 13:40

Re: Advanced text entry on Sailfish (Swype or similar)
 
I used Schturmans Guide for NOOBS.
It is not necessary to build the RPMs as root or use the root home directory.

Just modify the spec file for example like this:


Code:

Name:          okboard-spanish
Version:      0.1
Release:      1
Summary:      Spanish language for OKBoard
Group:        System/Tools
Vendor:        YourName/Alias
Distribution:  SailfishOS
Packager: YourName/Alias <your@mail.adress>


License:      GPL

%description
Spanish language files for OkBoard

%files

%defattr(-,root,root,-)
/usr/share/okboard/es.tre.gz
/usr/share/okboard/predict-es.db.gz
/usr/share/okboard/predict-es.ng.gz

%post

%postun
if [ $1 = 0 ]; then
    // Do stuff specific to uninstalls
rm -rf /usr/share/okboard/es.tre.gz
rm -rf /usr/share/okboard/predict-es.db.gz
rm -rf /usr/share/okboard/predict-es.ng.gz

else
if [ $1 = 1 ]; then
    // Do stuff specific to upgrades
echo "It's just upgrade"
fi
fi

%changelog
* Wed Jan 20 2016 YourName <your@mail.adress> 0.1
- First build

GZip the three files for OKBoard like described in the Readme.md.

Create the folder
/home/nemo/rpmbuild/BUILD/okboard-spanish-0.1-1.arm/usr/share/okboard

The folder name in the BUILD directory depends on the spec file entries.
<Name>-<Version>-<Release>.arm

Copy the three OKBoard language files into the newly created folder.

Copy your created .spec file to /home/nemo/rpmbuild/SPECS

Build the RPM on your Jolla.

LameDuck 2016-01-20 15:27

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by mautz (Post 1495625)
German Language for OKBoard- first try. Feedback welcome.

https://www.dropbox.com/s/qwruts98l9...mv7hl.rpm?dl=0

also bei mir klappt es schon ganz gut .einige, zum Teil wichtige Worte kennt das Wörterbuch (noch) nicht aber diesen Text tippe ich mit dem Okboard.
Danke fürs hoch laden!

EDIT:
Sorry for writing in german, I had to try the german dictionary for OKBoard :)

I just wrote some SMS and no longer texts yet, so this is my first Impression. BUt I can say: Looks very promising, only two issues which make usage a bit difficult:
  • Some common words, which I would expect in the list of most used words, seem to be missing. E.g. "Danke" (=thank you) isn't recognized (nor is it in the suggestion list).
  • Words with Umlauts are rarely recognised. I guess that this feature is not implemented in OKBoard yet.

Thanks for your work @Eber, @mautz and everybody else!

EDIT2:
to enter words with umlauts you have to swipe over the non-umlauted letter (eg Ol for Öl and uber for über)
so no problems with umlauts ...

cvp 2016-01-20 17:17

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by mautz (Post 1495625)
German Language for OKBoard- first try. Feedback welcome.

https://www.dropbox.com/s/qwruts98l9...mv7hl.rpm?dl=0

some basic stuf like "deiner" is not working perfekt. but thank you very match !!!! now i can use swype like keyboard ^^

now with color smileys from dolphin keyboard and all is good enough :D

velox 2016-01-20 18:37

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by mautz (Post 1495625)
German Language for OKBoard- first try. Feedback welcome.

https://www.dropbox.com/s/qwruts98l9...mv7hl.rpm?dl=0

Great news, thanks!

I tried some words starting with umlauts – they don't seem to be in there or are buggy somehow: "Änderung", "Überraschung", "östlich" and some others I've forgotten since. When I swipe "ärger", it thinks the word starts with an L and suggests words like "lehrt".

The words "bösen", "böse" and "plötzlich" work, though. All in all it's still pretty useful.

For the non-germans:
I present to you a post with some funny words! Enjoy and try to say some of them out loud!

mautz 2016-01-20 18:41

Re: Advanced text entry on Sailfish (Swype or similar)
 
Sorry, i've forgotten to mention that you have to swype over a for ä, u for ü and o for ö. I'm going to add some more common words to the dictionary. Thanks for the feedback.

velox 2016-01-20 18:49

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by mautz (Post 1495681)
Sorry, i've forgotten to mention that you have to swype over a for ä, u for ü and o for ö. I'm going to add some more common words to the dictionary. Thanks for the feedback.

Awesome, thanks! I guess with a few more words, this should be ready for openrepos.

My take on the corpus selection: I'd really like at least a small Wiki portion in there, because sometimes it's nice to use correct terminology e.g. when writing to colleagues. Wikipedia should be unbeatable for those words, regardless of profession.

My dream would actually be a "hybrid" with de and en, because I frequently have to change between those. But now, at least, I can change and still swipe. Thanks again!

mautz 2016-01-20 21:37

Re: Advanced text entry on Sailfish (Swype or similar)
 
New german language version with a much bigger dictionary:

https://www.dropbox.com/s/7us6j0lg7u...mv7hl.rpm?dl=0

cvp 2016-01-20 22:44

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by mautz (Post 1495717)
New german language version with a much bigger dictionary:

https://www.dropbox.com/s/7us6j0lg7u...mv7hl.rpm?dl=0

Works much better! Thank you !!!!

MoritzJT 2016-01-21 00:46

Re: Advanced text entry on Sailfish (Swype or similar)
 
Is it actually possible to get the colour emoji from Dolphin to be available with OKBoard?

Then I could finally ditch all other keyboards!

mautz 2016-01-21 06:23

Re: Advanced text entry on Sailfish (Swype or similar)
 
The german language files are downloadable from Openrepos now.

https://openrepos.net/content/mautz/...-files-okboard

@velox

I will include some wiki articles in the corpora for the next build.

itdoesntmatt 2016-01-21 17:33

Re: Advanced text entry on Sailfish (Swype or similar)
 
for Eber42 , maybe i have found another bug,apart for the transparency known one with default sailfish's browser

http://talk.maemo.org/showthread.php...14#post1495814

GuSec 2016-01-21 17:36

Re: Advanced text entry on Sailfish (Swype or similar)
 
Great work with OKBoard. It's a really appreciated effort to finally get a Swype-like textual entry method on Jolla!

I have some questions though. First, why doesn't swyping over the special characters work? Like å, ä and ö in Swedish and the umlauts in German? Is this a fundamental restriction of maliit? Secondly, both me and my partner are having problems with the keyboard becoming half invisible in the browser fields. It's still usable, but a bit tricky. We both have the newest Sailfish OS.

Thirdly, is there some plan on the dictionary being intelligent by adding words typed? It's a bit troublesome when some words are lacking and don't seem to show up even with frequent manual usage.

And lastly, are there plans on expanding with the N9 swype gestures (or others) for formatting? For example, I remember there being a swype from comma to space (for space after comma) and a method of not adding space after a word (I think it was swype away from keyboard at the end). The last action, not adding a space, is really useful for compound words. As of now it doesn't seem possible to even use backspace and then swype a new one.

These are not complaints, merely feedback. I'm very grateful for your work. As a great fan of the N9, thank you!

mautz 2016-01-21 17:56

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by GuSec (Post 1495816)

Thirdly, is there some plan on the dictionary being intelligent by adding words typed? It's a bit troublesome when some words are lacking and don't seem to show up even with frequent manual usage.

The keyboard has a learning feature. You can enable or disable it in the OKBoard app. New words will be added after using them a few times...

ljo 2016-01-21 23:42

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by GuSec (Post 1495816)
I have some questions though.
1) First, why doesn't swyping over the special characters work? Like å, ä and ö in Swedish and the umlauts in German? Is this a fundamental restriction of maliit?

2) is there some plan on the dictionary being intelligent by adding words typed? It's a bit troublesome when some words are lacking and don't seem to show up even with frequent manual usage.

1) This is due to modeling restrictions of okboard that only a-z are supported on the keyboard. They are not special characters, just letters of one or more alphabets, but please remember this is a young project based on Erics own itches primarily and who had specific alphabet(s) without these characters as separate letters and common keyboard letter coordinates in mind. We are though thinking hard on how to extend the model. So meanwhile see it as a way to practice mental training to make curves with fewer keys. ;)
2) It is learning, unless you turned learning off. My experience is that if you type the actual letters for something you curved without a prediction inserted, the next time you make the curves it gives you the alternative.
Please PM any words you think is not showing up even with frequent manual use and your procedure, eg trying only typing not curving etc. It is all about curving.

ferlanero 2016-01-22 19:30

Re: Advanced text entry on Sailfish (Swype or similar)
 
Hi boys! At last, OKBoard has full support for Spanih language :)

https://openrepos.net/content/ferlan...nguage-okboard

And it's very, very responsive ;)

Thank you very much for all your help!

ferlanero 2016-01-22 19:38

Re: Advanced text entry on Sailfish (Swype or similar)
 
Now due to request from the Italian people I want to star the project of Italian support for OKBoard. I already have all the corpora sources, but when I already has everything ready to the process the script show me this error:

Code:

[ferlanero@ferlanero-XPS okb-engine-master]$ db/build.sh it
Building for languages:  it
~/okb-engine-master/ngrams ~/okb-engine-master/db
running build
running build_ext
running build
running build_ext
~/okb-engine-master/db
~/okb-engine-master/cluster ~/okb-engine-master/db
make: No se hace nada para 'first'.
~/okb-engine-master/db
«/home/ferlanero/okb-engine-master/db/lang-en.cf» -> «/home/ferlanero/okboard/langs/lang-en.cf»
«/home/ferlanero/okb-engine-master/db/lang-fr.cf» -> «/home/ferlanero/okboard/langs/lang-fr.cf»
«/home/ferlanero/okb-engine-master/db/lang-it.cf» -> «/home/ferlanero/okboard/langs/lang-it.cf»
«/home/ferlanero/okb-engine-master/db/lang-nl.cf» -> «/home/ferlanero/okboard/langs/lang-nl.cf»
«/home/ferlanero/okb-engine-master/db/add-words-fr.txt» -> «/home/ferlanero/okboard/langs/add-words-fr.txt»
«/home/ferlanero/okb-engine-master/db/db.version» -> «/home/ferlanero/okboard/langs/db.version»
make: '.depend-it' está actualizado.
( [ -f "add-words-it.txt" ] && cat "add-words-it.txt" ; aspell -l it dump master | aspell -l it expand | tr ' ' '\n') | sort | uniq > it-full.dict
lbzip2 -d < /home/ferlanero/okboard/langs/corpus-it.txt.bz2 | /home/ferlanero/okb-engine-master/db/../tools/corpus-splitter.pl 200 50 it-learn.tmp.bz2 it-test.tmp.bz2
mv -vf it-learn.tmp.bz2 it-learn.txt.bz2
«it-learn.tmp.bz2» -> «it-learn.txt.bz2»
mv -vf it-test.tmp.bz2 it-test.txt.bz2
«it-test.tmp.bz2» -> «it-test.txt.bz2»
set -o pipefail ; lbzip2 -d < it-learn.txt.bz2 | /home/ferlanero/okb-engine-master/db/../tools/import_corpus.py it-full.dict | sort -rn | lbzip2 -9 > grams-it-full.csv.bz2.tmp
/bin/sh: línea 1:  4592 Tubería rota          lbzip2 -d < it-learn.txt.bz2
      4593 Terminado (killed)      | /home/ferlanero/okb-engine-master/db/../tools/import_corpus.py it-full.dict
      4594 Hecho                  | sort -rn
      4595 Hecho                  | lbzip2 -9 > grams-it-full.csv.bz2.tmp
/home/ferlanero/okb-engine-master/db/makefile:43: fallo en las instrucciones para el objetivo 'grams-it-full.csv.bz2'
make: *** [grams-it-full.csv.bz2] Error 137

I think is a similar problem that we had with the "aspell-es" pakage, but this time with "aspell-it"... Because the same process with the same corpus.txt.bz2, but running "db/build.sh es" for Spanish gives no errors :/

If the problem go away, I'll be very pleasant to give italian support to OKBoard.

Thank you very much again for all your support guys ;)

itdoesntmatt 2016-01-22 21:26

Re: Advanced text entry on Sailfish (Swype or similar)
 
ferlanero really thanks!! i hope you will manage to implement italian language, if i can help please tell me. unfortunately i am not able to solve that problem, since i am not a developer.

spidernik84 2016-01-24 14:01

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by ferlanero (Post 1495937)
Now due to request from the Italian people I want to star the project of Italian support for OKBoard. I already have all the corpora sources, but when I already has everything ready to the process the script show me this error:\


Hello Ferlanero, thanks for all the efforts. I was working as well on the Italian dictionary since I was unaware of this thread. Eric, the author of okboard, pointed me here! Good to know so we don't duplicate the efforts.

I had other issues that prevented me from generating the dict, possibly related to memory errors.

Is it possible for you to get more verbosity by running bash in debug mode? I run the script like this:
bash -x db/build.sh it

Thanks.

BTW: what corpus are you using? I found a bunch of them, the most complete being PAISA http://www.corpusitaliano.it/it/cont...scription.html

But also this http://www.corpora.heliohost.org/download.html

spidernik84 2016-01-24 20:00

Re: Advanced text entry on Sailfish (Swype or similar)
 
Update: I contacted Eric, he's checking out his script. Turns out it crashes on the aspell dump, for me.
He found out this might be caused by the huge size of the Italian dictionary. For comparison, the Italian dictionary is 36 millions words compared to the French, 600k words.

I can hardly know the average ten thousand words, how did we manage to invent 36 millions of them...

Anyway, so you know ferlanero :)

ljo 2016-01-24 22:00

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by spidernik84 (Post 1496169)
For comparison, the Italian dictionary is 36 millions words compared to the French, 600k words.

I can hardly know the average ten thousand words, how did we manage to invent 36 millions of them...

@spidernik84 et al, this should rather be between 0.7-1.8 million wordforms but not much more based on the 92034 stems (roughly what we count as words) which is about the size of a standard working vocabulary of other latin script languages like french (0.63 million aspell wordforms). So there is something wrong with the assumptions in the expansion processing.

ferlanero 2016-01-25 00:24

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by spidernik84 (Post 1496136)
Hello Ferlanero, thanks for all the efforts. I was working as well on the Italian dictionary since I was unaware of this thread. Eric, the author of okboard, pointed me here! Good to know so we don't duplicate the efforts.

I had other issues that prevented me from generating the dict, possibly related to memory errors.

Is it possible for you to get more verbosity by running bash in debug mode? I run the script like this:
bash -x db/build.sh it

Thanks.

BTW: what corpus are you using? I found a bunch of them, the most complete being PAISA http://www.corpusitaliano.it/it/cont...scription.html

But also this http://www.corpora.heliohost.org/download.html

Hi spidernik84. First of all thank you very much for adding support for OKBoard :)

For Italian language I'm using this corpus files:
http://corpora2.informatik.uni-leipzig.de/download.html
http://opus.lingfil.uu.se/OpenSubtitles2016.php

Which are perfect for coloquial language :)

I already have the whole file ready to process, so, if you are interested in using it for generating the prediction keyboard for italian language, I send you the download link to process it.So, if you are working on italian language, I can focus on other languages, do you agree?

spidernik84 2016-01-25 08:04

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by ferlanero (Post 1496211)
Hi spidernik84. First of all thank you very much for adding support for OKBoard :)

You are welcome, thank you for starting the work.

Quote:

I send you the download link to process it.So, if you are working on italian language, I can focus on other languages, do you agree?
Please, yes, so I can compare them and proceed from where you started while you take care of other languages. It's good to distribute the work.
I'm still in contact with Eric, let's see what he finds.

ferlanero 2016-01-25 09:46

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by spidernik84 (Post 1496225)
You are welcome, thank you for starting the work.

Please, yes, so I can compare them and proceed from where you started

Perfect! Then provide me, please, an email or enable private messages to send there the link with my Italian corpus.

Now currently working on Swedish language ;)

ljo 2016-01-25 11:33

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by ferlanero (Post 1496239)
Now currently working on Swedish language ;)

Err, why, I already maintain and published the Swedish language resources during new year's weekend.

ferlanero 2016-01-25 11:49

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by ljo (Post 1496251)
Err, why, I already maintain and published the Swedish language resources during new year's weekend.

:D Ha, ha! It's true I didn't realize about it! Sorry. Focusing now in Portuguese :)

spidernik84 2016-01-25 20:02

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by ljo (Post 1496196)
@spidernik84 et al, this should rather be between 0.7-1.8 million wordforms but not much more based on the 92034 stems (roughly what we count as words) which is about the size of a standard working vocabulary of other latin script languages like french (0.63 million aspell wordforms). So there is something wrong with the assumptions in the expansion processing.

I think you are right. I just failed another generation attempt (ran out of 20GB of RAM plus 5GB of swap... ).
I did a comparison with the English language, this is what I see:

Code:

nico@hendrix:~/aspell/aspell6-it-2.4-20070901-0$ aspell -l en dump master | aspell -l en expand | wc
 119789  119789 1153336
nico@hendrix:~/aspell/aspell6-it-2.4-20070901-0$ aspell -l it dump master | aspell -l it expand | wc
  95193 36636439 655315062

The number of words generated for the Italian language is INSANE.
You seem to know a lot of this. Have you got any idea of what can be done to keep the dictionary smaller? I've been searching for aspell alternative dictionaries with no luck...

Thanks. I surely hope we don't need to rent a Cray cluster to generate this dict... :)

eber42 2016-01-25 20:06

Re: Advanced text entry on Sailfish (Swype or similar)
 
As discussed with spidernik84, the Italian aspell dictionary contains 34M words (with affix expansion support that was added for Spanish).
Try this :
Code:

aspell -l it dump master | aspell -l it expand | wc -w
In the current process, aspell is used for filtering out badly written words (because available texts sometimes contains errors).

Even if we fix the corpus reader script the keyboard has not been built to work with this volume: My largest language (French) contains ~100k words (and only 45k used by the word prediction engine, others are in "best effort" mode).

From a quick look I see the following causes for the large size:
  • lots of words are repeated with prefixes such as dall', sull'. At the moment my model handles words separated by quotation marks or hyphens as single words, so words with different prefixes are treated as different words. OKBoard roadmap contains an item for managing prefixes and suffixes (explicit ones with punctuation signs, or linked together as in German) as distinct words, but I don't know when (and if) I will work on it.
  • some words are repeated multiple times with weird capitalization: are these different words: Sull'Acclimatatele, sull'Acclimatatele, sull'acclimatatele ? At the moment words with different capitalizations are treated as different words (unless they are at the beginning of a sentence). But the case of words with two different capitalization is not very well handled.


Spidernik84's text corpus only contains 315k words (only counting those which are also known by aspell), so my short term suggestion is to add an option to provide a (smaller) dictionary instead of using aspell's one or to trust the input corpus to be flawless.

What do you think ?

Edit: ouch, spidernik84 was faster with wc:)

ljo 2016-01-25 20:17

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by spidernik84 (Post 1496332)
The number of words generated for the Italian language is INSANE.
You seem to know a lot of this. Have you got any idea of what can be done to keep the dictionary smaller? I've been searching for aspell alternative dictionaries with no luck...

I reduced the size by 3/4 by removing different capitalisations of the same words in the Italian dictionary. It is true some small fraction might actually be different words, but the majority is just lowercase initial letter vs uppercase initial letter differences. Comment out the %-full.dict target in the db/makefile and put the filtered word list content directly in your it-full.dict file (reduce it by axing further parts of it off if needed still).

spidernik84 2016-01-25 20:24

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by eber42 (Post 1496334)
From a quick look I see the following causes for the large size:
  • lots of words are repeated with prefixes such as dall', sull'. At the moment my model handles words separated by quotation marks or hyphens as single words, so words with different prefixes are treated as different words. OKBoard roadmap contains an item for managing prefixes and suffixes (explicit ones with punctuation signs, or linked together as in German) as distinct words, but I don't know when (and if) I will work on it.
  • some words are repeated multiple times with weird capitalization: are these different words: Sull'Acclimatatele, sull'Acclimatatele, sull'acclimatatele ? At the moment words with different capitalizations are treated as different words (unless they are at the beginning of a sentence). But the case of words with two different capitalization is not very well handled.

Hello Eber!
I never heard those words before :)
I can tell you for sure that the form dall' sull' is surely correct, but a bit too formulaic. Also, those are "articulated prepositions" in front of nouns, hence should be considered on their own. Example:

dall'anima
dall'oceano

The nouns are "anima" and "oceano", while "dall'" is the preposition. That does not justify creating a word for each preposition+word combination!
There are additional rules, naturally: for instance, that form is only used with words starting with vocals...
Good that you are thinking of handling this situation.

As for the capitalization: I would not consider common to have capitalised variants of words. Most words are either capitalised or not, so I'd prioritise lower case words when multiple variants are found.

Quote:


Spidernik84's text corpus only contains 315k words (only counting those which are also known by aspell), so my short term suggestion is to add an option to provide a (smaller) dictionary instead of using aspell's one or to trust the input corpus to be flawless.

What do you think ?

Edit: ouch, spidernik84 was faster with wc:)
We can try to skip aspell just for my language, for sure... I'm afraid of the results though: spelling mistakes are definitely common :D
It's worth a shot, I'll see what happens. Thanks for your help.

ljo 2016-01-25 20:31

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by eber42 (Post 1496334)
1) But the case of words with two different capitalization is not very well handled.
...
2) so my short term suggestion is to add an option to provide a (smaller) dictionary instead of using aspell's one or to trust the input corpus to be flawless.

What do you think ?

1) It is definitely true. I saw this with the Spanish dictionary too when I did the full corpus.

2) Yes, providing an alternative dictionary is good. Maybe just keep the dict if its there? Instruct to build clean otherwise? Assuming they are flawless is not too bad either since people still write a lot of stuff which is not covered by the aspell dictionary.

itdoesntmatt 2016-01-25 20:50

Re: Advanced text entry on Sailfish (Swype or similar)
 
sorry guys, i dont know how many of you are italian, but i am.
Dall' Sull' and other words could be just inserted as single words.
When you write sentences you actually left a space between preposition and other word.
so i think it would be better to have two words splitted:
dall/dall' (showing both option when swyped d-a-l-l ) and anima for example.

However sull' Acclimatatele for example doesnt make sense.
Sull is a preposition that preceed some noun and means over/on/regarding. for example Sull' Oceano. it means literally "over the ocean".
the ' is inserted just becaus Oceano starts with a vocal letter!

And however acclimatatele is such a very unusual word in common speech. "acclimatare" means "to get habitued to some climate condition" (for example, when you are out in the cold winter and come into your home, you spend your first minutes just to "get used" to the hotter condition).

"Acclimatate " is one of the possible conjugations (participio passato) of this verb when referring to female &plural nouns (a group of women for example).

"AcclimatateLE" it literally means "make them acclimatized/ambiented"

so i mean, those are words not very frequently used in speech.
sorry for my bad teacher skills.

spidernik84 2016-01-25 21:01

Re: Advanced text entry on Sailfish (Swype or similar)
 
Quote:

Originally Posted by itdoesntmatt (Post 1496345)
sorry guys, i dont know how many of you are italian, but i am.
Dall' Sull' and other words could be just inserted as single words.
When you write sentences you actually left a space between preposition and other word.

Ciao!
I am pretty confident there should be no space between article and nouns and articulated prepositions and nouns. This is the only input I can give :)

itdoesntmatt 2016-01-25 21:08

Re: Advanced text entry on Sailfish (Swype or similar)
 
Ciao a voi,ragazzi :) e grazie tante per il vostro impegno!


i know, but i explained mayself badly.
When you write a sentence :
example : il gatto e' sull'Amaca
i swipe in this way: .. I-L..G-A-T-T-O... E'.. S-U-L-L(') ..A-M-A-C-A

is not comfortable to swipe S-U-L-L-'-A-M-A-C-A
because we consider them as separated words when we think about that. Sull'Amaca is considered just like Sul Letto, as two separated words, even if you formally shouldnt leave the space.
and moreover in common written language (included SMS,chat and other stuff ) is really the same to leave space between preposition with ' and the other following word.
i dont know how to explain better i hope it is understandable.


All times are GMT. The time now is 23:46.

vBulletin® Version 3.8.8