View Single Post
Posts: 1,414 | Thanked: 7,547 times | Joined on Aug 2016 @ Estonia
#6
Originally Posted by suicidal_orange View Post
First good work on this - I'm using it in preference to the paid prediction because I can.

I've made a keyboard for Colemak which doesn't make much sense given it's designed for keyboards where edge keys (A and ; on QWERTY) are easy to hit which isn't the case on a screen, but it works so may as well put it out there.

As there is no international version of Colemak and it's set to use English predictions not sure if it should be called en-colemak-presage or colemak-presage - any preference?

Next step will be to research/find a thumb optimised layout and en_GB-ise dictionary - will the script will handle a non four character languages or should it just be called en_GB?
Great to hear that you work on it. I am a bit surprised on rather low adoption of this work and the absence of contributions of other languages/keyboards. But let's see how it will progress in future.

There are 3 packages that are needed for full support.

Package 1: keyboard

This is named as keyboard-presage-<YOUR_OWN_CODE, usually in form en_US>-0.1.0-1.noarch.rpm (version info may vary)

For example, download and examine contents of one of the keyboard RPMs. As you will see, it has languageCode in .conf file of the keyboard definition. This code will have to match presage and hunspell dictionaries. If you want to go for UK, use en_GB

The script will probably be not too excited about you trying to name and use some other dict names. So, I would suggest to hack the script and change keyboard RPM content by hand. Please see README at https://github.com/sailfish-keyboard...utils/keyboard


Package 2: hunspell dict

You will have to provide hunspell dictionary. This can be downloaded from somewhere and will have to be converted to UTF8. See readme at https://github.com/sailfish-keyboard/presage, last section


Package 3: presage n-gram

That will require text corpus to teach presage. Note that we are looking for something large, the more text the merrier with the context similar to the use at mobile. You may wish to filter profanity, but it is rather non-trivial problem.

For help on generation n-gram database, see https://github.com/sailfish-keyboard...ased-predictor .

I don't remember whether there was freely available en_GB corpus, though.

Good luck!
 

The Following 4 Users Say Thank You to rinigus For This Useful Post: