![]() |
[Announcement]Open source text prediction input plugin
Hello all,
We have started a new sailfishos_keyboard development team at https://github.com/sailfish-keyboard. The team is happy to announce a new version from the presage based alternative text prediction solution. Currently we offer the following keyboards for downloading at openrepos: https://openrepos.net/content/sailfi...ext-prediction https://openrepos.net/content/sailfi...nput-predictor https://openrepos.net/content/sailfi...ext-prediction https://openrepos.net/content/sailfi...nput-predictor If you are using a community supported language pack or a ported device you should definitely try it out. If you cannot find your language in the supported list please read through this documenation about how can you add support for your language: * https://github.com/sailfish-keyboard...ased-predictor * https://github.com/sailfish-keyboard...utils/keyboard If you are having problems with the packaging or distribution at openrepos let us know (preferabliy in the form of a github issue), we are more than happy to help with it. In this release we have been mainly focusing on performance improvements. We are satisfied with the results thanks to moving the internal database format from SQLite to MarisaDb and the asynchronous prediction handling. Other than the performance improvements we have implemented the following: * Added Hunspell based prediction engine: if you mistype a word the predictor will offer the corrected version in the predictions. (The plugin does not do any auto-correction, just predicts the corrected word.) * Added ability to forgot the learned words: presage learns what you type to be able to make more accurate predictions. Now you can delete the mistyped which have been learned from your text input by long tapping on the predictions. (The forgot feature does not works on the words coming from the preinstalled database, but it is on our TODO list). * We have renamed the plugin to make the naming more consistent, and we get rid from the external library dependencies by statically linking to them, and removed some unnecessary configuration packages. **Important message to the users of the former revision** Because of the structural changes please follow the instructions below before installing the new plugin: * Deselect presage keyboards in Settings/Text input/Keyboards * Uninstall in command line (pkcon or zypper) all presage packages. Use zypper se presage to see which packages are installed * Refresh repositories (pkcon refresh or zypper ref) * Reboot * Enable sailfish_keyboard repository at OpenRepos * Install keyboard(s) of interest Please report back if you have any issues, problems! Wishing you happy Sailing, @ljo, @martonmiklos, and @rinigus |
Re: [Announcement]Open source text prediction input plugin
Working great. This now completes the SFOS port I am using.
A suggestion/request/expected behaviour If i place the cursor on a word that is already typed and decides to replace it with one of the suggested words by presage, the new word gets inserted where the cursor is rather than replacing the existing word. I recall that was the same behaviour in Jolla C with xt9 as well. Could that be fixed? Expected/desired behaviour is, when the cursor is placed on an already typed word and one of the suggested words is selected, the existing word gets replaced completely with the suggested word. |
Re: [Announcement]Open source text prediction input plugin
Quote:
Quote:
|
Re: [Announcement]Open source text prediction input plugin
I have packaged Russian keyboard using ngrams as distributed at http://ruscorpora.ru/corpora-freq.html . In addition, hunspell dictionary was converted to UTF-8, as needed. Not sure whether its the best ngrams available, but give it a try. If someone gets better frequency distribution, please feel free to suggest improvements.
I think Russian would heavily benefit from proper Unicode support in presage. Right now, Presage doesn't know how to normalize Russian letters (lowercase or otherwise). So, we would love to get an enthusiastic ICU specialist who would have time to work on Unicode support of Presage. I'll be happy to help with the database parts and normalization, but have rather limited time to do it all. In addition, my primary languages work quite well already :) |
Re: [Announcement]Open source text prediction input plugin
First good work on this - I'm using it in preference to the paid prediction because I can.
I've made a keyboard for Colemak which doesn't make much sense given it's designed for keyboards where edge keys (A and ; on QWERTY) are easy to hit which isn't the case on a screen, but it works so may as well put it out there. As there is no international version of Colemak and it's set to use English predictions not sure if it should be called en-colemak-presage or colemak-presage - any preference? Next step will be to research/find a thumb optimised layout and en_GB-ise dictionary - will the script will handle a non four character languages or should it just be called en_GB? |
Re: [Announcement]Open source text prediction input plugin
Quote:
There are 3 packages that are needed for full support. Package 1: keyboard This is named as keyboard-presage-<YOUR_OWN_CODE, usually in form en_US>-0.1.0-1.noarch.rpm (version info may vary) For example, download and examine contents of one of the keyboard RPMs. As you will see, it has languageCode in .conf file of the keyboard definition. This code will have to match presage and hunspell dictionaries. If you want to go for UK, use en_GB The script will probably be not too excited about you trying to name and use some other dict names. So, I would suggest to hack the script and change keyboard RPM content by hand. Please see README at https://github.com/sailfish-keyboard...utils/keyboard Package 2: hunspell dict You will have to provide hunspell dictionary. This can be downloaded from somewhere and will have to be converted to UTF8. See readme at https://github.com/sailfish-keyboard/presage, last section Package 3: presage n-gram That will require text corpus to teach presage. Note that we are looking for something large, the more text the merrier with the context similar to the use at mobile. You may wish to filter profanity, but it is rather non-trivial problem. For help on generation n-gram database, see https://github.com/sailfish-keyboard...ased-predictor . I don't remember whether there was freely available en_GB corpus, though. Good luck! |
Re: [Announcement]Open source text prediction input plugin
Do we get an option to read all the First Names and Details (Business names) from the People app, and include them in the predicted words?
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
As rinigus suggested the script for packaging keyboards wasn't very happy with what I'm trying to do, so I had to modify it. In doing so I found the .spec needed changing too, but that made it inflexible...
My solution (which I'm sure is far from best practice) is to write the .spec file from the script with modified name and descriptions based on an optional 6th argument, resulting in a file called "keyboard-presage-colemak-en_US-1.0.0-1.noarch.rpm" Not sure anyone's interested in alternate layouts but if you are here it is. Criticism welcome, I want to learn :) Code:
#!/bin/bash |
Re: [Announcement]Open source text prediction input plugin
I installed https://openrepos.net/content/sailfi...nput-predictor to my X Compact (using official patched image from Xperia X) and it is working like a charm. However, swedish is my second language. Can anyone help to make layout for finnish?
I have found data base for finnish words in UTF-8 format from Github: Also, the layout is the same for finnish and swedish. Is it possible to just change the data base? |
Re: [Announcement]Open source text prediction input plugin
Quote:
You may need to contact some language institute to get such text body. For Estonian, I managed to get large text corpus - about 1900GB of text. But probably smaller text would give a decent result. Please look into it - would be great to extend the support to Finnish. Also, the layout is the same for finnish and swedish. Is it possible to just change the data base? |
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
EDIT: Here is more information about the data. It is in VRT file format: Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
No problem, happy to help. |
Re: [Announcement]Open source text prediction input plugin
From brief reading, looks like VRT is not the text but processed list of tokens. For training, either plain text or already processed as n-grams (latter used for Russian) is needed. But there should be text corpus behind these processed files as well.
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
@ljo: sounds great! Good luck with it!
|
Re: [Announcement]Open source text prediction input plugin
I took a closer look of the data that is available from Kielipankki:
In addition to that there is a corpus data set of several Finnish magazines and newspapers from the 1990s and 2000s (around 300 magazines). However, I downloaded three of them which were dealing with tech and the size of one magazine was only ~1Mb. Also you have to download each of them separately. EDIT: Some of them are in VRT format and other in TXT format. |
Re: [Announcement]Open source text prediction input plugin
FNC1 may allow you to cut the corners and get it running without any stats since it's already done for you. Although , language may have changed in this time window ... otherwise , probably the first one is of the biggest interest.
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
I am pretty sure we'd get a Tay.ai type prediction engine out of that corpus! :p |
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
With such a huge file, we may have to split it into smaller parts. Otherwise RAM will probably become an issue.
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
I might need to adjust the dictionary size a bit, but as a non-native speaker I await your opinions before doing something more for Finnish. I will try to find some time to continue to work on the hyphenation problems that are really annoying in Swedish at least. |
Re: [Announcement]Open source text prediction input plugin
I had time to test it this morning and it seems to work pretty good after quick testing :). I can confirm that there is a hyphenation problem with some words. However, it is not a big problem in normal use since the issue seems to be linked to compound words. Here is few examples:
English: Finnish: my input: text-prediction
I put the text-prediction for comparison with an Android phone and both predictions were working quite similarly with most common words. Sometimes the most obvious conjugation is among the last words in the list but I believe that will improve after use (in Sailfish). Also the prediction knows every bad words in Finnish and some name-calling slang words. I believe that it is not a surprise since the corpus was from forum. EDIT: And I almost forgot: huge thanks for you, tusen tack! |
Re: [Announcement]Open source text prediction input plugin
Profanity is an issue and would be great to get rid of it. I had the same problem when composing the database for English, large fraction of the time was spent on that. I would suggest to filter the database and remove all n-grams that include any of the words that are classified as "bad". For that, we need a list of the words (possibly as substrings). That would have to be provided by native speakers though. Maybe such list is composed already somewhere...
|
Re: [Announcement]Open source text prediction input plugin
Quote:
dozens of conjugation forms. Here are few examples: Word: run = juosta
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
I made a list of profanity words. It is just a simple CSV file and every conjugation form is a single word in the list. I also uploaded The National Library's journal's Finnish n-grams (1820-2000) to cloud. If someone is willing to help, you can find the link below.
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
It seems that libpresage has been removed from OpenRepos. Was this intentional? If so, how to install the plugin?
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
Re: [Announcement]Open source text prediction input plugin
I have added the first version of German Presage predicting keyboard. Its made together with @matzgewinn who found the corpus and processed it. While relatively small corpus (300MB), let's hope it works. Hunspell dictionary was added as well.
As usual, just install https://openrepos.net/content/sailfi...ext-prediction and all the rest should be pulled. Easiest is to install, enable in settings, and reboot. Corresponding issue: https://github.com/sailfish-keyboard/presage/issues/26 My German is non-existent, so its up to the users to test and improve it. |
Re: [Announcement]Open source text prediction input plugin
I've noticed that it seems to have issues predicting words with apostrophes (e.g. it predicts "aren" instead of "aren't"). Is there a way to fix this?
|
Re: [Announcement]Open source text prediction input plugin
Quote:
|
| All times are GMT. The time now is 10:27. |
vBulletin® Version 3.8.8