maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   SailfishOS (https://talk.maemo.org/forumdisplay.php?f=52)
-   -   [Announcement]Open source text prediction input plugin (https://talk.maemo.org/showthread.php?t=100266)

martonmiklos 2018-03-22 21:52

[Announcement]Open source text prediction input plugin
 
Hello all,

We have started a new sailfishos_keyboard development team at https://github.com/sailfish-keyboard. The team is happy to announce a new version from the presage based alternative text prediction solution.

Currently we offer the following keyboards for downloading at openrepos:
https://openrepos.net/content/sailfi...ext-prediction
https://openrepos.net/content/sailfi...nput-predictor
https://openrepos.net/content/sailfi...ext-prediction
https://openrepos.net/content/sailfi...nput-predictor

If you are using a community supported language pack or a ported device you should definitely try it out.

If you cannot find your language in the supported list please read through this documenation about how can you add support for your language:
* https://github.com/sailfish-keyboard...ased-predictor
* https://github.com/sailfish-keyboard...utils/keyboard

If you are having problems with the packaging or distribution at openrepos let us know (preferabliy in the form of a github issue), we are more than happy to help with it.

In this release we have been mainly focusing on performance improvements. We are satisfied with the results thanks to moving the internal database format from SQLite to MarisaDb and the asynchronous prediction handling.

Other than the performance improvements we have implemented the following:

* Added Hunspell based prediction engine: if you mistype a word the predictor will offer the corrected version in the predictions. (The plugin does not do any auto-correction, just predicts the corrected word.)
* Added ability to forgot the learned words: presage learns what you type to be able to make more accurate predictions. Now you can delete the mistyped which have been learned from your text input by long tapping on the predictions. (The forgot feature does not works on the words coming from the preinstalled database, but it is on our TODO list).
* We have renamed the plugin to make the naming more consistent, and we get rid from the external library dependencies by statically linking to them, and removed some unnecessary configuration packages.

**Important message to the users of the former revision**
Because of the structural changes please follow the instructions below before installing the new plugin:

* Deselect presage keyboards in Settings/Text input/Keyboards
* Uninstall in command line (pkcon or zypper) all presage packages. Use zypper se presage to see which packages are installed
* Refresh repositories (pkcon refresh or zypper ref)
* Reboot
* Enable sailfish_keyboard repository at OpenRepos
* Install keyboard(s) of interest

Please report back if you have any issues, problems!

Wishing you happy Sailing,
@ljo, @martonmiklos, and @rinigus

lal 2018-03-23 06:02

Re: [Announcement]Open source text prediction input plugin
 
Working great. This now completes the SFOS port I am using.

A suggestion/request/expected behaviour
If i place the cursor on a word that is already typed and decides to replace it with one of the suggested words by presage, the new word gets inserted where the cursor is rather than replacing the existing word. I recall that was the same behaviour in Jolla C with xt9 as well. Could that be fixed?

Expected/desired behaviour is, when the cursor is placed on an already typed word and one of the suggested words is selected, the existing word gets replaced completely with the suggested word.

ljo 2018-03-23 06:14

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by lal (Post 1542639)
Working great. This now completes the SFOS port I am using.

Great to hear!

Quote:

Originally Posted by lal (Post 1542639)
A suggestion/request/expected behaviour
If i place the cursor on a word that is already typed and decides to replace it with one of the suggested words by presage, the new word gets inserted where the cursor is rather than replacing the existing word. I recall that was the same behaviour in Jolla C with xt9 as well. Could that be fixed?

Expected/desired behaviour is, when the cursor is placed on an already typed word and one of the suggested words is selected, the existing word gets replaced completely with the suggested word.

Thanks, yes this on my list of fixes to do.

rinigus 2018-03-31 13:34

Re: [Announcement]Open source text prediction input plugin
 
I have packaged Russian keyboard using ngrams as distributed at http://ruscorpora.ru/corpora-freq.html . In addition, hunspell dictionary was converted to UTF-8, as needed. Not sure whether its the best ngrams available, but give it a try. If someone gets better frequency distribution, please feel free to suggest improvements.

I think Russian would heavily benefit from proper Unicode support in presage. Right now, Presage doesn't know how to normalize Russian letters (lowercase or otherwise). So, we would love to get an enthusiastic ICU specialist who would have time to work on Unicode support of Presage. I'll be happy to help with the database parts and normalization, but have rather limited time to do it all. In addition, my primary languages work quite well already :)

suicidal_orange 2018-06-25 10:55

Re: [Announcement]Open source text prediction input plugin
 
First good work on this - I'm using it in preference to the paid prediction because I can.

I've made a keyboard for Colemak which doesn't make much sense given it's designed for keyboards where edge keys (A and ; on QWERTY) are easy to hit which isn't the case on a screen, but it works so may as well put it out there.

As there is no international version of Colemak and it's set to use English predictions not sure if it should be called en-colemak-presage or colemak-presage - any preference?

Next step will be to research/find a thumb optimised layout and en_GB-ise dictionary - will the script will handle a non four character languages or should it just be called en_GB?

rinigus 2018-06-25 17:53

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by suicidal_orange (Post 1545801)
First good work on this - I'm using it in preference to the paid prediction because I can.

I've made a keyboard for Colemak which doesn't make much sense given it's designed for keyboards where edge keys (A and ; on QWERTY) are easy to hit which isn't the case on a screen, but it works so may as well put it out there.

As there is no international version of Colemak and it's set to use English predictions not sure if it should be called en-colemak-presage or colemak-presage - any preference?

Next step will be to research/find a thumb optimised layout and en_GB-ise dictionary - will the script will handle a non four character languages or should it just be called en_GB?

Great to hear that you work on it. I am a bit surprised on rather low adoption of this work and the absence of contributions of other languages/keyboards. But let's see how it will progress in future.

There are 3 packages that are needed for full support.

Package 1: keyboard

This is named as keyboard-presage-<YOUR_OWN_CODE, usually in form en_US>-0.1.0-1.noarch.rpm (version info may vary)

For example, download and examine contents of one of the keyboard RPMs. As you will see, it has languageCode in .conf file of the keyboard definition. This code will have to match presage and hunspell dictionaries. If you want to go for UK, use en_GB

The script will probably be not too excited about you trying to name and use some other dict names. So, I would suggest to hack the script and change keyboard RPM content by hand. Please see README at https://github.com/sailfish-keyboard...utils/keyboard


Package 2: hunspell dict

You will have to provide hunspell dictionary. This can be downloaded from somewhere and will have to be converted to UTF8. See readme at https://github.com/sailfish-keyboard/presage, last section


Package 3: presage n-gram

That will require text corpus to teach presage. Note that we are looking for something large, the more text the merrier with the context similar to the use at mobile. You may wish to filter profanity, but it is rather non-trivial problem.

For help on generation n-gram database, see https://github.com/sailfish-keyboard...ased-predictor .

I don't remember whether there was freely available en_GB corpus, though.

Good luck!

rob_kouw 2018-06-26 15:23

Re: [Announcement]Open source text prediction input plugin
 
Do we get an option to read all the First Names and Details (Business names) from the People app, and include them in the predicted words?

rinigus 2018-06-26 20:21

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by rob_kouw (Post 1545840)
Do we get an option to read all the First Names and Details (Business names) from the People app, and include them in the predicted words?

From presage point of view, its possible to add more predictors. That's the way the current learning is implemented. However, I don't know whether anyone has been working on this or other features. So, if you wish to see it and know how to program, I can only encourage you to work on it. We will all be happy to help and reply to your queries.

suicidal_orange 2018-06-27 11:50

Re: [Announcement]Open source text prediction input plugin
 
As rinigus suggested the script for packaging keyboards wasn't very happy with what I'm trying to do, so I had to modify it. In doing so I found the .spec needed changing too, but that made it inflexible...

My solution (which I'm sure is far from best practice) is to write the .spec file from the script with modified name and descriptions based on an optional 6th argument, resulting in a file called "keyboard-presage-colemak-en_US-1.0.0-1.noarch.rpm"

Not sure anyone's interested in alternate layouts but if you are here it is. Criticism welcome, I want to learn :)

Code:

#!/bin/bash

set -e

PROGPATH=$(dirname "$0")

if [ "$#" -lt 5 ]; then
    echo "Usage: $0 Language langcode version keyboard.qml keyboard.conf [layout name]"
    echo
    echo "Language: Specify language in English starting with the capital letter, ex 'Estonian'"
    echo "langcode: Specify language code, ex 'en_US'. Use the same notation as Hunspell dictionaries."
    echo "version: Version of the language package, ex '1.0.0'"
    echo "keyboard.qml: Keyboard QML file"
    echo "keyboard.conf: Keyboard Configuration file referencing the QML file"
    echo "layout name: Optional, for alternate layouts (BÉPO, Colemak, Dvorak...)"
    echo
    echo "When finished, the keyboard support will be packaged into RPM in the current directory"
    echo
    echo "The script requires rpmbuild to be installed. Note that rpmbuild can be installed on distributions that don't use RPM for packaging"
    echo
    exit 0
fi

L=$1
CODE=$2
VERSION=$3
KQML=$4
KCONF=$5
if [ $6 != "" ]; then
LAYOUT="-$6"
fi

NAME=keyboard-presage$LAYOUT-$CODE
LAYOUT=" ${LAYOUT:1:50}"

TMPDIR=`mktemp -d`

mkdir -p $TMPDIR/$NAME-$VERSION/keyboard
mkdir -p $TMPDIR/$NAME-$VERSION/rpm
cp "$KQML" $TMPDIR/$NAME-$VERSION/keyboard
cp "$KCONF" $TMPDIR/$NAME-$VERSION/keyboard

echo "# Template for generation of keyboard RPMs
# for Presage on Sailfish. This temlate is used
# by package-keyboard.sh script

# Prevent brp-python-bytecompile from running.
%define __os_install_post %{___build_post}

# \"Harbour RPM packages should not provide anything.\"
%define __provides_exclude_from ^%{_datadir}/.*$

Name: "$NAME"
Version: __version__
Release: 1
Summary: Keyboard layout for"$LAYOUT" __Language__ with Presage support
License: MIT
URL: https://github.com/martonmiklos/sailfishos-presage-predictor
Source: %{name}-%{version}.tar.xz
BuildArch: noarch
Requires: presage-lang-__langcode__
Requires: hunspell-lang-__langcode__
Requires: maliit-plugin-presage

%description
Keyboard layout for"$LAYOUT" __Language__ language with Presage text predictions

%prep
%setup -q

%install
mkdir -p %{buildroot}/usr/share/maliit/plugins/com/jolla/layouts
cp -r keyboard/* %{buildroot}/usr/share/maliit/plugins/com/jolla/layouts

%files
%defattr(-,root,root,-)
%{_datadir}/maliit/plugins/com/jolla/layouts" > $TMPDIR/$NAME-$VERSION/rpm/$NAME.spec

sed -i "s/__langcode__/$CODE/"  $TMPDIR/$NAME-$VERSION/rpm/$NAME.spec
sed -i "s/__Language__/$L/"  $TMPDIR/$NAME-$VERSION/rpm/$NAME.spec
sed -i "s/__version__/$VERSION/"  $TMPDIR/$NAME-$VERSION/rpm/$NAME.spec

tar -C $TMPDIR -cJf $TMPDIR/$NAME-$VERSION.tar.xz $NAME-$VERSION

mkdir -p $HOME/rpmbuild/SOURCES
mkdir -p $HOME/rpmbuild/SPECS

cp $TMPDIR/$NAME-$VERSION.tar.xz $HOME/rpmbuild/SOURCES
cp $TMPDIR/$NAME-$VERSION/rpm/$NAME.spec $HOME/rpmbuild/SPECS

rm -rf $TMPDIR

rm -rf $HOME/rpmbuild/BUILD/$NAME-$VERSION
rpmbuild -ba --nodeps $HOME/rpmbuild/SPECS/$NAME.spec

mkdir -p RPMS
cp $HOME/rpmbuild/RPMS/noarch/$NAME-$VERSION-*.rpm .


FlyingAntero 2018-11-28 14:01

Re: [Announcement]Open source text prediction input plugin
 
I installed https://openrepos.net/content/sailfi...nput-predictor to my X Compact (using official patched image from Xperia X) and it is working like a charm. However, swedish is my second language. Can anyone help to make layout for finnish?

I have found data base for finnish words in UTF-8 format from Github:
Also, the layout is the same for finnish and swedish. Is it possible to just change the data base?

rinigus 2018-11-28 16:31

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by FlyingAntero (Post 1551207)
I installed https://openrepos.net/content/sailfi...nput-predictor to my X Compact (using official patched image from Xperia X) and it is working like a charm. However, swedish is my second language. Can anyone help to make layout for finnish?

I have found data base for finnish words in UTF-8 format from Github:

Would be great to get Finnish on board. You will need not the list of words, but large body of Finnish texts, called text corpus (https://en.wikipedia.org/wiki/Text_corpus). This is since we want to teach how to "predict" and it can be done if you know the common sequences in the language. Works for Estonian as well - so should work for Finnish too.

You may need to contact some language institute to get such text body. For Estonian, I managed to get large text corpus - about 1900GB of text. But probably smaller text would give a decent result.

Please look into it - would be great to extend the support to Finnish.

Also, the layout is the same for finnish and swedish. Is it possible to just change the data base?

ljo 2018-11-28 19:35

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by rinigus (Post 1551224)
Would be great to get Finnish on board. You will need not the list of words, but large body of Finnish texts, called text corpus ...
Please look into it - would be great to extend the support to Finnish.

Also, the layout is the same for finnish and swedish. Is it possible to just change the data base?

I could probably help @FlyingAntero to achieve this for Finnish like I created the Swedish resources. For the last question - yes, basically you could switch out the database, but in the long run it will be easier to do the full package now to get the language specific support and switching correct.

FlyingAntero 2018-11-29 06:06

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by rinigus (Post 1551224)
Would be great to get Finnish on board. You will need not the list of words, but large body of Finnish texts, called text corpus (https://en.wikipedia.org/wiki/Text_corpus). This is since we want to teach how to "predict" and it can be done if you know the common sequences in the language. Works for Estonian as well - so should work for Finnish too.

You may need to contact some language institute to get such text body. For Estonian, I managed to get large text corpus - about 1900GB of text. But probably smaller text would give a decent result.

Please look into it - would be great to extend the support to Finnish.

OK, now I got it. Text corpus makes sence for prediction. I have access to text corpus data which is about 60Gb. Is that enought or should I try to search bigger one? The data that I have found is in different zip files. There is two 25Gb zip files and few smaller zip files. Is that a problem?

EDIT: Here is more information about the data. It is in VRT file format:
Quote:

Originally Posted by ljo (Post 1551231)
I could probably help @FlyingAntero to achieve this for Finnish like I created the Swedish resources. For the last question - yes, basically you could switch out the database, but in the long run it will be easier to do the full package now to get the language specific support and switching correct.

I would be really grateful for help since my programming skills are very limited.

ljo 2018-11-29 08:12

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by FlyingAntero (Post 1551239)
OK, now I got it. Text corpus makes sence for prediction. I have access to text corpus data which is about 60Gb. Is that enought or should I try to search bigger one? The data that I have found is in different zip files.

EDIT: Here is more information about the data. It is in VRT file format:I would be really grateful for help since my programming skills are very limited.

That sounds like a good start.Then we can see if we need more data from our partner Kielipankki. Make sure to include some social media resources too in the first batch. Multiple source files are no problem. Concatenate them if it feels easier to handle a single source for you.
No problem, happy to help.

rinigus 2018-11-29 08:21

Re: [Announcement]Open source text prediction input plugin
 
From brief reading, looks like VRT is not the text but processed list of tokens. For training, either plain text or already processed as n-grams (latter used for Russian) is needed. But there should be text corpus behind these processed files as well.

ljo 2018-11-29 09:42

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by rinigus (Post 1551245)
From brief reading, looks like VRT is not the text but processed list of tokens. For training, either plain text or already processed as n-grams (latter used for Russian) is needed. But there should be text corpus behind these processed files as well.

There is nothing wrong with using the VRT files. They just need be processed to extract the running text tokens. So if we do this firstly for what you have we will see if further data is needed (propably since a lot of annotations are added to our annotated VRT files adding to the file sizes).

rinigus 2018-11-29 10:35

Re: [Announcement]Open source text prediction input plugin
 
@ljo: sounds great! Good luck with it!

FlyingAntero 2018-11-29 10:48

Re: [Announcement]Open source text prediction input plugin
 
I took a closer look of the data that is available from Kielipankki:
  • The Suomi 24 Corpus, ~60Gb: the largest discussion forum in Finland
  • FNC1, ~30Gb: The National Library's journal's Finnish n-grams (1820-2000)
  • DSPCON, ~4Gb :Aalto University DSP Course Conversation Corpus
  • AMPH, 600Mb: Think, ponder, consider -corpus
  • The SFNET corpus, 400Mb: a quite small discussion forum
  • Ylilauta Coprus, 300Mb: a Finnish version of 4chan
  • Opusparcus, 265Mb: Open Subtitles Paraphrase
  • Psycholinguistic corpus, 65Mb: Psycholinguistic Descriptives
  • Morphologies, 50Mb: Morphologies

In addition to that there is a corpus data set of several Finnish magazines and newspapers from the 1990s and 2000s (around 300 magazines). However, I downloaded three of them which were dealing with tech and the size of one magazine was only ~1Mb. Also you have to download each of them separately.

EDIT: Some of them are in VRT format and other in TXT format.

rinigus 2018-11-29 15:58

Re: [Announcement]Open source text prediction input plugin
 
FNC1 may allow you to cut the corners and get it running without any stats since it's already done for you. Although , language may have changed in this time window ... otherwise , probably the first one is of the biggest interest.

ljo 2018-11-30 10:09

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by rinigus (Post 1551260)
FNC1 may allow you to cut the corners and get it running without any stats since it's already done for you. Although , language may have changed in this time window ... otherwise , probably the first one is of the biggest interest.

Yes, I agree the Suomi-24 corpus is the best to start with.

juiceme 2018-11-30 10:21

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by ljo (Post 1551307)
Yes, I agree the Suomi-24 corpus is the best to start with.

Would'n that be a bit biased... taken off from a forum which is full of halfwits banging their heads off on marginal topics?
I am pretty sure we'd get a Tay.ai type prediction engine out of that corpus! :p

FlyingAntero 2018-12-06 06:42

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by juiceme (Post 1551308)
Would'n that be a bit biased... taken off from a forum which is full of halfwits banging their heads off on marginal topics?
I am pretty sure we'd get a Tay.ai type prediction engine out of that corpus! :p

I don't know how it will work out. I have downloaded the files and uploaded them to the drive (65Gb). I can share a link if someone wants to try it out. If not then I might try with The National Library's journal's Finnish n-grams by myself because it is easier that way.

ljo 2018-12-06 15:49

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by FlyingAntero (Post 1551476)
I don't know how it will work out. I have downloaded the files and uploaded them to the drive (65Gb). I can share a link if someone wants to try it out. If not then I might try with The National Library's journal's Finnish n-grams by myself because it is easier that way.

OK. I bought a larger hard drive today since I have been hitting the storage limit over and over for a few weeks. So I could give it a try in a few days when I have migrated to the new drive.

rinigus 2018-12-06 19:55

Re: [Announcement]Open source text prediction input plugin
 
With such a huge file, we may have to split it into smaller parts. Otherwise RAM will probably become an issue.

FlyingAntero 2018-12-07 03:19

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by ljo (Post 1551494)
OK. I bought a larger hard drive today since I have been hitting the storage limit over and over for a few weeks. So I could give it a try in a few days when I have migrated to the new drive.

Nice! Here are the files:

ljo 2018-12-07 09:07

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by FlyingAntero (Post 1551513)
Nice! Here are the files:

Thanks, I will get on it as soon as my harddrive is replaced.

ljo 2018-12-11 11:48

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by ljo (Post 1551521)
Thanks, I will get on it as soon as my harddrive is replaced.

So, now there is something to test. I noticed some hyphenation here and there that felt a bit strange but most of the words i typed were predicted. And it learns fast so I can't make the same tests twice ...
I might need to adjust the dictionary size a bit, but as a non-native speaker I await your opinions before doing something more for Finnish.
I will try to find some time to continue to work on the hyphenation problems that are really annoying in Swedish at least.

FlyingAntero 2018-12-12 09:29

Re: [Announcement]Open source text prediction input plugin
 
I had time to test it this morning and it seems to work pretty good after quick testing :). I can confirm that there is a hyphenation problem with some words. However, it is not a big problem in normal use since the issue seems to be linked to compound words. Here is few examples:
English: Finnish: my input: text-prediction
  • text input: tekstinsyöttö: tekstinsyö: tekstin-syö
  • shoe rack: kenkäteline: kenkäte: kenkä-te
  • (space) alien: avaruusolio: avaruusoli: avaruus-oli
I think that most Finns write compound words separately (tekstin and syöttö) and remove the space later (if they aren't too lazy). If you do that the prediction knows those separate words.

I put the text-prediction for comparison with an Android phone and both predictions were working quite similarly with most common words. Sometimes the most obvious conjugation is among the last words in the list but I believe that will improve after use (in Sailfish).

Also the prediction knows every bad words in Finnish and some name-calling slang words. I believe that it is not a surprise since the corpus was from forum.

EDIT: And I almost forgot: huge thanks for you, tusen tack!

rinigus 2018-12-12 10:03

Re: [Announcement]Open source text prediction input plugin
 
Profanity is an issue and would be great to get rid of it. I had the same problem when composing the database for English, large fraction of the time was spent on that. I would suggest to filter the database and remove all n-grams that include any of the words that are classified as "bad". For that, we need a list of the words (possibly as substrings). That would have to be provided by native speakers though. Maybe such list is composed already somewhere...

FlyingAntero 2018-12-12 10:39

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by rinigus (Post 1551688)
Profanity is an issue and would be great to get rid of it. I had the same problem when composing the database for English, large fraction of the time was spent on that. I would suggest to filter the database and remove all n-grams that include any of the words that are classified as "bad". For that, we need a list of the words (possibly as substrings). That would have to be provided by native speakers though. Maybe such list is composed already somewhere...

I can try to find that kind of list or make it by myself. Should that list also include every conjugation of specific word? Finnish words have
dozens of conjugation forms. Here are few examples:
Word: run = juosta
  • I run = Minä juoksen
  • You run = Sinä juokset
  • He/she runs = Hän juoksee
Word: box = laatikko
  • The color of a box = Laatikon väri
  • Look at that box = Katso tuota laatikkoa
  • The cat went inside the box = Kissa meni laatikkoon

rinigus 2018-12-12 10:55

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by FlyingAntero (Post 1551690)
I can try to find that kind of list or make it by myself. Should that list also include every conjugation of specific word? Finnish words have
dozens of conjugation forms. Here are few examples:
Word: run = juosta
  • I run = Minä juoksen
  • You run = Sinä juokset
  • He/she runs = Hän juoksee
Word: box = laatikko
  • The color of a box = Laatikon väri
  • Look at that box = Katso tuota laatikkoa
  • The cat went inside the box = Kissa meni laatikkoon

Assuming that ljo will filter, I don't know what's the preference. Most probably something like LIKE statement filters should be OK (http://www.sqlitetutorial.net/sqlite-like/). But please wait for conformation...

FlyingAntero 2018-12-17 08:24

Re: [Announcement]Open source text prediction input plugin
 
I made a list of profanity words. It is just a simple CSV file and every conjugation form is a single word in the list. I also uploaded The National Library's journal's Finnish n-grams (1820-2000) to cloud. If someone is willing to help, you can find the link below.

rinigus 2018-12-18 21:10

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by FlyingAntero (Post 1551807)
I made a list of profanity words. It is just a simple CSV file and every conjugation form is a single word in the list. I also uploaded The National Library's journal's Finnish n-grams (1820-2000) to cloud. If someone is willing to help, you can find the link below.

I can look into it, probably next week, if ljo will not beat me to it.

ljo 2018-12-19 18:06

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by rinigus (Post 1551862)
I can look into it, probably next week, if ljo will not beat me to it.

Been super busy the last week, but I will see after tomorrow.

taixzo 2019-01-15 21:00

Re: [Announcement]Open source text prediction input plugin
 
It seems that libpresage has been removed from OpenRepos. Was this intentional? If so, how to install the plugin?

ljo 2019-01-15 21:44

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by taixzo (Post 1552797)
... how to install the plugin?

You install one of the localized keyboards from sailfish_keyboard. All dependencies should be brought in.

rinigus 2020-03-15 13:04

Re: [Announcement]Open source text prediction input plugin
 
I have added the first version of German Presage predicting keyboard. Its made together with @matzgewinn who found the corpus and processed it. While relatively small corpus (300MB), let's hope it works. Hunspell dictionary was added as well.

As usual, just install https://openrepos.net/content/sailfi...ext-prediction and all the rest should be pulled. Easiest is to install, enable in settings, and reboot.

Corresponding issue: https://github.com/sailfish-keyboard/presage/issues/26

My German is non-existent, so its up to the users to test and improve it.

taixzo 2020-03-16 15:07

Re: [Announcement]Open source text prediction input plugin
 
I've noticed that it seems to have issues predicting words with apostrophes (e.g. it predicts "aren" instead of "aren't"). Is there a way to fix this?

rinigus 2020-03-16 15:57

Re: [Announcement]Open source text prediction input plugin
 
Quote:

Originally Posted by taixzo (Post 1566091)
I've noticed that it seems to have issues predicting words with apostrophes (e.g. it predicts "aren" instead of "aren't"). Is there a way to fix this?

Its probably an issue with tokenizer. Not sure where exactly, as prediction databases seem to have "aren't" in them. So, would require some investigation. I don't have time for it, unfortunately. So, someone would have to take a look if it is going to be fixed.


All times are GMT. The time now is 10:27.

vBulletin® Version 3.8.8