maemo.org - Talk - View Single Post - [Announcement]Open source text prediction input plugin

FlyingAntero	2018-12-12 , 10:39
Posts: 36 \| Thanked: 118 times \| Joined on Nov 2018	#30

Originally Posted by rinigus

Profanity is an issue and would be great to get rid of it. I had the same problem when composing the database for English, large fraction of the time was spent on that. I would suggest to filter the database and remove all n-grams that include any of the words that are classified as "bad". For that, we need a list of the words (possibly as substrings). That would have to be provided by native speakers though. Maybe such list is composed already somewhere...

I can try to find that kind of list or make it by myself. Should that list also include every conjugation of specific word? Finnish words have
dozens of conjugation forms. Here are few examples:
Word: run = juosta

I run = Minä juoksen
You run = Sinä juokset
He/she runs = Hän juoksee

Word: box = laatikko

The color of a box = Laatikon väri
Look at that box = Katso tuota laatikkoa
The cat went inside the box = Kissa meni laatikkoon

Quote & Reply |

The Following User Says Thank You to FlyingAntero For This Useful Post:
juiceme