The Following 3 Users Say Thank You to rinigus For This Useful Post: | ||
|
2018-11-28
, 19:35
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#12
|
Would be great to get Finnish on board. You will need not the list of words, but large body of Finnish texts, called text corpus ...
Please look into it - would be great to extend the support to Finnish.
Also, the layout is the same for finnish and swedish. Is it possible to just change the data base?
The Following 3 Users Say Thank You to ljo For This Useful Post: | ||
|
2018-11-29
, 06:06
|
Posts: 36 |
Thanked: 118 times |
Joined on Nov 2018
|
#13
|
Would be great to get Finnish on board. You will need not the list of words, but large body of Finnish texts, called text corpus (https://en.wikipedia.org/wiki/Text_corpus). This is since we want to teach how to "predict" and it can be done if you know the common sequences in the language. Works for Estonian as well - so should work for Finnish too.
You may need to contact some language institute to get such text body. For Estonian, I managed to get large text corpus - about 1900GB of text. But probably smaller text would give a decent result.
Please look into it - would be great to extend the support to Finnish.
I could probably help @FlyingAntero to achieve this for Finnish like I created the Swedish resources. For the last question - yes, basically you could switch out the database, but in the long run it will be easier to do the full package now to get the language specific support and switching correct.
|
2018-11-29
, 08:12
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#14
|
OK, now I got it. Text corpus makes sence for prediction. I have access to text corpus data which is about 60Gb. Is that enought or should I try to search bigger one? The data that I have found is in different zip files.
EDIT: Here is more information about the data. It is in VRT file format:I would be really grateful for help since my programming skills are very limited.
The Following 3 Users Say Thank You to ljo For This Useful Post: | ||
|
2018-11-29
, 08:21
|
Posts: 1,414 |
Thanked: 7,547 times |
Joined on Aug 2016
@ Estonia
|
#15
|
The Following 2 Users Say Thank You to rinigus For This Useful Post: | ||
|
2018-11-29
, 09:42
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#16
|
The Following 3 Users Say Thank You to ljo For This Useful Post: | ||
|
2018-11-29
, 10:35
|
Posts: 1,414 |
Thanked: 7,547 times |
Joined on Aug 2016
@ Estonia
|
#17
|
The Following 2 Users Say Thank You to rinigus For This Useful Post: | ||
|
2018-11-29
, 10:48
|
Posts: 36 |
Thanked: 118 times |
Joined on Nov 2018
|
#18
|
|
2018-11-29
, 15:58
|
Posts: 1,414 |
Thanked: 7,547 times |
Joined on Aug 2016
@ Estonia
|
#19
|
The Following 2 Users Say Thank You to rinigus For This Useful Post: | ||
|
2018-11-30
, 10:09
|
Posts: 102 |
Thanked: 187 times |
Joined on Jan 2010
|
#20
|
The Following 2 Users Say Thank You to ljo For This Useful Post: | ||
Tags |
predictive text, presage, text-prediction |
|
You may need to contact some language institute to get such text body. For Estonian, I managed to get large text corpus - about 1900GB of text. But probably smaller text would give a decent result.
Please look into it - would be great to extend the support to Finnish.
Also, the layout is the same for finnish and swedish. Is it possible to just change the data base?