View Single Post
Posts: 1,414 | Thanked: 7,547 times | Joined on Aug 2016 @ Estonia
#357
Originally Posted by smatkovi View Post
how many words should the input dictionary have? and has anyone of you tried doing this for Hungarian?
For training, you count not the words as such, but the corpus of text on the basis of which you'll get the database (not dictionary, but words and their combinations with the corresponding probabilities).

Trick is to find such corpus. I have done some training for presage predictor and used the following corpora (sizes in MBs of text file):

* English: 156MB
* Estonian: 1900MB

Russian was downloaded as n-grams database immediately from a site providing it.

For Hungarian, I would suggest to contact @martonmiklos. He trained presage for Hungarian and should have the corpus somewhere.
 

The Following 3 Users Say Thank You to rinigus For This Useful Post: