View Single Post
Posts: 102 | Thanked: 187 times | Joined on Jan 2010
#267
Originally Posted by ferlanero View Post
Hi guys! I was checking the performance of the Spanish language for OKBoard and I have noticed just an issue: the words with letter "ņ"

In Spanish there is a lot of words containing the letter "ņ". After applying the "clean_corpus.py" script, all the words with that letter, remaining intact, but after doing "db/build.sh es" and finish the process correctly, the keyboard doesn't recognise the words with the "ņ" inside. I want to ask: Is there any solution for this? How can I do in order to OKBoard recognises words like "Espaņa"?

Thanks in advance!
@ferlanero, it must be something you are doing on your side since I have no problem curving words like "Espaņa" based on the compilation I did of your source materials when guiding you how to do it. Could it be that some of the resources are latin-1 encoded instead of utf-8 so there is be a mismatch between corpus and dictionary and the words thus are filtered out?