Active Topics

 



Notices


Reply
Thread Tools
Posts: 3,617 | Thanked: 2,412 times | Joined on Nov 2009 @ Cambridge, UK
#1
I think I've now managed to decode enough of the auto-complete dictionary format to implement a basic editor - it works for me anyway

So, I'd like to announce the snappily-named "AutoComplete Editor", coming shortly to an extras-devel repository near you. It allows deletion and addition of terms from the custom auto-complete dictionary. I'm not sure how well this'll handle international character sets, so I'd appreciate some feedback from anyone using those.

For future reference, the custom auto-complete dictionary (/home/user/.osso/dictionaries/.personal.dictionary) format is:

An 8 byte header, consisting of a 3-byte hex sequence 80 00 01 (though I've also seen 01 00 01), followed by a single byte indicating the number of different dictionaries, followed by a single byte indicating the file size (in 256-byte multiples, so 0x04 indicates a 1kb file), followed by 0x00, followed by a 2-byte sequence indicating the position the padding starts.

This is followed by one or more 8-byte dictionary entries, consisting of a 2-byte sequence indicating the dictionary language (or 0x00 followed by a single byte), then a 2-byte sequence indicating the start position of the dictionary entries, followed by a 2-byte sequence indicating the number of entries in the dictionary, followed by a 2-byte hex sequence 00 00.

The dictionary data is stored as a single byte indicating the string length, followed by the string itself. The multiple entries follow straight on from each other, with no other delimiters.

Here's a screenshot for 0.0.4 (there's been no major changes to the default layout since then):


Current releases are:

0.0.12 in extras-devel. This allows:
  • Editing auto-complete dictionary
  • Deleting entries with/without warnings, or instantly by selection
  • Moving entries to a blacklist
  • Editing of entry blacklist
  • Auto-application of blacklist without launching GUI (suitable for scripting or scheduling)
  • Overriding of character set used to encode/decode dictionary entries

For those having issues, please see this post for how to generate a clean dictionary file. If this doesn't show the same issue then please consider sending me the original dictionary - all received files will be kept completely confidential and will be deleted as soon as I've fixed the relevant issue. Any dictionaries can be emailed to me at: maemo at robinhill.me.uk

Last edited by Rob1n; 2010-07-08 at 10:57. Reason: Cleaned up and removed old edits. Added details for 0.0.11.
 

The Following 77 Users Say Thank You to Rob1n For This Useful Post:
Posts: 729 | Thanked: 155 times | Joined on Dec 2009
#2
Sounds good, I will try it as soon as it is available with umlauts
 
pelago's Avatar
Posts: 2,121 | Thanked: 1,540 times | Joined on Mar 2008 @ Oxford, UK
#3
Sounds interesting and useful, thanks.

If you haven't done so already, I think an interesting option to implement would be for your app to check the "apparently fixed" things, and if they don't match your expected values, asking the user to contact you, just in case these bytes aren't fixed after all.
 

The Following User Says Thank You to pelago For This Useful Post:
Posts: 1,397 | Thanked: 2,126 times | Joined on Nov 2009 @ Dublin, Ireland
#4
Thanks, thanks, thanks, thanks!

For me the auto-complete feature have been turned completely useless due to misspelled words added to the dictionary, mixing of Spanish and English words, etc.
 
Posts: 3,617 | Thanked: 2,412 times | Joined on Nov 2009 @ Cambridge, UK
#5
Originally Posted by pelago View Post
If you haven't done so already, I think an interesting option to implement would be for your app to check the "apparently fixed" things, and if they don't match your expected values, asking the user to contact you, just in case these bytes aren't fixed after all.
A very good suggestion - I'll look at adding that in.
 
Posts: 3,617 | Thanked: 2,412 times | Joined on Nov 2009 @ Cambridge, UK
#6
Originally Posted by Rob1n View Post
A very good suggestion - I'll look at adding that in.
Version 0.0.2 is now available, with this check.
 

The Following User Says Thank You to Rob1n For This Useful Post:
Posts: 68 | Thanked: 16 times | Joined on Feb 2007
#7
Just tried it but got error:
File "/opt/AutoCompleteEditor/AutoCompleteEditor.py", line 17, in <module>
w.loadData()
File "/opt/AutoCompleteEditor/ACE_gui.py", line 157, in loadData
self._dict = ACEFile()
File "/opt/AutoCompleteEditor/ACE_file.py", line 34, in __init__
self.read()
File "/opt/AutoCompleteEditor/ACE_file.py", line 119, in read
Wasn't able to catch the whole stuff.
Any clue?
 
pelago's Avatar
Posts: 2,121 | Thanked: 1,540 times | Joined on Mar 2008 @ Oxford, UK
#8
Out of interest, how did you determine the format anyway? It's a shame it isn't documented openly by Nokia.
 
Posts: 3,617 | Thanked: 2,412 times | Joined on Nov 2009 @ Cambridge, UK
#9
Originally Posted by maddler View Post
Just tried it but got error:
File "/opt/AutoCompleteEditor/AutoCompleteEditor.py", line 17, in <module>
w.loadData()
File "/opt/AutoCompleteEditor/ACE_gui.py", line 157, in loadData
self._dict = ACEFile()
File "/opt/AutoCompleteEditor/ACE_file.py", line 34, in __init__
self.read()
File "/opt/AutoCompleteEditor/ACE_file.py", line 119, in read
Wasn't able to catch the whole stuff.
Any clue?
That's an entry count mismatch error - it's failed to read the correct number of entries before hitting the padding start point. Would it be possible for you to email/PM me your ~/.osso/dictionaries/.personal.dictionary file?
 
Posts: 3,617 | Thanked: 2,412 times | Joined on Nov 2009 @ Cambridge, UK
#10
Originally Posted by pelago View Post
Out of interest, how did you determine the format anyway? It's a shame it isn't documented openly by Nokia.
Basically I just removed the dictionary & rebooted (making sure I was starting with an empty file). I then just use Notes and added a single new word at a time, copying the dictionary file out each time. I then looked through the different files and noted the differences.

From there it's just a matter of trying to figure out what each means - the format used for the dictionary entries was fairly straightforward (once I realised that the delimiter between the words was the length indicator), so it was just the header changes that I needed to look at. Some of those are pretty trivial (the number of entries increments by one each time, so it's clear what that is) but some of it is pretty much guesswork - the file size (in 256-byte multiples) for example - it's a pretty random thing to store and I don't really see why it's necessary, but that's the only thing I could figure it as being.
 

The Following User Says Thank You to Rob1n For This Useful Post:
Reply


 
Forum Jump


All times are GMT. The time now is 22:57.