|
2009-01-14
, 18:41
|
|
Posts: 3,397 |
Thanked: 1,212 times |
Joined on Jul 2008
@ Netherlands
|
#22
|
I have visited http://download.wikimedia.org/enwiki/latest/ and am currently downloading enwiki-latest-pages-articles.xml.bz2. It's 4.1 GB, so hopefully I can open the bz2 file and delete a few sections that I'm not interested in before copying it onto my 4 GB SD card.
The Following 2 Users Say Thank You to allnameswereout For This Useful Post: | ||
|
2009-01-14
, 18:49
|
|
Posts: 4,930 |
Thanked: 2,272 times |
Joined on Oct 2007
|
#23
|
I just use Wikipedia Offline Reader to read the .xml.bz2. When you run it first time, it will create an index (*.idx.gz and *.blocks.idx). This will take a while. I haven't looked at the code, but if it was programmed smart it takes advantage of bunzip2 being able to uncompress specific bytes of data.
A newer version of this encyclopedia is 402 MB big as *.xml.bz2. It extracts to 1,95 GB *.xml. This is more than 4 times as much, and its all utf-8/unicode text.
|
2009-01-14
, 19:22
|
Posts: 1,208 |
Thanked: 1,028 times |
Joined on Oct 2007
|
#24
|
The Following User Says Thank You to mikkov For This Useful Post: | ||
|
2009-01-15
, 21:45
|
Posts: 110 |
Thanked: 52 times |
Joined on Sep 2007
|
#25
|
|
2009-01-17
, 17:05
|
|
Posts: 3,397 |
Thanked: 1,212 times |
Joined on Jul 2008
@ Netherlands
|
#26
|
I installed the Wikipedia dump reader on my Linux box, and opened the dump file there, to create the index quickly, as suggested before in the thread. Two new files were created: enwiki-latest-pages-articles.blocks.idx (385KB) and enwiki-latest-pages-articles.idx.gz (125MB). Does this mean that I can copy these three files to an 8GB microSD (with adapter) card and I will be able to read the Wikipedia offline on the N810? Has anyone else tried it with a dump this large? How slow is it to lookup a page? (BTW, on my big laptop, creating the index files only took about 2 hours for all 4.1 GB. Based on someone else's comment above, it seems that creating an index for this file on the N810 would take about 4 days. Does this seem about right?)
The Following User Says Thank You to allnameswereout For This Useful Post: | ||
|
2009-01-18
, 02:37
|
Posts: 110 |
Thanked: 52 times |
Joined on Sep 2007
|
#27
|
|
2009-01-21
, 17:25
|
|
Posts: 3,397 |
Thanked: 1,212 times |
Joined on Jul 2008
@ Netherlands
|
#28
|
|
2009-01-21
, 23:13
|
|
Moderator |
Posts: 7,109 |
Thanked: 8,820 times |
Joined on Oct 2007
@ Vancouver, BC, Canada
|
#29
|
It would be nice, BTW, for anyone with any of these multi-GB bundles to post back with the uncompressed size, so others know before they download. (I find it odd that that info is not being posted with the downloads; you'd think it would be a key number for actual use... but it's omitted on both download.wikimedia.org and www.soschildrensvillages.org)
It has about 5500 articles (as much as can be fitted on a DVD with good size images) and is about the size of a twenty volume encyclopaedia (34,000 images and 20 million words).
The Following User Says Thank You to qole For This Useful Post: | ||
|
2009-01-22
, 03:15
|
Posts: 110 |
Thanked: 52 times |
Joined on Sep 2007
|
#30
|
It would be nice, BTW, for anyone with any of these multi-GB bundles to post back with the uncompressed size, so others know before they download. (I find it odd that that info is not being posted with the downloads; you'd think it would be a key number for actual use... but it's omitted on both download.wikimedia.org and www.soschildrensvillages.org)
World's first inductively-charged N900!