Menu

Main Menu
Talk Get Daily Search

Member's Online

    User Name
    Password

    Evopedia dump at home

    Reply
    crei | # 1 | 2010-03-10, 19:32 | Report

    Hi!

    Version 3 of Evopedia, the offline Wikipedia reader for (not only) maemo has now been available in Extras-testing for quite a while. Unfortunately there is no recent dump for the English edition of Wikipedia. Evopedia uses a compressed database of pre-rendered article pages (called dump) to deliver articles even faster than when reading them online. The drawback of this strategy is the amount of time needed to pre-render every single article when creating such a dump. This is the reason why there is no current English dump yet.

    To remedy this situation, dump at home was created: A system for distributed rendering of Wikipedia pages. If you have a Linux computer with some spare CPU cycles and want to have a recent English (or any other edition, but at the moment, English has priority) Wikipedia dump, please consider joining the project. More information is available on the dump at home project site: http://dumpathome.evopedia.info/contribute
    Note that the platform is still in some kind of beta state, so please forgive me if there are still some bugs (and please also report them).

    Thanks for your time and interest in the project.

    Edit | Forward | Quote | Quick Reply | Thanks

    Last edited by crei; 2010-03-10 at 19:36.
    The Following 3 Users Say Thank You to crei For This Useful Post:
    jebba, mannakiosk, pelago

     
    jebba | # 2 | 2010-03-10, 22:19 | Report

    Cool! Here's a quick and dirty script that works with Ubuntu Karmic (and likely all Debian):

    http://gitorious.org/freemoe/freemoe...-evopedia-node

    I just started it on an Amazon EC2 node. It appears to be working. When it starts generating results I'll launch more nodes.

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following User Says Thank You to jebba For This Useful Post:
    crei

     
    nidO | # 3 | 2010-03-12, 14:35 | Report

    Am I right in thinking this isn't terribly stable at the moment? I left this running on two VM's overnight and each processed a few chunks fine, then both downloaded new copies of the 11.3GB dump and since this time theyre throwing out various mediawiki/database errors every time they get given a new job to process.

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following User Says Thank You to nidO For This Useful Post:
    crei

     
    crei | # 4 | 2010-03-12, 15:38 | Report

    Originally Posted by nidO View Post
    Am I right in thinking this isn't terribly stable at the moment? I left this running on two VM's overnight and each processed a few chunks fine, then both downloaded new copies of the 11.3GB dump and since this time theyre throwing out various mediawiki/database errors every time they get given a new job to process.
    Thank you for setting up the VMs. I'm seeing that your clients upload zero-length archives and I also got your error logs. With the next automatic update, the issue could be fixed, if only the user database is damaged. If important databases (i.e. databases with content) are damaged, you should remove the files "wikilang", "wikidate" and "commonsdate" in the state subdirectory to force the client to grab new database files.
    If you want to check if everything is ok, you can point your apache to the mediawiki directory and open it in the browser.

    Edit | Forward | Quote | Quick Reply | Thanks

     
    nidO | # 5 | 2010-03-12, 16:16 | Report

    Thanks for the info - It looks like my copies are broken, both clients did update a short while ago but after doing so i'm still getting one of two faults, exactly which one appears varies each time I kick off the client:
    inforrect information errors for ./wikidb/user..frm (the user table seems to be empty, even after the client update)
    Spurious "MediaWiki internal error. Exception caught inside exception handler" faults after the client is assigned a job to do and the static HTML dump folder is created.

    So, i'm now redownloading both the commons and en data from scratch, and will see how it gets on.

    Edit | Forward | Quote | Quick Reply | Thanks

     
    jebba | # 6 | 2010-03-13, 17:17 | Report

    Originally Posted by nidO View Post
    Thanks for the info - It looks like my copies are broken, both clients did update a short while ago but after doing so i'm still getting one of two faults, exactly which one appears varies each time I kick off the client:
    inforrect information errors for ./wikidb/user..frm (the user table seems to be empty, even after the client update)
    Spurious "MediaWiki internal error. Exception caught inside exception handler" faults after the client is assigned a job to do and the static HTML dump folder is created.

    So, i'm now redownloading both the commons and en data from scratch, and will see how it gets on.
    Are you on a 64bit system? You may need to install libc6-i686 (debian) or glibc.i686 (fedora).

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following User Says Thank You to jebba For This Useful Post:
    crei

     
    nidO | # 7 | 2010-03-15, 15:56 | Report

    Originally Posted by jebba View Post
    Are you on a 64bit system? You may need to install libc6-i686 (debian) or glibc.i686 (fedora).
    The VM's are 32-bit - Dump generation is technically functioning, over the space of 2 days I managed to get roughly 150MB of work done on each of 2 VM's, but for some reason every few hours each VM was getting submitting a completed slice back to the system then getting a new job to reload one or both of the commons and en wikidumps, despite the commonsdate and wikidate files locally having the same dump version already stored (ie the dumpathome client was deciding to redownload and overwrite the existing local dumps with exactly the same freshly downloaded dump every few hours).
    After doing so, the client then seems to have about a 50% chance of just carrying on processing fine, or start throwing out the errors listed above. Presumably something's amiss with having to redownload the wiki dumps so frequently in the first place, I can't see any real reason it would need redownloading at all, and having to redownload the 15GB ish of dumps twice a day is taking up more time than the VM's were able to actually spend processing (while both working they were averaging a slice every 10 minutes ish, between them).

    Edit | Forward | Quote | Quick Reply | Thanks

     
    crei | # 8 | 2010-03-18, 18:09 | Report

    We created the first dump! Thanks to all who helped! The dump can be downloaded from http://wiki.maemo.org/Evopedia (I hope the archive does not break again...).

    I'm currently uploading the commons image database and I think we will start the Dutch Wikipedia on the weekend.

    Edit | Forward | Quote | Quick Reply | Thanks

     
    alessandra | # 9 | 2010-08-25, 14:30 | Report

    I intall evopedia with dumps in Italian, but when I open the application to find the dump says he can not find anything as I do? But if I see them on file management with extension IDX and DAT. help me thanks

    Edit | Forward | Quote | Quick Reply | Thanks

     
vBulletin® Version 3.8.8
Normal Logout