View Single Post
Posts: 1,414 | Thanked: 7,547 times | Joined on Aug 2016 @ Estonia
#110
Originally Posted by MartinK View Post
Sure! I've talked a bit more with the NLP lab people and they are indeed fine with both hosting and generating data for OSM Scout Server as long as the data is hosted as static files that are batch-updated once in a while (every few days, etc.). As that condition is satisfied it should be possible to host at least ~100 GB, without transferred bandwidth limitations.

So now I need to know how can I "join" the OSM Scout Server data CDN with these resources. Are there some mirroring scripts I could add to cron or something similar ?

I can also help generate the data sets if needed, on one of the NLP compute machines, a fairly beefy system (220 GB RAM & 48 CPU core) running Fedora 24 with some 200-300 GB of local storage that could be used for the data generation run.
Now a proper reply: Thank you very much for an offer! It would be of great help and I hope that this service would not be abused. We have now some amount of time to figure it out while my current CDN is up.

As it is, I can process the planet reasonably fast on the system available to me. Its not as beefy, but made for serious calculations as well . Maybe later we could move data processing as well, but there is no need now for data processing. Unless they are interested on working on address parsing and NLP approach behind libpostal. Then we can discuss it separately.

At present, I use ~40GB for all datasets. I presume that during dataset version changes (version incompatibilities) we would need double of that space for a week or two. If the other backends would be added, the data requirement may increase, but let's see about that.

Datasets are served statically. In essence, we have a tree with different backends (osmscout, postal, geocoder-nlp) and a JSON file that describes them. By changing a line at https://github.com/rinigus/osmscout-...ded.json#L7112 , I could move all download requests to a new location (when you hit "Update list" in Map Manager you download that file by OSM Scout Server from github).

I presume we would keep conditions human and update once in a month?

So, it would boil down to the way we move the data between the servers. Or if its too complicated, we would just need cron jobs running. But that we could just discuss via PM/email. I'll get in touch with you tomorrow, maybe rather late though.
 

The Following 5 Users Say Thank You to rinigus For This Useful Post: