Menu

Main Menu
Talk Get Daily Search

Member's Online

    User Name
    Password

    N900, ohmd, syspart, VM & swap tweaks

    Reply
    Page 2 of 3 | Prev |   1   2   3   | Next
    jurop88 | # 11 | 2011-03-17, 12:22 | Report

    Originally Posted by hawaii View Post
    Moving hildon-sv-notification-daemon out of [mediasrc] closes the socket and doesn't allow any sound?
    Why should it? As far as I understood, changing the syspart.conf just changes resources utilization on a per-process basis, and that's the whole reason for ohmd presence. So, it should simply lower their priority.
    What I can affirm is that on my machine in its current state, the baloons are now delayed (also 5 or 10 seconds) while chatting, don't know about emails, but I hear both vibration and notification sounds.

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following 2 Users Say Thank You to jurop88 For This Useful Post:
    Estel, vi_

     
    Dark_Angel85 | # 12 | 2011-03-17, 14:32 | Report

    if your configurations really make the n900 snappier and more responsive without sacrificing anything else, it should really be in a wiki or included via swappolube or something...

    This is just a great work that you've done. Marvellous

    Edit | Forward | Quote | Quick Reply | Thanks

     
    shadowjk | # 13 | 2011-03-20, 05:33 | Report

    As for kernel reporting mmcb blocksize as "512k", it's not. It's saying logical blocksize is 512 bytes. This is meaningless for your purposes though, it only tells you the smallest request size that the mmc will accept. Internally it then translates 512 byte write into a read-modify-erse-write cycle of 128k or 256k, whatever its true block size is.

    This brings us to the "noop" scheduler issue. You are correct that there are no moving parts, but the huge blocksize calls for scheduling writes close to eachother anyway, to minimize the amount of read-modify-erase-write cycles the mmc/usd has to do.

    Imagine if kernel sends request for writing 4k at position 2M, and then 4k at position 8M, and 4k at position 2M+4k, 4k at 8M+4k, and so on. Each request makes the uSD/emmc internally read 128k (assuming that's the true eraseblocksize), change 4k of that 128k, erase another 128k block, write 128k to that block. A write amplification factor of 32. You can divide your raw write rate of a nominal 6Meg/s for Class6 with 32 to get estimated 192 kilobytes/sec...
    So ideally we'd want an elevator that knows about the special properties of flash. but we don't have one, so we use CFQ. which atleast has some heuristics for distributing IO "fairly" between processes.


    Incidentally, this is where the explanation for why moving swap to uSD seems to improve performance begins too.

    The heaviest loads for the emmc is swap, and anything that uses databases like sqlite. That includes dialer and conversations, calendar, and many third party apps. Why is this a heavy load? Because these things typically write tiny amounts of data, and then request fsync() to ensure the data is on the disk. This triggers the writeout of all unwritten data in memory, and updating all the filesystem structures. Remember that a tiny amount of data spread out randomly triggers massive amount of writing internally to the emmc. Worse, while this goes on, all other requests are blocked.
    And what else besides /home and swap is on emmc? /opt. Containing, these days, both apps and vital parts of the OS. The CPU is starved for data, waiting for requests to be written out so that the requests for the executable demand-paged code of apps can complete.

    Btw for Harmattab I'm told sqlite will be using a more optimized db, that essentially works like one gigantic journal. Sequential writing is fast and good on flash, random in-place updates is bad.

    Moving swap to uSD gives a path for swap that is always free (well almost always unless you do heavy acesses to uSD by other means), and offloading swap from emmc means less random IO load on the emmc.

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following 7 Users Say Thank You to shadowjk For This Useful Post:
    bobbydoedoe, Estel, joerg_rw, jurop88, pkz, reinob, Xagoln

     
    bipinbn | # 14 | 2011-03-20, 06:00 | Report

    @jurop88

    lots of respect and thanks..thats fantastic and lots of mindblowing effort you have put in.

    it took me 3 reads just to understand things you have tried out..

    very impressive..hope u do some more r&d and we can make the n900 more better

    Thanks

    Edit | Forward | Quote | Quick Reply | Thanks

     
    jurop88 | # 15 | 2011-03-20, 14:32 | Report

    Hi Shadowjk,

    thank you for you participation.

    Originally Posted by shadowjk View Post
    As for kernel reporting mmcb blocksize as "512k", it's not. It's saying logical blocksize is 512 bytes. This is meaningless for your purposes though, it only tells you the smallest request size that the mmc will accept. Internally it then translates 512 byte write into a read-modify-erse-write cycle of 128k or 256k, whatever its true block size is.
    Fair enough and rather consistent with which I found on the internet. Two questions:
    1) why 512k will mean 512 byte? Can you point me somewhere, also through kernel source? I just started digging on the matter, found relevant code in the mmc driver (I hope to be on the right path to understand something) but I must admit my C knowledge is rather rusty
    2) where to find the true HW block dimension? Is there a place where is it reported or shall I know it directly from the uSD producer?
    The 128k size, though, explains why Nokians choosed to set page-cluster to 5; 32*4=128 and that's it

    Originally Posted by shadowjk View Post
    This brings us to the "noop" scheduler issue. You are correct that there are no moving parts, but the huge blocksize calls for scheduling writes close to eachother anyway, to minimize the amount of read-modify-erase-write cycles the mmc/usd has to do. Imagine... (CUT)
    From Wikipedia,
    Originally Posted by
    The NOOP scheduler inserts all incoming I/O requests into a simple, unordered FIFO queue and implements request merging
    It means, AFAIU, that when a block is ready to be written (request merging), it is written and the memory is freed.
    Wikipedia again,
    Originally Posted by
    CFQ works by placing synchronous requests submitted by processes into a number of per-process queues and then allocating timeslices for each of the queues to access the disk. The length of the time slice and the number of requests a queue is allowed to submit depends on the IO priority of the given process (...) It can be considered a natural extension of granting IO time slices to a process
    So, it doesn't work on a 'try to write as less blocks as possible on the uSD' level but the goal is to give all processes a time slice 'hoping' that most writing and reading will be done in the same area. I gave a quick read at the code, and it looked like an the 'elevator' part has a huge weight, allowing some trackbacks (I am not an expert in this area, so pick everything with a grain of salt). The overhead is rather consistent, and at first sight with almost no advantages in case of a IO device where no mechanical part are moving.
    After having used the setting in the first page for some days, I have to say that with NOOP probably the fragmentation is bigger, but the feeling is that it works faster UNTIL IT WORKS. Another member on the forum (don't remember precisely who) set a swap rotation during the night in order to avoid this fragmentation, and I can confirm that after two days my N900 started 'choking' and a swapon/swapoff/swapon/swapoff let it fly again, in line with identifying the issue due to swap fragmentation.

    Originally Posted by shadowjk View Post
    So ideally we'd want an elevator that knows about the special properties of flash. but we don't have one, so we use CFQ. which atleast has some heuristics for distributing IO "fairly" between processes.
    The argument is that we don't care about 'per process' I/O but exactly 128KB writings in order to speed them as much as possible.
    What we ideally need is a scheduler saying:
    Code:
    - kernel: we need some free room. 
    - scheduler: ok let's have a look at the discardable pages. Here they are. Just a second please. 
    - scheduler choose exactly 128Kb ready for writing (and that's the page-cluster tunable at a kernel level, right?)
    - scheduler frees the memory requested with a single page-writing
    - scheduler: here I am again, you have those requested memory free
    - kernel: thank you
    The fact that then lot of pages are fragmented does not matter since the reading penalty is very low compared to - for example - an HD
    I have already found an example of NOOP scheduler written in C on the internet, and it does not look to much hard to implement. Here we are speaking of brute force, not high math - A simple modified NOOP algorithm good for flash could look like:
    Code:
    - check if the page to be unloaded is already cached and not dirty or in the current queue
    if yes -> load the page requested and discard the unloaded one
    if no -> put in the queue the page to be unloaded and serve the page to be loaded
    is the queue 128Kb? 
    if yes -> write it out and update table of swapped pages
    if no -> job done
    I know that the real writing will be performed by the uSD HW controller, but why the hell the HW controller would split a perfect aligned 128KB writing? Your thoughts? Any kernel gurus in the neighbourhood? Am I missing something? It looks too simple in order nobody thought about it...

    Originally Posted by shadowjk View Post
    Moving swap to uSD gives a path for swap that is always free (well almost always unless you do heavy acesses to uSD by other means), and offloading swap from emmc means less random IO load on the emmc.
    This sounds very reasonable and is consistent with other findings.

    On a side note, I am digging into the ohmd & cgroups realm and I am happy to have learnt lot of things - probably the parameters in the first page will be tuned again after some days of usage and having looked at the patterns arised in terms of load and memory used.
    EDIT - oh, and I forgot to report this https://bugs.maemo.org/show_bug.cgi?id=6203 where many hints on ohmd & syspart are given!

    Edit | Forward | Quote | Quick Reply | Thanks

    Last edited by jurop88; 2011-03-20 at 15:23. Reason: final addition
    The Following 2 Users Say Thank You to jurop88 For This Useful Post:
    Estel, vi_

     
    jurop88 | # 16 | 2011-03-20, 21:21 | Report

    hehe it looks like I made some confusion amongst kswapd and IO scheduler - still learning a lot in this illness period

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following 3 Users Say Thank You to jurop88 For This Useful Post:
    bobbydoedoe, Estel, vi_

     
    ivgalvez | # 17 | 2011-03-24, 12:20 | Report

    Hi jurop88,

    I've spent at least 20 minutes trying to find again this thread as I'm doing some experiments with information that is split across multiple threads:
    1. Swappluble Wiki
    2. Massive interactivity improvement under high I/O load!
    3. Striping swap to increase performance under memory contention
    4. Nokia N900 Smartphone Performance Optimization Tune-up Utilities
    5. Swappolube to lubricate your gui

    And this one

    Have you made any more progress?

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following User Says Thank You to ivgalvez For This Useful Post:
    Sourav.dubey

     
    jurop88 | # 18 | 2011-03-24, 19:21 | Report

    Originally Posted by ivgalvez View Post
    Hi jurop88,

    I've spent at least 20 minutes trying to find again this thread as I'm doing some experiments with information that is split across multiple threads:
    (...)
    Have you made any more progress?
    It almost was same goal on my side. Now I'm back to my job so pace had slowed, but what I can say is that Nokia's engineers already did lot of work on the subject and the phone was probably best optimized for the general use case.
    I since wrote the orginal post made some slight modifications, but still not updated here. Perhaps will do it through the WE

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following 2 Users Say Thank You to jurop88 For This Useful Post:
    Estel, ivgalvez

     
    epitaph | # 19 | 2011-03-25, 07:56 | Report

    Originally Posted by ivgalvez View Post
    Hi jurop88,

    I've spent at least 20 minutes trying to find again this thread as I'm doing some experiments with information that is split across multiple threads:
    1. Swappluble Wiki
    2. Massive interactivity improvement under high I/O load!
    3. Striping swap to increase performance under memory contention
    4. Nokia N900 Smartphone Performance Optimization Tune-up Utilities
    5. Swappolube to lubricate your gui

    And this one


    Have you made any more progress?
    You want to look at the BFS-kernel thread, mlocker-thread ( my signature ) and the 4-Line-Cgroup-Patch, too!

    Edit | Forward | Quote | Quick Reply | Thanks

    Last edited by epitaph; 2011-03-25 at 09:24.

     
    epitaph | # 20 | 2011-05-29, 13:48 | Report

    > partition desktop memory-limit 70M

    When I've cgroups mounted I noticed that the desktop groups only need 25M.

    So, it's better to write partition desktop memory-limit 25M

    or echo "25M" > /dev/cgroup/cpu/desktop/memory.limit_in_bytes.

    Edit | Forward | Quote | Quick Reply | Thanks

     
    Page 2 of 3 | Prev |   1   2   3   | Next
vBulletin® Version 3.8.8
Normal Logout