Notices


Reply
Thread Tools
Posts: 257 | Thanked: 1,182 times | Joined on Aug 2016 @ Estonia
#1
I would like to announce a system monitoring solution that is developed to be lightweight and provide information relevant to mobile devices. Its based on Sailfish port of collectd, rrdtool, and the GUI SystemDataScope.

Example of the questions this solution can answer you: does my phone enter sleep? for how long? which CPU frequencies are used? when did my RAM run out? whats a battery current? how much cellular traffic did I use? All these questions are relevant for monitoring different aspects of your device performance, such as determining reasons behind battery drain, for example.

It is expected that users are running this solution 24x7 without any noticeable impact on battery life, CPU, RAM, and storage usage.

collectd

Homepage: https://collectd.org/
Sailfish port: https://github.com/rinigus/collectd
Packages: https://openrepos.net/content/rinigus/collectd

rrdtool

Sailfish packaging scripts: https://github.com/rinigus/pkg-rrdtool
Packages: https://openrepos.net/content/rinigus/rrdtool

SystemDataScope

Homepage: https://github.com/rinigus/systemdatascope
Packages: https://openrepos.net/content/rinigus/systemdatascope

Screenshots: see SystemDataScope and collectd OpenRepos packages description.

When in use, the data is recorded by collectd and stored in RRD datasets. collectd runs as a daemon that should be enabled on installation. Data can be visualized using SystemDataScope that uses rrdtool to generate graphs. SystemDataScope also shows selected graph on a cover allowing you to follow recent data (whether your device is entering CPU sleep, for example).

Goal is to allow keep records that cover several time windows from hours, days, up-to a year in the default configuration. The data can be viewed on a device as well as the GUI allows you to generate the reports that you could send as a feedback on relevant forums.

The collected data covers CPU usage and sleep, battery, RAM and storage, network, radios, system load and processes.

In addition to overall system data, you could also follow some specific apps and see their CPU, RAM, I/O usage. This is useful if you develop an app and want to profile it.

Used resources

collectd has a very small CPU (~0.1% of wall time), RAM (~15MB RSS for collectd and datasets kept in RAM), and storage (~10MB for all default datasets) impact. collectd wakes up the device once in 2.5 minutes to perform the readout. SystemDataScope's main impact is through RAM usage which is ~55MB RSS that's used by GUI and rrdtool running in the background. CPU usage is negligible when minimized (redrawing of the cover once in 2 minutes) and an average when you scroll through the graphs. As such, I would expect that it has no noticeable impact.


Current state

In general, all is expected to work. There are several plugins that I developed for collectd and which are not merged yet with upstream. Taking into account the earlier experience with the other plugins, I expect that the recorded data may change for these plugins and the users would have to remove the old datasets recorded by these plugins. Eventually, when everything is merged with upstream such inconvenience would disappear.

If something does not work, please send your bug reports via GitHub by opening an Issue or reporting here.

Licenses: Open Source, see corresponding package or module.

Last edited by rinigus; 2016-10-08 at 17:16.
 

The Following 15 Users Say Thank You to rinigus For This Useful Post:
Posts: 74 | Thanked: 153 times | Joined on Sep 2016 @ Yekaterinbourg, Russia
#2
Hello, rinigus.
Tnanks for useful application, but sometimes I see on the charts Battery_current=0 A and Battery_power_consumption=hundreds mW at the same time period(a few hours). It's strange.
 
Posts: 257 | Thanked: 1,182 times | Joined on Aug 2016 @ Estonia
#3
Originally Posted by XOleg View Post
Hello, rinigus.
Tnanks for useful application, but sometimes I see on the charts Battery_current=0 A and Battery_power_consumption=hundreds mW at the same time period(a few hours). It's strange.
Sounds strange indeed. Since we have several layers interacting, the error could be in any of them (GUI, collectd, or statefs). When you see zero current, would you mind to check on whether statefs reports it correctly:

cat /run/state/namespaces/Battery/Current

If that shows correct non-zero value and you get 0 in the graphs then we should look further. For that, maybe you could also post some example graphs (you could generate them with Gui/Report), your current status (Gui/Status) and /etc/collectd.conf ? You could easily do that by opening an issue on https://github.com/rinigus/systemdatascope

cheers,

rinigus
 
Posts: 74 | Thanked: 153 times | Joined on Sep 2016 @ Yekaterinbourg, Russia
#4
Originally Posted by rinigus View Post
If that shows correct non-zero value and you get 0 in the graph... You could easily do that by opening an issue on https://github.com/rinigus/systemdatascope
Yes, I'm sure it's correct non-zero value for current, it's only 0 in the graph. This for standbye mode for night hours. Big value for power consumption is strange. IMHO this value continues from past, does not change to XX mA.
Tonight it was better for current/power_consumption, i.e. 0mA/98mW for graphs. Then, 98mW/4.15V=23.5mA. It's excessively for standbye, but this is other problem... Maybe in history value for current is other?..

I'll try for github, it's not very simple :-) I don't know how add screenshot here from PC.
 

The Following 2 Users Say Thank You to XOleg For This Useful Post:
Posts: 257 | Thanked: 1,182 times | Joined on Aug 2016 @ Estonia
#5
Originally Posted by XOleg View Post
Yes, I'm sure it's correct non-zero value for current, it's only 0 in the graph. This for standbye mode for night hours. Big value for power consumption is strange. IMHO this value continues from past, does not change to XX mA.
Tonight it was better for current/power_consumption, i.e. 0mA/98mW for graphs. Then, 98mW/4.15V=23.5mA. It's excessively for standbye, but this is other problem... Maybe in history value for current is other?..

I'll try for github, it's not very simple :-) I don't know how add screenshot here from PC.
Morning! You may have solved this problem then

With screenshots, I usually mail them to myself and then upload from PC. But with the latest version of SystemDataScope you could generate report (Pulley menu, Report). When you call this function, all defined graphs are saved as PNG under /home/nemo/Documents/SystemDataScope/[DateTime]. You could see them under Gallery app with your photos. Select and send the ones that are needed and, after sending them, delete the folder /home/nemo/Documents/SystemDataScope. That would also clean up your Gallery from all the graphs. The main advantage of the report graphs is that they are made on white background and each file contains exactly one graph. If its easier to paste pictures on talk.maemo.org - please do so.

Coming back to your problem. It could be induced by the fact that your device is in deep sleep. In general, its a very good news and the small annoyance with the graphs should not disturb us from the fact that your device is behaving as intended.

If your explanation is right (which it probably is), you should see very long times under CPU details/CPU sleep details/Duration of a single suspend. Under long times I mean anything significantly longer than 150 seconds. If you are going to post the graphs, then please post also "CPU sleep" and "Duration of a single suspend".

The problem that you see could be a more general issue that is hard to solve at present. In general, data is acquired by collectd in several threads and written to RRDs. On PC, where CPU is always on, that works very well. On Sailfish, I have to wakeup the device using keepalive library, readout the data, and let the device go to sleep. I suspect that in you case, and it has happened on my phone as well, sometimes phone either does not wakeup (keepalive event is not fired?) or the phone manages to fall asleep faster than the data is recorded. Since there is inevitable variability in such wakeup/sleep cycle, I had to increase the allowed time-window for RRD writing which could lead to the effect that you see. Namely, old data gets interpolated over longer period of time.

If it is a problem with collectd not being able to record all the data during awake window, this may get fixed when upstream developers will help me with the port. I did submit the merge request in summer (https://github.com/collectd/collectd/pull/1736), but, due to the fact that its rather complex problem, I haven't had a chance to work on it with the developers who know how to tackle collectd multi-threaded internals

The interpolation is a problem for the values that are recorded as just "current values". Namely, you record a datapoint and hope that its an adequate representation of a variable during that time-window. Values in statefs are representing "current values" and, as a result, could fluctuate a bit too much. I presume that's why you see sometimes so high power consumption that is later interpolated over all deep sleep time window.

Fortunately, many values reported by the kernel are using additive approach. For example, kernel keeps counters on internet connections that are incremented. For these values, collectd takes derivative which would be a more accurate way to represent the network traffic irrespective on whether device was sleeping in between or not. These should be better represented in your case as well (CPU sleep, for example).

I hope that this explanation is helpful and, if your explanation is right, there are no problems with your device and you managed to hit either an issue with collectd data recording or that Sailfish just ignored keepalive request and does not wakeup in between. If it is collectd problem, we'll get a chance to work on it as a part of https://github.com/collectd/collectd/pull/1736.

cheers,

rinigus
 

The Following 2 Users Say Thank You to rinigus For This Useful Post:
Posts: 74 | Thanked: 153 times | Joined on Sep 2016 @ Yekaterinbourg, Russia
#6
Hi, rinigus.
Many thanks for this detailed explanation. I'll see and try.
This story with battery is by origin from N9. It's good application for N9(and symbian devices) EnergyProfiler. I know current in standbye mode(only 2G=on) for N9 is 7...9mA(2mA for N52) and for battery 1450 mAh -> 1450/9=161h=6.7days. Aqua Fish battery have 2500 mAh and for current 23 mA I have 108.7h=4.5days only. And it's close to my practice for this moment. It's bad.
I don't know why consumption is three times more but I would like to reduce consumption. I know the difference between N9 and Jolla_C, but... :-)

Last edited by XOleg; 2016-10-12 at 10:04.
 

The Following 2 Users Say Thank You to XOleg For This Useful Post:
Posts: 257 | Thanked: 1,182 times | Joined on Aug 2016 @ Estonia
#7
Originally Posted by XOleg View Post
Hi, rinigus.
Many thanks for this detailed explanation. I'll see and try.
This story with battery is by origin from N9. It's good application for N9(and symbian devices) EnergyProfiler. I know current in standbye mode(only 2G=on) for N9 is 7...9mA(2mA for N52) and for battery 1450 mAh -> 1450/9=161h=6.7days. Aqua Fish battery have 2500 mAh and for current 23 mA I have 108.7h=4.5days only. And it's close to my practice for this moment. It's bad.
I don't know why consumption is three times more but I would like to reduce consumption. I know the difference between N9 and Jolla_C, but... :-)
... and I started with this project since the port of SFOS to Nexus 4 was not able to have enough power for a day . Don't know what would it be on full standby since I have to use it.

I guess obvious places to look at are:

* CPU sleep %.
* for how long a single sleep lasts
* is sleep is interrupted frequently, check number of forks per second
* check whether suspend attempts have a high success rate. if there are many failures, you can try to debug why
* check distribution of CPU frequencies. usually, the lowest frequency is dominating since your phone is mainly waiting for some network package to arrive
* cellular / wifi radio signal strength

Good luck!

rinigus
 

The Following User Says Thank You to rinigus For This Useful Post:
Posts: 74 | Thanked: 153 times | Joined on Sep 2016 @ Yekaterinbourg, Russia
#8
@rinigus Battery_current, Battery_power_consumption are stranges. Current is very low for this consumption. I'm not a programmer, I don't understand how this is possible.
https://ptpb.pw/_Fyj.png
https://ptpb.pw/qSnv.png
https://ptpb.pw/fhZh.png
https://ptpb.pw/RD1P.png
https://ptpb.pw/5HVM.png
https://ptpb.pw/UMD2.png
https://ptpb.pw/G_T-.png
https://ptpb.pw/2W1S.png
https://ptpb.pw/IG8j.png
https://ptpb.pw/wCmQ.png
 

The Following User Says Thank You to XOleg For This Useful Post:
Posts: 257 | Thanked: 1,182 times | Joined on Aug 2016 @ Estonia
#9
Originally Posted by XOleg View Post
@rinigus Battery_current, Battery_power_consumption are stranges. Current is very low for this consumption. I'm not a programmer, I don't understand how this is possible.
https://ptpb.pw/_Fyj.png
https://ptpb.pw/qSnv.png
https://ptpb.pw/fhZh.png
https://ptpb.pw/RD1P.png
https://ptpb.pw/5HVM.png
https://ptpb.pw/UMD2.png
https://ptpb.pw/G_T-.png
https://ptpb.pw/2W1S.png
https://ptpb.pw/IG8j.png
https://ptpb.pw/wCmQ.png
@XOleg, thank you for posting graphs. And thank you for using Report feature - as its exactly intended for what you did!

From the graphs I can see that the device wakes up once in ~150 s and spends probably about 4.5 s awake. I think that the reporting on collectd side should be fine here.

Maybe on your device StateFS values are not updated during this period and that's why you see bogus 0 on current. While you stated earlier that the value is non-zero during sleep, we would have to see it by calling cat on /run/state/namespaces/Battery/Current . Tricky part is that you have to do it while its awaking in deep sleep. One way is to start terminal and enter

sleep 15m; date; cat /run/state/namespaces/Battery/Current

and hope that it will get fired during deep sleep part (sleep
would usually be counted on awake CPU time). If its zero then there is nothing I can do - its just reported wrong by the OS. If its some other value and collectd is reporting it wrong then we would look into it deeper.

Now, looking on your graphs all seems to be fine with the exception of used CPU frequency. For some reason, your phone does not use frequencies below 800MHz. On my OnePlus X and Nexus 4, the lowest frequency was used the most. I think that's where you should get some improvement.

I think I saw something regarding optimization of kernel settings in TJC. You may want to check out there or some AquaFish-related forums over here.

Please let us know whether the changes in governor would help. Would be great to see frequency distribution and whether it helped your battery. Even if we cannot get current graph always perfectly, battery % reduction would already tell something.
 
Posts: 96 | Thanked: 113 times | Joined on Apr 2012
#10
I can not download SystemDataScope. It complains about libkeepalive-glib.rpm in repos.
 
Reply

Thread Tools

 
Forum Jump


All times are GMT. The time now is 16:46.