maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   Development (https://talk.maemo.org/forumdisplay.php?f=13)
-   -   how no-lifeguard-reset works? (https://talk.maemo.org/showthread.php?t=83352)

AapoRantalainen 2012-03-30 23:21

how no-lifeguard-reset works?
 
When removing some 'essential' files (e.g. camera-ui) and phone goes to reboot loop and it is then cured with:
Code:

flasher-3.5 --set-rd-flags=no-lifeguard-reset
What is really happening?
What is causing reboot loop (dsme?)? How essential files are defined? How to add/remove something to this list?

Why it is not recommended to set permanently that rd-flag?

I know that I can check current flags with
Code:

cal-tool -f

AapoRantalainen 2012-04-04 09:14

Re: how no-lifeguard-reset works?
 
I made testing with own hildon-desktop and I found it ultimately handy to kill system's hildon-desktop and then start non-installed version on hildon-desktop.
i.e.

as root
Code:

killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop

#And then start own version
(as user)
Code:

maemo-invoker ~/hildon-desktop.launch
I realized that if phone has --set-rd-flags=no-lifeguard-reset this will work. But if it is not in RD-mode, phone will reboot when hildon-desktop dies.

My questions
* How system decides that hildon-desktop is a critical application?
*How I can see which applications are critical?
*How I can change list of critical applications?

Consider my questions as I would make own operating system based on Maemo5.

vi_ 2012-04-04 09:36

Re: how no-lifeguard-reset works?
 
Quote:

Originally Posted by AapoRantalainen (Post 1187844)
I made testing with own hildon-desktop and I found it ultimately handy to kill system's hildon-desktop and then start non-installed version on hildon-desktop.
i.e.

as root
Code:

killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop
killall hildon-desktop ; /etc/osso-af-init/launch-wrapper.sh stop hildon-desktop /usr/bin/hildon-desktop

#And then start own version
(as user)
Code:

maemo-invoker ~/hildon-desktop.launch
I realized that if phone has --set-rd-flags=no-lifeguard-reset this will work. But if it is not in RD-mode, phone will reboot when hildon-desktop dies.

My questions
* How system decides that hildon-desktop is a critical application?
*How I can see which applications are critical?
*How I can change list of critical applications?

Consider my questions as I would make own operating system based on Maemo5.

How system decides that hildon-desktop is a critical application?


An application is 'critical' when it is required for the n900 to run. Imagine hildon-desktop was crashing due to some weird bug with a widget or somthing and It keeps crashing. Either you have a mostly broken semi-working device or you just turn it off and back on again to sort the issue. Did you ever see that UK comedy 'The IT crowd'? It is a comedy show about the IT support staff in a big company. Their running joke is 'have you turned it off and back on again?' The n900 automatically turns itself off and back on again when something goes massively wrong.

Hildon-desktop crashing/killed 5 times in a row? HOLY SHíT, SYSTEM IN MELTDOWN!! RESTART NOW!!

No camera program?? HOW?? RESTART NOW!!!!!11!


There are 2 items used to detect when the shít has hit the fan and trigger a reboot.

1. The watchdog timer (WD). This is a hardware clock that counts down to 0 (possibly up, however this is just an example). There is software that stops it by resetting it back to a value (like 255 or something). If the software crashes, it will stop resetting the counter, the counter will count down to 0 and the CPU will trigger a reboot. The counter takes around 15 seconds to count down (hence BME, BQ27XXX module crash after short time after BME start stop thing.).

2. DSME. something STATE MANAGMENT ENTITY. This is the program responsible for 'tickling' (resetting) the WD timer. When DSME stops tickling, the CPU stops lolling and a reboot is triggered.

So a process is 'critical' when it is started with DSME. When a process that is started with DSME fails a particular number of times or won't start or crashes (all these rules are defineable) DSME will force a reboot. If DSME fails or stops tickling, the WD will force a reboot.


How I can see which applications are critical?

Unfortunatly DSME does not have a 'list' function. However, if you read ALL the scripts in:

Code:

/etc/init.d/
/etc/event.d/
/etc/X11/xsession-post.d
/etc/X11/xsession.d

You will find various programs being started by DSME including hildon desktop and camera-ui etc.

Start by looking in the xsession folders.


How I can change list of critical applications?

By editing the way DSME launches the critical processes in the various start up scripts found in the various locations above.

***BIG FAT WARNING!!***


This is 'going deep', as deep as the maemo 5 Mariana trench. These hackjob bootup scripts are what holds this whole rust bucket together. Make a backup and be ready to re-flash before you bugger up the whole thing.

Consider my questions as I would make own operating system based on Maemo5.

lolwut?

AapoRantalainen 2012-04-04 10:11

Re: how no-lifeguard-reset works?
 
Quote:

Originally Posted by vi_ (Post 1187850)
2. DSME. something STATE MANAGMENT ENTITY. This is the program responsible for 'tickling' (resetting) the WD timer.

So a process is 'critical' when it is started with DSME.

How I can see which applications are critical?

Start by looking in the xsession folders.

This is 'going deep' as deep as the maemo 5 Mariana trench. These hackjob bootup scripts are what holds this whole rust bucket together. Make a backup and be ready to re-flash before you bugger up the whole thing.

Thank you very much. I have one N900 solely for experimental/stupid/weird hacks. My current record (true case, not joke) is 12 reflash on one evening.

-
I'm not sure is this better to handle via patching dsme:
dsme-0.60.48+0m5/util/kicker.c:117: if (strstr(p, "no-omap-wd")) {
dsme-0.60.48+0m5/modules/lifeguard.c:1161: if (strstr(p, "no-lifeguard-reset")) {

Or starting (e.g.) hildon-desktop with another way than dsme.

(Truly speaking I'm not sure what is my goal)

vi_ 2012-04-04 10:17

Re: how no-lifeguard-reset works?
 
Quote:

Originally Posted by AapoRantalainen (Post 1187865)
Thank you very much. I have one N900 solely for experimental/stupid/weird hacks. My current record (true case, not joke) is 12 reflash on one evening.

-
I'm not sure is this better to handle via patching dsme:
dsme-0.60.48+0m5/util/kicker.c:117: if (strstr(p, "no-omap-wd")) {
dsme-0.60.48+0m5/modules/lifeguard.c:1161: if (strstr(p, "no-lifeguard-reset")) {

Or starting (e.g.) hildon-desktop with another way than dsme.

(Truly speaking I'm not sure what is my goal)

I think patching DSME would be silly when you can just change the way hildon desktop is launched with vim in about 10 seconds.

javispedro 2012-04-04 10:27

Re: how no-lifeguard-reset works?
 
touch /etc/no_lg_reboots

vi_ 2012-04-04 11:16

Re: how no-lifeguard-reset works?
 
Quote:

Originally Posted by javispedro (Post 1187869)
touch /etc/no_lg_reboots

Can you explain this one? Does DSME look for this file?

AapoRantalainen 2012-04-04 12:39

Re: how no-lifeguard-reset works?
 
Code:

./dsme-0.60.48+0m5/modules/lifeguard.c:84:#define FILE_REBOOT_OVERRIDE  "/etc/no_lg_reboots"
Code:

./dsme-0.60.48+0m5/modules/lifeguard.c:825:    if (access(FILE_REBOOT_OVERRIDE, F_OK) != 0) {
Code:

./dsme-0.60.48+0m5/debian/changelog:967:  * Lifeguard reboots are disabled if /etc/no_lg_reboots exists

reinob 2012-04-04 13:54

Re: how no-lifeguard-reset works?
 
Hi all,

I checked the startup folders mentioned by @vi_ and made this little list of programs started by dsmetool:

Note that -t means the device will reboot after N (default 10) restarts, while -r will reboot immediately on exit, and -f will stop trying after N restarts.

Using -r (most critical)
/etc/init.d/dbus (-r)
/etc/event.d/xomap (-n -8 -r)

Using -t (critical)
/etc/init.d/dnsmaq (-n -1 -t)
/etc/init.d/hulda (-n -1 -t)
/etc/init.d/ke-recv (-n 1 -t)
/etc/init.d/mce (-n 1 -t)
/etc/init.d/wlancond (-n 1 -t)
/etc/X11/Xsession.d/03gtk2-engines-sapwood (-t)
/etc/X11/Xsession.d/03osso-systemui (-n -1 -t)
/etc/X11/Xsession.d/65hildon-sv-notification (-t)
/etc/event.d/mce (-n -1 -t)
/etc/X11/Xsession.post/15hildon-status-menu (-t)
/etc/X11/Xsession.post/17camera-ui (-t)
/etc/X11/Xsession.post/22camera-ui (-t) (why twice???)
/etc/X11/Xsession.post/18hildon-home (-t)
/etc/X11/Xsession.post/20hildon-desktop (-c 3 -T 180 -m -17 -t)
/etc/X11/Xsession.post/22clipboard-manager (-t)
/etc/X11/Xsession.post/24connui-conndlgs (-t)
/etc/X11/Xsession.post/25hildon-input-method-configurator (-t)
/etc/X11/Xsession.post/30tablet-browser-daemon (-c 3 -T 180 -m -17 -t)

Using -f (can be stopped)
/etc/init.d/alarmd (-f)
/etc/init.d/clockd (-f)
/etc/init.d/icd2 (-m -17 -f)
/etc/init.d/iphbd (-f)
/etc/init.d/wappushd (-m -17 -f)
/etc/X11/Xsession.post/66maesync-controller (-n 10 -f)
/etc/X11/Xsession.post/68syncd (-n 10 -f)
/etc/X11/Xsession.post/32mafw*
(calls /usr/bin/mafw.sh, which uses dsmetool -f)

Also, in /etc/event.d we have:
* bme: will call dsmetool --reboot on post-stop
* dsme-dbus: calls dsmetool --start-dbus
and
* dsme-thermal: when "wide thermal limits" calls dsmetool -a
*no idea* what that means. -a is not shown in the help.

Lotsa things to investigate. Too little time..

Add.: -a is an option when dsmetool is compiled for "TA" (type approval). Still don't know what that is, but is apparently irrelevant for us.

retsaw 2012-04-04 14:37

Re: how no-lifeguard-reset works?
 
Quote:

Originally Posted by reinob (Post 1187939)
/etc/X11/Xsession.post/17camera-ui (-t)
/etc/X11/Xsession.post/22camera-ui (-t) (why twice???)

On my 3 N900s, one well used and customised PR1.3, one fairly clean with stable CSSU, and the other vanilla PR1.3 (mostly used for NITDroid), none of them have that second entry, although one has "/etc/X11/Xsession.post/22cl-launcher".


All times are GMT. The time now is 17:53.

vBulletin® Version 3.8.8