maemo.org - Talk

maemo.org - Talk (https://talk.maemo.org/index.php)
-   General (https://talk.maemo.org/forumdisplay.php?f=7)
-   -   Infrastructure maintainance on 19.11. (https://talk.maemo.org/showthread.php?t=98329)

fstern 2016-11-18 12:33

Infrastructure maintainance on 19.11.
 
Hi everybody,

sorry for the short notice but we will do some heavy maintainance to the maemo.org infrastructure tomorrow, starting at 10:00 CET (09:00 UTC).

All systems will be affected.

We expect to be down for at least 6 hours as we do upgrades on the underlying hypervisors.

What we will do:
  • Do an image backup of all machines
  • Upgrade the underlying hypervisors
  • Upgrade individual machines

Sorry for any inconvenience this might cause.

Best,

Falk

peterleinchen 2016-11-18 14:14

Re: Infrastructure maintainance on 19.11.
 
Thanks for notificatiln.

@tmo admin
possibly to be made sticky on overall level?

fstern 2016-11-19 22:57

Re: Infrastructure maintainance on 19.11.
 
Hi everyone,

tl;dr: half of infrastucture broken, fix expected early next week, film at eleven.

This maintainance didn't go to plan, here's a short post-mortem:

Timeline:

10:00 - start updates and backups on blade-a
14:30 - backups and updates complete on blade-a, reboot confirmed successful
14:31 - uptime induced filesystem check after 1347 days
15:00 - start of backups on blade-b
17:12 - filesystem check complete, blade-a up and running
17:30 - first systems on blade-a confirmed up and working
18:30 - software upgrade on stage and mail complete
20:15 - backups of blade-b finished and copied onto blade-a backup space
20:16 - start of updates on blade-b
21:00 - updates on blade-b complete, reboot
21:01 - blade-b stuck in boot with corrupt bios image in flash
23:30 - all available remote recovery options tried, none working
23:40 - decision to go for Plan B, boot talk.maemo.org on blade-a, redirect everything else to talk.m.o
23:45 - blade-b turned off through IPMI
23:53 - talk.m.o available again

Fallbacks in place:

www.maemo.org, wiki.maemo.org, garage.maemo.org are redirected to talk.maemo.org

Next Action Items:

I'll visit the datacenter monday after work (around 18:00 CET) to try to recover the bios of the broken machine with a physical USB stick.

If this is successful we'll migrate talk.m.o back to it's original host and reenable www.m.o, wiki.m.o, garage.m.o through DNS after the VMs and the blade are confirmed working


Best,

xes & falk

pichlo 2016-11-20 07:14

Re: Infrastructure maintainance on 19.11.
 
My browser complaints about a wrong certificate; is this a side effect of the update? Is it temporary?
(Details: the name on the cert does not match the URL.)

peterleinchen 2016-11-20 07:56

Re: Infrastructure maintainance on 19.11.
 
1 Attachment(s)
Quote:

Originally Posted by joerg_rw (Post 1519058)
many thanks for this massive effort ...

+1

A hint for all remaining N9 user: we have again no automatic network (WLAN auto/manual) detection. A nice screenshot attached (maybe later, my N9 does not let me select it :))

--edit
Quote:

Originally Posted by pichlo (Post 1519060)
My browser complaints about a wrong certificate; is this a side effect of the update? ...

Guess so as these corrections/redirections were also made earlier this year.

xes 2016-11-20 16:16

Re: Infrastructure maintainance on 19.11.
 
2 Attachment(s)
Let me share the screen that our Supermicro server showed to reward us for a day of work...
http://www.supermicro.nl/products/sy...cfm?parts=SHOW

Then, we also discovered that Supermicro wants money to obtain a license to flash bios remotely using the IPMI.
(anyway, we are not sure this could work to recovery the bios)

Supermicro: really, thanks.

Win7Mac 2016-11-20 20:58

Re: Infrastructure maintainance on 19.11.
 
Possible to replace the chip?

xes 2016-11-20 23:40

Re: Infrastructure maintainance on 19.11.
 
@win7mac
at the moment i can't say which is the "weight" of the problem we are facing until tomorrow Falk will make some tests while trying to restore the blade.

Then, while with your personal pc / board / laptop you can try whatever you want and any hack, any trick is done because you have nothing to loose, with servers you have to enter in a different perspective where you have to consider risks, best options, time to fix, quality of result and possibility to make more damages.

So, my reply is: i think that no one tries to remove a chip from a server mainboard without a spare board or without a warranty of result.

Win7Mac 2016-11-21 00:28

Re: Infrastructure maintainance on 19.11.
 
I wasn't suggesting any tricks or hacks. Some BIOS are replaceable, but since it's not listed on that parts list, that's probably not an option. :(

fstern 2016-11-21 06:28

Re: Infrastructure maintainance on 19.11.
 
Quote:

Originally Posted by joerg_rw (Post 1519131)
plus we have two spare blades, incl BIOS chips (if the flash of the now-down blade is actually defect)
edit: I think it would actually be a great opportunity to swap the blades for wear leveling

No, we don't. All we have ist two empty slots in the Chassis.

Best,

Falk


All times are GMT. The time now is 17:19.

vBulletin® Version 3.8.8