That was NOT FUN

Early Sunday afternoon, there was apparently a power and/or UPS failure in the datacenter at my office. By 5pm, most of the systems were back up except for one that I’m responsible for.

Fire up the VPN, hit the ILO remote console (one of the few things that makes using x86 systems as servers bearable) and see “Cannot find boot.bin” Uh-oh. I head to the office at 6pm.

It seems that a certain Sun patch turns non-GRUB systems into non-bootable systems after application. There’s a specific set of steps to follow when this patch is installed and a system is “upgraded” to the GRUB bootloader, but apparently Sun’s “smpatch” utility does not follow these steps. The patch had been applied months ago, but the system didn’t get rebooted until the power outage.

I figured “Okay, the system was running Solaris 10 FCS, so its time to do an upgrade install of S10u4 anyway”. After some other problems and workarounds, four hours later, I watch in resignation as the install hangs and locks up (not accepting keyboard input at a Y/N prompt) while trying to install the CPQary package. This package is the drivers that Solaris x86 needs in order to use the hardware RAID built into the Compaq DL360.

I bite the bullet and do a “nuke from orbit” fresh install of S10u4, planning to restore from backups. I ended up having to rebuild most of the services on the box by hand (which was better in the long run, as things needed cleaning up) as our backup system had also been affected by the power outage and it wasn’t available until Monday morning.

To make a long story short, I went to the office at 6pm Sunday, and finally walked out of my office to go home and get some sleep at 10:30am Monday. 16 hours is the longest single shift I’ve ever pulled anywhere, and certainly the longest after-hours session.

I’ve got one more service to restore onto the box on Tuesday, but it’s non-critical and could wait until I got some rest.

I can’t complain – I might have incidents like these once or twice a year, and it’s a lot better than getting called or paged every other day like I was used to at my last job. I really like my job and my managers and coworkers.

Leave a Reply

Your email address will not be published. Required fields are marked *