Just a minor speedbump on this weekend’s project…

I recently bought a Via PC2500E motherboard in order to build a low-power replacement for my current backup / rsnapshot system, a 2.5-year-old Dell SC420.

Got all the parts in, assembled the system tonight, and went to boot Solaris Express Community Edition; the latest release being Nevada B83a. I got the ISO image downloaded, burned a DVD, and booted the system.

For some reason, it boots straight into a grub prompt, and not into the installer or even a boot menu.

Oh well; now I have to spend another three hours downloading the latest Solaris Express Developer Edition, which is Nevada B79b. Hopefully it will work properly on the Via board.

I really don’t want to have to run FreeBSD 7; it’s ZFS support is still a little unreliable.

That was NOT FUN

Early Sunday afternoon, there was apparently a power and/or UPS failure in the datacenter at my office. By 5pm, most of the systems were back up except for one that I’m responsible for.

Fire up the VPN, hit the ILO remote console (one of the few things that makes using x86 systems as servers bearable) and see “Cannot find boot.bin” Uh-oh. I head to the office at 6pm.

It seems that a certain Sun patch turns non-GRUB systems into non-bootable systems after application. There’s a specific set of steps to follow when this patch is installed and a system is “upgraded” to the GRUB bootloader, but apparently Sun’s “smpatch” utility does not follow these steps. The patch had been applied months ago, but the system didn’t get rebooted until the power outage.

I figured “Okay, the system was running Solaris 10 FCS, so its time to do an upgrade install of S10u4 anyway”. After some other problems and workarounds, four hours later, I watch in resignation as the install hangs and locks up (not accepting keyboard input at a Y/N prompt) while trying to install the CPQary package. This package is the drivers that Solaris x86 needs in order to use the hardware RAID built into the Compaq DL360.

I bite the bullet and do a “nuke from orbit” fresh install of S10u4, planning to restore from backups. I ended up having to rebuild most of the services on the box by hand (which was better in the long run, as things needed cleaning up) as our backup system had also been affected by the power outage and it wasn’t available until Monday morning.

To make a long story short, I went to the office at 6pm Sunday, and finally walked out of my office to go home and get some sleep at 10:30am Monday. 16 hours is the longest single shift I’ve ever pulled anywhere, and certainly the longest after-hours session.

I’ve got one more service to restore onto the box on Tuesday, but it’s non-critical and could wait until I got some rest.

I can’t complain – I might have incidents like these once or twice a year, and it’s a lot better than getting called or paged every other day like I was used to at my last job. I really like my job and my managers and coworkers.

Fired up the T1000 tonight

and boy, is it LOUD. Not just loud, but uncomfortably loud.

My wife is still asleep in bed, so I shut it down before the extended POST finished – so I don’t have any idea yet if it quiets down after boot (like a SB1K does).

I’m going to have to setup a stand for it in my closet and run power and ethernet under the door, it looks like. Leaving it on my desk in the “lab” is right out at this point. They should include earplugs in the country kit (which they forgot with my system – but that just means a power cord, and I’ve got plenty of those around).

The need for a universal file system format

After the past 24 hours, I’ve come to the conclusion that there needs to be a universal file system format that has the same support on all operating systems.

My main “server” system here at the house is a Dell PowerEdge SC420 with a 2.5Ghz Celeron-D CPU, 1G RAM, 160G SATA HD, and GigE. Since I got the machine (for $250 during one of Dell’s CRAAAZY DEALS last year), I’ve been running Fedora Core 4 on it with no problems. On top of FC4, I use rsnapshot to do nightly backups of my colocated server and some client machines.

I decided a few days ago that it was time to ditch FC4 and put Solaris 10 on the machine now that all the hardware is fully supported. First, however, I needed to get my rsnapshot repository off the machine. That was accomplished with a 250G SATA hard drive and a SATA to USB2 adapter with power supply. Now I had my critical data on an ext2-formatted hard drive.

I proceeded to reinstall the Dell with the latest Solaris Express release. I then installed the ext2fs drivers for Solaris 10, and attempted to rsync the data back off the hard drive. Five minutes in, the system wedges hard and requires a reboot.

Okay, so that’s not going to work. I carry the HD and adapter back into the other room, plug it into the Mac, and install ext2fsx. When I try to mount the drive, it complains about a bad superblock. So, a couple hours of forced-fsck_ext2 later, I can mount the drive.

When I try to rsync from the Mac over the network to the Dell, the Mac gripes about filenames on the ext2 partition. Crap. That’s not going to work either, and I don’t have another Linux box to mount the HD on.

It was then that I realized I didn’t *need* another permanent Linux installation. I downloaded Knoppix, booted it on my AMD64 Windows gaming box, then plugged the USB/SATA HD in. It was detected and mounted right up, and has been happily rsync-ing everything back to the Dell/Solaris system for the past couple of hours.

I know that in my situation, having a couple of big disks sitting on an NFS server would have been the easiest way to do things. Others might have suggested FAT32, however my rsnapshot backup repository makes heavy use of UNIX hard links, and would not be “portable” to FAT32.

This all demonstrates the need for a truly portable filesystem that can be easily transported between operating systems without having to use ugly hacks. I’m hoping that ZFS might eventually be the solution, if Sun ports it to Linux as rumored and even maybe OSX.

I’m wondering if it would be usable on single disks, since everything I’ve seen seems to emphasize its mirroring/redundancy and handling of multi-disk pools over its non-dependency on byte endianness and portability between CPU architectures.

More OpenSolaris goodness

I’ve gotten the last two OpenSolaris source releases (20050701 and 20050720) to build without a problem on my Ultra 60. Build time went from 4:00 (initial OS release, building on top of Solaris Express) to almost 6:30 (building OS on top of itself, in debug mode) but I had no problems. Looks like they’ve updated the BFU install process to automatically run acr to resolve conflicts.

Sun Microsystems Inc.   SunOS 5.11      mrbill  Jul. 27, 2005
SunOS Internal Development:  root 2005-07-27 [mrbill]
bfu'ed from /opensolaris/mrbill/archives/sparc/nightly on 2005-07-28
Sun Microsystems Inc.   SunOS 5.11      snv_16  October 2007