beaglebros.com v2.0

HOME SOLARIS/UNIX RANDOM CRAP EMPEG BEAGLEBROS CA IPv6
empegVNC empeg-preinit Solaris toolchain RAID-1 empeg-download

RAID-1


Quite a while ago, while unemployed, I agreed to help develop some RAID support on the empeg. I got a reasonable way towards this goal, but was unable to continue when it got to some final points.

The major remaining technical problem is that the kernel thinks that the RAID has been corrupted because it's been shut down hard. I think that this is probably solvable by marking the RAID (not just the filesystem). This, of course, generates a new problem when emplode-ing, as not only do the filesystems need to be changed to rw, but the RAID sets as well.

Then there's the problem of actually getting this installed. I don't know how to automate it, and it's going to take a loooooong time, anyway, as it takes a loooooong time to sync.

Anyway, I'm going to have to call it quits. Here are my notes so far. My work should be reproducible. I didn't work with a Hijack kernel, only the stock empeg kernel, but I don't think Mark's changes should have a huge impact on the patches that I installed; I just didn't feel like working against a moving target, at least not one moving that fast.

  • empeg 2.00b13 kernel is 2.2.14
  • The builtin RAID support was updated to 0.90 RAID via the use of the patch at ftp://kernel.org/pub/linux/daemons/raid/alpha/raid0145-19990824-2.2.11.gz (no longer available at kernel.org — local copy)
  • This patch did not apply cleanly. There were partition-related patches that had moved entirely. Unclean patches related to asm-ppc and arch/sparc64 were ignored. The only significant in the failed genhd had to do with autodetect_raid(). The others were largely whitespace related.
  • I also needed to apply a patch to fix asm/unaligned.h, which referenced a nonexistant function. It was fixed using the patch at http://www.linuxhq.com/kernel/v2.2/patch/pre-patch-2.2.18-10/linux.18p10_include_asm-arm_unaligned.h.html (no longer available — local copy as a diff file). There should be a better source, as I was forced to copy and paste from a browser.
  • Also installed were the 0.90 raidtools, available at ftp://kernel.org/pub/linux/daemons/raid/alpha/raidtools-19990824-0.90.tar.gz (no longer available at kernel.org — local copy).
  • There is a bug in the 0.90 raidtools tweaked by the empeg install. In raid_io.c:sanity_checks(), it will fail to open /etc/mtab, but then tries to close the returned NULL. This can be fixed by moving the close() within the preceding if-statement.
  • At this point RAID-1 appears to be working. Creating the main music repository seems to be taking baout 2 hours for a system w/2 10GB drives.
  • Creating a filesystem using mke2fs while the RAID was being built seemed to work fine. However, mounting said filesystem induced a kernel panic: B_FREE inserted into queues. I'll try again, but wait for the RAID to be fully created first.
  • Waiting until the RAID was fully built allowed me to create and mount it with no problems.
  • I went back and rebuilt the kernel to support autodetection of the RAID set. This change, coupled with changing some of the device nodes on the empeg's /dev filesystem allowed the empeg to boot properly and mount the RAIDed filesystem with no interaction and no new software.
  • On-empeg steps so far: move /dev/hda{a,c}4 to /dev/hd{a,c}4.orig. create /dev/md0 (b 9,0). create bogus /dev/hda4 (b 22,68) (see below). link /dev/hda4 to /dev/md0. change partition tags of /dev/had{a,c}4 to 0xFD for RAID autodetect. create /etc/raidtab. Plus kernel and raidtools.
  • The bogus /dev/hdc4 must point to an ext2 filesystem. This is because the player software requires that it be mountable if /proc/ide/hdb exists. I'm now just using /dev/hdc5, since it's otherwise unused. This will have to change before full RAIDing of all filesystems will work. I'll probably have to use a loop device w/ losetup.
  • First failure test was a reasonable success. While the empeg didn't recover gracefully at runtime (this might partially be due to the fact that my bogus hdc4 was on the “failed” disk), it did recover after a boot, which is certainly acceptable for this purpose. Failure was simulated by simply pulling the cable off of hdc.
  • losetup has been built, along with a new kernel, and it all seems to work just fin. /dev/loop0 is (b 7,0).
  • What partitions should be mirrored? Partition 4, definitely. This is the music partition. Partition 5, too. This is the root, and if it fails, all of this is pointless. Partition 6, swap? No. Swap is not usually turned on and swapping on a RAID-1 won't work anyway. Partition 3, raw? If it will work. Partition 1 is just the extended partition container, so there's no need for that. The only other partition is partition 2, which is empty by default. However, it will be trivial to set it up, so we might as well.
  • I managed to get partitions 5, 4, & 3 RAIDed. Partition 5 (the boot partition) had to be created by using a different partition as the root partition (I chose hda2 and copied all the files over via tar). The root partition can be changed by setting ROOT_DEV in arch/arm/kernel/setup.c.
  • Having the kernel autodetect the RAID arrays and then using one of them as the root partition seems to work fine just by setting the kernel to use the RAID device as the root partition seems to work fine just by setting the kernel to use the RAID device as the root partition. See info about about setup.c for more info.
  • However, since the power is simply removed from the empeg, the RAID arrays are not turned off properly. Look into the powerfail_interrupt() function in arch/arm/special/empeg_state.c for the most appropriate place to fix this. Also look into how the kernel automatically stops the RAID arrays on a more normal shutdown. Supposedly, no user-space program need be run.

last updated $Date: 2005/03/02 22:41:24 $