Troubleshoot RAID Partition Failure due to a Missing Disk
| Limited Availability: Upgrade to SWG 13.0 for enhanced stability and security. Contact Skyhigh Support for assistance. |
This section provides steps to successfully boot a system after a RAID partition failure caused by a missing disk. It requires a new raw disk and a live disk or SWG 13.0 installer ISO.
Issue:
If the system fails to boot after power-on, displaying the message: failed to mount /dev/md0 to /sysroot.
Troubleshooting Steps:
-
Prepare the System:
- Attach a new raw disk (for example, sdb) of the same size as the missing disk (sda).
- Attach the live disk or ISO disk and reboot the system.
-
Access Grub Command Line:
- When the Grub menu appears, press c to enter the Grub command line (grub\>).
- Enter ls to list all attached drives and CD-ROMs.
-
Load Live OS/Installer ISO:
- Execute the following commands to load the kernel and initial ramdisk, then boot into the rescue shell:
- set root=(cd0) (assuming cd0 is your ISO/live CD)
- linux /isolinux/vmlinuz rd.break
- initrd /isolinux/initrd.gz
- boot
- Execute the following commands to load the kernel and initial ramdisk, then boot into the rescue shell:
-
Access Rescue Prompt:
-
As the ISO image boots, you will see a prompt do you want to continue to installation.... Press Ctrl+C to access the rescue prompt.
-
-
Recover RAID Partition:
-
At the rescue prompt, perform the following to recover the RAID partition and mark its status as degraded, allowing a successful boot:
-
Verify RAID Status:
-
Run cat /proc/mdstat.
The output must show md0 and md1 as inactive.
-
-
Assemble RAID Arrays: To recover RAID 1, assemble md0 and md1:
- Mdadm --assemble --run /dev/md0 /dev/sda2
- Mdadm --assemble --run /dev/md1 /dev/sda3
- Reverify RAID Arrays: Check cat /proc/mdstat again to confirm they are active.
Sample Output:
Personalities : [raid1]
md0 : active raid1 sda2[0]
10999808 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sda3[0]
40859648 blocks super 1.2 [2/1] [U_] -
Mount md0 to sysroot:
-
mkdir -p /sysroot
-
mount /dev/mdo /sysroot
-
-
-
-
Activate LVM Volumes:
-
Activate LVM volumes to rebuild initramfs:
-
Run lvm vgchange -an
-
Run lvm vgchange -ay
Expected output: 5 logical volume(s) in volume group “vg00” now active -
Run lvscan
This should show all 5 volumes as active. -
Run ls -l /dev/mapper/
Expected output: vg00-opt, vg00-var, vg00-cache, vg00-tmp, vg00-swap.
-
-
If you are unable to find the above files, then it means they exist but aren’t added.
-
Run the vgmknodes command. This will add all volumes.
-
Verify with the previous step and run ls -l /dev/mapper/ command.
You should be able to see files with a list of vg00-opt, vg00-var, vg00-cache, vg00-tmp, vg00-swap.
-
-
-
Update grub.cfg & Rebuild initramfs:
-
Mount esp sda1 and validate it:
-
mount /dev/sda1/ /sysroot/boot/efi/
-
-
Mount LVM volumes:
-
mount /dev/mapper/vg00-tmp /sysroot/tmp
-
mount /dev/mapper/vg00-var /sysroot/var
-
mount /dev/mapper/vg00-opt /sysroot/opt
-
-
Mount other required directories:
-
mount --bind /dev/ /sysroot/dev/
-
mount --bind /proc/ /sysroot/proc/
-
mount --bind /sys/ /sysroot/sys/
-
-
Switch to chroot:
-
chroot /sysroot/
-
-
Mark sda as degraded:
-
Get the UUID of md0:
-
Run the mdadm --detail --scan command and run the UUID of md0.
For example, UUID=2cd677de:435c12b6:0a7caef0:5f29cb88
-
-
Edit the grub file:
-
vim /etc/default/grub
-
Find the line GRUB_CMDLINE_LINUX and append rd.md.uuid=*uuidofMD0*:degraded
(for example, rd.md.uuid=2cd677de:435c12b6:0a7caef0:5f29cb88:degraded)

-
Save the file and exit.
-
-
-
Run command for 2nd disk (sdb):
-
Mwg-raid add sdb (Ignore any errors)
-
-
Update the grub.cfg:
-
grub2-mkconfig -o /boot/grub2/grub.cfg. (ignore the warnings)
-
-
Regenerate initramfs:
-
Dracut --force
-
-
Exit chroot & reboot:
-
exit
-
reboot
-
-
-
Post-Reboot Steps:
-
Run Mwg-raid add sdb once the system reboots successfully. This will set up the RAID partition and take time for syncing.
-
Check the status periodically with mwg-raid info until md0 and md1 show as clean.
-
Edit the /etc/default/grub file, remove the rd.md.uuid=*uuidofMD0*:degraded entry, and save and exit.
-
Run the grub2-mkconfig -o /boot/grub2/grub.cfg command to refresh the grub conf with the latest change.
-
Run the dracut --force command to reflect the change.
-
Verify the disk status by running lsblk and cat /proc/mdstat to confirm the system is functional.
-
The system boot is successful, and SWG 13.0 is recovered.
