Skip to main content

Check out Interactive Visual Stories to gain hands-on experience with the SSE product features. Click here.

Skyhigh Security

Troubleshoot RAID Partition Failure due to a Missing Disk

Limited Availability: Upgrade to SWG 13.0 for enhanced stability and security. Contact Skyhigh Support for assistance. 

This section provides steps to successfully boot a system after a RAID partition failure caused by a missing disk. It requires a new raw disk and a live disk or SWG 13.0 installer ISO.

Issue:

If the system fails to boot after power-on, displaying the message: failed to mount /dev/md0 to /sysroot.

Troubleshooting Steps:

  1. Prepare the System:

    1. Attach a new raw disk (for example, sdb) of the same size as the missing disk (sda).
    2. Attach the live disk or ISO disk and reboot the system.
  2. Access Grub Command Line:

    1. When the Grub menu appears, press c to enter the Grub command line (grub\>).
    2. Enter ls to list all attached drives and CD-ROMs.
  3. Load Live OS/Installer ISO:

    1. Execute the following commands to load the kernel and initial ramdisk, then boot into the rescue shell: 
      1. set root=(cd0) (assuming cd0 is your ISO/live CD)
      2. linux /isolinux/vmlinuz rd.break
      3. initrd /isolinux/initrd.gz
      4. boot
  4. Access Rescue Prompt:

    1. As the ISO image boots, you will see a prompt do you want to continue to installation.... Press Ctrl+C to access the rescue prompt.

  5. Recover RAID Partition:

    1. At the rescue prompt, perform the following to recover the RAID partition and mark its status as degraded, allowing a successful boot:

      1. Verify RAID Status: 

        1. Run cat /proc/mdstat.
          The output must show md0 and md1 as inactive.

      2. Assemble RAID Arrays: To recover RAID 1, assemble md0 and md1:

        1. Mdadm --assemble --run /dev/md0 /dev/sda2
        2. Mdadm --assemble --run /dev/md1 /dev/sda3
           
      3. Reverify RAID Arrays: Check cat /proc/mdstat again to confirm they are active. 

        Sample Output:

        Personalities : [raid1]

        md0 : active raid1 sda2[0]

        10999808 blocks super 1.2 [2/1] [U_]

        md1 : active raid1 sda3[0]

        40859648 blocks super 1.2 [2/1] [U_]
      4. Mount md0 to sysroot:

        1. mkdir -p /sysroot

        2. mount /dev/mdo /sysroot

  6. Activate LVM Volumes:

    1. Activate LVM volumes to rebuild initramfs:

      1. Run lvm vgchange -an

      2. Run lvm vgchange -ay
        Expected output: 5 logical volume(s) in volume group “vg00” now active

      3. Run lvscan
        This should show all 5 volumes as active.

      4. Run ls -l /dev/mapper/
        Expected output: vg00-opt, vg00-var, vg00-cache, vg00-tmp, vg00-swap. 

    2. If you are unable to find the above files, then it means they exist but aren’t added.

      1. Run the vgmknodes command. This will add all volumes.

      2. Verify with the previous step and run ls -l /dev/mapper/ command.
        You should be able to see files with a list of  vg00-opt, vg00-var, vg00-cache, vg00-tmp, vg00-swap.

  7. Update grub.cfg & Rebuild initramfs:

    1. Mount esp sda1 and validate it:

      1. mount /dev/sda1/ /sysroot/boot/efi/

    2. Mount LVM volumes:

      1. mount /dev/mapper/vg00-tmp /sysroot/tmp

      2. mount /dev/mapper/vg00-var /sysroot/var

      3. mount /dev/mapper/vg00-opt /sysroot/opt

    3. Mount other required directories:

      1. mount --bind /dev/ /sysroot/dev/

      2. mount --bind /proc/ /sysroot/proc/

      3. mount --bind /sys/ /sysroot/sys/

    4. Switch to chroot:

      1. chroot /sysroot/

    5. Mark sda as degraded:

      1. Get the UUID of md0:

        1. Run the mdadm --detail --scan command and run the UUID of md0.
          For example, UUID=2cd677de:435c12b6:0a7caef0:5f29cb88

      2. Edit the grub file:

        1. vim /etc/default/grub

        2. Find the line GRUB_CMDLINE_LINUX and append rd.md.uuid=*uuidofMD0*:degraded
          (for example, rd.md.uuid=2cd677de:435c12b6:0a7caef0:5f29cb88:degraded)
          clipboard_e4fd0177059f18b55791cbeef9e0fd4a7.png

        3. Save the file and exit.

    6. Run command for 2nd disk (sdb):

      1. Mwg-raid add sdb (Ignore any errors)

    7. Update the grub.cfg:

      1. grub2-mkconfig -o /boot/grub2/grub.cfg. (ignore the warnings)

    8. Regenerate initramfs:

      1. Dracut --force

    9. Exit chroot & reboot:

      1. exit

      2. reboot

  8. Post-Reboot Steps:

    1. Run Mwg-raid add sdb once the system reboots successfully. This will set up the RAID partition and take time for syncing.

    2. Check the status periodically with mwg-raid info until md0 and md1 show as clean.

    3. Edit the /etc/default/grub file, remove the rd.md.uuid=*uuidofMD0*:degraded entry, and save and exit.

    4. Run the grub2-mkconfig -o /boot/grub2/grub.cfg command to refresh the grub conf with the latest change.

    5. Run the dracut --force command to reflect the change.

    6. Verify the disk status by running lsblk and cat /proc/mdstat to confirm the system is functional.

The system boot is successful, and SWG 13.0 is recovered.

 

  • Was this article helpful?