Jerry Jelinek's blog

Search
Close this search box.

Solaris Volume Manager root mirror problems on S10

April 1, 2005

There are a couple of bugs that we found in S10
that make it look like Solaris Volume Manager root
mirroring does not work at all. Unfortunately
we found these bugs after the release went out.
These bugs will be patched but I wanted to describe
the problems a bit and describe some workarounds

On a lot of systems that use SVM to do root mirroring
there are only two disks. When you set up the configuration
you put one or more metadbs on each disk to hold the
SVM configuration information.

SVM implements a metadb quorum rule which means that
during the system boot, if more than 50% of the metadbs
are not available, the system should boot into single-user
mode so that you can fix things up. You can read more
about this

here
.

On a two disk system there is no way to set things
up so that more than 50% of the metadbs will be available
if one of the disks dies.

When SVM does not have metadb quorum during the boot it
is supposed to leave all of the metadevices read-only and
boot into single-user. This gives you a chance to confirm that
you are using the
right SVM configuration and that you don’t corrupt any of
your data before having a chance to cleanup the dead metadbs.

What a lot of people do when when they set up a root
mirror is pull one of the disks to check if the system
will still boot and run ok. If you do this experiment
on a two disk configuration running S10 the system will
panic really early in the boot process and it will go
into a infinite panic/reboot cycle.

What is happening here is that we found a bug related
to UFS logging, which is on by default in S10. Since
the root mirror stays read-only because there is no
metadb quorum we hit a bug in the UFS log rolling code.
This in turn leaves UFS in a bad state which causes
the system to panic.

We’re testing the fix for this bug right now but in the
meantime, it is easy to workaround this bug by just
disabling logging on the root filesystem. You can do
that be specifying the “nologging” option in the last
field of the vfstab entry for root. You should reboot
once before doing any SVM experiments (like pulling a disk)
to ensure that UFS has rolled the log and is no longer
using logging on root.

Once a patch for this bug is out you will definitely want to remove
this workaround from the the vfstab entry since UFS logging
offers so many performance and availability benefits.

By they way, UFS logging is also on by default in the S9 9/04
release but that code does not suffer from this bug.

The second problem we found is not as serious as the UFS bug.
This has to do with an interaction with the

Service Management Facility (SMF)

which is new in S10 and again, this is related to not
have metadb quorum during the boot. What should happen
is that the system should enter single-user so you can clean
up the dead metadbs. Instead it boots all the way to multi-user
but since the root device is still read-only things don’t work
very well. This turned out to be a missing dependency which
we didn’t catch when we integrated SVM and SMF. We’ll have
a patch for this too but this problem is much less serious.
You can still login as root and clean up the dead metadbs so that
you can then reboot with a good metadb quorum.

Both of the problems result because there is no metadb quorum
so the root metadevice remains read-only after a boot with
a dead disk. If you have a third disk which you can use to
add a metadb onto, then you can reduce the likelihood of hitting
this problem since losing one disk won’t cause you to lose
quorum during boot.

Given these kinds of problems you might wonder why does SVM
bother to implement the metadb quorum? Why not just trust
the metadbs that are alive? SVM is conservative and always
chooses the path to ensure that you won’t lose
data or use stale data. There are various corner cases to
worry about when SVM cannot be sure it is using the most current
data. For example, in a two disk mirror configuration, you might run
for a while on the first disk with the second disk powered down. Later
you might reboot off the second disk (because the disk was
now powered up) and the first disk might now be powered down.
At this point you would be using the stale data on the mirror,
possibly without even realizing it. The metadb quorum rule
gives you a chance to intervene and fix up the configuration when
SVM cannot do it automatically.

17 Responses

  1. This quorum problem is precisely the reason that I have suggested (no formal RFE, yet) that Sun hardware come with a small solid state disk (think USB thumb drive) to hold a copy of the metadb’s.
    I haven’t filed this RFE because it seems to make more sense to me to simply implement hardware mirroring in all of the hardware. Not like the V440 did it with only 2 of the 4 disks, however.

  2. Thats an interesting idea Mike, what could be cool would be to make use of the NVRAM, and store 1 metadb copy there. Therefore if you have a two disk system like a V210, and say 3 metadb copies per disk, the nvram metadb could help provide quorum in the case of a total disk or metadb corruption (or removable for testing).

  3. I believe I was bit by this bug. Boy did it confuse me. I didn’t open a case, because I had no idea to explain what happened. I also assumed it was a UFS bug, so I removed the logging value from /etc/vfstab. I didn’t realize I needed to add nologging to force it off. I was so close! Then I got into what I believe is the loop you suggest. If you could post the bug ID or patch number to your blog when it’s created I would be very interested in following it.
    <hr>

    Rebooting with command: boot
    Boot device: /pci@1c,600000/scsi@2/disk@0,0:a File and args:
    SunOS Release 5.10 Version Generic 64-bit
    Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
    WARNING: Error writing ufs log state
    WARNING: ufs log for / changed state to Error
    WARNING: Please umount(1M) / and run fsck(1M)
    panic[cpu1]/thread=180e000: Could not install the isa driver
    000000000180b970 unix:post_startup+48 (0, 117ce0c, 0, 10, 180ba14, 18ab000)
    %l0-3: 000000000181a000 00000300003abd08 0000000000000000 0000000000000001
    %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000030001a8bd40
    000000000180ba20 genunix:main+b8 (1813c98, 1011c00, 1834340, 18a7c00, 0, 1813800)
    %l0-3: 000000000180e000 0000000000000001 000000000180c000 0000000001835200
    %l4-7: 0000000070002000 0000000000000001 000000000181ba54 0000000000000000
    syncing file systems… done
    skipping system dump – no dump device configured
    rebooting…

  4. Sounds like a two disk mirror is not all that usfull, at least not in my case as my goal was to shorten the recovery time after a disk failure.
    But the “fix” seems easy and cheap: I’ll just have to hunt down a third drive, just big enough to hold a third metadb. So tomorrow I’ll walk the halls looking for a SCSI disk maybe still inside some old unused SPARC-5. Finally a good use for all those old drives!
    Question: Do I need to worry about performance? I assume not because I assume the metadb is not acessed frequently. Those old drives are very slow.

  5. A two disk mirror should be useful.
    Ufortunately we have these various bugs
    that I have described which make it appear
    not to work at all once one of your disks
    dies. Using a 3rd, small disk is a great
    solution to this problem. I blogged elsewhere
    about even using a USB memory stick as a 3rd
    disk for maintaining mddb quorum. There are
    still some subtle issues with that approach
    that I haven’t had time to track down yet, but
    a regular, small, old, cheap and slow disk is
    just perfect for sticking a 3rd mddb onto.
    Aside from configuration changes, the only
    IO to those disks could be for mirror
    resync regions but if you had that disk
    as the 3rd mddb then that is very unlikely.
    Even if we were writing the resync data out
    there, that would be the only IO and I don’t
    think it would impact you.
    Thanks,
    Jerry

  6. I think a fix for these problems on a 2 disk system is critical. Systems like the 280R, V480, V490, … only have 2 internal disks. In my world, we have systems like this that only need the amount of storage on the 2 internals to do their jobs. Others may be attached to a SAN for application data. I’m guessing you dont want to have metadb replications on SAN attached storage. Thanks to all above for the info.

  7. I usually use the non-symmetric metadb approach in this case: i add three metadbs to disk1, and four to disk2.
    If disk1 fails, I still have more than 50% of the replica. If disk2 fails, I have less than 50% and have to fix my metadb. So, I have a 50% chance not to run into problems.
    Whereas the 3/3 metadb configuration has the disadvantage that I have a 100% chance to get trouble.
    By the way, what happend to the
    set md:mirrored_root_flag=1
    option in /etc/system? It seems it does no longer work…

  8. > By the way, what happend to the
    >
    > set md:mirrored_root_flag=1
    >
    > option in /etc/system? It seems it does
    > no longer work
    Sorry about the delayed response to this question.
    I found this problem a while ago and I filed this
    bug:
    6272573 mirrored_root_flag no longer works
    This is fixed in the current Solaris Express
    releases and in the OpenSolaris code. Also,
    somebody is working on backporting the fix
    to an S10 update release.
    Jerry

  9. Is this not fixed in 120537-03 ? It seems not. That patch is supposed to fix:
    6236382 boot should enter single-user when there is no mddb quorum
    but it does not appear to fix it on my v120 test system with sol10ga+latest recommended patches which includes 120537-03.

  10. There are two issues. Going into single
    user is fixed but the underlying problem
    with the UFS logging bug was just fixed
    in nevada and won’t be in a patch yet.
    This is bug 6215065 and that will get in the
    way of the other fixes.

  11. Ok thanks Jerry – interestingly it seems the md:mirrored_root_flag problem is fixed in the latest builds. I’m running snv_27 sx:cw on another box and have been testing these issues on this also. Alas the panic/reboot cycle still happens, but if you have md:mirrored_root_flag=1 the system successfully boots into multiuser mode.
    I look forward to the other bug being fixed. For something so fundamental I’m amazed its taken 11 months to come up with a fix.

  12. I just found this bug myself and got around it by booting from a DVD and editing the /etc/vfstab and /etc/system files so that the root device wasn’t a mirror.
    I have just installed the latest patches (as of
    9 Feb 2006). Has this problem been fixed yet?

  13. It seems this is fixed but still no patch for solaris 10 available.
    Bug ID 6215065 now reports “fix delivered” and integrated into build s10u2_03 but no “fixed by patch” entry.

  14. It appears this is now fixed in patch:
    120254-02
    for sol 10 sparc. I will be investigating to see if it is fixed and report back if it is.

  15. Hi Jerry,
    unfortunately I have set up two X2100 production server with a S10 two disk RAID-1 configuration which includes mirroring the root filesystem. I already had set up “nologging” for /, but I still does not know how to carefully handle the case of a disk failure.
    Wouldn’t it makes sense to just unmirror at least root filesystem before a disk failure occurs? What about regularly rsyncing / on both disks with each other to keep the root filesystem in sync if unmirroring is recommendable? If one disk fails you should be able to boot at least from the other one and after disk replacement user data still should be resynced automatically.
    Let me ask a last question regarding the metadb quorum stuff. What’s the problem with storing just additional database state replicas on an USB stick. If a disk fails in a two disk RAID-1 configuration you still should have more that 50% of the metadbs available. By the way: Is it feasable to add metadbs to a running setup?
    I am using S10 03/05 on one machine and S10 01/06 on the other one. Because I have gotten the recommendation not to put three or more metadbs on each disk on an x86 system, I have configured RAID-1 with only two metdbs each.

  16. Unfortunately it would seem that this issue is *still* not fixed as of x86 Sol10u3 (11/06) with recent patches.
    Host: x4100 with two internal SAS disks. One of the disks has gone screwy, and the system wedged. When I reset and try to boot (even with -s tacked onto multiboot for single-user) I get:
    "WARNING: Error writing ufs log state
    WARNING: ufs log for / changed state to Error
    WARNING: Please umount(1M) / and run fsck(1M) "
    I’m trying to netboot off a jumpstart server to get a prompt so I can mount without UFS logging and deal with the metadb replicas but not having much luck so far.
    I’ve submitted a Sunsolve case regarding this bug but haven’t yet gotten anyone to acknowledge it. The closest is a claim that I need to have an altbootpath defined in bootenv.rc but it doesn’t seem to be as though that’s relevant here.
    It does seem, though, that we should disable the default UFS logging on /, which I really don’t want to have to do (and shouldn’t have to, especially for a bug that’s gone unfixed for 2.5 years).

Recent Posts

September 23, 2010
September 13, 2010
May 26, 2009

Archives