Solaris Volume Manager root mirror problems on S10

April 1, 2005

There are a couple of bugs that we found in S10
that make it look like Solaris Volume Manager root
mirroring does not work at all. Unfortunately
we found these bugs after the release went out.
These bugs will be patched but I wanted to describe
the problems a bit and describe some workarounds

On a lot of systems that use SVM to do root mirroring
there are only two disks. When you set up the configuration
you put one or more metadbs on each disk to hold the
SVM configuration information.

SVM implements a metadb quorum rule which means that
during the system boot, if more than 50% of the metadbs
are not available, the system should boot into single-user
mode so that you can fix things up. You can read more
about this

here.

On a two disk system there is no way to set things
up so that more than 50% of the metadbs will be available
if one of the disks dies.

When SVM does not have metadb quorum during the boot it
is supposed to leave all of the metadevices read-only and
boot into single-user. This gives you a chance to confirm that
you are using the
right SVM configuration and that you don’t corrupt any of
your data before having a chance to cleanup the dead metadbs.

What a lot of people do when when they set up a root
mirror is pull one of the disks to check if the system
will still boot and run ok. If you do this experiment
on a two disk configuration running S10 the system will
panic really early in the boot process and it will go
into a infinite panic/reboot cycle.

What is happening here is that we found a bug related
to UFS logging, which is on by default in S10. Since
the root mirror stays read-only because there is no
metadb quorum we hit a bug in the UFS log rolling code.
This in turn leaves UFS in a bad state which causes
the system to panic.

We’re testing the fix for this bug right now but in the
meantime, it is easy to workaround this bug by just
disabling logging on the root filesystem. You can do
that be specifying the “nologging” option in the last
field of the vfstab entry for root. You should reboot
once before doing any SVM experiments (like pulling a disk)
to ensure that UFS has rolled the log and is no longer
using logging on root.

Once a patch for this bug is out you will definitely want to remove
this workaround from the the vfstab entry since UFS logging
offers so many performance and availability benefits.

By they way, UFS logging is also on by default in the S9 9/04
release but that code does not suffer from this bug.

The second problem we found is not as serious as the UFS bug.
This has to do with an interaction with the

Service Management Facility (SMF)
which is new in S10 and again, this is related to not
have metadb quorum during the boot. What should happen
is that the system should enter single-user so you can clean
up the dead metadbs. Instead it boots all the way to multi-user
but since the root device is still read-only things don’t work
very well. This turned out to be a missing dependency which
we didn’t catch when we integrated SVM and SMF. We’ll have
a patch for this too but this problem is much less serious.
You can still login as root and clean up the dead metadbs so that
you can then reboot with a good metadb quorum.

Both of the problems result because there is no metadb quorum
so the root metadevice remains read-only after a boot with
a dead disk. If you have a third disk which you can use to
add a metadb onto, then you can reduce the likelihood of hitting
this problem since losing one disk won’t cause you to lose
quorum during boot.

Given these kinds of problems you might wonder why does SVM
bother to implement the metadb quorum? Why not just trust
the metadbs that are alive? SVM is conservative and always
chooses the path to ensure that you won’t lose
data or use stale data. There are various corner cases to
worry about when SVM cannot be sure it is using the most current
data. For example, in a two disk mirror configuration, you might run
for a while on the first disk with the second disk powered down. Later
you might reboot off the second disk (because the disk was
now powered up) and the first disk might now be powered down.
At this point you would be using the stale data on the mirror,
possibly without even realizing it. The metadb quorum rule
gives you a chance to intervene and fix up the configuration when
SVM cannot do it automatically.

17 Responses

MIke Gerdts says:

April 1, 2005 at 9:47 am

This quorum problem is precisely the reason that I have suggested (no formal RFE, yet) that Sun hardware come with a small solid state disk (think USB thumb drive) to hold a copy of the metadb’s.
I haven’t filed this RFE because it seems to make more sense to me to simply implement hardware mirroring in all of the hardware. Not like the V440 did it with only 2 of the 4 disks, however.
Ian McGinley says:

April 5, 2005 at 8:57 pm

Thats an interesting idea Mike, what could be cool would be to make use of the NVRAM, and store 1 metadb copy there. Therefore if you have a two disk system like a V210, and say 3 metadb copies per disk, the nvram metadb could help provide quorum in the case of a total disk or metadb corruption (or removable for testing).
Jerry Uanino says:

April 19, 2005 at 6:31 am

I believe I was bit by this bug. Boy did it confuse me. I didn’t open a case, because I had no idea to explain what happened. I also assumed it was a UFS bug, so I removed the logging value from /etc/vfstab. I didn’t realize I needed to add nologging to force it off. I was so close! Then I got into what I believe is the loop you suggest. If you could post the bug ID or patch number to your blog when it’s created I would be very interested in following it.
<hr>

Rebooting with command: boot
Boot device: /pci@1c,600000/scsi@2/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic 64-bit
Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
WARNING: Error writing ufs log state
WARNING: ufs log for / changed state to Error
WARNING: Please umount(1M) / and run fsck(1M)
panic[cpu1]/thread=180e000: Could not install the isa driver
000000000180b970 unix:post_startup+48 (0, 117ce0c, 0, 10, 180ba14, 18ab000)
%l0-3: 000000000181a000 00000300003abd08 0000000000000000 0000000000000001
%l4-7: 0000000000000000 0000000000000000 0000000000000000 0000030001a8bd40
000000000180ba20 genunix:main+b8 (1813c98, 1011c00, 1834340, 18a7c00, 0, 1813800)
%l0-3: 000000000180e000 0000000000000001 000000000180c000 0000000001835200
%l4-7: 0000000070002000 0000000000000001 000000000181ba54 0000000000000000
syncing file systems… done
skipping system dump – no dump device configured
rebooting…
Peter Tribble says:

April 19, 2005 at 6:44 am

So if you get into this particular hole (the infinite panic-reboot cycle), what’s the best way to get out of it?
Chris Albertson says:

August 4, 2005 at 8:17 pm

Sounds like a two disk mirror is not all that usfull, at least not in my case as my goal was to shorten the recovery time after a disk failure.
But the “fix” seems easy and cheap: I’ll just have to hunt down a third drive, just big enough to hold a third metadb. So tomorrow I’ll walk the halls looking for a SCSI disk maybe still inside some old unused SPARC-5. Finally a good use for all those old drives!
Question: Do I need to worry about performance? I assume not because I assume the metadb is not acessed frequently. Those old drives are very slow.
gerald.jelinek says:

August 5, 2005 at 4:19 pm

A two disk mirror should be useful.
Ufortunately we have these various bugs
that I have described which make it appear
not to work at all once one of your disks
dies. Using a 3rd, small disk is a great
solution to this problem. I blogged elsewhere
about even using a USB memory stick as a 3rd
disk for maintaining mddb quorum. There are
still some subtle issues with that approach
that I haven’t had time to track down yet, but
a regular, small, old, cheap and slow disk is
just perfect for sticking a 3rd mddb onto.
Aside from configuration changes, the only
IO to those disks could be for mirror
resync regions but if you had that disk
as the 3rd mddb then that is very unlikely.
Even if we were writing the resync data out
there, that would be the only IO and I don’t
think it would impact you.
Thanks,
Jerry
Jim Willey says:

September 9, 2005 at 8:09 am

I think a fix for these problems on a 2 disk system is critical. Systems like the 280R, V480, V490, … only have 2 internal disks. In my world, we have systems like this that only need the amount of storage on the 2 internals to do their jobs. Others may be attached to a SAN for application data. I’m guessing you dont want to have metadb replications on SAN attached storage. Thanks to all above for the info.
Peter Bauer says:

October 26, 2005 at 11:23 pm

I usually use the non-symmetric metadb approach in this case: i add three metadbs to disk1, and four to disk2.
If disk1 fails, I still have more than 50% of the replica. If disk2 fails, I have less than 50% and have to fix my metadb. So, I have a 50% chance not to run into problems.
Whereas the 3/3 metadb configuration has the disadvantage that I have a 100% chance to get trouble.
By the way, what happend to the
set md:mirrored_root_flag=1
option in /etc/system? It seems it does no longer work…
Jerry Jelinek says:

November 3, 2005 at 1:48 pm

> By the way, what happend to the
>
> set md:mirrored_root_flag=1
>
> option in /etc/system? It seems it does
> no longer work
Sorry about the delayed response to this question.
I found this problem a while ago and I filed this
bug:
6272573 mirrored_root_flag no longer works
This is fixed in the current Solaris Express
releases and in the OpenSolaris code. Also,
somebody is working on backporting the fix
to an S10 update release.
Jerry
Michael Pye says:

December 7, 2005 at 6:38 am

Is this not fixed in 120537-03 ? It seems not. That patch is supposed to fix:
6236382 boot should enter single-user when there is no mddb quorum
but it does not appear to fix it on my v120 test system with sol10ga+latest recommended patches which includes 120537-03.
Jerry Jelinek says:

December 7, 2005 at 8:39 am

There are two issues. Going into single
user is fixed but the underlying problem
with the UFS logging bug was just fixed
in nevada and won’t be in a patch yet.
This is bug 6215065 and that will get in the
way of the other fixes.
Michael Pye says:

December 7, 2005 at 9:11 am

Ok thanks Jerry – interestingly it seems the md:mirrored_root_flag problem is fixed in the latest builds. I’m running snv_27 sx:cw on another box and have been testing these issues on this also. Alas the panic/reboot cycle still happens, but if you have md:mirrored_root_flag=1 the system successfully boots into multiuser mode.
I look forward to the other bug being fixed. For something so fundamental I’m amazed its taken 11 months to come up with a fix.
John McQueen says:

February 9, 2006 at 1:49 pm

I just found this bug myself and got around it by booting from a DVD and editing the /etc/vfstab and /etc/system files so that the root device wasn’t a mirror.
I have just installed the latest patches (as of
9 Feb 2006). Has this problem been fixed yet?
Michael Pye says:

February 16, 2006 at 9:24 am

It seems this is fixed but still no patch for solaris 10 available.
Bug ID 6215065 now reports “fix delivered” and integrated into build s10u2_03 but no “fixed by patch” entry.
Michael Pye says:

April 7, 2006 at 6:53 am

It appears this is now fixed in patch:
120254-02
for sol 10 sparc. I will be investigating to see if it is fixed and report back if it is.
Werner Dworaczek says:

July 14, 2006 at 10:35 pm

Hi Jerry,
unfortunately I have set up two X2100 production server with a S10 two disk RAID-1 configuration which includes mirroring the root filesystem. I already had set up “nologging” for /, but I still does not know how to carefully handle the case of a disk failure.
Wouldn’t it makes sense to just unmirror at least root filesystem before a disk failure occurs? What about regularly rsyncing / on both disks with each other to keep the root filesystem in sync if unmirroring is recommendable? If one disk fails you should be able to boot at least from the other one and after disk replacement user data still should be resynced automatically.
Let me ask a last question regarding the metadb quorum stuff. What’s the problem with storing just additional database state replicas on an USB stick. If a disk fails in a two disk RAID-1 configuration you still should have more that 50% of the metadbs available. By the way: Is it feasable to add metadbs to a running setup?
I am using S10 03/05 on one machine and S10 01/06 on the other one. Because I have gotten the recommendation not to put three or more metadbs on each disk on an x86 system, I have configured RAID-1 with only two metdbs each.
Anthony says:

October 30, 2007 at 12:06 pm

Unfortunately it would seem that this issue is *still* not fixed as of x86 Sol10u3 (11/06) with recent patches.
Host: x4100 with two internal SAS disks. One of the disks has gone screwy, and the system wedged. When I reset and try to boot (even with -s tacked onto multiboot for single-user) I get:
"WARNING: Error writing ufs log state
WARNING: ufs log for / changed state to Error
WARNING: Please umount(1M) / and run fsck(1M) "
I’m trying to netboot off a jumpstart server to get a prompt so I can mount without UFS logging and deal with the metadb replicas but not having much luck so far.
I’ve submitted a Sunsolve case regarding this bug but haven’t yet gotten anyone to acknowledge it. The closest is a claim that I need to have an altbootpath defined in bootenv.rc but it doesn’t seem to be as though that’s relevant here.
It does seem, though, that we should disable the default UFS logging on /, which I really don’t want to have to do (and shouldn’t have to, especially for a bug that’s gone unfixed for 2.5 years).

Jerry Jelinek's blog

Solaris Volume Manager root mirror problems on S10

17 Responses

Recent Posts

Joyent. Wow!

An End and a Beginning

solaris10 branded zones on OpenSolaris

Community One Slides

Free Community One Deep Dive

Running Solaris 10 on OpenSolaris

Archives

Archives