Discussion:
Upgrading raidz2 set, one replacement causes kernel panic
(too old to reply)
AZ Nomad
2010-06-01 14:50:14 UTC
Permalink
I'm in the process of upgrading a raidz2 pool comprised of seven 500G drives.
I'm replacing them one at a time with 1TB drives and resilvering.

So far so good. #5 was a WD green w/ "adv formating" that was so slow, that
I've sent it back, to be replaced with a drive that doesn't have such a
"feature". Grabbed the next disk from the replacement set and it installed and
resilvered without problems.

The bigger problem is with my latest replacement. This is #6 (but the last
drive I have on hand until the WD replacement arrives.)

This drive is a "Samsung EcoGreen F2 HD103SI 1TB 5400 RPM 32MB Cache"
I've already installed one of them without any problem.

When I installed this one, the system crashed when I did a "zpool replace..."
It would then crash on reboot.
Unplugging the drive permitted me to boot.
I did a "zpool detach" and then was able to reboot with the drive connected,
but it crashed again when I attempted a "zpool replace".

The solaris version is opensolaris sol-nv-b123 running on an amd64.

Any ideas, or should I simply send it back for replacement?
Andrew Gabriel
2010-06-01 16:57:29 UTC
Permalink
Post by AZ Nomad
I'm in the process of upgrading a raidz2 pool comprised of seven 500G drives.
I'm replacing them one at a time with 1TB drives and resilvering.
So far so good. #5 was a WD green w/ "adv formating" that was so slow, that
I've sent it back, to be replaced with a drive that doesn't have such a
"feature". Grabbed the next disk from the replacement set and it installed and
resilvered without problems.
The bigger problem is with my latest replacement. This is #6 (but the last
drive I have on hand until the WD replacement arrives.)
This drive is a "Samsung EcoGreen F2 HD103SI 1TB 5400 RPM 32MB Cache"
I've already installed one of them without any problem.
When I installed this one, the system crashed when I did a "zpool replace..."
What was the crash?
Post by AZ Nomad
It would then crash on reboot.
What was the crash?
Post by AZ Nomad
Unplugging the drive permitted me to boot.
I did a "zpool detach" and then was able to reboot with the drive connected,
but it crashed again when I attempted a "zpool replace".
The solaris version is opensolaris sol-nv-b123 running on an amd64.
Any ideas, or should I simply send it back for replacement?
Have you tried accessing it on the system some other way, such as
labeling it and formatting for ufs and seeing if the drive actually
works?
--
Andrew Gabriel
[email address is not usable -- followup in the newsgroup]
AZ Nomad
2010-06-01 18:21:12 UTC
Permalink
Post by Andrew Gabriel
Post by AZ Nomad
I'm in the process of upgrading a raidz2 pool comprised of seven 500G drives.
I'm replacing them one at a time with 1TB drives and resilvering.
So far so good. #5 was a WD green w/ "adv formating" that was so slow, that
I've sent it back, to be replaced with a drive that doesn't have such a
"feature". Grabbed the next disk from the replacement set and it installed and
resilvered without problems.
The bigger problem is with my latest replacement. This is #6 (but the last
drive I have on hand until the WD replacement arrives.)
This drive is a "Samsung EcoGreen F2 HD103SI 1TB 5400 RPM 32MB Cache"
I've already installed one of them without any problem.
When I installed this one, the system crashed when I did a "zpool replace..."
What was the crash?
Unfortunately, I'm a real solaris newbie. Where should I look to find
a log? I saw console messages that it was writing the thing out, but
most of the information scrolled by so fast I didn't catch any of it.



...
Post by Andrew Gabriel
Have you tried accessing it on the system some other way, such as
labeling it and formatting for ufs and seeing if the drive actually
works?
I'll try again tonight. Right now the drive is unplugged.
AZ Nomad
2010-06-02 00:18:15 UTC
Permalink
Post by Andrew Gabriel
Have you tried accessing it on the system some other way, such as
labeling it and formatting for ufs and seeing if the drive actually
works?
Maybe it's an incompatibility between the drive and the sil3132 sata controller
I'm using. The other samsung is on a motherboard (nforce4) sata controller.





format> format
Ready to format. Formatting cannot be interrupted.
Continue? yes
Beginning format. The current time is Tue Jun 1 14:39:45 2010

Formatting...
Format failed

Retry of formatting operation without any of the standard
mode selects and ignoring disk's Grown Defects list. The
disk may be able to be reformatted this way if an earlier
formatting operation was interrupted by a power failure or
SCSI bus reset. The Grown Defects list will be recreated
by format verification and surface analysis.

Retry format without mode selects and Grown Defects list? yes
Formatting...
Illegal request during format: block 0 (0x0) (0)
ASC: 0x20 ASCQ: 0x0
Illegal request during format: block 0 (0x0) (0)
ASC: 0x20 ASCQ: 0x0
failed
format> quit
AZ Nomad
2010-06-02 02:04:08 UTC
Permalink
Post by AZ Nomad
Post by Andrew Gabriel
Have you tried accessing it on the system some other way, such as
labeling it and formatting for ufs and seeing if the drive actually
works?
Maybe it's an incompatibility between the drive and the sil3132 sata controller
I'm using. The other samsung is on a motherboard (nforce4) sata controller.
That's confirmed. I switched the samsung to a spare motherboard SATA port and
am able to use the drive now. Resilvering in progress without crashing.

It's a little worrying, but if it works without errors, then I'm ok for now.
Sometime between now and when the new array is in need of larger drives,
I'll spring for a better SATA controller and a hotswap chassis. Right now I'm
too damn cheap (and broke)

In the meantime, everything that's important gets backed up to removable hard
drives and optical media, some offsite, and the bulk are video files that I can
rerip off DVD.
h***@bofh.ca
2010-06-02 13:25:27 UTC
Permalink
Post by AZ Nomad
It's a little worrying, but if it works without errors, then I'm ok for now.
Sometime between now and when the new array is in need of larger drives,
I'll spring for a better SATA controller and a hotswap chassis. Right now I'm
too damn cheap (and broke)
You said you're running b123... is it possible to give b134 a try?
--
Brandon Hume - hume -> BOFH.Ca, http://WWW.BOFH.Ca/
AZ Nomad
2010-06-02 15:08:58 UTC
Permalink
Post by h***@bofh.ca
Post by AZ Nomad
It's a little worrying, but if it works without errors, then I'm ok for now.
Sometime between now and when the new array is in need of larger drives,
I'll spring for a better SATA controller and a hotswap chassis. Right now I'm
too damn cheap (and broke)
You said you're running b123... is it possible to give b134 a try?
all I can find lately when I visit opensolaris.org / download is
the live cd and installer (LiveCD 2009.06). b123 is from october and it didn't
make a whole lot of sense to revert to something released 4 months prior.

I rummaged around the new open solaris site and decided I had better things
to do with my time.

Where is a link to b134 and the group releasing them? Those were the "community
edition", right?
Chris Ridd
2010-06-02 16:12:10 UTC
Permalink
Post by AZ Nomad
Post by h***@bofh.ca
Post by AZ Nomad
It's a little worrying, but if it works without errors, then I'm ok for now.
Sometime between now and when the new array is in need of larger drives,
I'll spring for a better SATA controller and a hotswap chassis. Right now I'm
too damn cheap (and broke)
You said you're running b123... is it possible to give b134 a try?
all I can find lately when I visit opensolaris.org / download is
the live cd and installer (LiveCD 2009.06). b123 is from october and it didn't
make a whole lot of sense to revert to something released 4 months prior.
I rummaged around the new open solaris site and decided I had better things
to do with my time.
Where is a link to b134 and the group releasing them? Those were the "community
edition", right?
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.

The community editions are quite different things. And deader.
--
Chris
AZ Nomad
2010-06-02 16:58:41 UTC
Permalink
Post by Chris Ridd
Post by AZ Nomad
Post by h***@bofh.ca
Post by AZ Nomad
It's a little worrying, but if it works without errors, then I'm ok for now.
Sometime between now and when the new array is in need of larger drives,
I'll spring for a better SATA controller and a hotswap chassis. Right now I'm
too damn cheap (and broke)
You said you're running b123... is it possible to give b134 a try?
all I can find lately when I visit opensolaris.org / download is
the live cd and installer (LiveCD 2009.06). b123 is from october and it didn't
make a whole lot of sense to revert to something released 4 months prior.
I rummaged around the new open solaris site and decided I had better things
to do with my time.
Where is a link to b134 and the group releasing them? Those were the "community
edition", right?
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.
The community editions are quite different things. And deader.
thanks. FYI: moving the samsung to a motherboard controller didn't help much.
It worked for about 4% of the resilver then vanished. #$%^&* "green" drives.
In the future I won't buy drives with "green" anywhere in the name.
h***@bofh.ca
2010-06-02 19:01:19 UTC
Permalink
Post by Chris Ridd
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.
What this means is that you'll probably have to use 'pkg' to add the
dev repository before doing the upgrade (there's guides online for how
to do that).

Also, there was a major renaming of all the packages circa b130 or so.
Going from 123 to 134 will likely fail, as I ran into the problem myself.
Unfortunately, I can't find the post describing how to avoid it.

But, there's a guide to how to update to a specific version here:

http://opensolaris.org/jive/thread.jspa?messageID=452103&#452103

(Last post)

As always, keep your current boot environment available so you can
back out if it goes badly, and certainly don't do a "zpool upgrade"
until you're sure the new environment is stable.
--
Brandon Hume - hume -> BOFH.Ca, http://WWW.BOFH.Ca/
Chris Ridd
2010-06-02 19:19:11 UTC
Permalink
Post by h***@bofh.ca
Post by Chris Ridd
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.
What this means is that you'll probably have to use 'pkg' to add the
dev repository before doing the upgrade (there's guides online for how
to do that).
Yep. I posted in a bit of a rush :-(
Post by h***@bofh.ca
Also, there was a major renaming of all the packages circa b130 or so.
Going from 123 to 134 will likely fail, as I ran into the problem myself.
Unfortunately, I can't find the post describing how to avoid it.
The first problem that I recall (128a to 129) was that the
repository/publisher names had to be corrected, which caused some quite
alarming looking errors. Advice given on pkg-discuss was to use
image-update -f (actually first to try -nvf) which avoids the version
check on the SUNWipkg package.

The other major problem was that the package renaming in the early 130s
took an absolute shedload of memory.
Post by h***@bofh.ca
http://opensolaris.org/jive/thread.jspa?messageID=452103&#452103
(Last post)
As always, keep your current boot environment available so you can
back out if it goes badly, and certainly don't do a "zpool upgrade"
until you're sure the new environment is stable.
Also IMO only upgrade the pool when you've got a bootable CD which
supports that pool version.
--
Chris
AZ Nomad
2010-06-02 21:02:27 UTC
Permalink
Post by h***@bofh.ca
Post by Chris Ridd
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.
What this means is that you'll probably have to use 'pkg' to add the
dev repository before doing the upgrade (there's guides online for how
to do that).
Of course there's no pkg in b123. :-p

-bash-3.2$ ls /usr/bin/pkg*
/usr/bin/pkg-config /usr/bin/pkgcond /usr/bin/pkgparam
/usr/bin/pkg-get /usr/bin/pkgdata /usr/bin/pkgproto
/usr/bin/pkg2du /usr/bin/pkginfo /usr/bin/pkgtrans
/usr/bin/pkgadm /usr/bin/pkgmk
-bash-3.2$

I think I'll stick with plan A and wait for the next opensolaris?
Chris Ridd
2010-06-03 05:25:29 UTC
Permalink
Post by AZ Nomad
Post by h***@bofh.ca
Post by Chris Ridd
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.
What this means is that you'll probably have to use 'pkg' to add the
dev repository before doing the upgrade (there's guides online for how
to do that).
Of course there's no pkg in b123. :-p
-bash-3.2$ ls /usr/bin/pkg*
/usr/bin/pkg-config /usr/bin/pkgcond /usr/bin/pkgparam
/usr/bin/pkg-get /usr/bin/pkgdata /usr/bin/pkgproto
/usr/bin/pkg2du /usr/bin/pkginfo /usr/bin/pkgtrans
/usr/bin/pkgadm /usr/bin/pkgmk
-bash-3.2$
Try looking in /usr/sbin.
--
Chris
Chris Ridd
2010-06-03 09:13:23 UTC
Permalink
Post by Chris Ridd
Post by AZ Nomad
Post by h***@bofh.ca
Post by Chris Ridd
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.
What this means is that you'll probably have to use 'pkg' to add the
dev repository before doing the upgrade (there's guides online for how
to do that).
Of course there's no pkg in b123. :-p
-bash-3.2$ ls /usr/bin/pkg*
/usr/bin/pkg-config /usr/bin/pkgcond /usr/bin/pkgparam
/usr/bin/pkg-get /usr/bin/pkgdata /usr/bin/pkgproto
/usr/bin/pkg2du /usr/bin/pkginfo /usr/bin/pkgtrans
/usr/bin/pkgadm /usr/bin/pkgmk
-bash-3.2$
Try looking in /usr/sbin.
Sorry, pkg is in /usr/bin after all. I don't think you can have an
OpenSolaris system without /usr/bin/pkg, so perhaps you have Solaris
Express or something instead? What does /etc/release say?
--
Chris
AZ Nomad
2010-06-03 14:57:44 UTC
Permalink
Post by Chris Ridd
Post by Chris Ridd
Post by AZ Nomad
Post by h***@bofh.ca
Post by Chris Ridd
No, build 134 is the version from the "dev" repository, which you can
'pkg image-update' to. You can also get a b134 live CD via genunix.org.
What this means is that you'll probably have to use 'pkg' to add the
dev repository before doing the upgrade (there's guides online for how
to do that).
Of course there's no pkg in b123. :-p
-bash-3.2$ ls /usr/bin/pkg*
/usr/bin/pkg-config /usr/bin/pkgcond /usr/bin/pkgparam
/usr/bin/pkg-get /usr/bin/pkgdata /usr/bin/pkgproto
/usr/bin/pkg2du /usr/bin/pkginfo /usr/bin/pkgtrans
/usr/bin/pkgadm /usr/bin/pkgmk
-bash-3.2$
Try looking in /usr/sbin.
/usr/sbin/pkgadd /usr/sbin/pkgask /usr/sbin/pkgchk /usr/sbin/pkgrm
Post by Chris Ridd
Sorry, pkg is in /usr/bin after all. I don't think you can have an
OpenSolaris system without /usr/bin/pkg, so perhaps you have Solaris
Express or something instead? What does /etc/release say?
yup

Solaris Express Community Edition snv_123 X86
Copyright 2009 Sun Microsystems, Inc.
All Rights Reserved.
Use is subject to license terms.
Assembled 09 September 2009
-
h***@bofh.ca
2010-06-03 15:06:27 UTC
Permalink
Post by AZ Nomad
Solaris Express Community Edition snv_123 X86
Copyright 2009 Sun Microsystems, Inc.
Okay, then yeah, you're stuck where you are. I would wait until the next
release, as you've already said. Chances are you're going to have to
flatten the box and reinstall anyway.

However, it might be worth your while to download one of the latest
live CDs from genunix.org, and just try to format your test drive using
it (without installing), if only to determine whether the future upgrade
will fix your problem.
--
Brandon Hume - hume -> BOFH.Ca, http://WWW.BOFH.Ca/
AZ Nomad
2010-06-03 15:23:49 UTC
Permalink
Post by h***@bofh.ca
Post by AZ Nomad
Solaris Express Community Edition snv_123 X86
Copyright 2009 Sun Microsystems, Inc.
Okay, then yeah, you're stuck where you are. I would wait until the next
release, as you've already said. Chances are you're going to have to
flatten the box and reinstall anyway.
However, it might be worth your while to download one of the latest
live CDs from genunix.org, and just try to format your test drive using
it (without installing), if only to determine whether the future upgrade
will fix your problem.
My problem is hardware. One drive kindof worked, but used WD's
"advanced format" which mean that the drive moves 4K on 512byte
requests and runs predictably at 1/8th the performance. The other
drive caused the kernel panic and is simply a bad drive. It fails on
two different SATA interfaces as well as doing UFS.

Kernel panics are good thing in such a situation.

Plan is to replace and resilver them, and wait for the next release.
I don't mind about doing a full reinstall as it is just for a file
server.

Loading...