« | »

2013.04.09

RAID再々構築

■またまたエラー
 サーバのRAIDから1年半ぶりのエラーが出てきた。

件名:Fail event on /dev/md2:hoge
This is an automatically generated mail message from mdadm
running on hoge
A Fail event had been detected on md device /dev/md2.
It could be related to component device /dev/sda1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1]
md2 : active raid1 sda1[0](F) sdb1[1]
4094968 blocks super 1.1 [2/1] [_U]
md1 : active raid1 sda2[0] sdb2[1]
1023988 blocks super 1.0 [2/2] [UU]
md0 : active raid1 sda3[0](F) sdb3[1]
971639676 blocks super 1.1 [2/1] [_U]
bitmap: 4/8 pages [16KB], 65536KB chunk
unused devices:
 
どうやらエラーらしい。というわけで、前回の履歴をもとにごそごそをしてみる。

とりあえず現状をば。
# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda1[0](F) sdb1[1]
4094968 blocks super 1.1 [2/1] [_U]

md1 : active raid1 sda2[0](F) sdb2[1]
1023988 blocks super 1.0 [2/1] [_U]

md0 : active raid1 sda3[0](F) sdb3[1]
971639676 blocks super 1.1 [2/1] [_U]
bitmap: 7/8 pages [28KB], 65536KB chunk

unused devices: [none]
前回と違いRAIDの片割れが認識はされているがfailになってる。
 
まずはmd0の詳細を。
# mdadm – -query /dev/md0
/dev/md0: 926.63GiB raid1 2 devices, 0 spares. Use mdadm – -detail for more detail.
# mdadm – -detail /dev/md0
/dev/md0:
Version : 1.1
Creation Time : Wed Jun 9 22:12:15 2010
Raid Level : raid1
Array Size : 971639676 (926.63 GiB 994.96 GB)
Used Dev Size : 971639676 (926.63 GiB 994.96 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Apr 9 23:03:17 2013
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

Name : localhost.localdomain:2
UUID : c2acb682:8308f70d:e5c0eda4:ce8530d8
Events : 4048082

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 19 1 active sync /dev/sdb3

0 8 3 – faulty spare /dev/sda3
 
よく分からんが次にエラーチェック。
# badblocks -vs -o sda1.bad /dev/sda1
Checking blocks 0 to 4095999
Checking for bad blocks (read-only test): done
Pass completed, 4096000 bad blocks found.
不良セクター出現。やばい。
# badblocks -vs -o sda2.bad /dev/sda2
Checking blocks 0 to 1023999
Checking for bad blocks (read-only test): done
Pass completed, 1024000 bad blocks found.
ほかのパーティションでも出た。一番でかいmd0は長そうなのでパス。
 
不良セクタが出た以上、sdaをRAIDからはずす。
# mdadm /dev/md0 -r sda3
mdadm: hot removed sda3 from /dev/md0
# mdadm /dev/md1 -r sda2
mdadm: hot removed sda2 from /dev/md1
# mdadm /dev/md2 -r sda1
mdadm: hot removed sda1 from /dev/md2
 
はずしてから現状を見ると
# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb1[1]
4094968 blocks super 1.1 [2/1] [_U]

md1 : active raid1 sdb2[1]
1023988 blocks super 1.0 [2/1] [_U]

md0 : active raid1 sdb3[1]
971639676 blocks super 1.1 [2/1] [_U]
bitmap: 7/8 pages [28KB], 65536KB chunk

unused devices:[none]
sdaがRAIDからなくなった。

ここまでで一度rebootを実行。
再起動後に再度badblocksを実行してみる。今はここまで。

Comment & Trackback

Comments and Trackback are closed.

No comments.