Contento de haber recibido, hace rato(más de un año), el nuevo gabinete para realizar hotswap nunca había realizado dicha acción. Hoy pequé no siguiendo una de las frase de IT: Si funciona, NO lo toques
.
Como se que la desgracia se sienta al lado nuestro en tiempos que todo explota, pase a probar.
Entorno:
Tengo un fileserver y lamentablemente me quedo un RAID1 con un solo disco. Como lo utilizo como Papelera de Reciclaje de las cuentas de Samba honestamente no me interesa mucho lo que se guarde ahí más sí se rompe no sería grabe el daño.
El estado actual es:
root@server:~# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdf[1] 976761424 blocks super 1.2 [2/1] [_U]
Así fueron los pasos a seguir:
- Desmonto la unidad:
umount /mnt/md0/
- Hago un stop al raid:
mdadm --manage /dev/md0 --stop
- Ponemos el disco rígido en standby:
hdparm -Y /dev/sdf
- Y lo sacamos en caliente.
Leí por ahí que debemos:
Como no hice esto, revise los logs:
Dec 2 12:51:45 server kernel: [793838.144423] ata6: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen Dec 2 12:51:45 server kernel: [793838.144481] ata6: irq_stat 0x00400040, connection status changed Dec 2 12:51:45 server kernel: [793838.144528] ata6: SError: { PHYRdyChg 10B8B DevExch } Dec 2 12:51:45 server kernel: [793838.144572] ata6: hard resetting link Dec 2 12:51:46 server kernel: [793838.863111] ata6: SATA link down (SStatus 0 SControl 300) Dec 2 12:51:51 server kernel: [793843.852521] ata6: hard resetting link Dec 2 12:51:51 server kernel: [793844.171849] ata6: SATA link down (SStatus 0 SControl 300) Dec 2 12:51:51 server kernel: [793844.171860] ata6: limiting SATA link speed to 1.5 Gbps Dec 2 12:51:56 server kernel: [793849.161272] ata6: hard resetting link Dec 2 12:51:57 server kernel: [793849.480594] ata6: SATA link down (SStatus 0 SControl 310) Dec 2 12:51:57 server kernel: [793849.480603] ata6.00: disabled Dec 2 12:51:57 server kernel: [793849.480614] ata6: EH complete Dec 2 12:51:57 server kernel: [793849.480623] ata6.00: detaching (SCSI 5:0:0:0) Dec 2 12:51:57 server kernel: [793849.480940] sd 5:0:0:0: [sdf] Synchronizing SCSI cache Dec 2 12:51:57 server kernel: [793849.480977] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Dec 2 12:51:57 server kernel: [793849.480981] sd 5:0:0:0: [sdf] Stopping disk Dec 2 12:51:57 server kernel: [793849.480988] sd 5:0:0:0: [sdf] START_STOP FAILED Dec 2 12:51:57 server kernel: [793849.480990] sd 5:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Como tengo mucha curiosidad probé los pasos:
- Desmonto la unidad:
umount /mnt/md0/
- Hago un stop al raid:
mdadm --manage /dev/md0 --stop
- Ponemos el disco rígido en standby:
hdparm -Y /dev/sdf
- Borramos físicamente del kernel el disco:
echo 1 > /sys/block/sdf/device/delete
- Y lo sacamos en caliente.
Logs:
Dec 2 14:36:15 srv-it kernel: [800094.929215] sd 5:0:0:0: [sdf] Synchronizing SCSI cache Dec 2 14:36:15 srv-it kernel: [800094.929259] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Dec 2 14:36:15 srv-it kernel: [800094.929328] ata6.00: waking up from sleep Dec 2 14:36:15 srv-it kernel: [800094.929367] ata6: hard resetting link Dec 2 14:36:16 srv-it kernel: [800095.245764] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 2 14:36:16 srv-it kernel: [800095.246223] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359) Dec 2 14:36:16 srv-it kernel: [800095.246232] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT5._GTF] (Node ffff8806060c0588), AE_NOT_FOUND (20110623/psparse-536) Dec 2 14:36:16 srv-it kernel: [800095.246854] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359) Dec 2 14:36:16 srv-it kernel: [800095.246860] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT5._GTF] (Node ffff8806060c0588), AE_NOT_FOUND (20110623/psparse-536) Dec 2 14:36:16 srv-it kernel: [800095.247041] ata6.00: configured for UDMA/133 Dec 2 14:36:16 srv-it kernel: [800095.247045] ata6.00: retrying FLUSH 0xea Emask 0x0 Dec 2 14:36:16 srv-it kernel: [800095.247112] ata6: EH complete Dec 2 14:36:16 srv-it kernel: [800095.247186] sd 5:0:0:0: [sdf] Stopping disk Dec 2 14:36:16 srv-it kernel: [800095.247235] sdf: detected capacity change from 0 to 1000204886016 Dec 2 14:36:20 srv-it kernel: [800099.758977] ata6.00: disabled Dec 2 14:36:38 srv-it kernel: [800117.854892] ata6: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen Dec 2 14:36:38 srv-it kernel: [800117.854950] ata6: irq_stat 0x00400040, connection status changed Dec 2 14:36:38 srv-it kernel: [800117.854997] ata6: SError: { PHYRdyChg 10B8B DevExch } Dec 2 14:36:38 srv-it kernel: [800117.855043] ata6: hard resetting link Dec 2 14:36:39 srv-it kernel: [800118.576295] ata6: SATA link down (SStatus 0 SControl 300) Dec 2 14:36:39 srv-it kernel: [800118.576306] ata6: EH complete
Ahora a conectarlo, conectamos el disco y luego ejecutamos:
- Levantamos el raid ya configurado:
mdadm -A /dev/md0
- Verificamos sí se levanto:
cat /proc/mdstat
- En caso de que figure (auto-read-only) ejecutamos:
mdadm --readwrite /dev/md0
- Ya podemos montar la unidad:
mount -a
Finalmente quedaría algo así:
root@server:~# mdadm -A /dev/md0 mdadm: /dev/md0 has been started with 1 drive (out of 2). root@server:~# cat /proc/mdstat Personalities : [raid1] md0 : active (auto-read-only) raid1 sdf[1] 976761424 blocks super 1.2 [2/1] [_U] root@server:~# mdadm --readwrite /dev/md0 root@server:~# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdf[1] 976761424 blocks super 1.2 [2/1] [_U] root@server:~# mount -a root@server:~# dfc FILESYSTEM (=) USED FREE (-) %USED AVAILABLE TOTAL MOUNTED ON /dev/md0 [==------------------] 5% 870.1G 916.9G /mnt/md0 root@server:~#
Log de conexión:
Dec 2 14:50:21 server kernel: [800938.675339] ata6: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen Dec 2 14:50:21 server kernel: [800938.675397] ata6: irq_stat 0x00000040, connection status changed Dec 2 14:50:21 server kernel: [800938.675444] ata6: SError: { CommWake DevExch } Dec 2 14:50:21 server kernel: [800938.675485] ata6: hard resetting link Dec 2 14:50:27 server kernel: [800944.425243] ata6: link is slow to respond, please be patient (ready=0) Dec 2 14:50:28 server kernel: [800945.934056] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 2 14:50:28 server kernel: [800945.950971] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359) Dec 2 14:50:28 server kernel: [800945.950980] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT5._GTF] (Node ffff8806060c0588), AE_NOT_FOUND (20110623/psparse-536) Dec 2 14:50:28 server kernel: [800945.951226] ata6.00: ATA-8: ST1000DM003-9YN162, CC4B, max UDMA/133 Dec 2 14:50:28 server kernel: [800945.951229] ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA Dec 2 14:50:28 server kernel: [800945.951618] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359) Dec 2 14:50:28 server kernel: [800945.951624] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT5._GTF] (Node ffff8806060c0588), AE_NOT_FOUND (20110623/psparse-536) Dec 2 14:50:28 server kernel: [800945.951826] ata6.00: configured for UDMA/133 Dec 2 14:50:28 server kernel: [800945.951832] ata6: EH complete Dec 2 14:50:28 server kernel: [800945.951931] scsi 5:0:0:0: Direct-Access ATA ST1000DM003-9YN1 CC4B PQ: 0 ANSI: 5 Dec 2 14:50:28 server kernel: [800945.952113] sd 5:0:0:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) Dec 2 14:50:28 server kernel: [800945.952116] sd 5:0:0:0: [sdf] 4096-byte physical blocks Dec 2 14:50:28 server kernel: [800945.952158] sd 5:0:0:0: Attached scsi generic sg5 type 0 Dec 2 14:50:28 server kernel: [800945.952240] sd 5:0:0:0: [sdf] Write Protect is off Dec 2 14:50:28 server kernel: [800945.952245] sd 5:0:0:0: [sdf] Mode Sense: 00 3a 00 00 Dec 2 14:50:28 server kernel: [800945.952286] sd 5:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 2 14:50:28 server kernel: [800945.968907] sdf: unknown partition table Dec 2 14:50:28 server kernel: [800945.969128] sd 5:0:0:0: [sdf] Attached SCSI disk
Fuentes:
http://blog.kihltech.com/2012/12/sata-hotswap-drive-in-mdadm-raid-array/