Hardware Raid

3ware Replacing A Degraded Drive With tw_cli

Repairing a RAID can be a frightening task if you aren’t comfortable in what you are doing. It also can be difficult to find a reliable, clear, and straight forward article about how the process is done. Working as a technician in a Data Center we are replacing drives in RAID arrays regularly. This is why I have written this article to provide you with a very simple broken down version of the commands you will need to know to replace a failed drive in the RAID and how you use them to complete this task.

tw_cli info

 

This command will show you controller information for the box.

 

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c0    9690SA-4I    4         4          1       0       1       1     -

tw_cli info c0

 

This command will show you the array information(unit) and will show you a list of all the hard drives that are connected to that controller. (You will notice that the previous command listed will give you the controller number)

 

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   OK             -       -       256K    549.459   RiW    OFF
 
 
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   279.39 GB SAS   0   -            SEAGATE ST3300657SS
p1    OK             u0   279.39 GB SAS   1   -            SEAGATE ST3300657SS
p2    OK             u0   279.39 GB SAS   2   -            SEAGATE ST3300657SS
p3    OK             u0   279.39 GB SAS   3   -            SEAGATE ST3300657SS
 

 

tw_cli info c0

 

Using this command when a hard drive in the array is degraded will give us the information we will need to proceed. It will show us which member of the array is currently in need of being replaced.

 

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   DEGRADED       -       -       256K    549.459   RiW    OFF
 
 
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   279.39 GB SAS   0   -            SEAGATE ST3300657SS
p1    OK             u0   279.39 GB SAS   1   -            SEAGATE ST3300657SS
p2    DEGRADED       u0   279.39 GB SAS   2   -            SEAGATE ST3300657SS
p3    OK             u0   279.39 GB SAS   3   -            SEAGATE ST3300657SS

 

Notice that it specifically says degraded in the status for the disk labelled as “p2” this is the drive we will want to remove.

 

tw_cli maint remove c0 p2

 

This command is going to remove that degraded disk from the array, once you have done this you should ideally run the “tw_cli info c0” command again and get an output like this:

 

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   DEGRADED       -       -       256K    549.459   RiW    OFF
 
 
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   279.39 GB SAS   0   -            SEAGATE ST3300657SS
p1    OK             u0   279.39 GB SAS   1   -            SEAGATE ST3300657SS
p3    OK             u0   279.39 GB SAS   3   -            SEAGATE ST3300657SS

 

Now remove that drive which you have removed from the array from the machine. If the machine is currently handling traffic all the LED’s of the drives that are currently still functioning in the array should be blinking. This is the best indicator for which drive is the one to remove. If this is not the case and none of the LED’s are blinking my best suggestion is to just plan a downtime to take the server offline and physically trace the cables. This will save you the headache of lost data. Another good indicator may be if you can see the serial numbers from the front view of your server. When in doubt your best bet is always the short downtime to be 100% sure. Once you have replaced that drive with a new one we can proceed.

 

tw_cli maint rescan

 

This command will rescan the controller and will look for new devices on the controller, this should detect the new drive that you just added in. The output should look similar:

 

Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p2].
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   DEGRADED       -       -       256K    549.459   RiW    OFF
 
 
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   279.39 GB SAS   0   -            SEAGATE ST3300657SS
p1    OK             u0   279.39 GB SAS   1   -            SEAGATE ST3300657SS
p2    OK             -    279.39 GB SAS   2   -            SEAGATE ST3300657SS
p3    OK             u0   279.39 GB SAS   3   -            SEAGATE ST3300657SS
 

 

Now you should see that your new drive has been detected. We now must add it to the unit so it can start to rebuild back to “OK” status.

 

tw_cli maint rebuild c0 u0 p2

 

This command will now start the rebuild process on the drive, which for this example was “p2”. You should see an output similar to this:

 

Sending rebuild start request to /c0/u0 on 1 disk(s) [2] ... Done.
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   REBUILDING     50%     -       256K    549.459   RiW    OFF
 
 
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   279.39 GB SAS   0   -            SEAGATE ST3300657SS
p1    OK             u0   279.39 GB SAS   1   -            SEAGATE ST3300657SS
p2    DEGRADED       u0   279.39 GB SAS   2   -            SEAGATE ST3300657SS
p3    OK             u0   279.39 GB SAS   3   -            SEAGATE ST3300657SS

(Rebuild processes for RAID 10 will begin at 50%) There you have it you have now replaced the bad drive and it is just a matter of time until it rebuilds. Keep an eye on this and monitor the rebuild process. Once it is complete it should return into “OK” status.

 

Of course this method is best and quickest. It causes you to have no downtime which in most cases is ideal. If you ever are not 100% certain about having the drive or not I HIGHLY suggest scheduling the 10-15 minutes of downtime it will take to make sure you will not pull the wrong drive and cause a loss of your data.

Related Articles

  • Get CPU Information

    You can use the dmesg utility to display the contents of the system message buffer when FreeBSD comes up. For accuracy I recommend querying /var/run/dmesg.boot file. Usually a snapshot of...