Here is a fairly robust way to ensure a drive safe to put into service. I have tested this before and caught drives that would have failed shortly after put into prod, and some that would of after it was more than half full.

  1. Check S.M.A.R.T Info: Confirm no (0) Seek Error Rate, Read Error Rate, Reallocated Sector Count, Uncorrectable Sector Count

  2. Run Short S.M.A.R.T test

  3. Repeat Step 1

  4. Run Conveyance S.M.A.R.T test

  5. Repeat Step 1

  6. Run Destructive Badblocks test (read and write)

  7. Repeat Step 1

  8. Perform a FULL Format (Overwrite with Zeros)

  9. Repeat Step 1

  10. Run Extended S.M.A.R.T test

  11. Repeat Step 1

Return the drive if either of the following is true:

A) The formatting speed drops below 80MB/s by more than 10MB/s (my defective one was ~40MB/s from first power-on)

B) The S.M.A.R.T tests show error count increasing at any step

It is also highly advisable to stagger the testing (and repeat some) if you plan on using multiple drives in a pool/raid config. This way the wear on the drives differ, to reduce the likelihood of them failing at the same time. For example, I re-ran either the Full format or badblocks test on some of the drives so some drives have 48 hours of testing, some have 72, some have 96. This way, the chances of a multiple drive failures during rebuild is lower.

  • kon_dev@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I guess having a backup and error correcting file systems like ZFS or BTRFS will help you more long term. Sure, watch fir Smart values, but imho don’t go over board with tests. I do a extended smart test, rebuild/extend my RAID, check a quick smart test again and that’s it. Drives can die at any time, even if they were fine after a long test cycle. The 3-2-1 rule should save you from data loss.