Here is a fairly robust way to ensure a drive safe to put into service. I have tested this before and caught drives that would have failed shortly after put into prod, and some that would of after it was more than half full.
-
Check S.M.A.R.T Info: Confirm no (0) Seek Error Rate, Read Error Rate, Reallocated Sector Count, Uncorrectable Sector Count
-
Run Short S.M.A.R.T test
-
Repeat Step 1
-
Run Conveyance S.M.A.R.T test
-
Repeat Step 1
-
Run Destructive Badblocks test (read and write)
-
Repeat Step 1
-
Perform a FULL Format (Overwrite with Zeros)
-
Repeat Step 1
-
Run Extended S.M.A.R.T test
-
Repeat Step 1
Return the drive if either of the following is true:
A) The formatting speed drops below 80MB/s by more than 10MB/s (my defective one was ~40MB/s from first power-on)
B) The S.M.A.R.T tests show error count increasing at any step
It is also highly advisable to stagger the testing (and repeat some) if you plan on using multiple drives in a pool/raid config. This way the wear on the drives differ, to reduce the likelihood of them failing at the same time. For example, I re-ran either the Full format or badblocks test on some of the drives so some drives have 48 hours of testing, some have 72, some have 96. This way, the chances of a multiple drive failures during rebuild is lower.
Jeez you’re buring through so much of the drive’s lifespan just checking the damn thing. If a failed drive will cause problems worthy of this amount of burn-in time you need a more robust setup.
I run all used ebay drives. Except for a glance at the smart data before addng them to the array I don’t test them at all. Just keep an extra drive or two on hand as spares. Life’s easier when you plan for failure instead of fighting it.
Same, except I also use Scrutiny to flag drives for my attention. It makes educated guesses for a pass/fail mark, using analysis of vendor-specific interpretations of SMART values, matched against the failure thresholds from the BackBlaze survey. It can tell you things like “the current value for the Command Timeout attribute for this drive falls into the 1-10% bracket of probability of failure according to BackBlaze”.
It helps me to plan ahead. If for example I have 3 drives that Scrutiny says “smell funny” it would be nice if I had 2-3 spares on hand rather than just 1. Or if two of those drives happen to be together in a 2-pair mirror perhaps I can swap one somewhere else.
I didn’t know all of this until today. I just plug in and use 😅.
Way Overkill.
Single pass read (SMART test is fine) and single pass write (ones, zeros, random, whatever you want) is more than adequate to determine any issues a new disk may have out of the gate, unless you want to isolate a fringe case condition and waste time and wear on your hard drive doing so.
I do it the other way around: first write (zero wipe), then read (SMART long test). Served me well for many disks. :)
For real. I suppose if you kept one single copy of the drive you’d want to really, really make sure? But then again why would you keep one copy of anything?
TLDR: smart is smort enuf
What program are you using to run those tests? Is it usable on windows 10? Thanks for putting your guide up!
I just full format and check smart, seems like a lot of work for each drive…
Most I would do is a write test followed by a read test, and then check the smart counters
I just plug it in and glance over smart data.
A single full read, and full write test should be plenty. Drives tend to fail really early on or don’t fail at all until eol
Question: How do you monitor format speed?
I guess having a backup and error correcting file systems like ZFS or BTRFS will help you more long term. Sure, watch fir Smart values, but imho don’t go over board with tests. I do a extended smart test, rebuild/extend my RAID, check a quick smart test again and that’s it. Drives can die at any time, even if they were fine after a long test cycle. The 3-2-1 rule should save you from data loss.
How about if I have already filled the new hard drive (still have the data on the source drives) and just want to make sure all of it is readable (before erasing the data from the source drive), without having to copy all the data from the new drive ?
Seek Error Rate and Read Error Rate can’t be zero.
Yeah I was under the impression these two attributes vary so wildly between vendors that they’re basically void of meaning by now.
But why do all this if using raid with hot spare? If a new drive fails, just replace it once detected that it failed?