In a past career I spent years working with engineers and hard drive manufacturers to come up with a better way to determine if a hard drive is going bad. Finally after thinking about creating my own software to do so I found someone else that wrote a great utility that covers everything I want.
The utility is SMARTUtility, this small but powerful utility uses the smartmon open source project to perform some quick magic. The utility looks at what I call the key SMART attributes and has logic to determine if the drive is healthy or needs replacing. This logic to determine good or bad is great because it predicts drive failure before the manufactures definition of a SMART failure occurs. This awesome utility also shows you the raw drive attributes and values. The author has some great supporting documentation on the attributes and definitions as well.
A bit of background, most software utilities look at a drives SMART value to determine if a drive is good or bad. While that is a step in the right direction, SMART failure criteria varies from vendor to vendor and most drive manufactures do not want to set overly aggresive thresholds due to high warranty costs.
A hard drive has a table of attributes with defined thresholds to determine if the SMART test has passed or failed. In a factory environment,I believe that once a drive is installed in a product and thoroughly tested key attributes in this table should not change. If a company has the resources to do a full write, full read pass, and a random read/write test of a drive and then check the drive attributes that is the best case. Most computer and storage manufactures do not have the manufacturing time to do such a test.
This past weekend our iMac would not boot. After dropping to single user mode and seeing FSCK fail I quickly booted from a backup firewire drive and ran Mac OS X Disk Utility,which also failed. I then booted the Tech Tool Deluxe DVD and noticed the full surface scan failing.
Based on my past experience I really desired a quick way to see the HDD SMART attributes to see what the drive firmware is registering. I did a quick search and found SMARTUtility. After a quick Mac install and launch I saw that my HDD was over temp at some point and over 30 sectors have been reallocated. I knew that a 250GB drive can have up to ~2000+ reallocated sectors before SMART trips. So, only seeing 30 led me to believe that I have some time, maybe seconds or years, but sometime before I start to panic that the drive needs to be replaced NOW.
My next step to determine if the drive was quickly failing was to do a full write pass, and then a full read pass on the drive. This would cause the drive to reallocate bad sectors and also tell me if more bad sectors were reallocated. Luckily the reallocation count did not grow after performing the full write pass using disk utility followed by a full read pass using the Tech Tool Deluxe DVD. To me, this meant the drive, while bad, is healthy enough to continue.
If I had a spare drive I would have replaced it right away but we had important work to be done that weekend. After my testing and a Time Machine restore we were able to get our weekend work done with only loosing one day of work. I was able to do this only because of SMARTUtility. This is a great app and I highly recommend it.
This is my experience and I will be replacing this drive as soon as I ship the iMac to Apple for a drive swap for free thanks to APP. I am not recommending to others that a drive that has been over-temp and has “x” number of bad sectors is good enough to run production on. I truly believe and highly recommend that if SMARTUtility says the drive is failing swap it out as soon as you can.
Have a similar experience? Let me know..