Experiences in the community

Just another WordPress.com weblog

smartmontools for better monitoring

Hi all,
This is going to be a long long post so warning in advance. I know I was supposed to be talking about the game “My Tribe” but that had to hold as there are/were issues to my hdd. Hence had to use smartmontools to understand how my hdd is and then do some workarounds. This post is going to be talking about smartmontools and the kind of outputs one can expect.

First up, some stats

my kernel on which I’m running smartmontools. I’m running on Intrepid (8.10)


$ uname -a
Linux shirish-desktop 2.6.27-9-generic #1 SMP Thu Nov 20 21:57:00 UTC 2008 i686 GNU/Linux

Now output from smartctl


$ smartctl --version
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

smartctl comes with ABSOLUTELY NO WARRANTY. This
is free software, and you are welcome to redistribute it
under the terms of the GNU General Public License Version 2.
See http://www.gnu.org for further details.

CVS version IDs of files used to build this code are:
Module: atacmdnames.cpp revision: 1.16 date: 2008/03/04
uses: atacmdnames.h revision: 1.6 date: 2008/03/04
Module: atacmds.cpp revision: 1.190 date: 2008/03/04
uses: atacmds.h revision: 1.90 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: extern.h revision: 1.54 date: 2008/03/04
uses: int64.h revision: 1.17 date: 2008/03/04
uses: scsiata.h revision: 1.2 date: 2006/07/01
uses: utility.h revision: 1.51 date: 2008/03/04
Module: ataprint.cpp revision: 1.185 date: 2008/03/04
uses: atacmdnames.h revision: 1.6 date: 2008/03/04
uses: atacmds.h revision: 1.90 date: 2008/03/04
uses: ataprint.h revision: 1.31 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: extern.h revision: 1.54 date: 2008/03/04
uses: int64.h revision: 1.17 date: 2008/03/04
uses: knowndrives.h revision: 1.18 date: 2008/03/04
uses: smartctl.h revision: 1.25 date: 2008/03/04
uses: utility.h revision: 1.51 date: 2008/03/04
Module: knowndrives.cpp revision: 1.166 date: 2008/02/02
uses: atacmds.h revision: 1.90 date: 2008/03/04
uses: ataprint.h revision: 1.31 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: extern.h revision: 1.54 date: 2008/03/04
uses: int64.h revision: 1.17 date: 2008/03/04
uses: knowndrives.h revision: 1.18 date: 2008/03/04
uses: utility.h revision: 1.51 date: 2008/03/04
Module: os_linux.cpp revision: 1.100 date: 2008/03/04
uses: atacmds.h revision: 1.90 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: int64.h revision: 1.17 date: 2008/03/04
uses: os_linux.h revision: 1.27 date: 2008/03/04
uses: scsicmds.h revision: 1.66 date: 2008/03/04
uses: utility.h revision: 1.51 date: 2008/03/04
Module: scsicmds.cpp revision: 1.96 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: extern.h revision: 1.54 date: 2008/03/04
uses: int64.h revision: 1.17 date: 2008/03/04
uses: scsicmds.h revision: 1.66 date: 2008/03/04
uses: utility.h revision: 1.51 date: 2008/03/04
Module: scsiprint.cpp revision: 1.121 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: extern.h revision: 1.54 date: 2008/03/04
uses: int64.h revision: 1.17 date: 2008/03/04
uses: scsicmds.h revision: 1.66 date: 2008/03/04
uses: scsiprint.h revision: 1.21 date: 2008/03/04
uses: smartctl.h revision: 1.25 date: 2008/03/04
uses: utility.h revision: 1.51 date: 2008/03/04
Module: smartctl.cpp revision: 1.169 date: 2008/03/04
uses: atacmds.h revision: 1.90 date: 2008/03/04
uses: ataprint.h revision: 1.31 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: extern.h revision: 1.54 date: 2008/03/04
uses: int64.h revision: 1.17 date: 2008/03/04
uses: knowndrives.h revision: 1.18 date: 2008/03/04
uses: scsicmds.h revision: 1.66 date: 2008/03/04
uses: scsiprint.h revision: 1.21 date: 2008/03/04
uses: smartctl.h revision: 1.25 date: 2008/03/04
uses: utility.h revision: 1.51 date: 2008/03/04
Module: utility.cpp revision: 1.65 date: 2008/03/04
uses: configure.in revision: 1.135 date: 2008/03/10
uses: int64.h revision: 1.17 date: 2008/03/04
uses: utility.h revision: 1.51 date: 2008/03/04

smartmontools release 5.38 dated 2008/03/10 at 10:44:07 GMT
smartmontools build host: i686-pc-linux-gnu
smartmontools build configured: 2008/07/30 20:12:27 UTC
smartctl compile dated Jul 30 2008 at 20:12:41
smartmontools configure arguments: '--prefix=/usr' '--sysconfdir=/etc' '--mandir=/usr/share/man' '--with-initscriptdir=/etc/init.d' '--with-docdir=/usr/share/doc/smartmontools' 'CXXFLAGS=-g -O2' 'LDFLAGS=-Wl,-Bsymbolic-functions' 'CPPFLAGS=' 'CFLAGS=-fsigned-char -Wall -O2'

So this is basically smartmontools 5.38 . What is cool about this whole output is it also is verbose in telling which modules went to go into making this version of smartmontools.

To install smartmontools go to System > Administration > Synaptic Package Manager and search for smartmontools and install it.

Unfortunately this beautiful tool is CLI atm but there are efforts to package a GUI frontend on the same.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508512

Once it gets packaged I’m sure it would get merged in Ubuntu sometime or the other.

First a small background on what the smart in smartmontools stands for.

S.M.A.R.T or as its better known as Self-Monitoring, Analysis, and Reporting Technology was introduced circa 1995.

Smartmontools started life in November 2002 on ATA specification 5 and was inspired from another project called smartsuite which started in August 2002 .

http://sourceforge.net/projects/smartsuite

Now let’s have a look at what smartmontools can do for us.

The way to work with smartmontools is through a command called smartctl.

smartctl stands for Control and Monitor Utility for SMART Disks

So without further ado, let’s dive in.


$ smartctl -h
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Usage: smartctl [options] device

============================================ SHOW INFORMATION OPTIONS =====

-h, --help, --usage
Display this help and exit

-V, --version, --copyright, --license
Print license, copyright, and version information and exit

-i, --info
Show identity information for device

-a, --all
Show all SMART information for device

================================== SMARTCTL RUN-TIME BEHAVIOR OPTIONS =====

-q TYPE, --quietmode=TYPE (ATA)
Set smartctl quiet mode to one of: errorsonly, silent, noserial

-d TYPE, --device=TYPE
Specify device type to one of: ata, scsi, marvell, sat, 3ware,N

-T TYPE, --tolerance=TYPE (ATA)
Tolerance: normal, conservative, permissive, verypermissive

-b TYPE, --badsum=TYPE (ATA)
Set action on bad checksum to one of: warn, exit, ignore

-r TYPE, --report=TYPE
Report transactions (see man page)

-n MODE, --nocheck=MODE (ATA)
No check if: never, sleep, standby, idle (see man page)

============================== DEVICE FEATURE ENABLE/DISABLE COMMANDS =====

-s VALUE, --smart=VALUE
Enable/disable SMART on device (on/off)

-o VALUE, --offlineauto=VALUE (ATA)
Enable/disable automatic offline testing on device (on/off)

-S VALUE, --saveauto=VALUE (ATA)
Enable/disable Attribute autosave on device (on/off)

======================================= READ AND DISPLAY DATA OPTIONS =====

-H, --health
Show device SMART health status

-c, --capabilities (ATA)
Show device SMART capabilities

-A, --attributes
Show device SMART vendor-specific Attributes and values

-l TYPE, --log=TYPE
Show device log. TYPE: error, selftest, selective, directory,
background, scttemp[sts,hist]

-v N,OPTION , --vendorattribute=N,OPTION (ATA)
Set display OPTION for vendor Attribute N (see man page)

-F TYPE, --firmwarebug=TYPE (ATA)
Use firmware bug workaround: none, samsung, samsung2,
samsung3, swapid

-P TYPE, --presets=TYPE (ATA)
Drive-specific presets: use, ignore, show, showall

============================================ DEVICE SELF-TEST OPTIONS =====

-t TEST, --test=TEST
Run test. TEST: offline short long conveyance select,M-N
pending,N afterselect,[on|off] scttempint,N[,p]

-C, --captive
Do test in captive mode (along with -t)

-X, --abort
Abort any non-captive test on device

=================================================== SMARTCTL EXAMPLES =====

smartctl --all /dev/hda (Prints all SMART information)

smartctl --smart=on --offlineauto=on --saveauto=on /dev/hda
(Enables SMART on first disk)

smartctl --test=long /dev/hda (Executes extended disk self-test)

smartctl --attributes --log=selftest --quietmode=errorsonly /dev/hda
(Prints Self-Test & Attribute errors)
smartctl --all --device=3ware,2 /dev/sda
smartctl --all --device=3ware,2 /dev/twe0
smartctl --all --device=3ware,2 /dev/twa0
(Prints all SMART info for 3rd ATA disk on 3ware RAID controller)
smartctl --all --device=hpt,1/1/3 /dev/sda
(Prints all SMART info for the SATA disk attached to the 3rd PMPort
of the 1st channel on the 1st HighPoint RAID controller)

That’s just the number of things it can do.

Let’s find a bit about the hdd I have. I have a Seagate 160 GiB traditional 7200 rpm hdd.


$ sudo smartctl --all /dev/hda
[sudo] password for shirish:
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Smartctl open device: /dev/hda failed: No such file or directory

Whoops, nothing what happened.

Ah, forget that the naming convention has moved from hda to sda

https://wiki.ubuntu.com/LibAtaForAtaDisks

Doh, I should have checked how many hard disks do I have first though.


$ sudo fdisk -l

Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x48254824

Device Boot Start End Blocks Id System
/dev/sda1 * 1 24 192748+ 83 Linux
/dev/sda2 25 19457 156095572+ 5 Extended
/dev/sda5 25 1220 9606838+ 83 Linux
/dev/sda6 1221 19335 145508706 83 Linux
/dev/sda7 19336 19457 979933+ 82 Linux swap / Solaris

Ah, that makes my job so much easy, its just a single hard disk divided into partitions.

So doing again with sda now.


$ sudo smartctl --all /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model: ST3160021A
Serial Number: 4JS26D4P
Firmware Version: 8.01
User Capacity: 160,041,885,696 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Mon Dec 29 18:47:38 2008 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 111) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 056 051 006 Pre-fail Always - 57827127
3 Spin_Up_Time 0x0003 099 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4217
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail Always - 687085616
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 16026
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 092 092 020 Old_age Always - 8290
194 Temperature_Celsius 0x0022 039 056 000 Old_age Always - 39
195 Hardware_ECC_Recovered 0x001a 056 051 000 Old_age Always - 57827127
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 13
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:14:07.011 READ DMA
27 00 00 00 00 00 f0 00 00:14:07.005 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:14:06.864 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:14:06.841 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:14:08.081 READ NATIVE MAX ADDRESS EXT

Error 11 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:14:07.011 READ DMA
27 00 00 00 00 00 f0 00 00:14:07.005 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:14:06.864 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:14:06.841 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:14:06.832 READ NATIVE MAX ADDRESS EXT

Error 10 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:02:46.696 READ DMA
27 00 00 00 00 00 f0 00 00:02:46.691 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:02:46.622 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:02:46.615 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:02:46.586 READ NATIVE MAX ADDRESS EXT

Error 9 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:02:45.467 READ DMA
27 00 00 00 00 00 f0 00 00:02:45.460 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:02:45.320 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:02:45.056 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:02:45.048 READ NATIVE MAX ADDRESS EXT

Error 8 occurred at disk power-on lifetime: 14509 hours (604 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 e0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 e0 00 00:00:22.304 READ DMA
27 00 00 00 00 00 e0 00 00:00:37.310 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 00:00:37.302 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 02 00:00:37.294 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:37.286 READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 15925 -
# 2 Extended offline Interrupted (host reset) 90% 15884 -
# 3 Short offline Completed without error 00% 15884 -
# 4 Short offline Completed without error 00% 15884 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Wow, that’s whole lot of information. Let’s break it down in pieces and try to understand.


=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model: ST3160021A
Serial Number: 4JS26D4P
Firmware Version: 8.01
User Capacity: 160,041,885,696 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Mon Dec 29 18:47:38 2008 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The important part to note here that smart support is enabled as well as the disk is in the database .

Without both these things, other information that we got would not be possible.

This block of information would also have been possible with doing just


$sudo smartctl -i /dev/sda

If smartctl for some reason is disabled you can just do


$sudo smartctl -s on /dev/sda

Now the next part of info. given


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 111) minutes.

This part of the information tells what checks my hard disk supports.
I could have had this same information if I had given

$sudo smartctl -c /dev/sda

It gives lot of information including if I have issues and want to run a long test how much time it would take for the same. Of course, with hard disks becoming a terabyte selective tests might become the order of the day soon.


SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 056 051 006 Pre-fail Always - 57827127
3 Spin_Up_Time 0x0003 099 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4217
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail Always - 687085616
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 16026
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 092 092 020 Old_age Always - 8290
194 Temperature_Celsius 0x0022 039 056 000 Old_age Always - 39
195 Hardware_ECC_Recovered 0x001a 056 051 000 Old_age Always - 57827127
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 13
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

This information could also be have had by doing

$sudo smartctl -A /dev/sda

This is telling us various attributes of the hard disks such as
3 Spin_Up_Time – Basically time taken from rest to fully operational.
Mine is an abysmal 99 seconds😦

OR

198 Offline_Uncorrectable which is telling me that there is beginning of problems either in disk surface or problems in the disk sub-system but still can be used. There is lot more to be inferred from this but would leave it in your hands to explore.

The next part is the error log .


SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:14:07.011 READ DMA
27 00 00 00 00 00 f0 00 00:14:07.005 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:14:06.864 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:14:06.841 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:14:08.081 READ NATIVE MAX ADDRESS EXT

Error 11 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:14:07.011 READ DMA
27 00 00 00 00 00 f0 00 00:14:07.005 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:14:06.864 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:14:06.841 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:14:06.832 READ NATIVE MAX ADDRESS EXT

Error 10 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:02:46.696 READ DMA
27 00 00 00 00 00 f0 00 00:02:46.691 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:02:46.622 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:02:46.615 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:02:46.586 READ NATIVE MAX ADDRESS EXT

Error 9 occurred at disk power-on lifetime: 14588 hours (607 days + 20 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 f0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 f0 00 00:02:45.467 READ DMA
27 00 00 00 00 00 f0 00 00:02:45.460 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 b0 02 00:02:45.320 IDENTIFY DEVICE
ef 03 45 00 00 00 b0 02 00:02:45.056 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 f0 00 00:02:45.048 READ NATIVE MAX ADDRESS EXT

Error 8 occurred at disk power-on lifetime: 14509 hours (604 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 07 00 00 e0 Error: ICRC, ABRT 1 sectors at LBA = 0x00000007 = 7

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 e0 00 00:00:22.304 READ DMA
27 00 00 00 00 00 e0 00 00:00:37.310 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 02 00:00:37.302 IDENTIFY DEVICE
ef 03 45 00 00 00 a0 02 00:00:37.294 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 00:00:37.286 READ NATIVE MAX ADDRESS EXT

These are the errors which happened during the life of the system and what errors were they.

This information could also be had by doing

$sudo smartctl -l error /dev/sda

The last bit of info. given


SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 15925 -
# 2 Extended offline Interrupted (host reset) 90% 15884 -
# 3 Short offline Completed without error 00% 15884 -
# 4 Short offline Completed without error 00% 15884 -

This could have also been done by doing :-

smartctl -l selftest /dev/sda

This gives the information that the hard disk
is ok. It also tells that how long the hard disk has been up.

16000/24=666.66

which is 2 years if the hard disk was up 24*7.

Unfortunately with the Indian conditions that is not possible.

This hard disk is almost 5 years now.

Last but not the least, finding a bit more what the tests were run.


$ sudo smartctl -r ioctl -i /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

[inquiry: 12 00 00 00 24 00 ]
scsi_status=0x0, host_status=0x0, driver_status=0x0
info=0x0 duration=0 milliseconds resid=0
status=0x0
[ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
scsi_status=0x0, host_status=0x0, driver_status=0x0
info=0x0 duration=8 milliseconds resid=0
status=0x0
Detected SAT interface, switch to device type 'sat'

REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE
[ata pass-through(16): 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00 ]
scsi_status=0x0, host_status=0x0, driver_status=0x0
info=0x0 duration=4 milliseconds resid=0
status=0x0
REPORT-IOCTL: DeviceFD=3 Command=IDENTIFY DEVICE returned 0
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model: ST3160021A
Serial Number: 4JS26D4P
Firmware Version: 8.01
User Capacity: 160,041,885,696 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Mon Dec 29 19:39:17 2008 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS
[ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ]
scsi_status=0x2, host_status=0x0, driver_status=0x8
info=0x1 duration=84 milliseconds resid=0
status=2: [desc] sense_key=0 asc=0 ascq=0
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK
[ata pass-through(16): 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 ]
scsi_status=0x2, host_status=0x0, driver_status=0x8
info=0x1 duration=116 milliseconds resid=0
status=2: [desc] sense_key=0 asc=0 ascq=0
REPORT-IOCTL: DeviceFD=3 Command=SMART STATUS CHECK returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES
[ata pass-through(16): 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 ]
scsi_status=0x0, host_status=0x0, driver_status=0x0
info=0x0 duration=80 milliseconds resid=0
status=0x0
REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE VALUES returned 0

REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS
[ata pass-through(16): 85 08 0e 00 d1 00 01 00 01 00 4f 00 c2 00 b0 00 ]
scsi_status=0x0, host_status=0x0, driver_status=0x0
info=0x0 duration=28 milliseconds resid=0
status=0x0
REPORT-IOCTL: DeviceFD=3 Command=SMART READ ATTRIBUTE THRESHOLDS returned 0

This would be useful in debugging in case of an issue.

The more interesting for admins is the ability to have the smartd daemon running by default.

To do this one needs to edit /etc/default/smartmontools and uncomment the line

#start_smartd=yes

to

start_smartd=yes

and do


$sudo /etc/init.d/smartmontools start

From next boot the daemon would be up.

One can also configure smartd if one wants using /etc/smard.conf
but leave that for you to explore.

Find support to smartmontools at smartmontools-support@lists.sourceforge.net

That’s all for now.

Add to FacebookAdd to NewsvineAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Furl

Single Post Navigation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: