Monday, May 28, 2007

XP System Rescue

Last week I lost my XP desktop system.

Not that I didn't know where it was...I mean the system crashed. Hard.

Three times last Saturday it locked up and the HDD indicator light was solid on, but the system remained unresponsive. I had to redo several posts lost in the process each time it occurred.

The last time it happened I was running Process Explorer to see if there was a certain application that had caused it to lock up. But nothing indicative was found.

Reboots seemed to get it going again.

Late Sunday Lavie told me there was a error on it and it showed a system file load error.

At that point I rebooted again so I could bring it up in Safe Mode and see if that cleaned things up. Unfortunately, it wouldn't boot in Safe Mode at all and kept looping a boot process. Not a good sign. Each time it tried to come up, it seemed to indicate a new file problem.

So I knew I was looking forward to what was likely to be an involved XP System rescue.

Fortunately, I already had a basic plan ready to execute.

Step One...Do Nothing

Yes. That's correct. My very first response was to do nothing at all.

Why? Well actually, I mean "Do Nothing to recover the system immediately."

All too often users try to begin to hack and recover a system without having a plan of attack. In the process they end up doing more hard than good.

So despite every inclination in my IT guy's bones to rip into it...especially considering how much data I had on it...I stepped away and just left it powered off and unplugged until I could better evaluate the situation.

I didn't want to move too fast and do more harm or make a mistake while tired and frustrated. Even worse I might decide to skip something because I didn't have a disk or other needed item handy at the moment which could just make the situation even worse.

Step Two...Assess what you Know

During the "down-time" over the next week I evaluated what I knew about my system and what problems I had been running into.

I knew that the issue seemed related to the hard-drive. Was it a physical hard-drive failure or was it an OS error causing the HDD to lock up?

I knew that I had a backup of all my critical files. It wasn't "this-month" current, but it was enough to take some of the edge off accepting an unrecoverable HDD failure.

I knew that based on my last views of Process Explorer that it was unlikely that a "rouge" program or virus was to blame for the failure.

I don't "dual-boot" my systems, using virtualization instead, and I was seeing the Windows loader even if it couldn't fully execute, so it was unlikely that a critical master boot record or boot.ini failure had occurred.

My hard drive was a "newer" model that supported S.M.A.R.T. disk technology, so I would be able to read some of the health information from it.

It was XP SP2 Home running IE7. This would factor into my recovery approach.

I had only a few weeks ago ran and printed out a copy of my system information using Belarc Advisor...which contained quite a lot of system information as well as my Windows Keys.

And finally I had partitioned my 120G hard drive into four separate partitions; a NTFS C: drive, and three FAT-32 partitions to store my pictures, music, and archive files.

Step Three...Make a Plan

My plan was thus made.

I identified all the software/disks needed to do a full XP repair/reinstallation if needed.

I wanted to have a plan to get my custom system drivers recovered if I did need to reinstall the system from scratch. (More on that later.)

I would try a "Bart PE" "live disk" boot to recover critical data to one of the other partitions in case the system partition was failing.

I would run a HDD health check to see if the HDD was failing using a nice little utility, HDTune.

I would identify and prioritized the user data files on the system I most needed to recover to the ones I least needed to recover. This might allow me to rescue the most important ones in the event that the drive gave out mid-way through.

I would then pull the drive to place into an external drive case and connect it to my Vista system.

I would then recover my critical user files off it to my Vista laptop if possible.

I would then run drive repair commands (file structure and bad sector scans) at my leisure without concern about additional data loss.

Next I would replace the drive back in the original system, boot it and (hopefully if system booted) run my driver backup utilities and save them to a USB stick\CD-ROM.

If it was up enough and still unstable, I would first uninstall Internet Explorer 7 and roll back to IE 6.

I would run a XP system rebuild/repair attempt first.

If that didn't help I would wipe my C: partition and then do a fresh XP install.

Of course, if my hard drive itself turned out bad...I would just have to buy a new one and go with what I had managed to recover from scratch. Then destroy the bad platters with a hammer for security.

Each step was designed to build upon the previous and give me important information needed to determine how I would progress though the next stages. It isn't a perfect plan...as I chose to first make a set of recovery files on another partition on the same drive instead of to a separate one, but that would only be done if it looked like the drive reported itself to be "healthy."

Step Four...Making Material

Since my system was a custom "home-brew" Shuttle SFF, I had an original OEM XP Home CD. This was a great start. However, as it was so old, it was SP1. I wanted to have an up-to-date XP SP2 setup disk. What to do?

Funny that despite all my slipstreaming work on the job, I had never taken the time to make a slipstreamed copy of my own for XP SP2. Fortunately, that is no longer a difficult thing to do.

Slipstreaming is basically the processes of adding later service packs into a prior Windows setup disk. I could have done it "manually" using these easy-to-follow instructions. But there is now an even easier method: AutoStreamer.

This application takes your original Windows system disk, your downloaded SP2 file, and merges them with little extra effort into an ISO file which you burn to a CD-ROM. Done. In a little over 30 minutes using Lavie's XP laptop I had a freshly slipstreamed XP SP2 Home cd ready to go. I tested in in a Virtual PC 2007 session and it worked flawlessly.

I had previously made a custom Bart's PE boot disk that would allow me to boot the system without needing to "touch" the installed drive for the most part.

I already own an external USB drive enclosure. Just had to remove the DVD ROM player in it first.

I had an XP Home laptop as well as a Vista Home Premium laptop at my disposal. That also took a ton of pressure off and I could use these to browse the web and look up any technical leads I might need as well as (hopefully) load the critical files off the drive to.

I also downloaded and tested three system-driver backup utilities before I attempted to use them. They fit nicely on my USB stick so I had they ready to go.

I had also in hand a CD-ROM copy of a recently-ran heise Offline Update disk so I could bring all the critical updates up to date before I put it back on line, if a full system reload was required.

Step Five...In System "Live" Assessment

Resisting the urge to "try to boot it one more time" I booted the system instead with a copy of Bart PE that I had customized with a ton of specialized utilities.

The hardware booted off the CD-ROM and seemed to run fine. That indicated to me I wasn't having a problem with the systemboard, CPU, PSU or memory. That was a good start.

I had a copy of HDTune on my Bart's PE disk so I next ran that to take a look at the drive health.

All the S.M.A.R.T. monitoring points reported they were operating well within proper tolerances. So I could proceed knowing that it wasn't likely (though still not guaranteed) that my drive was going to die in the middle of anything I was going to progressively attempt.

Hurdle one appeared cleared.

Next thing I did was to consult the list of data files I needed to recover and set about backing them up onto another partition on the drive. I started with my Thunderbird email files, my password management file, my Firefox profiles. From there I went on to my financial management software data files and finally each of our Windows "My Documents" folders in each user profile account.

Once safely stored and having met the requirements of my critical targeted files I wanted to recover, I resisted the temptation to be greedy and get more and I shut the system down. I didn't want to run the risk of causing more potential damage until I had gotten a copy of the files off the drive itself.

Step Six...HDD On the Operating Table.

I could have uploaded the files over FTP to one of the other systems, but I felt pretty comfortable at this point that the problem was now a Windows OS issue and not a physical drive issue.

I removed my drive from my Shuttle system and placed it into my USB external drive enclosure. Everyone should have at least one of these cheap things around just for such emergencies. They are dead-useful in a pinch.

I connected it up to our Vista system which quickly recognized the unit, the drive, and each of the partitions on it. Good sign!

I then copied the pre-recovered files I had copied to another partition on that drive off to my Vista system. I then grabbed the music and photo files as well that I had previously skipped.

With all my data now safely backed up on another system the pressure was off.

Another key hurdle down.

I first used Vista to check the drive for errors and repair them. It ran about fifteen minutes and reported that errors were found and fixed. I then followed that up again and this time told it to look for errors and repair, and this time to also do a scan for bad sectors. This time it took better than 30 minutes to finish. Again errors were found and repaired.

There was one record reparsed, 5 unindexed files processed, found bad clusters in a memory.dmp file, added two bad clusters to the drive's bad cluster file, and corrected errors in the Volume Bitmap.

Whew!

Through the process the drive ran great and never locked up or died. Doing a bad sector scan is very intensive activity for a drive and since it didn't die during the process, any last bits of concern about a a flaming drive crash pretty much disappeared.

Step Seven...Physical Therapy

I put the drive back into its host body and rebooted.

It took it a while to come up, and it ran another checkdisk/scandisk operation again as it booted. It found a scary sounding error about fixing some Uppercase settings, but not actually as serious as it appears.

A bit longer I was back at my XP profile login screen!

Despite this good sign, I resolutely stuck to my plan. I popped in my USB stick and ran all three system driver backup utilities in succession and copied the driver files to my USB.

Since I had homebuilt this system, having the actual drivers is a bit of an issue. While XP would be able to bring it up in the event of a system reinstall, having the original drivers right on hand would save a bunch of time hunting through them on the Net and through all the CD-ROM's I've had to collect for them over the years.

One more hurdle down, one more to possibly go.

Once I had those I could now decide what I wanted to do next.

According to Microsoft, you need to uninstall IE 7 from your system before attempting an XP repair or you will likely break both IE 6 and IE7 on the system for good. Others have stated in forum posts that they did XP repairs with IE 7 in place with no ill effects. For now, I'm prepared to follow Microsoft until I get information otherwise.

I decided that I would let the system run for a full day and see if the drive locked up and that it handled reboots without ill effects before I would decide to attempt a XP repair/reinstall.

Step Eight...Not Needed to Date.

So far the drive and OS has now been running for several days with no more lockups or bad-behaviors. It has returned to rock-solid performance.

Had I needed, I would have next first attempted to uninstall IE 7, then run a Windows XP repair. Depending on how that worked I would have reloaded any Windows Updates broken in the repair process.

If that didn't restore the system to good health, I would have then wiped the primary partition (probably zeroed it out) then reformatted and reinstalled XP Home SP2 from my slipstreamed disk. I could then follow it up with my backed up custom system drivers.

To date, I haven't had to jump that final hurdle. But I am ready for it, should it be needed.

Final Thoughts

While admittedly stressful to undergo, this experience has actually left me in a better position for any future problems. I will continue to make backups on a regular basis. I have a XP SP2 slipstream disk on hand in the event of a future system reinstall. I have all my custom system drivers backed up as well on CD-ROM. I'm set.

I will begin to do monthly bad-sector scans on the drive now. I don't think I have done that on this drive since I put it in at system build, so having a few found after all that time doesn't (in itself) concern me that much just yet. This system uses an Ultra ATA (EIDE) drive connection instead of SATA connections. Those ATA drives are getting harder to find, so I might just pick a new one up in a few months while I can still find one fairly painlessly. (Hard Drives: ATA versus SATA)

If problems start to arise again, I will also just go ahead and remove IE7 and run a XP repair.

And the cause of all this grief?

I suspect that based on the information I gathered during this process that a file somewhere ended up getting corrupted or landed on a bad sector. When the system kept encountering this file, it would lock up the drive as the system kept re-reading the bad file and not moving on or erroring out. The system repair process run under Vista seemed to have fixed the issue and the integrity of all my Windows system files seems fine now again.

I am hopeful that this is correct, as the system is running fine again with no system error messages being report at boot. I never did an XP repair process so that's why I remain hopeful that none of them were actually corrupted. As time goes on I will probably find out if it is indeed a failing hard drive, some system corrupted file, or maybe something yet to be identified.

And, amazingly, the XP installation remains the original one when I built the system, back in February 2004. I know many Windows users advocate "repaving" their Windows systems with regular format/installs but by being careful on what I install, keeping it clean of malware, and keeping it running on a UPS to avoid system "crashes" it has done a great job all this time.

So, with the right tools, a good plan and quite a bit of patience and luck, even the hardest crashing systems might just be restored...just as long at the physical drive itself isn't toasted.

Good Luck!

--Claus

2 comments:

Anonymous said...

Claus,

Have you looked at running Spinrite on the drive? If some clusters went bad, more might be getting close to going. Spinrite is really good for checking those out.

Not free, not even terribly cheap, but a great tool to recover with. It might well have been able to recover your drive in-place.

Anonymous said...

That's a very good suggestion.

I actually do have an older version of GRC's SpinRite and completely forgot about it.

SpinRite

I don't think I've used it for years.

I'll give it a toss on the drive by the weekend and post what it came up with.

Sometimes I wonder if I have too many tools and utilities for my own good!

Thanks for the tip and Reminder!