Saturday, February 21, 2009

Wait, Wait, I know this one!

Faithful readers of the GSD blog may vaguely remember this post from the archives:

In summary, enterprise had deployed some Dell dual-core supported systems and then we later discovered that incorrect hall.dll and multi-core kernel files were included in the image, rendering the dual-barrel cpu’s single shooters.

In the post I outlined a method we ended up deploying to fix them “on the fly” in the field instead of having to reimage them with a corrected image build.

So many, many systems successfully fixed later I started getting calls saying weirdness was raising its head again.

Strike One

A field tech gave me a call after trying the fix and reported the following error after applying our fix:

He had followed the steps, replaced the single-core hal.dll and ntoskrnl.exe files with the multi-core versions particular to our system, and rebooted.

And was presented with the following error on a baby-blue Windows screen:

"autochk not found - skipping autocheck"

Rebooting the system repeated the error and neither system-restore or last good known configuration helped.  Even after he copied the original files back (renamed .old in our process), the error would not go away.

We used a WinPE boot disk and verified that the autochk file was still present and accounted for in the C:\Windows\system32 folder.

Puzzled, we took down the notes, recovered the user data to a USB drive, and reimaged the system to get it going again with the fixed dual-core system image.

Strike Two

A few weeks later Mr. No (one of our senior network watchers) was in the field and was leading a project to update re-deployed systems.  As such he was also checking for and updating the core files on some systems he discovered were not correctly applied

And he ended up with the same error.

Now my whole attention was on this.  I could understand if a field tech made a mistake in the dual-core enablement process, but Mr. No?  Not likely.

After a considerable amount of troubleshooting assistance over the phone, we again collected our notes and Mr. No bailed and reimaged the system, again after recovering the user’s data.

Why after a long run of success with this technique were both field techs and senior staff finding the process no longer working?

It’s Outta da Park!

Then I figured it out…while taking my morning shower last week…go figure.

It was simple.

When I got to the office I asked Mr. No what Service Pack level the system was at.  He didn’t know because he hadn’t checked.  Yep.  Suspected as such.

So I fired up my image building system, XP Pro with SP3 and applied the dual-core fix to it, rebooted, and…

"autochk not found - skipping autocheck"

replicated exactly, the error message the team was seeing.

What I realized is that it was very likely (later confirmed) that the staff were first applying XP SP3 to the systems they were checking (yes, yes, enterprise still hasn’t pushed out SP3 to our systems yet…we are having to do the updates ourselves at this point…I know, but not my department….) before applying the dual-core fix.

When the autochk process ran at boot it knew that these system files were incorrect versions, thus borking the fix and boot.

So I extracted the XP SP3 file versions and issued updated instructions that everyone now has to check to see what SP version the XP system is running, then apply the correct multi-core files to the system.

As the files are captured on our systems:

XP Pro SP2 

hal.dll & halmacpi.dll – file version 5.1.2600.2705

ntoskrnl.exe & ntkrnlmp.exe – file version 5.1.2600.3093

XP Pro SP3

hal.dll & halmacpi.dll – file version 5.1.2600.5512

ntoskrnl.exe & ntkrnlmp.exe – file version 5.1.2600.5657

Repeated tests on the imaging systems demonstrated this fixed that problem and would restore dual-core functionality to the appropriate systems.

So the lesson is this, if you have corrupted or incorrect core Windows systems files, be very, very sure if you seek to replace them with ones from another system or Windows setup disk, that you use ones from a similar Service Pack level.  At the very least, check the file properties if possible and note the version number.

It might save you some headaches.

Possibly related

Good luck,

--Claus V.

No comments: