Sunday, December 02, 2007

Seeing Double

Being an avid fan of small utilities and portable applications I sometimes create problems for myself.

For instance, here is my general routine.

  1. Find a new or updated utility or application.
  2. Download and try to extract the files without installing it.
  3. If that fails, I install the program, copy the application folder out, then uninstall it.
  4. Next I save the application's program folder in an appropriate location in my "standalone applications" folder pile.
  5. Finally, I must try very hard to remember where I put everything.

I probably have more than a hundred such applications right now.  I've tried hard to organize them under a logical (to me) file structure. 

At work I have one "portable" application folder and structure.

At home I have another.

Then there is the set on my primary sysadmin USB stick.

Between the desktop pc and the two laptops, my work system and my USB stick I must struggle to keep them all updated and consistent..  That's where having a good sync program or two comes in really handy.

Unfortunately, I still sometimes end up finding multiple copies of a particular application squirreled away in multiple locations.

So periodically, I sweep my system for duplicate files with a duplicate file finder utility.

One must be very careful with these things.  Often multiple copies of important system or program files are scattered across a system.  I prefer to run these on my "personal" folders and locations rather than on entire drives.

When I used one of these at work on my "D:" partition where space was getting tight with all my utilities, applications, and setup programs I ended up cleaning up over 300 MB of space.  That seems like a lot, but mostly was just from about ten network installation programs and server/client administration tools I had multiple older copies retained for.  I had forgotten about those archive versions so I was able to burn them to a CD and then delete the older ones.

Not bad....

A brief test setup explained and disclaimer...

Just for kicks I decided to run an informal comparison on all the tools listed against each other.  I limited the scan location to my "portable apps" folder kept on my system hard-drive. The folder size is 1.27 GB, contains 1,780 folders, and 12,546 files. No filtering besides the default application settings were used nor any advanced comparison techniques, if possible.  All scans were run on the same system so should be fairly consistent.  Scan times by others will no-doubt vary from mine based on hard-drive speed, buffering, CPU/memory speeds, file-system fragmentation, and a host of other variables.  All this means that my results are effectively my own.  Some differences in the way the applications report their results may make exact result comparisons difficult. However all attempts will be made to compare as usefully as possible. The result descriptions can be useful information in of itself.

All programs will run on Windows 200/XP/Vista. They should run on older versions of Windows as well.  However, I did notice that some applications were able to view and access more folders on Vista than others could.  This could be both because of the design of the application to protect/skip certain system folders, hidden folders or files/folders with special attributes set, or because the application doesn't have sufficient rights to access some locations under Vista that it may under Windows 2000/XP/Win9x. Keep this in mind as well in your decisions.

Freeware Duplicate File Finders: The List

DupKiller - (freeware) - Mention of this utility has occurred at least three times in the course of this blog.  It appears I am quite taken by it. The feature that makes it so appealing is the speed at which it is able to sort through entire drive contents.  You can narrow down to folders, a drive, or multiple drives.  You can filter and exclude certain file extensions from the results.  However, this program just doesn't look for similarly named or dated or sized files, but also can compare files by contents, percent of match factors. It includes preview tools for graphic and text files. It provides details statistics on file duplication and space consumed by duplicate files.  You can also get it to search system and hidden files normally passed up by some scanners.

Results: 13,523 files scanned, 2,620 duplicate files found. Scan time was 10 seconds.

Easy Duplicate File Finder - (freeware) - This is my second-favorite duplicate file finder.  Like DupKiller, you can mask/filter your searches. Select a single or multiple locations for your duplication searches.  You can set it to skip system files and folder (nice for newbies) and instead of direct deletions, move the duplicate files you are deleting to the recycle-bin for a while before you empty/delete the files for good.  You can manually pick and choose which files you wish to delete. Or, you can move duplicate files to an alternative folder to make sure you don't delete something important--think of it like a pending drawer. When you are ready, delete the folder for good. The user interface is pretty simple to navigate.  File searches are fast.  It runs off a single executable file that is about 1.2 MB in size. Nicely portable for USB drives.

Results: 12,388 files scanned, 4,847 duplicate files found: 1786 original and 3061 duplicates. Scan time was 1:06 minutes.

Duplicate File Finder (DupFinder) - (freeware) - This tiny (237 KB) tool is a standalone utility. It automatically searches all files in all subdirectories. Duplicate files are compared with byte-for-byte reviews (100% accurate), and to speed comparing large files there is the "Fast Searching" option which will look at the the first and last 10 megabytes of large files which the developer claims is 99% accurate. Additional Features include Multithreading operation to provide extremely fast search results, file type filtering to hone in particular file types, location of empty files, selection of a file listed will display a detailed information dialog.  If you prefer your tools small and tiny, consider this one.

Results: 13,523 files scanned, 3,224 duplicate files found. Scan time was 2:20 minutes.

DoubleKiller - (freeware for private use) - The free version is a bit more limited over the $ "Pro" version. In the freeware product, you may search unlimited number of files/folders. Files over 2 GB will be skipped.  No installation is needed, you have limited sorting control on listed results. You can save/export the results, as well as delete and move files. It also runs slower than the "Pro" version.  You cannot move files to the recycle bin, delete empty directories or do advanced or bit-to-bit comparisons of files either. At just over 800 KB it is quite a bit larger than DupFinder, but still as a single executable file, it remains portable and useful enough for inclusion and consideration.  Seems quite a bit slower than some of the alternatives as well.

Results: 4,865 duplicate files found. Scan time was 1:02 minutes.

Duplicate File Finder (DupFiles) - (trialware/$) - Although this is not a freeware product, I haven't ever run into any limitations with it. The price is low ($15).  Although it is not a single executable, the program itself is "portable" and just over 1 MB in size.  It runs fast and if large and complicated interfaces are not your thing, than you will likely find the simple interface and options easy to learn and use.  It will base the duplication matches on CRC32 and Checksum or MD5 matches. You can do advanced filtering by date, file size, and file attributes.  File-masking is supported and can can set single, or multiple search-path locations.  It is quite snappy and worth looking at.

Results: 3,247 duplicate files found. Scan time was 51 seconds.

Easy Cleaner - (freeware) - This isn't really just a duplicate file finder tool.  It actually contains a number of excellent system-cleaning tools rolled up into one.  As such it weighs in at over 5 MB in folder size.  Not as small as the others, but considering its design, not too shabby.  The duplicate file finding feature is very simple, but quite fast.  Unfortunately you can't filter the results, sort the results or have any "advanced" options. However, if you are looking for a jack-of-all-trades type of cleaning tool, it is handy to have this feature rolled up along with the rest.

Results: 2,739 duplicate files found. Scan time was 42 seconds.

As you can see, different applications running different methods, using different default settings will find different, different files. Confused? Yeah, it's kinda like that.

Suffice it to say, it would probably be good practice to select one or two utilities, take the time to understand them, how they work, and be specific in the criteria you use to run your scans with.

Then carefully run and delete unnecessary duplicate files.  Then run the other program.  Between two careful sweeps, you should be able to identify and clean most of your targets.

Actually, the real time spent won't be in the running and identifying of the files, but in being careful and deliberate in deciding which duplicate files you need to keep and which ones you can remove.

That takes a lot of time, and a good deal of experience as well.

Happy hunting.

--Claus

No comments: