One of the processes we have in the shop is to archive a series of network graphs for various URL locations that are created in a specialized MRTG - Multi Router Traffic Grapher deployment.
Once the web-page screen shots with the graphs are each captured, they are combined into a single Word document for that day which is then archived for historical reference and distribution to management.
The result is the daily tasking of an analyst for about an hour clicking through a large Excel table that contains each of the URL links, grabbing a screen shot, pasting it into the Word document, then moving on to the next URL.
This has been going on for some time and unfortunately, the madness of my other projects has kept me from turning my attention onto addressing it for a more efficient process.
Last week was a bit lighter at my workbench so I could consider the issue for a few minutes.
It took me about five minutes to come locate the free command-line tool IECapt - A Internet Explorer Web Page Rendering Capture Utility coded by Björn Höhrmann.
It’s just 102 kB unpacked and though it requires the gdiplus.dll, I had no problem finding that file already present on our XP Pro systems (and about fifteen others in various portable utility program folders on my own system).
My solution for this daily task was very simple.
I created a folder “C:\graphdumps” and copied both the IECapt.exe and (for good measure a gdiplus.dll I had on my system) into it.
I then created a batch file that had a line for each of the separate MRTG page URL’s we need to access. In my case I had approximately 50 or so URLS each on their own line.
As an example, each line in the batch file has something along the following (all on a single line):
IECapt --url=http://www.uhcougars.com/ --out=GoCougs.jpg --min-width=800 --delay=5 --silent
I also choose an simple output filename for each URL line that was clearly indicative of the logical location each URL represented.
For now, I’m outputting as a jpg file format for maximum compatibility with the folks who would receive the final file, however IECapt supports a number of output formats such as .png, bmp, jpeg, emf, and probably a few other formats not listed in the help. I like the idea of using a PNG format instead and may do some comparisons between the two formats moving forward.
I did have one “gotcha” I had to overcome first.
Every time I ran the batch file, I would get an output error unable to generate the thumbnail image.
I checked around and found this forum post IECapt does not work when --url contains a query string which did seem to confirm the issue was that the URL’s I was using in my batch file contained query strings. I didn’t really like the options (recode the program or use a url-shorting service). On a hunch I wondered what would happen if I encapsulated the URL parameter in double-quotes.
It worked perfectly. So for example, each line in my batch file was now changed to add the “ “ accordingly. It now looked more like the following on a single line.
IECapt --url=”http://weather.chron.com/radar/station.asp?ID=HGX19&NOHEADER=1#MAPZOOM” --out=radar.jpg --min-width=800 --delay=5 --silent
My test run of the batch-file took just under 1.5 minutes to complete the pulling and saving of all the pages. I then opened up a blank Word document, selected all the output jpg files that had just been generated in my folder, and dragged/dropped them into the Word doc. I then saved it with the daily file name and was done. From about 60 minutes of dreary click-saving URLs to under 2 minutes of mostly-automated grabbing and pasting. Sweet.
Now if I could just find a way to automatically import these images into a templated Word/RTF format document (with images embedded not linked) I will be set. I’ve looked at “mail-merging images” into Word but I’m not sold yet on the process. There should be an easier way to just pipe the output into an RTF “word pad” document but I haven’t figured that out yet. This way alone is a big improvement so for now a little drag/drop into Word isn’t a deal-breaker. Thoughts/suggestions?
I considered using the robust freeware tool SiteShoter by Nir Sofer. It supports both a GUI and a CLI mode and is pretty sophisticated. However, for this application, IECapt worked perfectly and is dead-trim. SiteShoter can read out URL’s from a text file to act on, so SiteShoter is a different technique that could be better in some circumstances.
How to automatically capture images of a series of web sites and create thumbnails of the resulting image files. - Post by Paul Bradley that put me onto IECapt and how easy the CLI is to use.
Remembering to actually stop what we are in the middle of doing (especially annoying in the middle of a meeting) when the established URL capture hour comes around is quite challenging as well. More than a few days the designated team-member has forgotten and had to run the captures a few hours later. Because this process uses a batch-file, one can easily set the batch-file to execute as a scheduled task automatically when the capture-hour occurs. Then (as long as the system is running) we can come back later that afternoon and assemble the archive document from the jpg’s that were automatically generated. Super-sweet.