Monday, August 07, 2006

AOL's Pandora Project...Sweet!

Unless Internet privacy just isn't your thing, you may have heard that the brilliant minds at AOL recently released some choice "Grade A" search data on the net.

AOL Search data release: via TechCrunch

So what did quick users get with this data download? How about:

20 million web queries from 650,000 AOL users,
Just over 2GB of uncompressed, tab-delineated text file data,
All searches from those users for a three month period this year,
If they clicked on a result,
What that result was, and
Where it appeared on the result page.

It includes:

personal names,
addresses,
social security numbers, and
Search query data.

How much is that doggie in the window? (Arf-Arf!)

Free. Gratis. Free as in beer.

Sweet!

AOL quickly realized the error in their ways (probably just didn't want their servers hammered) and pulled the file from access.

But that's OK. It's now mirrored all over the tubal Internet! (note: removed link. You really don't need to see it that bad, if so, find the link somewhere else.)

Did you know that AOL's search engine is just a pimped up rebrand of Google's? So that is also (indirectly) Google's search results? Boy, I bet Google is really pissed off--what with their "DOJ can't have our data until they pry it from our dead fingers" stand they took recently.

Oh, what was that? Google's own cache still has a copy of the original page? Interesting.

Well surely nothing bad can come of this. Right? I mean, after all dear friend, the researchers carefully considered this fact and had the foresight to remove the AOL user's name from the search and exchange it for some "AnonID - an anonymous user ID number" instead.

Oh, what was that from over at Google Blogoscoped?

"What's really interesting is that queries were connected to a user ID... and there goes your privacy. Based on a sequence of searches it is often trivial to connect a person to a user ID. For example, user 500 may search for "link:mysite.com", and then user 500 may search for the name "John Doe." Now you can verify that mysite.com's webmaster is John Doe from San Francisco, and you have a good indicator that user 500 is indeed John Doe. Finally, you look at other queries from this user -- like, "jobs San Francisco" -- and you have strong indicators that John Doe is looking for a job behind his current boss's back."

Anything else to add from anyone? What's that Michael?

Michael Arrington wraps it up by saying, "The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the ability to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to." Sure, the data is of great value to research indeed, but as Asdf in the forum comments, "Poor AOL users."

Oh me-oh-my....what a Pandora's box has been opened up here. And all this time we've been concerning ourselves about data going on the lam from missing laptops, USB keys and credit brokers....

What's poor Claus supposed to do?

Geesh! I'm a sociology and IT guy! Why, I'm going to download that tar-ball of data from one of the mirrors, run some queries on it and see if any interesting Texas/Houston nuggets can be found stuck in that steaming hot cow-chip! That's what I'm going to do!

I'll share anything interesting.

Then again, maybe it's best left in the back pasture attracting flies: AOL User 927 Illuminated via the Consumerist.

In the meantime, you Google users out there should consider checking out imilli.com's article on Anonymizing Google's cookie. Follow the steps and create a bookmark with the listed java-script code. Works like a charm (so I hope)!

--Claus

No comments: