� Writing about e-Writing | Main | Good News Bad News �

September 13, 2005

On Data Mining

Data Mining is one of those technologies that lots of people talk about, few people understand and fewer still have actually done successfully. Now that this 'Able Danger' mess has revealed itself, people all over are piping up about whether or not it's dangerous.

Like lots of things related to security and technology, you really cannot comment definitively unless you have an open discussion. It's highly unlikely that we're going to find out exactly how these probes work. However, we do have a good way of showing how useful, (or not) an arbitrary data mining technique might be. How? Use Google.

For most of us, Google is the closest we lay people are ever going to get to sophisticated text searches. But it doesn't take long for anyone to realize Google's shortcomings. Google's original success was based upon the '6 degrees' idea. Results rise to the top depending upon how many links. But the right connections between them aren't always drawn. Here's an example.

Google 'Cabbage Patch' and 'Smurf'. Now there are a lot of possible connections between those two which have nothing to do with a very specific connection I have in mind.

Seth Grimes notes:

While decision-making is still largely based on well-established methods for exploring historical data, we're starting to hear more about successes gained with predictive analytics that turn the gaze on the future. These methods offer data classification, clustering and forecasting to help organizations apply knowledge to operational decision-making and planning. Significant barriers remain. A big problem is that the algorithms are generally abstruse, designed by and for statisticians. The results often defy lay explanation.

By the way, I was talking about dances. The contexts available for widely known data (there are millions of Americans who have danced the Cabbage Patch & the Smurf) are easy to come by. Nobody is trying to hide those facts, but drawin the right connections between individuals who are trying to hide their information is even more difficult.

I wouldn't be so confident in data mining models I haven't lived with for years. Even then, it often takes a leap of intuition. It's all about the interpretation, and sometimes you just know.

Posted by mbowen at September 13, 2005 06:58 PM

Trackback Pings

TrackBack URL for this entry:
http://www.visioncircle.org/mt/mt-tb.cgi/4281