Metadata Celebrity Cage Match!
It looks like Mike Daconta is going to take on Cory Doctorow's venerable Metacrap rant, point by point. The series begins clumsily by questioning Doctorow's qualifications for having an opinion about metadata.
Of course, publicity is not any indicator of truth and I surmise that most links to his article are merely people commiserating the fact that metadata is hard to do right. Fortunately, there are those of us that still believe that "hard" does not equate to "wrong" and that this is a temporary state due to lack of expertise.
I'm curious to see where the critique goes when it gets to point #3 of Doctorow's, wherein he agrees (five years ago) that, as difficult as it may be, improving metadata is a worthy goal.
So, the first "straw man" (I've never understood the usage here, but it makes for a catchy title) is that People Lie. Daconta calls this claim "ridiculous," because it doesn't apply to the corporate intranet. Well, it's true that this is less true on the intranet. However, it's also true that liars don't have to gain access to your metadata to affect you. By spamming the keywords field, spammers rendered the keywords field useless for everyone, including all the honest taggers out there.
Daconta also makes the counterpoint that people can lie about data (in addition to metadata), and yet Doctorow doesn't suggest that the Web itself should be distrusted. In discussions like this, I think it's very important to carefully define the line between what's metadata and what's data. After all, metadata is data too, and data may be metadata. In the context of Doctorow's essay, I think it can be agreed that data represents the visible content on Web pages, while metadata is the structured and often hidden data that describes it. I can't speak for Doctorow, but I think he would recommend that you take the visible content on the Web with a grain of salt too. That's just not the point of the essay.
I think Daconta is absolutely right about the importance of knowing the pedigree of your metadata, but that simply reinforces Doctorow's basic point. To be useful, all of your metadata must be reliable. If some (in fact a good chunk) of that metadata is unreliable, then the metadata is not useful.
In fact, applications that have made successful use of metadata, such as Google's search engine, and Yahoo!'s directory, succeeded by solving the pedigree problem. For example, a quick way to get your site banned by Google.com is to stuff your pages with hidden text that doesn't relate to what you're showing your human visitors.
Daconta's also right in saying that you'll get better mileage on the intranet -- at least in regard to the lying problem. But I can tell you first-hand stories about how true the remaining problems are in that sheltered, intranet environment. This should be an interesting series.
(Updated to change some really awkward phrasing. I'll never be a poet.)
Posted November 26, 2006 10:59 PM