January 31, 2007

For you Web standards geeks...

There's an interesting discussion regarding XML and XHTML error handling over on Tim Bray's blog, wherein he points to Anne van Kesteren's suggestion that the next iteration of the XML standard allow for a more nuanced approach to malformedness. (Be sure to read the comments contributions

The current XML standard requires parsers to throw up their hands and quit upon encountering invalid characters, unclosed tags, etc. This is good when you're talking about bank transactions, but applying the same rule to run-of-the-mill HTML content, as attempted by the XHTML standard, makes the format too brittle for most content. This is why browsers still have tag soup parsers.


January 21, 2007

What Wikipedia could offer the Semantic Web

Tim Bray started an interesting discussion on the nature of linking and particularly linking to the Wikipedia page for a given topic. This is closely related to some of my recent ponderings. Martin Hepp et al presented a paper last year on the topic of using wikis for ontology management, including an examination of the stability of Wikipedia URIs. Here's the PDF, and the abstract:

In this paper, we (1) show that standard Wiki technology can be easily used as an ontology development environment without modification, reducing entry barriers for the participation of users in the creation and maintenance of lightweight ontologies, (2) present a quantitative analysis of current Wikipedia entries and their properties, (3) prove that the URIs of Wikipedia entries are surprisingly reliable identifiers for ontology concepts, and (4) demonstrate how the entries available in Wikipedia can be used as ontology elements.

Hepp also has an excellent article in the Jan/Feb IEEE Internet Computing ("Possible Ontologies"). He points out several problems with the typical approach to ontologies, including the delay between recognition of an important entity and its insertion into the ontology, and the fact that ontologies are often presented in a way that offers scant detail about an entity. These problems and others may be addressed using wikis.

In Wikipedia, you have a potential community taxonomy of many different types of entities. Multiple articles on a topic are usually combined quickly into one, which gives the topic a canonical URI. There are also uniform ways of disambiguating names, mapping multiple alternate names to one, and of course, there's a standard way to categorize articles. All these traits of Wikipedia, combined with the increasing number of inbound links from average bloggers, make for a Web that is increasingly semantic -- ironically without the use of any of the Semantic Web technologies.

Whether or not there's gold in them thar Wikipedia links remains to be seen.