Using Wikipedia as a Web Database

Ever want to programmatically query Wikipedia? It's a tempting dataset with over 1.6 million articles but yet no official API. While there's been a rumor that the Wikipedia team will supply an API at some point, for now you can use an API we just listed here: the DBpedia API. It's a project headed by a team of German university researchers and as they describe it "DBpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data." More from their Introduction:

Wikipedia is the by far largest publicly available encyclopedia on the Web. Wikipedia editions are available in over 100 languages with the English one accounting for more than 1.6 million articles. Wikipedia has the problem that its search capabilities are limited to full-text search, which only allows very limited access to this valuable knowledge-base.

Semantic Web technologies enable expressive queries against structured information on the Web. The Semantic Web has the problem that there is not much RDF data online yet and that up-to-date terms and ontologies are missing for many application domains.

The DBpedia.org project approaches both problems by extracting structured information from Wikipedia and by making this information available on the Semantic Web. DBpedia.org allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to DBpedia data.

Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates, categorisation information, images, geo-coordinates and links to external Web pages. This structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content.

The DBpedia.org project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web. We use the SPARQL query language to query this data.

The DBpedia dataset currently consists of around 91 million RDF triples, which have been extracted from the English, German, French, Spanish, Italian, Portuguese, Polish, Swedish, Dutch, Japanese and Chinese version of Wikipedia. The DBpedia dataset describes 1,600,000 concepts, including at least 58,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 557,000 links to images, 1,300,000 links to relevant external web pages, 207,000 Wikipedia categories and 75,000 YAGO categories.

The project also has some interesting utilities like an integrated online debugger and a tool called the Relationship Finder that lets you explore the relationship between any two things in their dataset. In the example below you can see N degrees of separation between Kevin Bacon and Johnny Cash.



It will be interesting to see what sorts of applications get built on this API and if we start to see more public SPARQL/RDF APIs appearing.

John Musser

Comments

Comments(15)

[...] Using Wikipedia as a Web Database By John Musser dbpedia Ever want to programmatically query Wikipedia? It’sa tempting dataset with over 1.6 million articles but yet no official API. While there’s been a rumor that the Wikipedia team will supply an API at some point, for now you can … ProgrammableWeb - http://blog.programmableweb.com [...]

Danny

Hi John, great to see this on ProgrammableWeb!

See also: <a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenD... rel="nofollow">LinkingOpenData</a> project, which includes dbpedia - it's 20 or so independent datasets, all interlinked (scroll down for a diagram). "Collectively, the datasets consist of over one billion RDF triples, which are interlinked by 250,000 RDF links (July 2007)."

Hi Danny, good to hear from you and thanks for the pointer to the LinkingOpenData project. Very interesting to see the scale of the linked datasets. And yes, handy diagram too boot! Very good resource to know about.

[...] In the example below you can see N degrees of separation between Kevin Bacon and Johnny Cash. via ProgrammableWeb These icons link to social bookmarking sites where readers can share and discover new web [...]

jimmy

dbpedia: lovely stuff. Real lovely. 10 points to them.

[...] Programmable Web has announced the availability of a new API for automating queries to Wikipedia. That may not sound very exciting, but stay with me - it gets better. [...]

I would like to introduce an alternative - queries Wikipedia in XQuery.

Here is a demo of WikiXMLDB - a Wikipedia dump was parsed into XML and loaded into Sedna XML database.

http://wikixmldb.dyndns.org/

Enjoy!

Wow, very interesting! I'm probably am able to use this in one of my next projects! Cheers

[...] make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or [...]

[...] make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or [...]

[...] make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or [...]

What's Going down i'm new to this, I stumbled upon this I've discovered It absolutely useful and it has helped me out loads. I'm hoping to give a contribution &amp; assist other customers like its helped me. Good job.

This is the right webpage for anyone who wants to find out about this topic.

You realize a whole lot its almost hard to argue with you

(not that I actually would want to…HaHa).

You certainly put a brand new spin on a topic which has been written about for ages.

Great stuff, just great!