Open Library API: Cataloging 13 Million Books

Raymond Yee
May. 16 2008, 12:46AM EDT

The Open Library is a project of the non-profit Internet Archive, whose long-term goal is to present "one web page for every book ever published." A recent release of the Open Library brought the total number of book records to over 13.4 million, including over 234,000 records with full-text for the book. A new public Open Library API was also announced to give read access to the Open Library (and note that the addition of the Open Library API profile here means there are now 14 different book-oriented APIs on ProgrammableWeb).

Consider one of my favorite books on the Python programming language: David Beazley's Python Essential Reference, 3rd Edition. You can find the Open Library record for the book at

http://www.openlibrary.org/b/OL7668717M

On that page, you will see metadata about the book (e.g., title, author, language, the ISBN, etc.), as well as links to booksellers and libraries that might be able to sell you a copy of the book. Something that might be surprising is that you (and anyone else) can edit the record and see the history of revisions to the record. Think of Open Library as a big book-oriented wiki.

Using this book as an example, let's look at how to apply the three parts of the Open Library API:

  • get (to get an object)
  • things (to query for objects)
  • versions (to look for versions of objects)
  • Note that a good place to try out the API is the Open Library API Sandbox, which allows you to issue a query to the API and see the response.

    First, you can issue the following get query:

    curl http://www.openlibrary.org/api/get?key=/b/OL7668717M

    to get a JSON object that holds metadata about the book. In fact, you can get the JSON response to show up nicely in the browser by attaching the following parameters to the URL

    &prettyprint=true&text=true

    to pretty-print the response and send it as plain text

    Second, let's figure out how to get the Open Library identifier for this book (which you needed for the get query) using a things query. If you have an ISBN-10 for the book (i.e., 0672328623) , you can use direct a query whose value is

    {"type":"\/type\/edition", "isbn_10":"0672328623"}

    to generate the corresponding response.

    You can use a things query to search for books by title. Here you can specify a wildcard (*) at the end of a field, whose name you need to mark with ~

    {"type":"\/type\/edition",
     "title~":"Python Essential Reference*"}

    to generate a response containing the corresponding Open Library identifier.

    Third, we can get at the different versions of a record in Open Library by performing a versions query whose value is

    {"key": "\/b\/OL7668717M",
     "sort":"-created", "limit":10}

    to generate a JSON object holding version data for the record.

    There's plenty more to explore in the documentation of the API, including the list of types supported in the API.

    It'll be interesting to see whether it will get a following primarily in the library community or in the larger world. The announcement of the API on the code4lib list (a "forum for discussion of computer programming in the area of libraries and information science") immediately prompted the question of why the API was not implemented using SRU (a protocol used, for example, at the Library of Congress). The ranges of responses in the thread (SRU is "incomprehensible to non-librarians" to "non librarian students look at the [SRU] document and start working with it straight away.") should be familiar to anyone who has struggled with the questions of whether to adopt an existing standard or protocol or to create one's own. How could concepts that are crystal clear to one group appear so obscure to another group? Who exactly is the audience for a given API? How do you accommodate multiple audiences in the design of an API?

    Raymond Yee

    Comments

    Comments(4)