The challenge of curation

From the olpcsf blog

Last year, right after the OLPC SF Community Summit 2011, I had the pleasure of attending Books in Broswers (BiB-II) at the Internet Archive. It was a plan made with SJ on-the-fly to take the Pathagar OPDS Book Server and put it on a 4-watt SheevaPlug. The very cool and awesome duo of Mary Lou Jepsen and John Ryan helped us present the unit. We live-tested it: 150 people hit the box and it held up. Load tests revealed it could serve 500 simultaneous users!

So we had a self-contained book server, that could run off a solar panel, and arguably serve thousands of books in the middle of nowhere – a Wi-Fi bubble, that serves up books to all within its reach.  Heck, we even have a virtual machine, complete with Pathagar on it!   Where do we get the books? The Internet Archive of course! With its 3 million plus books, its a vast ocean to fish from. The bigger challenge is fishing well.

How do you curate content for your little Wi-Fi bubble?   And once you do so, how do you pull it all together?

Raj Kumar (@rajbot) at the Internet Archive has the answer… They have this script they have been working on, which pulls the books/media directly form the Archive. The script needs to be fed a bookmark file, that one may create after signing up at the Internet Archive. After a few conversations and a few trials, Raj has pointed me to the very cool fetch_IA_item script.

To get rolling:

  1. Sign up for an account on the Internet Archive.
  2. Log in.
  3. Look for stuff on the Archive’s pages, and when you find something interesting, bookmark it.
  4. Go to your “Patron Info” page, and grab the link for your bookmark file.
  5. Go to and ge, the script either as a .zip file, or via git:
       git clone git://
  6. In the fetch_IA_item folder, edit the file & replace the sample user id with your own id (mine is sverma in the example).
  7. Run it. You’ll need python for this.
  8. If you have books or other media in your bookmarks, media and its metadata will start coming in. You can interrupt it in the middle (CTRL-c) and pick up where you left off.

Works like a charm! Thank you Raj and the team at the Internet Archive. You guys rock! Next we need to work on getting the metadata into an appropriate json or csv format for Pathagar, but that’s another project…

1 thought on “The challenge of curation

  1. Pingback: One Laptop Per Child

Leave a Reply

Your email address will not be published. Required fields are marked *