From the olpcsf blog
Last year, right after theÂ OLPC SF Community Summit 2011, I had the pleasure of attendingÂ Books in BroswersÂ (BiB-II) at theÂ Internet Archive. It was a plan made withÂ SJÂ on-the-fly to take theÂ Pathagar OPDS Book ServerÂ and put it on aÂ 4-watt SheevaPlug. The very cool and awesome duo ofÂ Mary Lou JepsenÂ andÂ John RyanÂ helped us present the unit. We live-tested it: 150 people hit the box and it held up. Load tests revealed it could serve 500 simultaneous users!
So we had a self-contained book server, that could run off a solar panel, and arguably serve thousands of books in the middle of nowhere – a Wi-Fi bubble, that serves up books to all within its reach.Â Heck, we even have aÂ virtual machine, complete with PathagarÂ on it! Â Â Where do we get the books? The Internet Archive of course! With its 3 million plus books, its a vast ocean to fish from. The bigger challenge is fishing well.
How do you curate content for your little Wi-Fi bubble? Â Â And once you do so, how do you pull it all together?
Raj Kumar (@rajbot) at the Internet Archive has the answer… They have this script they have been working on, which pulls the books/media directly form the Archive. The script needs to be fed a bookmark file, that one may create after signing up at the Internet Archive. After a few conversations and a few trials, Raj has pointed me to the very coolÂ fetch_IA_itemÂ script.
To get rolling:
- Sign up for an account on theÂ Internet Archive.
- Log in.
- Look for stuff on the Archive’s pages, and when you find something interesting, bookmark it.
- Go to your “Patron Info” page, and grab the link for your bookmark file.
- Go toÂ https://github.com/rajbot/fetch_ia_itemÂ and ge, the script either as a .zip file, or via git:
Â Â git clone git://github.com/rajbot/fetch_ia_item.git
- In theÂ fetch_IA_itemÂ folder, edit theÂ fetch_IA_item.pyÂ file & replace the sample user id with your own archive.org id (mine isÂ svermaÂ in the example).
- Run it. You’ll need python for this.
Â Â python fetch_IA_item.py
- If you have books or other media in your bookmarks, media and its metadata will start coming in. You can interrupt it in the middle (CTRL-c) and pick up where you left off.
Works like a charm! Thank you Raj and the team at the Internet Archive. You guys rock! Next we need to work on getting the metadata into an appropriate json or csv format for Pathagar, but that’s another project…