Scraping Books to Build a Linked "Thingiverse" of Literary Data

July 19, 2012 Paul M. Davis

The portable and linkable nature of hypertext enables much of the big data explosion, as we produce an ever-growing amount of linked, machine-readable text every second. But what of the wealth of stories and information to be found in books? Is it doomed to become data exhaust, encased within print media or proprietary eBook formats? The startup Small Demons aims to address that by building a vast relational database of the people, places, things, and ideas in books by scraping publisher-provided eBooks and tapping user-generated human intelligence for additional insight and context.

Currently in beta, in time the startup plans to release APIs and hook into open databases to reveal a once-unimaginable tapestry of connections between the content of books and the wealth of knowledge on the web. In a feature at Shareable Magazine, I speak with Richard Nash, publishing futurist and VP of content and community at Small Demons, about the project and the role data plays in the future of books and storytelling. (Please note: some of the language in the interview may be considered NSFW.)

About the Author

Biography

Previous
American Thrombosis and Hemostasis Network (ATHN) is looking for a Web Application Developer
American Thrombosis and Hemostasis Network (ATHN) is looking for a Web Application Developer

At Pivotal Labs, one of the services we provide our clients is helping them interview and hire. Pivotal Lab...

Next
Cloud Foundry and Open PaaS at OSCON
Cloud Foundry and Open PaaS at OSCON

The Cloud Foundry team is at the premier Open Source conference, O’Reilly OSCON this week. What a differenc...