Project Gutenberg

The Project Gutenberg is a project gathering public domain books in different language, its web site is http://www.gutenberg.org. The purpose of this project is to create a sustainable solution to create a ZIM file providing the Gutenberg project ebooks in the similar manner like gutenberg.org

Goals

 * A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages.
 * The data should be scraped from www.gutemberg.org.
 * The texts should be available in HTML and EPUB.
 * The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....)

One way to achieve it

 * 1) Retrieve the list of books is published by the Gutenberg project in XML/RDF format
 * 2) Parse the XML/RDF and put the data in a structured manner (memory or local DB)
 * 3) Download the necessary HTML+EPUB data from Gutemberg.org based on the XML/RDF Catalog in a target directory
 * 4) Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried)
 * 5) Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory
 * 6) Run zimwriterfs to create the corresponding ZIM file of your target directory