eLibrary & Project Gutenberg

eLibrary is a project started a few years ago, which never worked all that great, in part because of the crappiness of Project Guteberg’s database; basically I had the program downloading one giant directory listing of raw filenames, and one [inconsistent, apparently hand edited] listing file, and attempting to correlate them in order to create a usable index of titles and filenames. Given the unfriendly source data, there was only so much I could do, and so many books were missing or incorrectly indexed, and in some cases whole slabs of text files got mis-labelled as "audio books". By the time that started happening I had kind of given up on ever getting it right; because of the supreme crappiness of this approach [which took much-o tweaking with hideous regular expressions] there wasn’t all that much I could do to fix it.

Not any more!

Finally Project Gutenberg offers a machine readable index file [possibly they always had one, but if they did it sure was well hidden]. So now eLibrary can get a revision which will enable it to take full advantage of the amazing power of XML and not screw up author and volume names! [ eg in the current version I note that "The Bible, King James" was written by "Book 1, Genesis" ].

There are still other problems with PG texts though, namely shitty formatting, but I can moan about that in another post.

One Response to “eLibrary & Project Gutenberg”

  1. Shane Klingonsmith Says:

    first off, I love eLibrary.

    This is very good news, at least for me since I hope to get my Home Theater PC (meedio.com) to launch BookReader so I can read from the sofa. Being able to download selected books into my digital library via eLibrary from a 10 foot interface would satisfy all my dreams.

Leave a Reply

You must be logged in to post a comment.