eLibrary & Project Gutenberg
eLibrary is a project started a few years ago, which never worked all that great, in part because of the crappiness of Project Guteberg’s database; basically I had the program downloading one giant directory listing of raw filenames, and one [inconsistent, apparently hand edited] listing file, and attempting to correlate them in order to create a usable index of titles and filenames. Given the unfriendly source data, there was only so much I could do, and so many books were missing or incorrectly indexed, and in some cases whole slabs of text files got mis-labelled as "audio books". By the time that started happening I had kind of given up on ever getting it right; because of the supreme crappiness of this approach [which took much-o tweaking with hideous regular expressions] there wasn’t all that much I could do to fix it.
Not any more!
Finally Project Gutenberg offers a machine readable index file [possibly they always had one, but if they did it sure was well hidden]. So now eLibrary can get a revision which will enable it to take full advantage of the amazing power of XML and not screw up author and volume names! [ eg in the current version I note that "The Bible, King James" was written by "Book 1, Genesis" ].
There are still other problems with PG texts though, namely shitty formatting, but I can moan about that in another post.
May 19th, 2005 at 2:38 pm
first off, I love eLibrary.
This is very good news, at least for me since I hope to get my Home Theater PC (meedio.com) to launch BookReader so I can read from the sofa. Being able to download selected books into my digital library via eLibrary from a 10 foot interface would satisfy all my dreams.