# A Million+ free Google ePub files: How to read them on the Kindle



## artsandhistoryfan (Feb 7, 2009)

I posted this today at the Amazon forum and others have been able to get this to work, with one person finding that it provided her that nice chapter navigation with her 5-way, for the book she chose. The thread there is under the same title, in case anyone wants to read questions people had etc. One gal with a Mac had a problem at first with the ePub 'folder' and a 'zip' file.

The ePub file format which so many companies are favoring now because it's open-source and standardized -- though purchased ePub books will usually have Adobe's digital-rights-management protection, DRM, added to the ePub file -- has been in the news because of Sony's focused use of that format in the coming months.

I saw the story about the Google stash of over a million files released yesterday in ePub format now, in addition to the PDF format released earlier.

PDFs tend to be hard to read on an e-reader due to tiny lettering in order to fit a large 'page' to the smaller e-screens. Also, they're limited for Kindle users in connection with annotations, dictionary-use, and Kindle-wide searches of words within them.

The ePub format allows for text-reflow to allow larger characters. And Google has done that work for us instead of making us try to convert 'page-oriented' PDFs to MOBI/PRC.

While the Kindle doesn't natively read ePub format (yet -- but Amazon now owns Stanza which specializes in ePub), we are able to do a very simple conversion from ePub to the Kindle-friendly MOBI format.

The Kindle format itself is more or less a MOBI format anyway, with different identifiers added.

Also, that conversion allows you to change the title and author wording and you can add a forced table of contents if there is none and you can also choose layout options.

I've put the detailed instructions into a blog article at

*http://bit.ly/milkbooks*

Here's *Google's own wording on what they did with these books*

And *their more-detailed explanation* with examples of their scans and what happens in processing it to text.

I tried it last night, and it's quite easy to do and allows for some creativity. This is the same software that allows you to organize computer-record-keeping for your Kindle files and it offers other nice features too. Hope some will enjoy this capability.


----------



## cjpatrick (Jan 4, 2009)

Wow what an awesomely informative post. Thanks a ton. I will check this out.


----------



## Rasputina (May 6, 2009)

This is great! I hope they have some of the books I have on pdf in epub.


----------



## BookishMom (Oct 30, 2008)

Calibre works pretty well in reformatting ePub to Mobi. The only problem I've had is the title & author's name and the page number are inserted in the middle of every other page. Has anyone else had this problem? Does anyone know of a way to fix it, or is it due to the ePub to Kindle/mobi conversion and we have to take what we can get?


----------



## artsandhistoryfan (Feb 7, 2009)

BookishMom said:


> Calibre works pretty well in reformatting ePub to Mobi. The only problem I've had is the title & author's name and the page number are inserted in the middle of every other page. Has anyone else had this problem? Does anyone know of a way to fix it, or is it due to the ePub to Kindle/mobi conversion and we have to take what we can get?


The software received a big update a few weeks ago. In the book I did last night I have no problem with the author name and page number + title showing in the middle of every other page. It could be a problem with one book you had that was done that way to fit a very fixed page as part of a PDF file but a conversion had put the title bar in the middle of every other converted page.

Mine shows just a title starting at upper left margin and is on every page, but no page number and no author, and always at the very top.

Best, its Table of Contents was there as an image of exactly how the older book looked - though it did have links. In the conversion it was just an image, tiny letters, and no liinks. So I clicked the option to "force" a table of contents and Calibre created one, with links, and up it at the back. REALLY nice.

However, there had been an index in the back too, which had links in the original as seen on the Google books page, but the links are not there. However, I can just search the words if I want. I don't use the index much.


----------



## artsandhistoryfan (Feb 7, 2009)

cjpatrick and Rasputina,

 Glad it's of interest!  So far, it's been rewarding.  Certainly, they load faster too when not in pdf format.  Google did a great service by doing the conversion to text-reflow style ePub while we can still see the PDF also if we want to know what the original page layouts were like.  I have a book on the history of ancient Egypt with lots of illustrations, and page turns aren't affected.


----------



## libros_lego (Mar 24, 2009)

Do you just get these books from Google books? I know it's a stupid question, but I just want to make sure.


----------



## artsandhistoryfan (Feb 7, 2009)

Jenni said:


> Do you just get these books from Google books? I know it's a stupid question, but I just want to make sure.


You go to http://books.google.com

But if you read the post at my blog there are a lot more steps involved in how to get a book
from there and onto your computer etc. It may be more difficult for people not used to working with computers though.


----------



## BookishMom (Oct 30, 2008)

artsandhistoryfan said:


> The software received a big update a few weeks ago. In the book I did last night I have no problem with the author name and page number + title showing in the middle of every other page. It could be a problem with one book you had that was done that way to fit a very fixed page as part of a PDF file but a conversion had put the title bar in the middle of every other converted page.


You know, now that I think about it, I think the books I've been converting were PDFs originally, not ePubs. I'll recheck to make sure, but it sounds exactly like you described above (the forced title bar ending up mid-page every other page). I'll check again and if the latest was an ePub, I'll come back and ask for more help.

*Edited to add*: yes, it was a PDF. I thought it was an ePub. I'm going to download an ePub book now and see how it does. I may be back with more questions! 

*Edited again, to add*: Wow! What a difference. The ePub is much better. It converts to Mobi (via Calibre) very well (at least the book I tested did). No page numbers, no forced breaks, indented text (long quotes) stayed indented, no odd spaces between words, sentences or paragraphs. Just wow... it's definitely going to be my format of choice if I can't have Mobi.

Also, do you prefer the public domain books from Google or from the other public domain sites? Are ePubs so much better that Google (and the resulting reformatting for use on the Kindle) is a good choice?

*Edited to clarify my question*... I'm curious as to the difference between the public domain books offered at Google and those at other sites. I get the impression that they're supposed to be better (?), but am not sure if I'm just misunderstanding what it's all about. I hope this makes better sense.


----------



## WinterBorn (Jun 13, 2009)

BookishMom said:


> ...
> 
> *Edited to clarify my question*... I'm curious as to the difference between the public domain books offered at Google and those at other sites. I get the impression that they're supposed to be better (?), but am not sure if I'm just misunderstanding what it's all about. I hope this makes better sense.


Hi,

My first post! I've lurked for a while but wanted to offer an answer to the above question.

The first site to offer public domain books was (and still is) Gutenberg.org. They have _text files_ of many books in the public domain, and these files have often been reformatted and offered elsewhere. Many (but not all) have been proofed, so you have a fair chance of getting all the text that was originally printed. (They also offer a few other formats, but it depends on the book.)

Google Books is a project that has scanned in books from libraries all over the world. When you download a PDF of a public domain book, you have not only a graphic image of the page, you usually also get OCR'd text along with it. The problem here is that the scanning, while great, can have pages that are blurred or half-turned, etc., and the OCR is only automated; there are no human proofreaders (afaik) for the text.

It's my understanding that EPUB takes the OCR'd text and offers it in a reflowable format. Meaning if you view it on your computer, the page with be one size, if you view it on an ereader, it will be another. The text simply reflows to fill the size, much like the text of a book you buy for your K2 would reflow when you download it to a KDX.

I tend to favor the PDFs since they are image files and not (possibly mangled) OCR'd text. I do have Acrobat Pro 8, so I'm able to crop them a bit to fit the screen better on my DX, thus making the text appear larger on the screen.

You can also download books from the Internet Archive. I think I've seen EPUB there on a few books, but I don't recall any specific ones. The IA offers more formats that Google Books, but they don't have nearly as many books.

So, here's my personal take: For straight text, Gutenberg.org is probably best. It's the source of text for a lot of the other sites that offer reformatted public domain books for specific ereaders.

For books that you want to actually see the page in the book, then the PDFs from Google Books are best. Imo, there is also a much wider selection of books from Google Books. For books that you will be reading on various devices, then EPUB might be a good choice. The caveat here is that there's been no proofing of the text and you sometimes just get garbage. (In my experience, the older a book is, the more likely it is that there will be OCR mistakes. And if you are wanting to read 18th century books that were printed with the long 's' (the letter that almost looks like an 'f' but isn't), then it's not worth bothering with OCR text at all. Either go with Gutenberg's proofed versions, or with PDFs.)

Internet Archive is in the middle, offering PDFs and text and several other formats, but they have fewer books overall (afaik, but note that I have not done any research into it; that's just my perception), though they do sometimes have books that aren't found on the other two sites.

Hopes this helps a little. Sorry for the long first post!


----------



## cjpatrick (Jan 4, 2009)

WELCOME TO KINDLEBOARDS!


----------



## artsandhistoryfan (Feb 7, 2009)

BookishMom said:


> *Edited again, to add*: Wow! What a difference. The ePub is much better. It converts to Mobi (via Calibre) very well (at least the book I tested did). No page numbers, no forced breaks, indented text (long quotes) stayed indented, no odd spaces between words, sentences or paragraphs. Just wow... it's definitely going to be my format of choice if I can't have Mobi.


Really glad you're squared away on that!



> Also, do you prefer the public domain books from Google or from the other public domain sites? Are ePubs so much better that Google (and the resulting reformatting for use on the Kindle) is a good choice?


 I have no preference except that I'd just rather press a button and get a nicely formatted file and not have to do anything else! The problem is that with all of them, the people who decide to provide editions are of really varying standards (and probably ability) so I'm glad for Amazon's sample feature where I can get a really good idea in a few seconds.

Google had a paragraph yesterday, which I pasted a link to here, today, which said it was automated and there will be errors in reading. And we've seen these in public domain documents from almost everywhere. I'm surprised at how good a job it DOES do but I guess we have a wealth of options now, and there's no guarantee whose edition will be best. Some people are meticulous and make sure all the links work and that there is a nice linked table of contents. SO many of the editions don't. So, I look for that in samples.

What I like is that we have one new extremely large bunch of books from which to choose, and with Google we can actually get a 'full view' of the entire book online when it's a free public domain book.

They have a feature now where you ask for a book and if you go for a preview, it usually means it's a book that costs money. Then they show you all the places that have that book and how much it costs at each, I THINK. And it shows when they don't have a book currently.

But with all the problems from bad conversions from PDFs to Kindle-compatible MOBI that we've had, this is really nice. At least we know we're not left out from the 1 million they keep saying Sony users get but not Kindle users.


----------



## CegAbq (Mar 17, 2009)

WinterBorn said:


> Hi,
> 
> My first post! I've lurked for a while but wanted to offer an answer to the above question.
> ...
> Hopes this helps a little. Sorry for the long first post!


Welcome to our merry bunch! and never worry about a "long" post, first, middle, or last. We're glad you finally decided to jump into our waters.


----------



## BookishMom (Oct 30, 2008)

WinterBorn said:


> Hi,
> 
> My first post! I've lurked for a while but wanted to offer an answer to the above question.


Welcome, and thank you so much for your explanation. It's very much appreciated!


----------



## Scheherazade (Apr 11, 2009)

So I guess this means Amazon can start advertising a library of eleventy-billion books too?  I can't wait to try this out, I can't even imagine what sort of hidden goodies I'll find in here for my term papers!


----------



## lmk2045 (Jun 21, 2009)

I am interested in what you all think about the quality of the scanned Google books before and after the conversion to Kindle. I personally was not impressed with the quality of the books, but I tend to be somewhat fussy about aesthetics.


----------



## artsandhistoryfan (Feb 7, 2009)

WinterBorn said:


> Hi,
> 
> My first post! I've lurked for a while but wanted to offer an answer to the above question.
> 
> The first site to offer public domain books was (and still is) Gutenberg.org. They have _text files_ of many books in the public domain, and these files have often been reformatted and offered elsewhere. Many (but not all) have been proofed, so you have a fair chance of getting all the text that was originally printed. (They also offer a few other formats, but it depends on the book.)


Welcome! 
I think, like Mobile Reference (also with teams of proofers and some wonderful layout people)
they both have some of the best, but some have sat there and not been updated.

There's now a 'Magic Catalog' you can download to your Kindle direct, and from that you can see latest offerings, popular ones, and then you can search the entire catalog for the book(s) you want.
If you haven't already done it, see http://bit.ly/kgutenb



> Google Books is a project that has scanned in books from libraries all over the world. When you download a PDF of a public domain book, you have not only a graphic image of the page, you usually also get OCR'd text along with it. The problem here is that the scanning, while great, can have pages that are blurred or half-turned, etc., and the OCR is only automated; there are no human proofreaders (afaik) for the text.
> 
> It's my understanding that EPUB takes the OCR'd text and offers it in a reflowable format. Meaning if you view it on your computer, the page with be one size, if you view it on an ereader, it will be another. The text simply reflows to fill the size, much like the text of a book you buy for your K2 would reflow when you download it to a KDX.


 You do very good explanations.



> I tend to favor the PDFs since they are image files and not (possibly mangled) OCR'd text. I do have Acrobat Pro 8, so I'm able to crop them a bit to fit the screen better on my DX, thus making the text appear larger on the screen.


 I like having my electronics manuals and some educational PDFs from the Net on history
and art stuff but for most things just give me text that I can highlight and add notes to
and for which I can use a Kindle search and the text will be included in the text base. Amazon
does not have highlighting or note-adding OR in-line dictionary for PDFs.

Also I was pleased with how fast the ePubs load, even with photos in them.



> You can also download books from the Internet Archive. I think I've seen EPUB there on a few books, but I don't recall any specific ones. The IA offers more formats that Google Books, but they don't have nearly as many books.


 And then there's http://feedbooks.com, http://fictionwise.com, http://mnybks.net and a few others that people use around here. SO much to choose from. But I have to say that only Google lets you see an entire public domain book before you decide to download it for keeping. Easier way to decide which of many versions you want. I was impressed last night.

Have always disliked the PDFs because they are often more hard to read for some reason.



> For books that you want to actually see the page in the book, then the PDFs from Google Books are best. Imo, there is also a much wider selection of books from Google Books. For books that you will be reading on various devices, then EPUB might be a good choice. The caveat here is that there's been no proofing of the text and you sometimes just get garbage. (In my experience, the older a book is, the more likely it is that there will be OCR mistakes.


True. In my case I like to keep some PDFs to use as reference but I want to convert many so I can highlight away and take notes and look up words. I am just a highlight/lookup maniac. It also makes sure I can find those segments important to me and bring them up with friends when we get together, or I can send them to people from my My Clippings file.



> And if you are wanting to read 18th century books that were printed with the long 's' (the letter that almost looks like an 'f' but isn't), then it's not worth bothering with OCR text at all. Either go with Gutenberg's proofed versions, or with PDFs.)


 I use MobileRef as much as I use Gutenberg. They do a very good job usually, and some fantastic jobs on things like the CIA Atlas for each year. If I get MobileRef from Amazon I can have my highlighting and notes backed up to their servers and to my personal server area, that latter is very nice when you can see all your notes made for a book, on one long page.



> Hopes this helps a little. Sorry for the long first post!


 As I said earlier, i enjoy your explanations, WinterBorn. Thank you!


----------



## artsandhistoryfan (Feb 7, 2009)

lmk2045 said:


> I am interested in what you all think about the quality of the scanned Google books before and after the conversion to Kindle. I personally was not impressed with the quality of the books, but I tend to be somewhat fussy about aesthetics.


 I think some of them look as if they were scanned by someone who just woke up and had to get his hour's work in before the next shift came in but they didn't have training in keeping them straight or their fingers from being scanned as well 

It's sort of charming to me - like a very busy friend did a special scan for me. They have funny little sticky labels that get picked up too. BUT having said that, the book I had Calibre convert last night is just beautiful! Whoever did the book made sure most illustrations were placed just so, and with typefont-text replacing images of the pages (which I do not like to try to read on a pc monitor!) I really find it very nice.

I know that some some people on the other thread reported back they were really pleased, and one person said he got 'spectacular' results. We have such an explosion of goodies, books-wise.


----------



## L.Canton (Jan 21, 2009)

Thanks for posting all this information, it's really quite useful. Smart move on Googles part to make all of these available for free.


----------



## CS (Nov 3, 2008)

Can you guys post any good books you're finding through Google? Maybe I'm doing it wrong, but I can't find anything decent.


----------



## legalbs2 (May 27, 2009)

CS said:


> Can you guys post any good books you're finding through Google? Maybe I'm doing it wrong, but I can't find anything decent.


Me too. I did not find millions of free downloadable books in any category. I did download three books that I recognized, but most were in different languages.


----------



## gadgetgirl003 (Mar 22, 2009)

legalbs2 said:


> Me too. I did not find millions of free downloadable books in any category. I did download three books that I recognized, but most were in different languages.


I'm glad to see that I'm not the only one who just isn't seeing them.


----------



## artsandhistoryfan (Feb 7, 2009)

CS said:


> Can you guys post any good books you're finding through Google? Maybe I'm doing it wrong, but I can't find anything decent.


 I've found so many good books just on Amazon and then Project Gutenberg, MobileRef, Feedbooks, Mnybks.net, etc., that I haven't looked, beyond doing the test and blog-entry the other day on converting a Google ePub public-domain book to a Kindle-readable one.

I just now went there and, from the categories at the left, I chose "Literature"

This brought up 
*this set of books under 'literature' category*

Seeing all the attractive, colorful pictures of covers that resulted, I then realized these probably include a lot of books that are beinig SOLD rather than just free public domain ones.

Looking up at the pull-down menu, I saw that it said: 
"Showing: Limited preview and fullview"

which means this search result includes books you have to buy.

So, I then changed the pull-down menu to the choice:
"Showing: Full view only" (which is available only for free, public domain books and magazines)

And then, most of the covers seen now were the plain, generic types. for the most part, and included books in other languages, as some have noted.
They should have searches limited to a certain language!

Even then, this resulted in only 42 books, with maybe half or more of them in other languages. I do realize Google said they included libraries from all over the world, but ...

THEN I changed the pull-down menu to show the option:
"Showing: All books"

This got me some rather esoteric selections. Since I love the singing of a baritone named William Sharp, I clicked on a book on the front page (claiming 4,424 books listed) that was named

"The sexual tensions of William Sharp" 

and saw the following description of that book (excerpted below)
=======
"
Title	The sexual tensions of William Sharp: a study of the birth of Fiona Macleod, 
incorporating two lost works, Ariadne in Naxos and "Beatrice"
Volume 2 of Studies in nineteenth-century British literature
Volume 2 of Studies in Modern Poetry
Author	Terry L. Meyers
Publisher	P. Lang, 1996
Original from	the University of Michigan
Digitized	Mar 13, 2008"
... "
=======

So, I guess these million books include a lot of esoteric academic treatments of various topics, not a bad feature! and generally not available in the usual public domain offerings elsewhere.

Note that there are a lot of categories on the front page's left column. Also, the front page has an assortment of the type of books or magazines available and are grouped in the following way (today):

Interesting
Classics
Magazines
Highly cited
Random Subject: (Today's, or that moment's, was "Alcoholics Fiction")

*TRANSLATING books written in another language:*

SO, I decided to go back to 'Literature" and see if I could produce a Google translation of one of these books written in another language, and sure enough Google has made it possible, although there is currently no option saying "translate" on the page.

I chose one of the Italian books on the front page of the public-domain books search-results:

"Scritti letterari di un Italiano vivente (1847) by Giuseppe Mazzini "

I then browsed through it until I came to a page that began with a new paragraph which I thought would be a good example for a translation. I chose page 24.

In our case, knowing which page we want, we don't have to tediously arrow through each page. I just entered '24' in the input box there, pressed my Enter key, and it took me to page 24.

What you'll see there is the scanned image that Google made from that page.

But they give us the option to see that in "Plain text" -- so I clicked on that option above the text.

This took me to a "plain-text" version showing plain text from pages 20-24 (see top left).

In the web browser's URL/or Location Field, the long URL was:
"http://books.google.com/books?ei=zbqYSqGUOp7CzQSwh-XbDg&rview=1&output=text&as_brr=1&id=w4hptHSmqvMC&dq=subject%3A%22+Literature+%22&jtp=24"

SO, I did a Ctrl-N for a new Window (or Ctrl-Tab in Firefox for a new TAB) and that brought up a window in which I could go to Google's Translation page, which is at

http://www.google.com/language_tools?hl=en

On that translation page, I put the full, long URL for the Italian book's "plain text" version of the page -- which gives us text from pages 20-24 -- into the "Translate a web page" feature at the bottom left of the Google translation page.
Then I chose Italian to English below that input box and clicked on "Translate"...

That brought me to Google's book again, this time displaying pages 20-24 in English.

Remember that this is an automated and usually somewhat-primitive web-translation, but it definitely gives you the gist of what is being said.

When your mouse hovers for a bit over the translated text, Google shows you the original-language's text for that small area and asks if you'd like to contribute a better translation for it. If you do, you click on that to put your suggestion into the pop-up input box.

Then I scrolled to the bottom of the page and clicked on "Continue" at the bottom right.

This brought up pages 25-29 translated into English also, and it goes on like that throughout the rest of the book if you like.

Quite amazing.

I'll probably make this my next blog-entry  
Yesterday I put up Google's explanation of how they do conversions from books, etc.

- Andrys


----------

