Passwords from E-books
#1
In addition to the numerous wordlists that can be found on the Internet, it seems that no one has gone to the effort of converting E-book archives to plaintext and parsing out the data in various ways.  Such data could produce a new genre of dictionary files for use with hashcat (or any other similar program).

There would be many challenges for such a project.  Here are a few.

1. The required disk storage.  Some sites that I have seen for E-books for just one genre held close to 80 GB (or more) of E-books.  Several terabytes could be necessary for someone who wants to work with a large data set.

2. Where to get the data.  There are numerous websites that have E-books available.  Obtaining that content is a challenge and task all by itself.

3. Converting the available formats.  Epub and pdf are common, but working with the data effectively requires plain-text format.  Calibre seems to be the most obvious choice for converting to plain-text, but there may be other options that I don't know about.
https://calibre-ebook.com/download

4. How to parse the data.  Just using all the words that are present in the text is not adequate and completely misses the point of getting all of this content in plain-text format.  Pulling out phrases and sentences, delimited by commas, periods and double quotes would yield candidates that could be useful.  Having sentences or phrases with the spaces removed or the spaces replaced with other characters also seems worthwhile.  Also, one of the methods that I once saw on YouTube for choosing a "secure" password was to take a sentence from your favorite book and use the first letters of every word in that sentence as a password.

5. Sentence length.  Entire sentences from books would be mostly useless in a lot of cases due to the length limitations of several hash types in hashcat.  I haven't investigated whether or not those length limitations are present in hashcat's alternative, John the Ripper, for example, but for several of the common fast hashes (MD5 and SHA1) full sentences would be rejected due to length.

Has anyone here ever bothered with this?  It seems like a lot of work with not a lot of reward.


Messages In This Thread
Passwords from E-books - by devilsadvocate - 03-21-2017, 09:37 AM
RE: Passwords from E-books - by kabirthapar - 03-22-2017, 09:14 AM
RE: Passwords from E-books - by Kgx Pnqvhm - 03-23-2017, 12:07 AM