hashcat Forum
.PST to wordlist - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Support (https://hashcat.net/forum/forum-3.html)
+--- Forum: hashcat (https://hashcat.net/forum/forum-45.html)
+--- Thread: .PST to wordlist (/thread-6503.html)



.PST to wordlist - tedix - 04-26-2017

Hi guys,

I am looking for a solution to convert Outlook 2010 .pst files to wordlists. During investigations, I often encounter encrypted files in .pst archives, and I am pretty sure that 80% of those .pst archives do contain the password of those encrypted files. Therefore, I would like to convert all mail conversations (subject/body) in the archive to a wordlist/dictionary which would allow me to run a dictionary attack on the files.

So far, I managed to convert the .pst format in Python to readable text with readpst, but I have not found a solution to convert those files to a large wordlists.

How would you guys approach this problem? Am I looking in the right direction or are there other/better ways to extract a wordlist out of a .pst archive? 

Greetings,

Tedix


RE: .PST to wordlist - royce - 04-26-2017

Interesting idea. I haven't worked with readpst before, but if the results are plain text, then the remaining question is how to turn emails into wordlists. Splitting the resulting text files into component word tokens and using space as a delimiter should be pretty easy.

What research have you done into such splitting?


RE: .PST to wordlist - rvn - 04-26-2017

I would suggest using readpst from the libpst utilites to convert it to mbox and then cat everything together and replace the spaces with newlines using sed (s/ /\n/g ) and then sort and uniq it.

http://www.five-ten-sg.com/libpst/[url=http://www.five-ten-sg.com/libpst/][/url]


RE: .PST to wordlist - rvn - 04-26-2017

lol i just saw that royce was faster in answering the question ^^


RE: .PST to wordlist - jimby - 04-26-2017

(04-26-2017, 10:26 AM)tedix Wrote: Hi guys,

I am looking for a solution to convert Outlook 2010 .pst files to wordlists. During investigations, I often encounter encrypted files in .pst archives, and I am pretty sure that 80% of those .pst archives do contain the password of those encrypted files. Therefore, I would like to convert all mail conversations (subject/body) in the archive to a wordlist/dictionary which would allow me to run a dictionary attack on the files.

So far, I managed to convert the .pst format in Python to readable text with readpst, but I have not found a solution to convert those files to a large wordlists.

How would you guys approach this problem? Am I looking in the right direction or are there other/better ways to extract a wordlist out of a .pst archive? 

Greetings,

Tedix


If you have access to a Unix host try massaging your above "readable text file" :

    cat readable_text_file | tr " " "\n" | sort | uniq > wordlist1.txt

That should get you started.  I'm sure it will need a lot of cleanup, particularly if there are still binary or Unicode characters in the file.

You might want to compare that output with

    strings PSTFILE | sort | uniq > wordlist2.txt


or maybe combine the two and uniq the result:

    cat wordlist1.txt wordlist2.txt | sort  | uniq > wordlist3.txt



Best,
Jim B.