hashcat Forum
Dump Scraper - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Misc (https://hashcat.net/forum/forum-15.html)
+--- Forum: User Contributions (https://hashcat.net/forum/forum-25.html)
+--- Thread: Dump Scraper (/thread-4196.html)

Pages: 1 2 3


Dump Scraper - vladimir125 - 03-19-2015

As you already know, Internet is full of passwords (plain and hashed ones): when a leak occurs, usually it's posted to PasteBin.
The pace of these dumps is so high that it's not humanly possible to collect them all, so we have to rely on a bot, scraping PasteBin site for interesting filea.

Dump Monitor will exactly do this: every time some leaked information are posted on PasteBin, he will tweet the link.

Sadly Dump Monitor is not very efficient: inside its tweets you will find a lot of "false positives" (debug data, log files, Antivirus scan results) or stuff we're not interested into (RSA private keys, API keys, list of email addresses).

Moreover, once you have the raw data you need to extract such information and remove all the garbage.

That's the reason why Dump Scraper was born: inside this repository you will find several scripts to fetch the latest tweets from Dump Monitor, analyze them (discarding useless files) and extract the hashes or the passwords.

https://github.com/tampe125/dump-scraper/releases

Please remember to read the wiki before continuing:
https://github.com/tampe125/dump-scraper/wiki

Finally, this is a super-alpha release, so things may be broken or not working as expected. Moreover, I know it's a kind of "hackish": a single program with a GUI would be 100 times better. Sadly I'm running out of time and I don't know anything about Python GUI development: if anyone wants to contribute, it would be more than welcome!

Please leave here your thoughts and opinions.


RE: Dump Scraper - atom - 03-19-2015

Many thanks!


RE: Dump Scraper - Si2006 - 03-19-2015

Can get it to work on ubuntu, I filled in the twitter auth keys and renamed the settings-dist.json and installed dependences.

PHP 5.5.22-1+deb.sury.org~precise+1 | Python 2.7.3

php scrape.php

PHP Warning: require_once(vendor/autoload.php): failed to open stream: No such file or directory in /home/xxxxx/dump-scraper/scrape.php on line 8
PHP Fatal error: require_once(): Failed opening required 'vendor/autoload.php' (include_path='.:/usr/share/php:/usr/share/pear') in /home/xxxxx/dump-scraper/scrape.php on line 8


RE: Dump Scraper - vladimir125 - 03-19-2015

ah crap, I forgot to put that in the wiki!
You have to get composer (https://getcomposer.org/download/) and run:

php composer.phar install

sigh, that's the risk of always working on a dev environment...
However don't worry, if everything goes smooth, I think I'll release a new Python only version, with a single entry point.


RE: Dump Scraper - Si2006 - 03-19-2015

That done the trick! Thanks


RE: Dump Scraper - Si2006 - 03-19-2015

One more problem.

Doesn't seem to create the data folder after processing the tweets with "php scrape.php" and is also displaying a php notice in term.

Code:
processed 2000 tweets
    Found 0 removed tweets in this batch
PHP Notice:  Trying to get property of non-object in /home/xxxxx/dump-scraper/scrape.php on line 97

Notice: Trying to get property of non-object in /home/xxxxxx/dump-scraper/scrape.php on line 97
PHP Notice:  Trying to get property of non-object in /home/xxxxx/dump-scraper/scrape.php on line 100

Notice: Trying to get property of non-object in /home/xxxxx/dump-scraper/scrape.php on line 100
PHP Notice:  Trying to get property of non-object in /home/xxxxx/dump-scraper/scrape.php on line 103

Notice: Trying to get property of non-object in /home/xxxxx/dump-scraper/scrape.php on line 103
PHP Notice:  Trying to get property of non-object in /home/xxxxx/dump-scraper/scrape.php on line 103

Notice: Trying to get property of non-object in /home/xxxxx/dump-scraper/scrape.php on line 103

    processed 2001 tweets
    Found 0 removed tweets in this batch

Total processed tweets: 2001



RE: Dump Scraper - vladimir125 - 03-19-2015

ignore the notice error, it seems the tweet doesn't have any data (I'll add a check for it).
Please manually create the folder data/raw

Tomorrow I'll release a new version addressing these issues...


RE: Dump Scraper - winxp5421 - 03-20-2015

Got everything working up till "python classify.py"

running Ubuntu 14.04, python 2.7, scipy'0.13.3' , sklearn'0.15.2'.

Error: http://pastebin.com/e2QMSmKs


RE: Dump Scraper - vladimir125 - 03-20-2015

can you please post the training/features.csv file?
I think there are some invalid values inside that.
You can upload it to pastebin and put the link here.

Thank you very much!


RE: Dump Scraper - winxp5421 - 03-20-2015

After you had mentioned that the training csv had invalid information i re looked at the Wiki and noticed that the training folder structure was "train" instead of the more logical "trash" i had a hunch this was a typeo so made the adjustment and all works fine now thanks!
http://prntscr.com/6j5lck