Facebook's directory
#6
Thanks for the feedback and I'm glad you like it!
ati6990: you're willkommen Wink

All I did was doing a lot of HTTP GET requests on a public part of facebook.com, parsing the result and store them into files.
I want to make that clear before anyone (not from this community) is accusing me of "hacking" Facebook.
I'll write more about it in a later post.

Take a look at the picture in the attachment and guess when atom tweeted the link to this post. (that's megaBYTES per second)
Whoever caused the spike: nicely done and I hope you'll seed the torrent as long as you can!

Back to the directory...
The crawling was done during the first two weeks in December 2014.
I've also got the data from the pages and places but they are only included in the raw.tar.xz archive since I didn't processed them further.
The latin names where converted to lowercase before the sorting.
I didn't do that for the nonlatin names because I wasn't sure if it would break the UTF-8 characters.
If anyone knows more about it, please post it here.

Here are some commands to deal with the dataset.
For Windows users there is cygwin but I recommend you to take a look at a unix based OS.

Get rid of the count in the processed files (so you can use it as a dictionary)
Code:
$ cut -b9- first_names.txt > first_names.dic

Get a list of usernames and exclude the ones which are IDs
Code:
$ cut -d";" -f1 names/fbdata.* | awk '! /^[0-9]+$/' > usernames.txt
Repeat that for the non-latin, pages and places. The output should already be unique.


Attached Files
.png   traffic.png (Size: 11.31 KB / Downloads: 68)


Messages In This Thread
Facebook's directory - by hops - 02-22-2015, 02:11 AM
RE: Facebook's directory - by Szulik - 02-22-2015, 10:58 AM
RE: Facebook's directory - by Saint - 02-22-2015, 11:32 AM
RE: Facebook's directory - by ati6990 - 02-22-2015, 12:54 PM
RE: Facebook's directory - by Si2006 - 02-22-2015, 01:08 PM
RE: Facebook's directory - by hops - 02-22-2015, 04:18 PM
RE: Facebook's directory - by ati6990 - 02-22-2015, 05:02 PM
RE: Facebook's directory - by Si2006 - 02-22-2015, 05:08 PM
RE: Facebook's directory - by Saint - 02-22-2015, 08:11 PM
RE: Facebook's directory - by _NSAKEY - 02-22-2015, 08:59 PM
RE: Facebook's directory - by rockroland - 02-11-2017, 09:45 PM
RE: Facebook's directory - by NoxFish - 03-08-2017, 01:27 PM