11-22-2011, 03:56 AM
"Finding and creating effective input dictionaries is a non-trivial problem."
From Matt Weir's "Using Probabilistic Techniques to aid in Password Cracking Attacks" found at his tools site, under Presentations and Papers
http://sites.google.com/site/reusablesec/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Some URLs from my notes:
http://en.wikipedia.org/wiki/Corpus_of_American_English
http://corpus.byu.edu/coca/wordfreq.asp?s=y
http://www.wordfrequency.info/
http://googleresearch.blogspot.com/2006/...o-you.html
http://googlesystem.blogspot.com/2010/12...iewer.html
http://ngrams.googlelabs.com/datasets
http://www.english-for-students.com/Words-List.html
http://en.wikipedia.org/wiki/American_National_Corpus
http://www.americannationalcorpus.org/frequency.html
http://www.anc.org/MASC/Download.html
http://en.wikipedia.org/wiki/British_National_Corpus
http://ucrel.lancs.ac.uk/bncfreq/flists.html
http://www.kilgarriff.co.uk/bnc-readme.html
http://www.natcorp.ox.ac.uk/corpus/index...D=products
http://faculty.washington.edu/dillon/Gra...ml#wintree
http://courses.washington.edu/englhtml/e...ources.htm
http://www.ota.ox.ac.uk/catalogue/index-id.html
http://wwwm.coventry.ac.uk/researchnet/B.../BAWE.aspx
http://conc.lextutor.ca/tuples
http://pie.usna.edu/
http://web-ngram.research.microsoft.com/info/
http://en.wiktionary.org/wiki/Wiktionary...ency_lists
http://www.pitt.edu/~naraehan/ling2050/r...rpora.html
http://xaira.sourceforge.net/
http://www.oucs.ox.ac.uk/rts/xaira/
http://www.americannationalcorpus.org/xaira.html
http://www.webcorp.org.uk/guide/howworks.html
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Unlike the linguists, all we want are the base words, which may be more accessible from frequency lists.
Full blown corpora tools may be need to extract those words from formal corpora.
From Matt Weir's "Using Probabilistic Techniques to aid in Password Cracking Attacks" found at his tools site, under Presentations and Papers
http://sites.google.com/site/reusablesec/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Some URLs from my notes:
http://en.wikipedia.org/wiki/Corpus_of_American_English
http://corpus.byu.edu/coca/wordfreq.asp?s=y
http://www.wordfrequency.info/
http://googleresearch.blogspot.com/2006/...o-you.html
http://googlesystem.blogspot.com/2010/12...iewer.html
http://ngrams.googlelabs.com/datasets
http://www.english-for-students.com/Words-List.html
http://en.wikipedia.org/wiki/American_National_Corpus
http://www.americannationalcorpus.org/frequency.html
http://www.anc.org/MASC/Download.html
http://en.wikipedia.org/wiki/British_National_Corpus
http://ucrel.lancs.ac.uk/bncfreq/flists.html
http://www.kilgarriff.co.uk/bnc-readme.html
http://www.natcorp.ox.ac.uk/corpus/index...D=products
http://faculty.washington.edu/dillon/Gra...ml#wintree
http://courses.washington.edu/englhtml/e...ources.htm
http://www.ota.ox.ac.uk/catalogue/index-id.html
http://wwwm.coventry.ac.uk/researchnet/B.../BAWE.aspx
http://conc.lextutor.ca/tuples
http://pie.usna.edu/
http://web-ngram.research.microsoft.com/info/
http://en.wiktionary.org/wiki/Wiktionary...ency_lists
http://www.pitt.edu/~naraehan/ling2050/r...rpora.html
http://xaira.sourceforge.net/
http://www.oucs.ox.ac.uk/rts/xaira/
http://www.americannationalcorpus.org/xaira.html
http://www.webcorp.org.uk/guide/howworks.html
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Unlike the linguists, all we want are the base words, which may be more accessible from frequency lists.
Full blown corpora tools may be need to extract those words from formal corpora.