10-18-2024, 01:08 PM
Wow, please don't do it this way. No need to hammer Wikipedia's site - it's slow and non-productive.
Wikipedia has dumps here: https://dumps.wikimedia.org . Just parse them and extract the words. The nice thing here is you can also pull article versions and catch mistypings and other interesting stuff.
Back in 2011 I wrote a similar script, which parses the dump on the fly and puts the wordlist in sqlite DB. It's python2 and written for targeted case, but can be easily changed to work. Check it out here: https://sec.stanev.org/?download
If there is an interest, I can shape it a bit.
Wikipedia has dumps here: https://dumps.wikimedia.org . Just parse them and extract the words. The nice thing here is you can also pull article versions and catch mistypings and other interesting stuff.
Back in 2011 I wrote a similar script, which parses the dump on the fly and puts the wordlist in sqlite DB. It's python2 and written for targeted case, but can be easily changed to work. Check it out here: https://sec.stanev.org/?download
If there is an interest, I can shape it a bit.