Wordlist optimisation based on ruleset
#3
Thank you for you're reply, rurasort looks interesting.


Am still thinking on how i would make this and not make it painfully slow. Might be really complicated and not worth it.

EDIT:
Ouch it's only reading at 2-3MB/s hitting 100% on a single core. Ill see if i can't improve on this if i do ill post it here.

EDIT2:

Am really bad with keeping up with forum posts. So am just gonna post the rough piece of code

rurasort.py --digit-trim --special-trim --lower 34.844s 10M lines
code below doing the same thing 2.503s for 10M lines

Code:
import multiprocessing as mp,os
path = "/mnt/NVMe/wordlist_10M.txt"
cores = 8
def process(line):
    newstring = line.lstrip('1,2,3,4,5,6,7,8,9,0')
    newstring = newstring.rstrip('1,2,3,4,5,6,7,8,9,0')
    newstring = newstring.lstrip("!\"#$%&'()*+,-./:;?@[\]^_`{|}~")
    newstring = newstring.rstrip("!\"#$%&'()*+,-./:;?@[\]^_`{|}~")
    print newstring.lower()


def process_wrapper(chunkStart, chunkSize):
    with open(path) as f:
        f.seek(chunkStart)
        lines = f.read(chunkSize).splitlines()
        for line in lines:
            process(line)

def chunkify(fname,size=1024*1024):
    fileEnd = os.path.getsize(fname)
    with open(fname,'r') as f:
        chunkEnd = f.tell()
        while True:
            chunkStart = chunkEnd
            f.seek(size,1)
            f.readline()
            chunkEnd = f.tell()
            yield chunkStart, chunkEnd - chunkStart
            if chunkEnd > fileEnd:
                break

#init objects
pool = mp.Pool(cores)
jobs = []

#create jobs
for chunkStart,chunkSize in chunkify(path):
    jobs.append( pool.apply_async(process_wrapper,(chunkStart,chunkSize)) )

#wait for all jobs to finish
for job in jobs:
    job.get()

#clean up
pool.close()
Reply


Messages In This Thread
RE: Wordlist optimisation based on ruleset - by eddie4 - 11-06-2019, 01:37 AM