Copy and reuse dictionary cache
#1
Hi,

I have a dictionary of size 100GB. Every time I run hashcat I need to wait at least 15 mins to build the cache of this dictionary.
How can I copy this cache from machine to machine to avoid building the cache every time?

Where is the path of this cache?

Kind Regards.
Reply
#2
That's ... a big wordlist. This isn't a direct answer to your question, but you might consider:

- Splitting your dictionary into multiple chunks, using the `split` command on Unix-likes
- If the wordlist is a mashup from multiple sources, considering running them individually
~
Reply
#3
There are even bigger word-lists out there. This is my own optimized mashup.
I already did that split it to 4 parts, but still slow.

You completely ignored my question, where is this cache stored? Or maybe it's not stored, it's just in memory?

There must be a solution for this real problem.
Reply
#4
My having explicitly said "This isn't a direct answer to your question" isn't exactly "completely ignoring" your question, yes?

The canonical solution to this problem is to not do what you're doing. Just because there are lists bigger than 100GB doesn't mean that it's a good practice. This may not be the advice you're looking for, but it may be the advice you need. Smile

Mashing up all of your lists into a single list is rarely necessary, and has no inherent efficiency gain. Multiple lists can be specified, or an entire directory name can be specified, on the hashcat command line.

If the purpose of your 100GB wordlist was deduplication, it is not necessary to do this via a single massive wordlist (and less efficient than the alternatives, such as using rli from the hashcat-utils suite to deduplicate across multiple wordlists)

If the purpose of your 100GB wordlist is to optimize attack order, simply split the file into smaller chunks, and supply them to hashcat in order on the command line. The end result will be identical, but the dictionary cache building cost will be distributed across the number of chunks. If the wait time is larger than desired, increase the number of chunks.

But if you wish to persist in mashing up your wordlists, I'm not aware of a way to automatically distribute dictionary caches across installations. You could experiment with copying the file yourself, but I'm not sure how effective that will be.

On Linux, the dictstat2 file is in ~/.hashcat/. Wherever the default Windows hashcat directory is, that's where it will be on Windows.
~
Reply
#5
Thanks for replying and answering my question. I appreciate your help Smile

Yes my purpose is de-duplication and then splitting, but even with 4 splits it's still slow, so I need to make more chunks as you said.

I wasn't aware of rli tool, I used to use sort command with -u -m parameters to remove duplicates. So rli will be more efficient because I don't need to rewrite the whole file as sort command does, right?

Regarding the cache file, dictstat2, if I use the same machine (same hardware always) will be fine?
I think you mean that you are worried about the caching mechanism, that maybe it's affected by hardware type?

Thanks again!
Reply
#6
I forgot to tell you that I use sort --parallel= in order to utilize all cpu cores and make the operation faster.
Is rli multi-threaded?
Reply
#7
rli is for deduplication *across files* - see this example: https://hashcat.net/wiki/doku.php?id=hashcat_utils#rli

If you use 'split', you don't have to re-sort. Just use 'split' to take your existing 4 files and split each of them into 2 or 3 pieces.

I'm not sure if dictionary stats change based on hardware. But I do know (just from experience) that they have to be updated depending on the attack type. So the same dictionary may need to be updated more than once, if the attack changes.
~
Reply
#8
Thanks royce!
Which is faster rli or rli2?
Reply
#9
rli2 is definitely faster - once you've paid the initial cost of the sorting of the input files first. but it only takes one file to be removed as input.

there's also a new project 'rling' in progress (https://github.com/Cynosureprime/rling) that has some knobs to customize trade-offs. it's already useful, but still actively being debugged and modified, so YMMV, and support for it is probably off-topic here.
~
Reply
#10
It's unfortunate the rli2 doesn't support multiple files. A script needs to be made then.

Regarding the topic of this thread, I found that there's an option:

Code:
--markov-hcstat2 | File | Specify hcstat2 file to use | --markov-hcstat2=my.hcstat2

I don't understand what does this option do and if it's related to caching the dictionary.

------------------------------------------

Also, one question regarding split please.
I use the following command to split without breaking lines but it still breaks lines and files are even not equal or close in size.

Code:
# split file into 3 chunks

split -d -n l/3 wpa.txt wpa


I'm using debian 10 and the docs says:
Code:
l/N    split into N files without splitting lines/records


Code:
Usage: split [OPTION]... [FILE [PREFIX]]
Output pieces of FILE to PREFIXaa, PREFIXab, ...;
default size is 1000 lines, and default PREFIX is 'x'.

With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N  generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE  put at most SIZE bytes of records per output file
  -d                      use numeric suffixes starting at 0, not alphabetic
      --numeric-suffixes[=FROM]  same as -d, but allow setting the start value
  -x                      use hex suffixes starting at 0, not alphabetic
      --hex-suffixes[=FROM]  same as -x, but allow setting the start value
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines/records per output file
  -n, --number=CHUNKS    generate CHUNKS output files; see explanation below
  -t, --separator=SEP    use SEP instead of newline as the record separator;
                            '\0' (zero) specifies the NUL character
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose          print a diagnostic just before each
                            output file is opened
      --help    display this help and exit
      --version  output version information and exit

The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).

CHUNKS may be:
  N      split into N files based on size of input
  K/N    output Kth of N to stdout
  l/N    split into N files without splitting lines/records
  l/K/N  output Kth of N to stdout without splitting lines/records
  r/N    like 'l' but use round robin distribution
  r/K/N  likewise but only output Kth of N to stdout
Reply