foreign words and --remove
#2
for question number 1 it depends on the type of hashes and also on your hardware (SSD, RAM, CPU) etc. In my experience, you shouldn't bother too much about removing hashes if you do not crack (and therefore need to remove) at least > 15 % of the hashes. I mean, if hashcat only removes some hundreds of hashes of a multi-million hash list, the remove wouldn't change much at all (the change is negligible and the time spend to remove this < 1% of the hashes might be much higher than the time needed to load the hashes).
The --remove has most importantly an impact on the initialization/startup phase when the hashes are being loaded. Hashcat will anyway remove (and therefore ignore) all the hashes that are already present in the pot file (if you do not disable potfile support with --potfile-disable). The cracking speed will almost exactly be the same as if you did not load the hashes at all (i.e. as if the hashes were already removed externally or with --remove on the previous run). Therefore, the main impact is on the first few seconds when hashes need to be parsed and checked (but as I already mentioned above, this will be negligible if you remove only very few hashes from a multi-million hash list).

For what regards question number 2, it depends on the encoding. hashcat (and actually the hashing algorithms) work on a byte-by-byte level. That means that the character ñ could be encoded with different character encodings. If it is encoded in utf8, it will use at least 2 bytes:
Code:
echo -n ñ | xxd -p
c3b1

Masks also work on a byte-by-byte level. Therefore, this character (if utf8 was the encoding that the input was provided to the original hash generation algorithm) would need a mask of length 2 just to crack "one character": for instance a mask of ?b?b (but of course you can use a much more specific mask of length 2).

On the other hand, if a different encoding was used (attention you need to generate it yourself, since the forum software will always convert it to utf8) for instance this (ISO-8859-1):
Code:
echo -en "\xf1"
ñ
only a mask of length 1 is needed, since hex 0xf1 is just one byte long (compared to 0xc3b1 which is 2 bytes long for utf8).

Yeah, character encoding is kind of difficult (also for some experienced programers/"experts"). The good thing is that a lot of hashes out there just use utf8, so you do not need to waste too much time to find the correct character encoding... but the problem is, that utf8/utf16/utf32 etc are able to use multi-byte characters and therefore the masks need to reflect this by using the correct length (e.g. 2 vs 1 in the example above).


Messages In This Thread
foreign words and --remove - by rsberzerker - 08-09-2017, 04:56 AM
RE: foreign words and --remove - by philsmd - 08-09-2017, 08:57 AM
RE: foreign words and --remove - by epixoip - 08-09-2017, 07:25 PM
RE: foreign words and --remove - by atom - 08-10-2017, 04:06 PM