Now, thinking about the padding attack... I think an alternative could work without revealing the whole encseed.
The thing is that the pre-sale algorithm uses padding (as already mentioned a couple of times above) and AES-128-CBC, see https://en.wikipedia.org/wiki/Block_ciph...ning_(CBC).
It might be possible to reveal only the last x blocks (each block is made up of 16 bytes) of encseed to test against certain bytes within the padding (the padding length for instance which is determined by the last bytes y and repeated for the last y bytes) even with CBC block cipher mode, but you need to xor the result of one block with the previous block (therefore at least 2+ blocks of encseed must be revealed). This could lead to some false positives, but it could nonetheless filter out a lot of "impossible passwords".
Fortunately the pre-sale algorithm uses pkcs7 padding (see https://github.com/ethereum/go-ethereum/...#L119-L138) which could be misused as a "test"/filter. As far as I know the padding length is kind of known (because we know the raw seed length).
I think this could be the only strategy to not reveal the whole encseed value, but still be able to filter out several unlikely/impossible passwords.
Attention: an attack like this would need the last 32 bytes (or slightly more) from the encseed, not the bytes at the very beginning
I didn't mention it above, but it is needless to say that this new algorithm also needs to be implemented in hashcat with a slightly different kernel function compared to the one currently in use.
It's also important to note that this type of padding attack is most efficient if you know exactly how long the padding is. For instance if you know that the seed length is always a multiple of 16, you know that you will have a padding of exactly 16 bytes, i.e. 0x10101010101010101010101010101010 (i.e. 16 times the value 16, 0x10 is 16 in decimal)
If the padding is just 1 byte (i.e. 0x01), the chance of false positives is very very high. If it is two bytes, i.e. 0x0202, there will be still a lot of false positives etc. The longer the padding (the range is from 1 to 16 bytes), the less false positives you need to expect.
This is why you would benefit if you know how long the raw seed length is; for instance you could only test for a single padding e.g 0x10101010101010101010101010101010 if you know that the raw (not encrypted) seed is always a multiple of 16.
update2: I just tried to investigate to see if there are known weaknesses in how the pre-sale algorithm generated the random seed.
I didn't really find answers by just searching for it, but I discovered this interesting change within the seed generation code commited a couple of years ago:
https://github.com/ethereum/pyethsaletoo...0913fee51e
If I'm interpreting this correctly, the seed was previously (before the patch) generated as a hex string and encrypted and again hex encoded (basically it was twice hex encoded, before and after the encryption). This would mean that all those long encseed values (for instance the ones with 1248 hex chars and therefore 624 bytes) have a very reduced charset (lower entropy per byte) if decrypted whenever the password is the correct one.
If this turns out to be true, one could attack this by decrypting the encseed and see if all the chars are within the range 0-9a-f (all hexadecimal values). This would actually be quite a scandal.
Unfortunately, I have no example to proof if this was really the case with the accounts with very long encseed. If someone could share a real world example (with password such that we can decrypt it) that would be very nice.
This attack of checking for low entropy could be accomplished (in contrast to the padding attack) by decrypting also the first (or last, before the padding... or any substring except padding) bytes.
This would also allow to not share the whole encseed and therefore without the risk of exposing the private key.
I'm not yet 100% convinced if an attack like this is possible because I have no examples, but the commit from above seems to indicate that this low-entropy-attack is possible for a small set of accounts with very long encseed values (which are twice hexed by accident).
update3: I've explained here https://github.com/ethereum/mist/issues/...-366007905 how I think the pre-sale website generated the wallet file and that it also used a reduced charset (!!! wtf?) which consists of the characters "abcdefghijklmnopqrstuvwxyz234567" see the app.min.js file linked on the issue. This could be used to filter out password candidates that decrypt the encseed to a seed that does not consist of those characters (like the method explained in the previous update 2).
It turns out that the web app was also on github and you can run the webserver locally and create your own wallet files with that code, I did some tests here: https://pastebin.com/raw/aXrn4t8d
The thing is that the pre-sale algorithm uses padding (as already mentioned a couple of times above) and AES-128-CBC, see https://en.wikipedia.org/wiki/Block_ciph...ning_(CBC).
It might be possible to reveal only the last x blocks (each block is made up of 16 bytes) of encseed to test against certain bytes within the padding (the padding length for instance which is determined by the last bytes y and repeated for the last y bytes) even with CBC block cipher mode, but you need to xor the result of one block with the previous block (therefore at least 2+ blocks of encseed must be revealed). This could lead to some false positives, but it could nonetheless filter out a lot of "impossible passwords".
Fortunately the pre-sale algorithm uses pkcs7 padding (see https://github.com/ethereum/go-ethereum/...#L119-L138) which could be misused as a "test"/filter. As far as I know the padding length is kind of known (because we know the raw seed length).
I think this could be the only strategy to not reveal the whole encseed value, but still be able to filter out several unlikely/impossible passwords.
Attention: an attack like this would need the last 32 bytes (or slightly more) from the encseed, not the bytes at the very beginning
I didn't mention it above, but it is needless to say that this new algorithm also needs to be implemented in hashcat with a slightly different kernel function compared to the one currently in use.
It's also important to note that this type of padding attack is most efficient if you know exactly how long the padding is. For instance if you know that the seed length is always a multiple of 16, you know that you will have a padding of exactly 16 bytes, i.e. 0x10101010101010101010101010101010 (i.e. 16 times the value 16, 0x10 is 16 in decimal)
If the padding is just 1 byte (i.e. 0x01), the chance of false positives is very very high. If it is two bytes, i.e. 0x0202, there will be still a lot of false positives etc. The longer the padding (the range is from 1 to 16 bytes), the less false positives you need to expect.
This is why you would benefit if you know how long the raw seed length is; for instance you could only test for a single padding e.g 0x10101010101010101010101010101010 if you know that the raw (not encrypted) seed is always a multiple of 16.
update2: I just tried to investigate to see if there are known weaknesses in how the pre-sale algorithm generated the random seed.
I didn't really find answers by just searching for it, but I discovered this interesting change within the seed generation code commited a couple of years ago:
https://github.com/ethereum/pyethsaletoo...0913fee51e
If I'm interpreting this correctly, the seed was previously (before the patch) generated as a hex string and encrypted and again hex encoded (basically it was twice hex encoded, before and after the encryption). This would mean that all those long encseed values (for instance the ones with 1248 hex chars and therefore 624 bytes) have a very reduced charset (lower entropy per byte) if decrypted whenever the password is the correct one.
If this turns out to be true, one could attack this by decrypting the encseed and see if all the chars are within the range 0-9a-f (all hexadecimal values). This would actually be quite a scandal.
Unfortunately, I have no example to proof if this was really the case with the accounts with very long encseed. If someone could share a real world example (with password such that we can decrypt it) that would be very nice.
This attack of checking for low entropy could be accomplished (in contrast to the padding attack) by decrypting also the first (or last, before the padding... or any substring except padding) bytes.
This would also allow to not share the whole encseed and therefore without the risk of exposing the private key.
I'm not yet 100% convinced if an attack like this is possible because I have no examples, but the commit from above seems to indicate that this low-entropy-attack is possible for a small set of accounts with very long encseed values (which are twice hexed by accident).
update3: I've explained here https://github.com/ethereum/mist/issues/...-366007905 how I think the pre-sale website generated the wallet file and that it also used a reduced charset (!!! wtf?) which consists of the characters "abcdefghijklmnopqrstuvwxyz234567" see the app.min.js file linked on the issue. This could be used to filter out password candidates that decrypt the encseed to a seed that does not consist of those characters (like the method explained in the previous update 2).
It turns out that the web app was also on github and you can run the webserver locally and create your own wallet files with that code, I did some tests here: https://pastebin.com/raw/aXrn4t8d