Language Character Sets
#1
atom is changing the way oclhashcat-plus deals with different character sets.


atom Wrote:My Idea is to drop the language specific charsets presets out of the binary and add them as charset files that can be used with the -1 parameter. That way we can produce lots of files, one for each language.

Sure its work, but its also quality work. So we should do it.


What is needed now is for native speakers of languages, other than English, to contribute, Spanish is already being worked on by rub3nCT and dudux.

So, if you would like your language special character set to be included please post your files here, following atom's instructions.

atom Wrote:What I need is it as file, can you please attach all the data as files? Please also make sure its the correct encoding.

Also make sure its not including ?l, ?d, ?s or ?u.

What the above means is remove...

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

...from your character set and supply only what is unique to your language.

Thanks.
Reply
#2
I could do Russian, but I'd like to know, which encoding should I use for the file itself, the native one(cp-1251) or something universal(utf-7/8)?
Reply
#3
definitely a native encoding. utf would defeat the purpose of this.
Reply
#4
Lithuanian:
Encoding Windows-1257
Not used characters: qwxQWX
addition to English chars: ąčęėįšųūžĄČĘĖĮŠŲŪŽ
Reply
#5
(02-06-2013, 08:03 AM)Rolf Wrote: I could do Russian,

Thank you very much for helping. Smile
Reply
#6
(02-06-2013, 11:25 AM)KT819GM Wrote: Lithuanian:
Encoding Windows-1257
Not used characters: qwxQWX
addition to English chars: ąčęėįšųūžĄČĘĖĮŠŲŪŽ

Thank you for helping. Smile

atom said he wants the special characters unique to your language in a file which is encoded properly. I have to admit I don't understand what he means by encoded properly but it looks as if epixoip might know.
Reply
#7
(02-06-2013, 11:25 AM)KT819GM Wrote: Lithuanian:
Encoding Windows-1257
Not used characters: qwxQWX
addition to English chars: ąčęėįšųūžĄČĘĖĮŠŲŪŽ

posting them to the forum will not work. they need to be in a file encoded as Windows-1257
Reply
#8
Lithuanian spec char's

Before posting thought about making proper file, but seems I cannot attach anything on forum. Now hope all correct, after extracting zip, on file will be only spec characters used in Lithuanian language, encoded in Windows-1257.
Reply
#9
Dar she is.
It's win-1251 and also, I've included "â„–" there for good reason - it's not a part of ?s, but it's present on Russian keyboards.
Okay.
If you don't like it, it's the first character, should be easy to get rid of.

Can't upload anything, so I'll do something unorthodox and post the file as hex:
Code:
B9A8C9D6D3CAC5CDC3D8D9C7D5DADDC6C4CBCED0CFC0C2DBD4DFD7D1CCC8D2DCC1DEB8E9F6F3EAE5EDE3F8F9E7F5FAFDE6E4EBEEF0EFE0E2FBF4FFF7F1ECE8F2FCE1FE

Now, I suspect that anyone with a hex editor may create that file.
Also, Russian alphabet has 33 characters, so it's â„– + 33 chars in uppercase + 33 chars in lowercase.

'course, when(if) Atom fixes the upload of files, I'll provide you with the charset file.
The proper way of doing things.
Reply
#10
@KT819GM
@Rolf

Wow, thank you very much guys for your contribution ! Smile

I am sorry about the attachments problem I don't know whats wrong, atom might read this thread and fix it. Please keep hold of your files and as soon as the Trac is working again I will give you guys a link to join a Trac ticket where you can upload files.

I really appreciate your help.

I know Spanish is on the way and I think I know someone I can ask for a French language file.
Reply