Password Manager .rpf file analysis
Some days ago I was privately (via PMs) contacted by a hashcat user and they asked if I knew if hashcat could deal with database files from RoboForm (a seemingly well-known and often recommended password manager).

Of course, since I do not remember seeing it in the --help output or within the example hashes page, I said that it is probably not working with hashcat. It is not supported. I also didn't find much information about the format with simple web searches (besides the faq/marketing pages of the vendor itself). Therefore, at least the first web searches suggested that the format is proprietary and not well-known/analyzed yet (I didn't really perform extensive searches from that point on, maybe some analysis was already done and I just didn't find it).

I decided to try to see if I can find out myself what RoboForm is doing with the provided sensitive data (when it comes to the encrypted password storage).
Therefore, I've downloaded and installed the RoboForm portable software which was also used by the user asking for help with these .rpf files (they had a "Contact Info.rpf" file, the default file used by this password manager... from which you can determine the master password). By the way, it seems that RoboForm is trying to priorize/push the webextension alternative instead, which is just a browser extension for the most common browsers and this alternative/new version is meant to sync with their server all the time (I think there are some "offline" options, but its main idea is probably more like a "sync with all devices everywhere and we keep your passwords securely stored on our servers" password manager). I think that the main reason that RoboForm doesn't really maintain/prioritize the offline version (like the RoboForm portable version) anymore is also that modern browsers made it a little bit more complicated to integrate a native software with the modern webextension standard (for instance older browser extensions for firefox had more permissions/possibilities and they are more limited now)... but I could also be wrong here.

Anyway, I decided to analyze this .rpf file format (offline RoboForm Passcode file), even if it seems that it won't be the most used format by the RoboForm users in the future (the webextension version seems to use .json files for temporary "encrypted" master password storage), because the majority probably will use the browser extension directly and sync with the RoboForm servers.

This is an example of such a file I generated with the windows software:

[Image: roboform_contact_info.jpg]

As you can see immediately, it is kind of a weird format. It seems that there are 3 different parts/sections (URL3:ver3:, +PROTECTED-2+ and the final section after the 2 new lines).
This is also what I've immediately noticed: it's not a "binary" format and the number of distinct characters after the +PROTECTED-2+ is exactly 64. My first idea was that these are base64-encoded strings, but with non-standard alphabet.

It's also important to note that you can use the software to protect and "unprotect" the file and therefore you can somehow observe what is going on (e.g. to see what the main/raw data stored by these files is). For instance you can immediately see that if the file is protected it is suddenly much, much longer (which is kind of weird, because why should the encrypted data use thaaaaat much more bytes?).
I've also immediately noticed that there is kind of a line-length limit within the file of around 50 characters per line (starting with the PROTECTED line). It's also funny that whenever there is a combination of shorter lines the lines together are again 50 characters long (it seems that if there are additional new line characters within the string/line the output format doesn't really consider it, it will still add additional newline characters after exactly 50 characters). Therefore, it first seems to be weird that some lines are not of the same length, but if you think about this strict newline-after-50-character rule and that some lines already could contain newline characters before the string was written to disk, it kind of makes sense.                                             
Back to the base64 encoding: if you create some .rpf files (this is what I did but it turns out that it would be enough to have just one of them to discover this) and look at all the different characters used below the PROTECTED line, it turns out that the characters used are exactly 64 characters from 0x20 to 0x5f (ascii " " to "_"). This seems to be a convenient choice for the alphabet because you could just add +32 (0x20) to each raw byte to encode the data (if the characters are used in sequence). Therefore, there would be no need for a base64-character-lookup table etc. This was basically just an educated guess I made, I wasn't too sure about it yet. Therefore I tried to base64-decode the parts with this base64 encoding/decoding scheme.

It also turned out that only the final/third part of the file changes whenever you change the master password of the RoboForm file. The other parts "stay" the same.

Of course, by base64-decoding the parts I wasn't really able to see much (let alone the plaintext passwords)... after all the data/passwords had to be encrypted too of course!

I reviewed again what the options by the RoboForm software are and what the RoboForm web page says about the encryption and key derivation.

The default option by the native software (and description on their web sites too) seems to indicate that they are using AES encryption and the key derivation should be using pbkdf2-hmac according to the web page. Again, I just did go all-in with this little information and tested if I can come up with a working AES decryption key or something like this (remember I already had the - maybe-wrongly - base64 decoded encrypted data).

Long story short (I didn't plan to write a novel here after all!), I pretty quickly discovered (I admit I was pretty lucky after all! It normally doesn't work that easily) that there are some bytes (2 bytes!) within the first bytes of the base64 decoded data (the decoded version of the final part after the two new lines) that always match with parts of the pbkdf2-hmac-sha1 hash of the password using 1000 iterations and the first 8 bytes of the 3rd part of the file as the data part of the hmac. Why did I even consider using the first few bytes as input? Well, we all know that most AES modes of operation need some IV and my thought was that there is either a footer or a header where some "metadata" is stored for the encryption. Back to the pbkdf2-hmac-sha1: whenever I've tried to change the master password with the software, the now changed 2 bytes also matched the pbkdf2 output of the new password! That's kind of weird. I immediately started wondering why this was the case, why are there parts of what I thought to be the "encryption key" within the base64-decoded (and until then I thought this could also be the "encrypted") data ? That doesn't really make much sense.
BTW: at that time I already manually generated at least hundreds of different .rpf files (using different passwords and all saved to disk to "compare" them)! Therefore, it still took me a little bit, but the pattern with the 2 bytes that I've found was quite easy to discover given all these collected examples, because I knew from the RoboForm FAQ that pbkdf2-hmac was used and I also knew the passwords and base64-decoded data of all of these files (and of course hmac always needs 2 inputs, password and some data, but what could the data be ? Some data from the start was my guess and I was right!).

My conclusions (after some further investigation):
- the first 8 bytes of the base64 decoded 3rd part of the file seem to be random (something like an initialization vector was my idea)
- the next 2 bytes can be found within the pbkdf2-hmac-sha1 output of the password (the 33th and 34th bytes of pbkdf2-hmac-sha1)
- the next 10 bytes seem to be some kind of a file checksum (this took me a little bit, because it also seemed to be random data)
- the main data starts after the 8+2+10 (the bytes mentioned above)
- the main data is always/exactly 1024 bytes long (if the raw data is short, below this limit) ! (hmm, my thought: maybe the data is filled with random bytes, before or after the main data?)
- we have a total base64 decoded length (again this is the investigation of the 3rd part of the file) of 8+2+10+1024=1044 bytes

Fast forward again: with some further educated guesses and observations I've found out that the first 16 bytes of the pbkdf2-hmac-sha1 output are used for the AES encryption and the next 16 bytes are used as a "key" for the file checksum (now this all started to makes sense, because we now have 16 bytes for encryption/decryption, 16 bytes for calculating the checksum and 2 bytes that we already found within the "header" of the encrypted data). Now all bytes up to the 34th byte of the pbkdf2 output are used somewhere. Bingo!

I tested some modes of operation of AES and I was lucky to pretty quickly find out that the CTR mode is used (well, there aren't that many AES modes widely used after all and if you finally have found some non-random looking plaintext and your "note" that you have saved with the software is within the decrypted output, you can be sure that you used the correct key and AES block cipher mode of operation). This (AES-128-CTR) is actually an interesting choice... it means that the 16-byte ciphertext blocks themself do not depend strictly on each other, they are kind of independent. This probably also makes data forensic/data recovery much, much easier. Each 16-byte block is kind of independent (well, you still have the increasing counter, but that's it!), it's not like the CBC mode where one ciphertext block depends on the previous one and each change within the input bytes significantly changes all later blocks.
If you think about it, a hardware/encryption problem (bit flips with misbehaving RAM etc) could make the checksum invalid, but you could still recover some 16 byte blocks (and if you are lucky enough you could recover exactly the bytes/blocks that contain the password data).
By the way, the key used for AES is 16 bytes long (AES-128-CTR, 32 * 8 = 128) and therefore the IV should also be 16 bytes long... but we only have 8 bytes within our 8+2+10 header. It turns out that the IV used is just using zero bytes for the remaining 8 bytes (you could just append 8 NUL-bytes to the 8 bytes from the beginning of the decoded 3rd part of the file because the counter is using little endian anyway).

My assumption about the always-to-the-1024-byte-limit-filled data turned out to be true: the decrypted data had leading seemingly random data before the main data. It's also interesting that the random data does never contain NUL-bytes, except that its last byte is *always* a NUL-byte (which seems to separate the random data from the important data, the notes/passwords etc).
Therefore, the decrypted data (decrypted with the first 16 bytes of the pbkdf2-hmac-sha1 1000 iteration key of the password, with AES-CTR. The counter itself is using little-endian + RFC3686 incrementing, i.e. it starts with the IV + 1 as a counter for the first block and each 16 byte block after that uses an always by +1 incremented counter) has some kind of preceding garbage data (not containing, but ending with a NUL-byte).
My feelings were kind of two-fold when I discovered all of these details: it's kind of understandable that the random data was used to make it more difficult to misuse the checksum (explained below) to guess the sensitive password data etc... but all in all I already had the impression that all these concepts used here are mostly just obscuring the data, not really protecting/securing the data (security by obscurity?). I mean, why do you use such a weird proprietary file format? Why do you use a non-default base64 encoding, why do you really need to have some bytes of the key within the viewable (base64 decoded) output (well, I know that you can determine that the password is correct even without decrypting the data... but it's also kind of dangerous because also crackers can use this as an early reject check, even though with only 16 bits there could be "some" collisions)? Why do you not publicly mention the details about the format (such that for instance other password managers could easily import these files etc)?... all in all very strange format and many things do not really make sense from a security point of view (I instead got the impression that they want to hide/protect something... but as you can see, it's way to easy to find out what is going on just by looking at the bytes changed within the output and making educated guesses about the key derivation and encryption/decryption scheme).

What does this all mean for users that want to recover data from these files or recover the master password? Hashcat could probably easily add a new hash mode that does the pbkdf2-hmac-sha1 and check the 2 bytes (the 9th and 10th byte within the decoded form of the 3rd part of the file must match the 33th and 34th byte of the pbkdf2-hmac-sha1 output), if they match it could try to decrypt the last 16 byte block and see if some expected bytes (like a sequence of carriage return and line feeds) match or alternatively if the entropy is low enough such that we can assume the bytes were correctly decrypted with AES-128-CTR (default encryption setting of the software, there are others too, but I do not think they are widely used).

What happened with the original request of the user asking for help? Well, it seems that by the time I figured all out, they were already done (automatically!) testing several candidate passwords with something like autohotkey (only a few password candidates led to the discovery of the correct password). Therefore, it seems that the meaningfulness (and my motivation, hehe) for adding this scheme is pretty low currently. I also have no clue how many users use this password manager and use the offline version which uses these .rpf files. Well, if you want that hashcat support this, you should probably request it on github.

By the way, I've also found out later on how the 2nd part of the file is encrypted. It always uses DES (Data Encryption Standard) and a fixed hex key: 1206101300000000 (this file/data part does not depend on the password, therefore it doesn't change when the master password is changed, the key seems to be hard coded and can be easily found multiple times by binary grepping the executable file/DLL, I think you need to search for 1200060010001300 because of unicode). It seems weird to me that such an useless form of protection/encryption (that uses a static, hard coded and easily greppable key) is used at all. At least this part of the file doesn't seem to normally contain very sensitive data. Again, you need to first base64 decode the data after +PROTECTED-2+ up to the 2 newlines (all other newlines within the file that do not separate the 3 "parts" can/should just be removed/skipped/ignored).

The 10 byte checksum (after the 8 byte IV and 2 bytes of the 34 byte key derived by pbkdf2-hmac-sha1) within the (decoded) 3rd part of the file are just a truncated version (the first 10 bytes) of hmac sha1 of the following 1024 bytes (in most situations the stored data by the user doesn't exceed this threshold, I guess, otherwise it probably would be using all the remaining bytes) of raw data (not the encrypted bytes). The key/password for this "hash"/checksum is basically just the 16 bytes substring of the pbkdf2-hmac-sha1 of the password with offset 16 (remember: first 16 bytes of the 34 bytes are used for the AES encryption, the next 16 bytes are used by the checksum algorithm and the next 2 bytes also found within the 8+2+10 header for the "quick" password check).

The first part of the file (after URL3:ver3:) is also kind of base64 encoded (not exactly! See example below. It is also never encrypted), but it seems that it inverts all input bits (bitwise NOT?) and uses 0x40 as the ASCII-alphabet offset o.O (there are rumors that some people like obscurity a lot!). The original/decoded string of the first part is using unicode character encoding (i.e. according to hashcat it just uses a NUL-byte every 2nd byte, lol. Remember doing unicode correctly on GPU is hard/inefficient :( ), that's why also the encoded form of it has a very noticeable pattern. A full encoding example for 2 bytes (unicode) goes like this (for instance if the raw data starts with an "h" - ASCII: 0x6800 - like in "https://"):

  00000000 01101000 (0x00 0x68), now invert all bits (bitwise NOT?)

  11111111 10010111 (0xff 0x97), we have 2 times 8 = 16 bits

     1111  111110  010111 (make 18 out of 16 bits: 2 leading 0 bits)

   001111  111110  010111 (add 0x40 to each block)
 +1000000 1000000 1000000

  1001111 1111110 1010111 (0x4f 0x7e 0x57)

btw: this encoding also is very easy to understand by just changing a single input byte with the RoboForm software (this first part of the file contains the URLs, so by changing one character of the URL you can see which and how the output bytes change). You will immediately notice that there is this kind of mapping between the input byte code and the output byte code (a 1:1 mapping).
It's not exactly the same as a base64 encoding because here we always make 3 bytes out of every 2 bytes (the ratio for base64 is 4:3), but it is similar enough with the 6 bits (64 possibilities) to easily recognize this strange encoding.

Anyway, to recover the protected data or the master password, only the third part of the file is needed.

Of course I have some sample data of each step if somebody is interested, but I think with the details I've discovered and I've mentioned above it should be pretty easy to decrypt your own test .rpf files. Feel free to PM me if you need more details/examples.

Both password recovery and data recovery (in case of corrupted file systems etc) should be pretty easy. The most important thing you need to have if it comes to file corruption is the initialization vector (if this part is missing, you would have a hard time brute-forcing the random 8 bytes ;) ), otherwise the bytes should be easy to recover if you know the password (you could even attack the password part and the CTR counter, i.e. the number of the 16-byte block we are currently trying to decrypt... normally this number is also known!, ... used by the AES decryption algorithm if the data is heavily corrupted, but the 8 byte IV should be known, otherwise you won't easily recover the data).

What can we learn from all this? Probably not too much. Maybe just how easy it is to find out these "secrets"/formats/schemes and that it's most of the time counter-intuitive to roll your own schemes/formats/crypto etc (do not try to hide something and do not try to be more clever, it doesn't make sense because it is too easy to figure out what is going on. I'm assuming that the main reason for this weird format was to make it difficult for the users and alternative password managers to figure out what is going on, but I could be wrong). I'm also not too convinced that part of the key (2 bytes) should be stored within the file and that the iterations should be limited to 1000 and that they are not configurable (I didn't check if the browser extension or if other products of RoboForm use a different key derivation function with more secure/configurable parameters).

Just to be very clear: this is not really a vulnerability and this also doesn't imply that there is some major security risk, because the thousands of iterations of pbkdf2-hmac-sha1 still make a file decryption without knowing the password hard (if your master password is not easily guessable/crackable).
(06-06-2018, 06:47 AM)philsmd Wrote: RoboForm (a seemingly well-known and often recommended password manager).
I am doubtful. Never heard of it and obviously for good reasons.

However, thank you for the detailed analysis! Luckily they don't solely rely on obscurity… it makes it safer moving the passwords saved there to a more reputable password manager. Hopefully their (P)RNG for generating passwords is any good.