> user passwords are saved in safe form using hash_hmac and sha256 algo with salt and pepper, all md5() passwords are deleted
Wait, what? Definitely lesson not learned:
- sha256 is not the proper way to store passwords, it's still vulnerable to the same attack as md5, rainbow tables, because it's a FAST algorithm (sure md5 is also poor for collisions, meaning it's worse, but practical attacks for lists of hashed passwords are rainbow tables). At least with salt+pepper it limits attack surface, but instead you should be hashing your passwords with:
- `password_hash`[1] should be used instead of `hash_hmac`, with the algorithm being "PASSWORD_BCRYPT" or better[2]. This is a slow hashing method, meaning anyone trying to rainbow-table attack your passwords will have a hard time.
- A common technique is not to delete old passwords, but instead to rehash them with the new algorithm. This would be useful for moving from sha256 => bcrypt, since collisions on sha256 are not practical, but if the original hash was md5 then I think it's fine to delete the md5 passwords and require a new one. Good luck to those who changed their email in 15 years though.
[2] I haven't followed the space too closely for 3-4 years, I'm not sure if bcrypt/blowfish is still the recommended algorithm or there's newer better ones
Defense against rainbow tables is obtained via salt, not via slow hashes. A rainbow table is a space-time tradeoff (you give space, and you get time), so using a slow hash only "encourages" (for lack of me knowing a better word) creating rainbow tables.
Adding long salts on the other hand requires the attacker to create an infeasible number of rainbow tables (one for each possible value of the salt).
Technically this class of trades also incurs a lot more hashing (and so the expense of the hash function is important), the trade is valuable because you get to do all the work once, up front, and not spend that time on any particular password hash once you have it.
So e.g. computing a rainbow table for all possible Windows 2000 passwords (up to 14 characters but you're actually only doing 56-bit inputs) in the LANMAN scheme took ages, and produced a fairly large file, but having done so that's it, you, or anyone you give the file to, can reverse LANMAN hashes into working passwords almost instantly.
(Microsoft's LANMAN hash lacked salt, and is stupid in various other ways, MD5 would actually have been a better choice than LANMAN, because you allow arbitrary input passwords and thus good passwords are stronger instead of impossible even though MD5 is much too fast for a good password hash)
Nobody is saying bcrypt isn't a good choice here, they're saying that rainbow tables (and all time-space trades) are made infeasible by salt regardless of whether you're using a good password hash like bcrypt.
But both you & GGP are talking about bcrypt as though it was only a password hash. If someone says, "I'm using bcrypt", then they are using both a password hash and unique per-password salts, or they're not using bcrypt. What makes bcrypt and other such systems nice (and makes this kind of mistake basically inexcusable in 2022) is that if you're using a library or package which implements it (which you should), you don't need to think about salts; you just call GenerateFromPassword(<new_password>) and CompareHashAndPassword(<entered_password>, <stored_hash>) and forget about it.
EDIT: I mean, I understand maybe why in 2006 you used md5 without salt. But a few years ago I when I tossed together my first webapp (a scheduling system for the community I'm involved in), I just googled "password hash" and immediately bcrypt came up as a recommendation; there was a package in my target programming language, so after half an hour of research and 5 minutes of programming I was done. I don't understand how the opensubtitles team, after having had their password database compromised, came up with "use sha256 without salt" instead of "use bcrypt or one of the many libraries which takes care of all of that for you".
I mean, sure, but, notice that you're still way behind state of the art for the end of the 20th century. I blame Tim (Berners-Lee, the man whose toy hypermedia system we're all stuck with because it became popular)
You note that when you googled "Password hash" you got pointed to a decent password hash. But, knowing what questions to ask is half the battle. Too many people figured hey, I should use a cryptographic hash and got pointed to MD5 (or SHA1 or even SHA256) because that's what those are.
The words you should have googled weren't "password hash" but maybe "web authentication" and then the answer is clearly you shouldn't use passwords or any sort of shared secret.
You can have a copy of the authentication database for the toys system I maintain, but it wouldn't help you sign into it because all the information in it is public, a trick we've known how to do in principle for decades, and which works today, on the public Web, with readily available devices (e.g. my phone) and yet, here we are on Hacker News discussing which password hash is best like it's still 1985.
> with readily available devices (e.g. my phone) and yet
I'm pretty sure it wasn't anywhere near ready to use five years ago when I wrote v1 of my webapp. And googling again, it's still not clear whether it would work on whatever random software setup some of my zealous-for-software-freedom colleagues use. With passwords I don't need to worry: if they can render the HTML+CSS (even the JS is optional, because some of my colleagues prefer NoScript), they can log in.
True, I might be mixing names. What I mean is that now with fast hashes it's feasible to take for each user and compute a table with the 1,000,000 most common passwords and their specific salt, and repeat for all the users. With a modern homemade GPU rigs making billions[1] of computations per second, you are testing thousands of users per second with a 1M word dictionary, so you are going to find matches.
The salt can be stored directly in front of the hash in the same string in the DB (a lot of crypto hash functions will output this). It can be plaintext since the goal is to add a random component so rainbow tables wouldn't be possible since there's always more to the string being hashed. That's where it becomes a time problem.
Yeah, you could rebuild a rainbow table yourself u til you find the collision, but you have to search every bit of the potential hash space.
> I'm not sure if bcrypt/blowfish is still the recommended algorithm or there's newer better ones
While bcrypt is already much, much better than using MD5 or SHA256, the best practice is to use Argon2.
In practice, it is more important to use a hashing algorithm that is designed for passwords (e.g. bcrypt, argon2, scrypt) than it is to choose the best one. As this breach shows, many sites are still using insecure hashes, like MD5, because it worked 15 years ago.
I haven't had to write any password storage code since before Argon2 won the 2015 competition so I haven't gone deep on the "side channel attack vs. memory hardness" tradeoff question when picking the mode. I am curious which mode others are using.
Hybrid (Argon2id) is probably the safest bet if you are unsure.
The specific modes are useful dependent on the threat model you are protecting against. Argon2i protects better against timing/side-channel attacks and Argon2d protects better against brute-forcing attacks.
In my opinion, a developer should (in most cases) focus on application security instead of picking the perfect hashing algorithm, as application security is far more crucial.
A properly implemented use of password_hash() would also allow them to use the same field and code for different algorithms over time.
What I mean is, the stored data contains which algorithms it is. So they can in their code or configuration change which algorithms to use and how many times it should hash. Then on login they can verify the password against the hash and also check if the stored hash needs to be rehashed against the current set settings, then it can create a new hash from the password the user entered on login and store that in the database.
Then you get automatic hash upgrades to match the current settings of the hashing of the passwords on the site with basically no user interaction (other than the act of logging in to have the password in plain text).
Or how about "we moved to an open source library or third party auth provider to host all our user data. We learned we didn't want to be responsible for such a critical security area."
There are lots and lots of options. Seems like they are a php shop, so even moving to any of these open source PHP server libraries[0] would have been a better move.
Disclosure: I work for a third party auth provider, FusionAuth.
Wait, what? Definitely lesson not learned:
- sha256 is not the proper way to store passwords, it's still vulnerable to the same attack as md5, rainbow tables, because it's a FAST algorithm (sure md5 is also poor for collisions, meaning it's worse, but practical attacks for lists of hashed passwords are rainbow tables). At least with salt+pepper it limits attack surface, but instead you should be hashing your passwords with:
- `password_hash`[1] should be used instead of `hash_hmac`, with the algorithm being "PASSWORD_BCRYPT" or better[2]. This is a slow hashing method, meaning anyone trying to rainbow-table attack your passwords will have a hard time.
- A common technique is not to delete old passwords, but instead to rehash them with the new algorithm. This would be useful for moving from sha256 => bcrypt, since collisions on sha256 are not practical, but if the original hash was md5 then I think it's fine to delete the md5 passwords and require a new one. Good luck to those who changed their email in 15 years though.
[1] https://www.php.net/manual/en/function.password-hash.php
[2] I haven't followed the space too closely for 3-4 years, I'm not sure if bcrypt/blowfish is still the recommended algorithm or there's newer better ones