Implications of storing a hashed password as an UTF8 string?

I found the following code that is used to hash a password before storing it in an MSSQL database (the column is of type NVARCHAR).

string HashPassword(string password)
{
    var encoding = Encoding.UTF8,
    var plainBytes = encoding.GetBytes(password);
    var hashedBytes = MD5.Create().ComputeHash(plainBytes);
    return encoding.GetString(hashedBytes); //<-- Bad practice?
}

At first I thought it was really strange to try and store random bytes as an UTF8 string and that I should change this to Base64 encoding. But are there any real implications of doing it this way other than bad practice?

And also; if anyone would get a hold of the database doesn't this mean that it would be impossible to use a rainbow table or similar to try and brute reverse the hashes since the original bytes are lost?

Jon Skeet
people
quotationmark

You're weakening the security by reducing the number of possible strings that will be encoded. Any time your hash ends up being an invalid UTF-8 sequence, you'll end up with U+FFFD as the output character (the Unicode "replacement" character). That means multiple hashes end up with the same string:

using System;
using System.Text;

class Program
{
    static void Main(string[] args)
    {
        byte[] hash1 = FillBytes(128);
        byte[] hash2 = FillBytes(129);
        string text1 = Encoding.UTF8.GetString(hash1);
        string text2 = Encoding.UTF8.GetString(hash2);
        Console.WriteLine(text1 == text2);
    }

    static byte[] FillBytes(byte data)
    {
        byte[] bytes = new byte[16];
        for (int i = 0; i < bytes.Length; i++)
        {
            bytes[i] = data;
        }
        return bytes;
    }
}

It's also possible that the text returned by GetString won't be correctly stored in SQL Server, depending on how you've got it configured. (If the field is set up so that it can store anything in Unicode, that part is okay.) If it is losing data, that's even worse - the stored correct hash won't match the computed correct hash, so someone typing in the right password will still be denied access. As I say, this may not be a problem - but you haven't given us enough information to say for sure, so it's at least worth considering. This part wouldn't be a problem if you used Base64 or hex, both of which end up with ASCII data.

Using MD5 to hash a password is a bad idea to start with - weakening it still further with a lossy text transformation is worse. It makes it significantly easier for an attacker to find an incorrect password that still ends up with the same text.

I would suggest:

  • You use a more secure hashing approach (e.g. bcrypt or PBKDF2) - see Jeff Atwood's blog post for more details (and read a security book for more still)
  • To store the hash, either use a blob (store the bytes directly) or convert to base64 or hex in order to preserve the full information.

people

See more on this question at Stackoverflow