Strange behaviour of Convert.FromBase64String

Why does the following code

var s =  "2I==";
var b = Convert.FromBase64String(s);
var new_s = Convert.ToBase64String(b);

end up with new_s being 2A==?

s was originally a longer string (96 chars) but I couldn't include it because it is a secret key.

Jon Skeet
people
quotationmark

"2I==" represents the numbers 54, 8, (padding x2), as per Wikipedia.

In other words, the bits represented are:

110110 000100 XXXXXX XXXXXX

(Where X represents an "I don't care, it's from padding")

However, because the padding indicates that there's only one byte of information here, the last 4 bits from the second character are irrelevant. As ever, we can reformat the 4 pieces of 6-bit information into 3 pieces of 8-bit information, at which point it becomes clearer:

11011000 0100XXXX XXXXXXXX

You can see that the second byte must be padding, as some of its bits come from a padding character. So only the first character and the top two bits of the second character are relevant - it decodes to just the single byte 0b11011000.

Now when you encode 0b11011000, you know that you'll have two padding characters, and the first character must be '2' (to represent bits '110110') but the second character can be any character whose first two bits represent '00'. It just happens that Convert.ToBase64String uses 'A', which has 0 bits for the irrelevant parts.

The question in my mind is why an encoder would choose to use 'I' instead of 'A'. I don't think it's invalid to do this in Base64, but it's an odd choice.

people

See more on this question at Stackoverflow