I'm not a Java developer but just have to understand what some Java is doing.
I have some code IOUtils.copy(InputStream a, Writer b, "ISO-8859-1")
or words to that effect
The docs for this method say that "inputEncoding - the encoding to use for the input stream, null means platform default".
As I understand it this is just saying that a is expected to be in ISO-8859-1. It is not doing any kind of conversion? What is the significance of this. What would happen if the inputstream was encoded as UTF-8?
As I understand it this is just saying that a is expected to be in ISO-8859-1.
Well, it's expecting the data in the stream that a
refers to to be textual data encoded in ISO-8859-1.
It is not doing any kind of conversion?
Yes it is. It's interpreting the binary data read from the stream as text in the given encoding, and then writing that textual data to the writer. (At least, that's what I assume given the method name.)
What would happen if the inputstream was encoded as UTF-8?
If the data were actually textual data encoded in UTF-8, then bogus data would be written to the writer (b
). Each byte would be converted to a character (as ISO-8859-1 has one byte per character) and then that character would be written to the writer. If the data contained a character which was encoded into UTF-8 as multiple bytes, the writer would receive multiple characters for that single original character.
Basically, if you get the wrong encoding, the data can easily be garbled. It's like trying to play a WAV file as if it were an MP3 file - except without the safeguards which make it obviously broken in that case...
See more on this question at Stackoverflow