Convert a byte array from Encoding A to Encoding B

I have a pretty interesting topic - at least for me. Given a ByteArrayOutputStream with bytes for example in UTF-8, I need a function that can "translate" those bytes into another - new - ByteArrayOutputStream in for example UTF-16, or ASCII or you name it. My naive approach would have been to use a an InputStreamReader and give in the the desired encoding, but that didn't work because that'll read into a char[] and I can only write byte[] to the new BAOS.

public byte[] convertStream(Charset encoding) {
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);
    ByteArrayOutputStream converted = new ByteArrayOutputStream();

    int readCount;
    char[] buffer = new char[4096];
    while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1)
        converted.write(buffer, 0, readCount);

    return converted.toByteArray();
}

Now, this obviously doesn't work and I'm looking for a way to make this scenario possible, without building a String out of the byte[].

@Edit: Since it seems rather hard to read the obvious things. 1) raw: ByteArrayOutputStream containing bytes of a BINARY object sent to us from clients. The bytes usually come in UTF-8 as a part of a HTTP Message. 2) The goal here is to send this BINARY data forward to an internal System that's not flexible - well this is an internal System - and it accepts such attachments in UTF-16. I don't know why don't even ask, it does so.

So to justify my question: Is there a way to convert a byte array from Charset A to Charset B or encoding of your choise. Once again Building a String is NOT what I'm after.

Thank you and hope that clears up questionable parts :).

Jon Skeet
people
quotationmark

As mentioned in comments, I'd just convert to a string:

String text = new String(raw.toByteArray(), encoding);
byte[] utf8 = text.getBytes(StandardCharsets.UTF_8);

However, if that's not feasible (for some unspecified reason...) what you've got now is nearly there - you just need to add an OutputStreamWriter into the mix:

// Nothing here should throw IOException in reality - work out what you want to do.
public byte[] convertStream(Charset encoding) throws IOException {       
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);

    int readCount;
    char[] buffer = new char[4096];
    try (ByteArrayOutputStream converted = new ByteArrayOutputStream()) {
        try (Writer writer = new OutputStreamWriter(converted, StandardCharsets.UTF_8)) {
            while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1) {
                writer.write(buffer, 0, readCount);
            }
        }
        return converted.toByteArray();
    }
}

Note that you're still creating an extra temporary copy of the data in memory, admittedly in UTF-8 rather than UTF-16... but fundamentally this is hardly any more efficient than creating a string.

If memory efficiency is a particular concern, you could perform multiple passes in order to work out how many bytes will be required, create a byte array of the write length, and then adjust the code to write straight into that byte array.

people

See more on this question at Stackoverflow