I'm pulling JSON data through a REST API using Requests in Python. Unfortunately, one of the fields contains all sorts of unescaped and control characters that breaks the JSON.
I don't control the data, but I can request it undecoded as a string that the application stores as a Java byte array.
For example: [B@1cf3bd82
The question is how do I decode the string back into the original UTF-8 text as I'm working through the JSON? All of the examples I've found seem to work with a byte object, not a encoded string.
Thoughts?
You're currently printing out the result of calling toString()
on the byte[]
. That's never a good idea - arrays don't override toString()
.
You should use the new String(byte[], Charset)
constructor:
String text = new String(bytes, StandardCharsets.UTF_8);
It's not entirely clear to me from the question where what is happening in terms of the data, but basically you need to modify the Java code - any Python code is probably irrelevant here.
See more on this question at Stackoverflow