I'm parsing XML using Hadoop, and I got the code from here.
But I'm getting the following error:
FINISH_TIME="1385387129970" HOSTNAME="DEV140" ERROR="java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[18,3] Message: Invalid byte 1 of 1-byte UTF-8 sequence.
But my XML is encoded with UTF-8 only . So how can I handle it?
I suspect this is the problem - it's at least a problem:
XMLStreamReader reader =
XMLInputFactory.newInstance().createXMLStreamReader(new
ByteArrayInputStream(document.getBytes()));
That call to getBytes
will use the platform default encoding, rather than UTF-8.
You could specify "utf-8"
as the encoding name - but it would be simpler to create a StringReader
:
XMLStreamReader reader = XMLInputFactory.newInstance()
.createXMLStreamReader(new StringReader(document));
Of course that may not be the only error, but it's at least something to look at.
See more on this question at Stackoverflow