Convert Windows 1252 xml file to UTF 8

Is there any approach to convert large XML file(500+MBs) from 'Windows-1252' encoding to 'UTF-8' encoding in java?

Jon Skeet
people
quotationmark

Sure:

  • Open a FileInputStream wrapped in an InputStreamReader with the Windows-1252 for the input
  • Open a FileOutputStream wrapped in an OutputStreamWriter with the UTF-8 encoding for the output
  • Create a buffer char array (e.g. 16K)
  • Repeatedly read into the array and write however much has been written:

    char[] buffer = new char[16 * 1024];
    int charsRead;
    while ((charsRead = input.read(buffer)) > 0) {
        output.write(buffer, 0, charsRead);
    }
    
  • Don't forget to close the output afterwards! (Otherwise there could be buffered data which never gets written to disk.)

Note that as it's XML, you may well need to manually change the XML declaration as well, as it should be specifying that it's in Windows-1252...

The fact that this works on a streaming basis means you don't need to worry about the size of the file - it only reads up to 16K characters in memory at a time.

people

See more on this question at Stackoverflow