Unexpected output with RandomAccessFile

I'm trying to learn about RandomAccessFile but after creating a test program I'm getting some bizarre output.

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;

public class RandomAccessFileTest
{
    public static void main(String[] args) throws IOException
    {
        // Create a new blank file
        File file = new File("RandomAccessFileTest.txt");
        file.createNewFile();

        // Open the file in read/write mode
        RandomAccessFile randomfile = new RandomAccessFile(file, "rw");

        // Write stuff
        randomfile.write("Hello World".getBytes());

        // Go to a location
        randomfile.seek(0);

        // Get the pointer to that location
        long pointer = randomfile.getFilePointer();
        System.out.println("location: " + pointer);

        // Read a char (two bytes?)
        char letter = randomfile.readChar();
        System.out.println("character: " + letter);

        randomfile.close();
    }
}

This program prints out

location: 0

character: ?

Turns out that the value of letter was '䡥' when it should be 'H'.

I've found a question similar to this, and apparently this is caused by reading one byte instead of two, but it didn't explain how exactly to fix it.

Jon Skeet
people
quotationmark

You've written "Hello World" in the platform default encoding - which is likely to use a single byte per character.

You're then reading RandomAccessFile.readChar which always reads two bytes. Documentation:

Reads a character from this file. This method reads two bytes from the file, starting at the current file pointer. If the bytes read, in order, are b1 and b2, where 0 <= b1, b2 <= 255, then the result is equal to:

   (char)((b1 << 8) | b2)

This method blocks until the two bytes are read, the end of the stream is detected, or an exception is thrown.

So H and e are being combined into a single character - H is U+0048, e is U+0065, so assuming they've been written as ASCII character, you're reading bytes 0x48 and 0x65 and combining them into U+4865 which is a Han character for "a moving cart".

Basically, you shouldn't be using readChar to try to read this data.

Usually to read a text file, you want an InputStreamReader (with an appropriate encoding) wrapping an InputStream (e.g. a FileInputStream). It's not really ideal to try to do this with RandomAccessFile - you could read data into a byte[] and then convert that into a String but there are all kinds of subtleties you'd need to think about.

people

See more on this question at Stackoverflow