Ok, I am reading a .docx file via a BufferedReader and want to store the text in an edittext. The .docx is not in english language but in a different one (greek). I use:
File file = new File(file_Path);
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder text = new StringBuilder();
while ((line = br.readLine()) != null) {
text.append(line);
}
et1.setText(text);
And the result I get is this:
If the characters are in english language, it works fine. But in my case they aren't. How can I fix this? Thanks a lot
Ok, I am reading a .docx file via a BufferedReader
Well that's the first problem. BufferedReader
is for plain text files. docx
files are binary files in a specific format (assuming you mean the kind of file that Microsoft Word saves). You can't just read them like text files. Open the file up in Notepad (not Wordpad) and you'll see what what I mean.
You might want to look at Apache POI.
From comments:
Testing to read a .txt file with the same text gave same results too
That's probably due to using the wrong encoding. FileReader
always uses the platform default encoding, which is annoying. Assuming you're using Java 7 or higher, you'd be better off with Files.newBufferedReader
:
try (BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {
...
}
Adjust the charset to match the one you used when saving your text file, of course - if you have the option of using UTF-8, that's a pretty good choice. (Aside from anything else, pretty much everything can handle UTF-8.)
See more on this question at Stackoverflow