I have set of octal values say (0177-0377). whenever these value I found in string, have to replace with ?.
String a= "sccce¼»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕerferferfer";
for (int i = 0177; i<= 0377 ; i++)
{
char x= (char) i;
a= a.replaceAll(Character.toString(x), "?");
}
System.out.print(a);
but this looks good when we have small file but I have to perform this operation in 1TB file.
How we can use regex to achieve this task.
You don't want to do this to the whole file in one go - you need a streaming approach. I'd do something like this:
// TODO: Rename to something more appropriate
public static void replaceInvalidCharacters(Reader reader, Writer writer) {
char[] buffer = new char[16384]; // Adjust if you want
int charsRead;
while ((charsRead = reader.read(buffer)) > 0) {
for (int i = 0; i < charsRead; i++) {
if (buffer[i] >= 0177 && buffer[i] <= 0377) {
buffer[i] = '?';
}
}
writer.write(buffer);
}
}
So you'd open a reader (with the appropriate encoding) for the current file, a writer (with the appropriate encoding) for the output file, then call the method above. It will read a chunk of data at a time, replace all the "bad" characters in the chunk, then write the chunk out to the writer.
No need for regular expressions.
Note that there are plenty of non-ASCII characters outside that range though - if you really want to remove all non-ASCII, you'd basically want
if (buffer[i] > 126) // Or 127; what do you want to do with U+007F?
See more on this question at Stackoverflow