My program reads x bytes from a file, checks if they are all zeros, repeats the process for 20.000 files, and keeps a list of the files that have non-zero bytes. Trying to monitor performance, I made the number of bytes it checks for each file definable (byteSize).
The problem is that the first run of the program it takes ~5 minutes for it to complete (byteSize = 8192), but if I run it again it takes only 10 seconds, even if I close and restart the program, so the only cause that comes to my mind is that the byte array remains in memory.
BinaryReader is under a "using" directive, so as far as I know it should close the stream after the loop completes. So why the byte array remains? How can I delete it? I need to do it to measure actual performance each time I run the prog.
byte[] readByte = new byte[byteSize];
for (int i = 0; i < readCycles; i++)
{
using (BinaryReader reader = new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read)))
{
reader.BaseStream.Seek(8192 + i * byteSize, SeekOrigin.Begin);
reader.Read(readByte, 0, byteSize);
}
foreach (byte b in readByte)
{
if (b != 0)
{
allZeros = false;
break;
}
else
allZeros = true;
}
if (allZeros == false) break;
}
This almost certainly has nothing to do with anything .NET is doing - it'll be the file system transparently caching for you.
To test this, change your code to just use FileStream
and simply loop over the file reading it to a buffer and ignoring the data:
using (var stream = File.OpenRead(...))
{
var buffer = new byte[16384];
while (stream.Read(buffer, 0, buffer.Length) > 0)
{
}
}
I'm sure you'll see the same result - the first read will be relatively slow, then it'll be very fast.
See more on this question at Stackoverflow