I'm trying to see how much memory is used when I have a lot of duplicate strings. I am using the method highlighted in this answer (at the bottom)
Here's me creating a list of a ten million strings, where each string has only a few characters.
public class Test1 {
public static void main(String[] args) {
int count = 10000000;
List<String> names = new ArrayList<String>();
for (int i = 0; i < count; i++) {
names.add("test");
}
Runtime rt = Runtime.getRuntime();
long usedMem = rt.totalMemory() - rt.freeMemory();
System.out.println(usedMem / (1024*1024) + " MB");
}
}
I run it, and it says 88 MB. I am not too sure what this represents, but I'll just take it as a number to compare iwth.
Here's me doing the same test again, except I replaced the small string with some lorem ipsum text
public class Test1 {
public static void main(String[] args) {
int count = 10000000;
List<String> names = new ArrayList<String>();
for (int i = 0; i < count; i++) {
names.add("Lorem ipsum dolor sit amet, brute euismod eleifend te quo, ne qui iudicabit hendrerit. Ea sit dolore assentior prodesset. In ludus adipiscing eos, ius erat graeco at, cu nec melius copiosae. Epicuri suavitate gubergren id sea, possim animal eu nam, cu error libris expetendis his. Te sea agam fabulas, vis eruditi complectitur ei. Ei sale modus vis, pri et iracundia temporibus. Mel mundi antiopam ad.");
}
Runtime rt = Runtime.getRuntime();
long usedMem = rt.totalMemory() - rt.freeMemory();
System.out.println(usedMem / (1024*1024) + " MB");
}
}
I run this, and it says 88 MB again.
This is not meant to be an attempt to properly benchmark memory usage, but I was expecting the number for the ipsum lorem string to be somewhat larger because there are about 50x as many characters in the string.
How does Java store arrays of strings in memory? Or, am I doing something wrong?
Your List<String>
isn't storing strings. It's storing string references.
In each case, you've got a single String
object, and then a list with a lot of references to the same object. It's like having a single house, and millions of pieces of paper all with the same address on. That takes roughly the same amount of land, whether the house is a bungalow or a mansion.
If you want to see what happens when you create a different string for each entry in the list to refer to, try:
for (int i = 0; i < count; i++) {
names.add("test" + i);
}
Now you'll run out of memory much quicker, as on each iteration you'll be creating a new string object, which will take a certain amount of memory. The exact amount of memory depends on the implementation, but it's generally a String
object containing a reference to a char[]
object (an array of characters), a start position, a length, and a cached hashcode. So for small strings, the textual data is dwarfed by the overhead of the housekeeping work, whereas for very large strings the data in the char[]
will take the bulk of the space.
See more on this question at Stackoverflow