The size does matter! (partly)

Have you ever asked yourself, what memory size does an object or a primitive data type really uses?

In general it uses a multiple of WORD. http://en.wikipedia.org/wiki/Word_%28computing%29

That means a multiple of 4 byte (32bit) or 8 byte (64bit), if we speak about usually used derived x86-systems.

In detail let’s take a look at the Sun Hotspot VM architecture. http://java.sun.com/products/hotspot/whitepaper.html

Memory usage in detail

Every Object has a 2 words header. Therefore it uses 2 multiple 4 or 8 bytes = 8 or 16 bytes at least on the heap for each empty Object. And the reference pointer to the object uses usually 4 byte, because the pointer is now hardware address. But if it is a local variable, this will be on the stack (afaik).

The first header word contains information such as the identity hash code and GC status information. The second is a reference to the object’s class.

We only know now, that we have an object of a specific type, but with no data in it.

And it gets worse…

If you have a simple Boolean object, which just states true or false it needs 12 bytes (32bit) or 24 bytes (64bit) on the heap. A primitive boolean variable needs at least one byte in the Java language ( I think for simplicity not to go down to bits). But because of the word architecture, we need to let the remaining bytes empty for fast accessing (padding).

Thats why you always have to use Boolean.valueOf or autoboxing(which uses valueOf). So the cached boolean object is used and no Object is created.

Adding more primitive data types

If we add more primitive data types, what will happen? Fortunately they will just fill up the remaining unused bytes! Of course, if we go over the word multiplier border the remaining bytes will be unused.

An Object containing 4 booleans (64bit: 8 ) will need the same memory as 1 Boolean.

What is with object in the objects?

What would happen if the object contains an object?  Well, the object have to be on the 4/8 byte grid. Otherwise the super object has to know all sub objects in detail. And we only have the pointer to an object, which

Example:

You have an object with following members:

boolean
boolean
Object
boolean
boolean

Usually you would assume, that for the first and last booleans unnecessary padding takes place. But fortunately the Sun Hotspot VM moves the attributes. So that less memory is wasted. This is called Object Packing.

Arrays (versus Lists)

Arrays (e.g. new byte[5]) are special. They have a three word header instead of two. The size is saved in the 3rd. And because it knows it size and what data type it contains it only uses padding at the end of the array.

Now we are at the question of array with primitives versus java.util.List with boxed data types.
If we talk about performance, the simple array will surely outperform a List. And that just because of the boxing of the primitive data types. And because a list usually starts with a size of 10 and increase its size with every add, which doesn’t fit anymore. A List has also a buffer, so it doesn’t have to increase its size all the time, but this buffer is also a memory waste.

Conclusion

Lists are preferred to use for Objects (array is an object too!). If you know the exact size (or the size it will surely take) of the list you can specify it at the constructor.

For primitive data types you should use, if the size is fix, an array. If that’s not the case, you should ask yourself, is the performance loss on using a List with boxed primitives something to worry about?
If it is, you should first ask yourself, if you can gain somewhere else performance to overcome this performance loss.

If not you may have to search for a library that provides a special primitive byte “list”  or you have to implement the array handling yourself, which can be a pain in the ass.

This entry was posted in Java, Performance and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *