Autoboxing Performance

Preface

Because of my previous post about autoboxing I get many search queries about autoboxing performance. Now I’m curious, too. So let’s get ready for some benchmarks.

When we talk about boxing performance, then we always have two sides. Throughput and memory consumption.

Throughput with Arrays

For micro benchmarking I use JMH framework, which I already described in my other post here.

To have a bigger test set I create an array of 2 million entries. We measure the throughput (operations/second).

 

The result (ops/s, higher is better)

Benchmark Score Error
testIntegerBoxing 176 919 ± 5 843
testIntegerInclUnboxing 193 915 ± 4 597
testPrimitiveIntegers 521 617 ± 5 347

Primitives seems to be 2-3 times faster then it’s Integer equivalent. But wait a minute why is boxing and unboxing faster than only boxing? The reason seems to be the handling of Integer for array and return value.

But still we can create nearly 200 k  arrays the size of 2 million in one second using autoboxing and Integer arrays! But we are very deep in the basic operations where some milliseconds can have a big impact in the overall application performance.

Now I’ve done the same with the byte datatype. The result is the following:

Benchmark Score Error
testByteBoxing 398 484 ±  3 886
testByteInclUnboxing 1029 260 ±  7 629
testPrimitiveByte 1025 612 ± 10 827

That seems even more weird. The performance loose is here only because of handling Byte[] and returning it instead of byte[]. Boxing has here no performance penalty, because all Byte values are cached in the JVM internally. The same value range is cached for all other wrapper data types, too.

Memory consumption

Explanation of WORD

The memory consumption of objects, data types in the JVM is pretty straight forward. The most important factor is the architecture data type “WORD“. WORD is dependent of your used system architecture and operating system. In most modern PCs this is currently either 64 bit or 32 bit. This defines what bite size the CPU can handle.

Wrapper Types

Boolean, Byte

If autoboxing (or manual valueOf) is used there is no need to create a new instance, because they are already cached in the VM. Every JVM has to cache this.

But we still need a pointer (“oop”) to the instance.

32bit: 4 byte pointer
64bit: 4 or 8 byte pointer (see Hotspot enhancements)

Short, Integer, Character, Float

-128 to 127 is cached. Character is unsigned so only 0 to 127. Float and Double values have no cache.
The rest always yields in a new instance. A Short, Character variable needs 4/8 bytes, too. Because it has to fit in the word size (padding). This has the same impact for int in 64bit environment.

32bit: 4 bytes pointer + 8 bytes instance header + 4 bytes content = 16 bytes
64bit: 4/8 bytes pointer + 16 bytes instance header + 8 bytes content = 28/32 bytes

Long, Double

32bit: 4 bytes pointer + 8 bytes instance header + 8 bytes content = 24 bytes
64bit: 4/8 bytes pointer + 16 bytes instance header + 8 bytes content = 28/32 bytes

Primitives

All data types will have at least word size. Therefore…

boolean, byte, short, char, int, float

32bit: 4 byte
64bit: 8 byte

long, double

32bit, 64bit: 8 byte

primitives in arrays

In arrays primitives retain their size, but they still have to fit to the WORD size.

Some examples:

boolean[299]: 299 * 1 +12/24 header + 4/8 pointer = 315/331 bytes + padding = 316 / 336 bytes
int[123]: 123 * 4 + 12/24 header + 4/8 pointer = 508/524 + padding = 508/528 bytes

object sizes

Of course objects have to fit to 4/8 bytes. But internally attributes can retain their sizes. That means primitives have their original 1 to 8 byte and objects probably have a reduced 4 byte pointer in 64 bit systems (see Hotspot enhancements).

For example:

 

Looking at the flat memory consumption (ignoring the object attributes itself) we have following consumption:

32bit: 4 (id) + 4 (name) + 4 (lastLogin) + 1 (enabled) + 2 (short) = 15 + padding = 16
most 64bit: 16
rare 64bit: 4 + 8 + 8 + 1 + 2 = 23 + padding = 24

all with wrapper instead of primitives:

32 bit: 5 * 4 = 24
most 64 bit: 24
rare 64 bit:  5 * 8 = 35

Hotspot enhancements

Compressed Oops

This feature reduces the pointer in 64bit system from 8 byte to 4 byte, if the heap is smaller than 32 GB. This is simply done, because the pointer doesn’t refer any more the hardware address. Instead the pointer is an offset from the start of the heap space. And it does only refer to objects and not byte.

Conclusion

Boxing is still quiet fast. I suggest to use primitives for mandatory attributes in a object, for heavy CPU calculations and for last resorts in bottleneck fights.

I wouldn’t recommended to use primitives for optional attributes, because that creates evil magic numbers. Of course in very rare cases where every tiny bit of performance counts this might be needed too.

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

The difference on a higher scale

Let’s say we have an JEE application with 5 million instances where the mandatory id is a Integer instead of an primitive int.

Assuming we have a 64bit system with 4 byte pointer:
Integer: 28 * 5 million = 140 000 000 bytes
int: 4 * 5 million = 20 000 000

Makes a difference of 120 MB we can safe.

Computer setup

CPU: 4 core Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Java: Java 8 update 25 64bit
OS: Linux Mint 17.1 64bit
Kernel: 3.13.0-37-generic
IDE used: eclipse Lunar

This entry was posted in Java, Performance and tagged , , , , . Bookmark the permalink.

2 Responses to Autoboxing Performance

  1. bogdan says:

    testPrimitiveIntegers works in 2-3 times faster then other tests because of wrong banchmarks.
    You forgot about profile-guided optimizations(http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_12_Forking.java)
    Try to run test one by one but not three test together and you will see totally different result.
    Also do not forget about Loop Unrolling(http://hg.openjdk.java.net/code-tools/jmh/file/183e50c96c54/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_11_Loops.java)

    • keiki says:

      You are right that looking only at the boxing would have different results.

      The reason why I used loops is also to show that handling with objects is way more time consuming than primitives, even if it is not solely the boxing.

Leave a Reply

Your email address will not be published. Required fields are marked *