Unknown Java Features Part 1: BitSet

BitSet – the holy grail of working with bits

What does it solve?

java.util.BitSet solves the memory consumption of boolean[]. Adding a tiny CPU overhead. And offering a lot of helpful methods. It’s available since Java 1.0.

Boolean and memory in detail

Have you ever wondered why a boolean takes up 1 byte instead of 1 bit? Of course because of the way the memory is accessed. Because in the memory everything is aligned to fit your architecture to 4 or 8 bytes depending on 32bit or 64 bit system. So even 1 byte boolean consumes 8 bytes if necessary. Further I will only calculate the 64bit consumption to make it simpler.

An improvement is an array or an Object. Because the space taken by other attributes will fill up to the 8 bytes or a multiple of 8 bytes.

Comparing different memory consumptions:

java.lang.Boolean (not cached): 4 bytes pointer to instance + 16 bytes instance header +  1 byte boolean + 7 bytes padding to be at 8 bytes = 28 bytes per boolean

8 booleans in an object (don’t use it. Just to make a point): 4 byte pointer + 16  byte header + 8 * 1 byte boolean= 28 / 8 = 3.5 bytes per boolean

boolean[8] : 4 bytes pointer + 16 bytes instance header + 8 bytes special array header + 8*1 byte boolean = 36 / 8 = 4.5 bytes per boolean

Of course it doesn’t make sense to make an object of x booleans just to save some memory. But it’s good to know that memory is automatically saved here if you got multiple boolean, short, etc attributes in your object.

How does BitSet solves the memory consumption?

In short BitSet saves the booleans as bits. It does that using internally a long[]. And it has two additional attributes a boolean and a int. I ignore any constants here of course.

Size comparison boolean[] versus BitSet

Let’s have a higher calculation size of 1000 booleans.

boolean[1000]: 4 bytes pointer + 16 bytes header + 8 bytes special array header + 1000*1 boolean = 1028 bytes in total for 1000 booleans

1028/1000 = 1.028 bytes per boolean

new BitSet(1000): 4 bytes pointer + 16 bytes header + 16 byte for boolean, int, pointer to long[] and padding + long[16] to save 1000 bits  (24+ 8*16=152 bytes) = 188 bytes in total

188/1000 = 0.188 bytes per boolean

For 1000 booleans BitSet saves 840 bytes. You can have 5 BitSets for one boolean[]! Of course with more booleans this will increase even further.

What does it costs?

To map the booleans to a bit. We need some CPU cycles. But the overhead is quite low to be honest. Some bit shifting here. Some range checks there. But of course a boolean array access is way more faster to access in terms of micro optimization. But in this case the memory benefit outperforms the CPU costs in probably most cases.

Wait what else does BitSet offer?

BitSet offers all methods you can think of for boolean.

  • Automatic size increase. The size is always doubled!
  • Setting bits to true/false in range
  • Finding next/previous clear(false) or set(true) bit
  • inverting bits
  • logical calculations with other BitSets! (AND, XOR, intersect…) – be aware a bitset is mutable. You change the BitSet instance all the time.

When should you use it?

Here are some rule of thumbs:

  • When it makes your code more readable
  • When you benefit from using the methods, instead of implementing them yourself (don’t reinvent the wheel)
  • When you got a lot of booleans in an array otherwise
  • When you do logical calculations (XOR, OR, AND)

When shouldn’t you use it?

  • When every nano second in read access counts more than a lot of memory usage
  • When it’s more obvious to use a boolean[] for the domain you are in
    • When your boolean array size is only in the hundreds. Then the memory advantage is only minimal. So it’s probably not worth it.

TL;TR

  • In general favor BitSet over boolean[]. Like you would favor ArrayList over []. Except that BitSet is not just a wrapper.
  • memory usage is quite high of boolean [] instead of BitSet. Will have 5 (1000 booleans) to nearly 8 times memory advantage. Depending on the size.
  • On smaller sizes use what makes the code more readable
This entry was posted in Unknown feature and tagged , , , , , , , . Bookmark the permalink.

3 Responses to Unknown Java Features Part 1: BitSet

  1. Roy van Rijn says:

    This article says: Beware BitSet is immutable.
    This is false, java.util.BitSet is *very* mutable and changes state!

    For example look at bitSet.and(otherBitSet); this is a void method and changes the state of the first BitSet.

  2. ZeeMan says:

    Please use bytes and bits with caution. JVM handles using boolean to take 1 to 2 bytes depending on the virtual machine and underlying architecture.

Leave a Reply

Your email address will not be published. Required fields are marked *