BitSet – the holy grail of working with bits
What does it solve?
java.util.BitSet solves the memory consumption of boolean. Adding a tiny CPU overhead. And offering a lot of helpful methods. It’s available since Java 1.0.
Boolean and memory in detail
Have you ever wondered why a boolean takes up 1 byte instead of 1 bit? Of course because of the way the memory is accessed. Because in the memory everything is aligned to fit your architecture to 4 or 8 bytes depending on 32bit or 64 bit system. So even 1 byte boolean consumes 8 bytes if necessary. Further I will only calculate the 64bit consumption to make it simpler.
An improvement is an array or an Object. Because the space taken by other attributes will fill up to the 8 bytes or a multiple of 8 bytes.
Comparing different memory consumptions:
java.lang.Boolean (not cached): 4 bytes pointer to instance + 16 bytes instance header + 1 byte boolean + 7 bytes padding to be at 8 bytes = 28 bytes per boolean
8 booleans in an object (don’t use it. Just to make a point): 4 byte pointer + 16 byte header + 8 * 1 byte boolean= 28 / 8 = 3.5 bytes per boolean
boolean : 4 bytes pointer + 16 bytes instance header + 8 bytes special array header + 8*1 byte boolean = 36 / 8 = 4.5 bytes per boolean
Of course it doesn’t make sense to make an object of x booleans just to save some memory. But it’s good to know that memory is automatically saved here if you got multiple boolean, short, etc attributes in your object.
How does BitSet solves the memory consumption?
In short BitSet saves the booleans as bits. It does that using internally a long. And it has two additional attributes a boolean and a int. I ignore any constants here of course.
Size comparison boolean versus BitSet
Let’s have a higher calculation size of 1000 booleans.
boolean: 4 bytes pointer + 16 bytes header + 8 bytes special array header + 1000*1 boolean = 1028 bytes in total for 1000 booleans
1028/1000 = 1.028 bytes per boolean
new BitSet(1000): 4 bytes pointer + 16 bytes header + 16 byte for boolean, int, pointer to long and padding + long to save 1000 bits (24+ 8*16=152 bytes) = 188 bytes in total
188/1000 = 0.188 bytes per boolean
For 1000 booleans BitSet saves 840 bytes. You can have 5 BitSets for one boolean! Of course with more booleans this will increase even further.
What does it costs?
To map the booleans to a bit. We need some CPU cycles. But the overhead is quite low to be honest. Some bit shifting here. Some range checks there. But of course a boolean array access is way more faster to access in terms of micro optimization. But in this case the memory benefit outperforms the CPU costs in probably most cases.
Wait what else does BitSet offer?
BitSet offers all methods you can think of for boolean.
- Automatic size increase. The size is always doubled!
- Setting bits to true/false in range
- Finding next/previous clear(false) or set(true) bit
- inverting bits
- logical calculations with other BitSets! (AND, XOR, intersect…) – be aware a bitset is mutable. You change the BitSet instance all the time.
When should you use it?
Here are some rule of thumbs:
- When it makes your code more readable
- When you benefit from using the methods, instead of implementing them yourself (don’t reinvent the wheel)
- When you got a lot of booleans in an array otherwise
- When you do logical calculations (XOR, OR, AND)
When shouldn’t you use it?
- When every nano second in read access counts more than a lot of memory usage
- When it’s more obvious to use a boolean for the domain you are in
- When your boolean array size is only in the hundreds. Then the memory advantage is only minimal. So it’s probably not worth it.
- In general favor BitSet over boolean. Like you would favor ArrayList over . Except that BitSet is not just a wrapper.
- memory usage is quite high of boolean  instead of BitSet. Will have 5 (1000 booleans) to nearly 8 times memory advantage. Depending on the size.
- On smaller sizes use what makes the code more readable