How are Java 9 sets created by the factory method Set.of randomized

Convenient factory methods for creating collections of objects is one of the nice feature that Java 9 brings. Here is how you would create a set:

Set<String> tshirtSizes = Set.of("XS", "S", "M", "L", "XL", "XXL", "3XL");

tshirtSizes.forEach(System.out::println);

Running this code a couple of times will show how every single run will create a different output. The Java 9 sets created by the newly added factory methods are intentionally randomized. We know since forever that, for example, hash sets don’t guarantee the order of the elements. However, if you create one and add some elements, you will get the same iteration order most of the time. In order to avoid writing code that depends on the order of the added elements, the new factory methods for collections randomize the elements. Let’s look in more details to this aspect.

First, what is the actual implementation of the set created by the Set.of static method? It is java.util.ImmutableCollections.SetN, an internal class which doesn’t have a public constructor. Looking carefully to this class we can see that the set is backed by a hash table represented by an array final E[] elements. The number of the elements allocated in the array is 2 times the number of the elements provided to the of method. So, in the above example, the array will have 14 elements.

I ran the code twice and I debugged it a bit – here is how the internal array was filled with elements:

Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13
1st run XS S 3XL M L XXL XL
2nd run 3XL L XS S M XL XXL

For every element given in the input, the index in the internal array is calculated using the following formula:

idx = Math.floorMod(element.hashCode() ^ SALT, 14)

The SALT is essential for randomizing the elements in the array. The SALT is initialized with the virtual machine initialization and it is based on the current system time:

    static final int SALT;
    static {
        long nt = System.nanoTime();
        SALT = (int)((nt >>> 32) ^ nt);
    }

The SALT for the first run was -1794585645 (and 325126010 for the second). Here are the indexes computed for the first run:

Element Index
XS 0
S 2
M 4
L 5
XL 13
XXL 9
3XL 2

If you check the indexes computed by the formula you can see that the index for “3XL” element doesn’t match the actual place of the element in the internal array. This is because on index 2 there is already an element (“S”). This is a collision and the element will be added at the next available index, which is index 3 in our example. Having the internal array twice bigger than the number of added elements will avoid iterating too much for finding the first available place.

Leave a Reply

Your email address will not be published. Required fields are marked *