It's now 2021, and I'm still struggling to understand some of the nuance behind the hashing/bucket finding algorithm. I understand how/what hash tables are, but I think with QuickJS's atom hash table, there's some optimizations going on that I haven't quite figured out.
Let's look through it, continuing from last time. The atom hash table is made of two primary data structures:
atom_hash is the list of indexes into
atom_array for a given hash for a given table size.
atom_hash also always has a length that is a power of two so that for each "resize" of the table, the hash can be masked. The basic usage seems to be:
- we have key string K, and atom value V
- we produce a hash of K
Hwhich is a 32bit integer
- since the initial hash size is say 256 (mask of
0xff) entries, we mask
H & 0ff. (
His then stored on
Hand we can reuse it later).
H1is used as the index into
Iis the index into
atom_array, such that
atom_array[I]gives us a container
C(linked list like object that contains
The difficulties I'm having, is that there is some pointer compression going on with
C so that the lowest(?) bit of
&C says if the entry is empty or not.
C also has the field
next_hash which returns an
I for the next entry, which is where I start to get confused. Does this mean there are multiple
I that map to the same hash? I think so, but it's not stored as a linked list like I would expect, instead the values all seem to be stored linearly in
atom_array, whereas I expected
atom_array to be buckets of linked lists.
What I want to answer for myself is if the whole datastructure in QuickJS is needed in Rust, or if I could just use some standard library datastructures that Rust provides and still maintain all the memory/speed benefits of the handwritten C version. At the moment, I feel like there's still too much escaping my understanding to be able to evaluate it.
I'm still planning/working on doing a port of the algorithm to Rust so I can understand it better, but all the indirection and bit/address manipulation is definitely pushing my abilities in Rust. It's a good thing in the long run, as it helps me udnerstand Rust's type/data system better, but it's quite the steep curve.