# Buckets and Van Emde Boas Trees

# Buckets

Explot *indirect addresssing* in RAM model.

review shortest path algorithm.

In shortest paths, often have edge lengths small integers (say max \(C\)).

Observe heap behavior:

heap min increasing (monotone property)

max \(C\) distinct values

(because don’t insert \(k+C\) until delete \(k\)).

Idea: lots of things have same value. Keep in buckets.

How to exploit?

standard heaps of buckets. \(O(m\log C)\) (slow) or \(O(m+n\log C)\) with Fib (messy).

Dial’s algorithm: \(O(m+nC)\).

in fact, \(O(m+D)\) for max distance \(D\)

space?

use array of size \(C+1\)

wrap around

Can we improve this?

where are we wasting a lot of effort?

scanning over big empty regions

can we arrange to leap whole regions?

some kind of summary structure?

2-level buckets.

make blocks of size \(b\)

add layer on top: \(nC/b\) block summaries

in each summary, keep count of items in block

insert updates block and summary: \(O(1)\)

ditto decrease key

delete min scans summaries, then scans one block

over whole algorithm, \(nC/b\) scanning summaries

also, scan one block per delete min: \(b\) work

over \(n\) delete mins, work \(nb+nC/b\)

balance, optimize with \(b=\sqrt{C}\)

result: SP in \(O(m+n\sqrt{C})\)

as before “space trick” to keep array sizes to \(C\)

if know \(D\) (or binary search for it) make bucket size \(\sqrt{D/n}\) and get \(O(m+\sqrt{nD})\)

Generalize: 3 tiers?

blocks, superblocks, and summary

block size \(C^{1/3}\)

runtime \(O(m+nC^{1/3})\)

how far can this go? To \(m+n\)? no, because insert cost rises.

Tries.

depth \(k\) tree over arrays of size \(\Delta\)

range \(C\) (details of reduction from \(nC\) omitted)

expansion factor \(\Delta=(C+1)^{1/k}\) (power of 2 simplifies)

insert: \(O(k)\) (also find, delete-non-min, decrease-key)

delete-min: \(O(k\Delta)=O(kC^{1/k})\) to find next element

Shortest paths: \(O(km+knC^{1/k})\)

Balance: \(nC^{1/k}=m\) so \(C=(m/n)^k\) so \(k=\log(C)/\log(m/n)\)

Runtime: \(m\log_{m/n}(C)\)

Space: \(k\Delta=kC^{1/k}=km/n\le (m/n)\log C\) using circular array trick.

Problems: space and time

Idea: be lazy! (Denardo and Fox 1979)

don’t push items down trie until you must

unique block (array) on each level

*active*keep other stuff piled up in buckets in each level

keep count of items in (buckets of) each level (not counting below)

only expand bucket when needed (and update level counts)

Operations:

Insert

walk item down tree till stop in (inactive) bucket or bottom

increment level count

\(O(k)\) work

Decrease key

Remove from current bucket

put in proper bucket

maybe descend (and revise level counts)

new bucket must be a prior sibling since monotone

so bucket exists

and no higher levels need be touched

amortization:

items descend once per touch, but never rise,

(critically relies on monotone property)

so \(k\) expansion per item

alternatively, consider “potential” as “total heights of items”

inserting at top adds \(k\) potential

then all descents are free

Delete-min

remove item, update bottom level count

advance to new min

rise to first nonempty level

traverse to first nonempty bucket

expand till find min

time:

pushdowns are accounted for in insert cost

identifying min happens during pushdowns

rise to nonempty bucket costs less than pushdowns from it

so, cost to scan only one block

cost \(C^{1/k}\)

space to \(kC^{1/k}\)

Times:

\(O(k)\) insert (charge expansions to insert)

\(O(1)\) decrease key

\(O(C^{1/k})\) delete-min

paths runtime: \(O(m+n(k+C^{1/k}))\), choose \(k = 2\log C/\log \log C\): \(O(m+n(\log C)/\log\log C)\)

great, even if e.g. \(C=2^{32}\)

Further improvement: heap on top (HOT) queues get \(O(m+n(\log C)^{1/3})\) time. Cherkassky, Goldberg, and Silverstein. SODA 97.

Implementation experiments—good model for project

# VEB

Van Emde Boas, “Design and Implementation of an efficient priority queue” Math Syst. Th. 10 (1977)

Thorup, “On RAM priority queues” SODA 1996.

Idea

idea: in bucket heaps, problem of finding next empty bucket was heap problem. Recurse!

\(b\)-bit words

\(\log b\) running times

thorup paper improves to \(\log\log n\)

consequence for sorting.

Algorithm.

array of buckets but fast lookup on top.

queue \(Q\) on \(b\) bits is struct

\(Q.\min\) is current min,

*not*stored recursivelyArray \(Q.low[]\) of \(\sqrt{U}\) queues on low order bits in bucket

\(Q.high\), vEB queue on high order bits of elements

*other than current min in queue*

Insert \(x\):

if \(x < Q.\min\), swap

now insert \(x\) in recursive structs

expand \(x=(x_h,x_l)\) high and low half words

If \(Q.low[x_h]\) nonempty, then insert \(x_l\) in it

else, make new queue holding \(x_l\) at \(Q.low[x_h]\), and insert \(x_h\) in \(Q.high\)

note two inserts, but one to an empty queue, so constant time

Delete-min:

need to replace \(Q.\min\)

Look in \(Q.high.\min\). if null, queue is empty.

else, gives first nonempty bucket \(x_h\)

Delete min from \(Q.low[x_h]\) to finish finding \(Q.\min\)

If results in empty queue, Delete-min from \(Q.high\) to remove that bucket from consideration

Note two delete mins, but second only happens when first was constant time.

Problem: space

need constant time hash table.

non-trivial complexity theory,

but can manage with randomization or slight time loss.