• PSets
• Calendar
• Materials
• Information
• # Fibonacci Heaps

Shortest path/MST motivation. Discuss Prim/Dijkstra algorithm.

Note: lots more decrease-key than delete.

Response: balancing

• trade off costs of operations

• making different parts equal time.

$$d$$-heaps:

• $$m\log_d n + nd\log_d n$$.

• set $$d=m/n$$

• $$O(m\log_{m/n} n)$$

• already nice, gives $$O(n^2)$$ (matching trivial alg) on complete graph.

## Fibonacci Heaps

Fredman-Tarjan, JACM 34(3) 1987.

http://dl.acm.org/citation.cfm?id=28874

Key principles:

• Lazy: don’t work till you must

• If you must work, use your work to “simplify” data structure too

• force user (adversary) to spend lots of time to make you work

• analysis via potential function measuring “complexity” of structure.

• user has to do lots of insertions to raise potential,

• so you can spread cost of complex ops over many insertions.

• potential says how much work adversary has done that you haven’t had to deal with yet.

• You can charge your work against that work.

• another perspective: procrastinate. if you don’t do the work, might never need to.

• Why the name? Wait and see.

Lazy approach:

• During insertion, do minimum, i.e. nothing.

• Indeed, for only insert, don’t even need to remember item!

• but we also want delete min, so save items

• For first delete-min, cost is $$n$$

• So, amortized cost 1.

Problem?

• issue with second and further delete mins

• $$n$$ delete mins cost $$n^2$$—means amortized $$n$$

Solution: Use your work to simplify

• As do comparisons, remember outcomes

• point from loser to winner

• creates “heap ordered tree” (HOT)

• not full or balanced, but heap ordered

• in fact, this tree will be a caterpillar: one spine with legs

• but we can consider general trees with this property

• remembers everything we figured out while finding min

Benefit: on delete-min, new min is child of current root

• find by scanning children

• but if we build a star, no benefit over brute force

• can we force few children?

Can we be lazy and simultaneously prepare for future laziness?

• Problem is if we start with min and it beats everything

• it plays too often

• how can we ensure nobody plays too often?

• Tournament

• pairs compete

• winners pair to play each other

• winners of winners pair

• etc.

• Still $$n-1$$ comparisons

• But now each node plays only $$\log n$$ times

• What is resulting HOT tree structure?

• binomial trees

x  x    x  x    x  x    x  x

x       x       x       x
\       \       \       \
\       \       \       \
x       x       x       x

x---             x---
\  \             \  \
\  -             \  -
x  x             x  x
\                \
\                \
x                 x

x---------
\  \     \
\  -     ------
x  x          x---
\          \  \
\          \  -
x           x  x
\
\
x
• degree $$d$$ has $$2^d$$ descendants

• so max degree $$\log n$$

Generalize

• As we interleave inserts and deletes, can’t run a tournament from clean start

• but can achieve same result

• So max degree $$O(\log n)$$ for $$n$$ items

• how?

• record each node’s degree

• i.e., its round in the tournament so far

• only link HOTs of same degree

• same “union by rank” idea as union find

induction: if only link heaps of same degree, than any degree-$$d$$ heap has $$2^{d}$$ nodes.

• creates “binomial trees” saw above

• “Binomial heaps” do this aggressively—when delete items, split up trees to preserve exact tree shapes.

• but we’ll be sloppier/faster

Consolidation algorithm for list of arbitrary HOTS

• inserts/deletes may create arbitrary set of them

• create buckets

• bucket HOTs by degree (maintain degree in node)

• start at smallest bucket

• link pairs till $$<2$$ left.

• next bucket.

• at end, collect items from buckets and discard buckets

For delete min

• remove min

• consolidate to identify new min

Combine with lazy insert

• keep pointer to min root (or just scan on delete min)

• on insert, just add new (degree 0) HOT to set of HOTs

• Formalize notion that scan through inserted items is “paid for” by consolidation

Heap ordered trees implementation

• definition

• represent using left-child, parent, and sibling pointers (both directions)

• keep double linked list of HOTs

• and a pointer to min HOT root.

• so in constant time, can find min

• in constant time, can add item

• in contant time, can link two HOT lists (Fibonacci heaps are mergeable in constant time)

• time to delete-min equal to number of roots, and simplifies struct.

Insert solution idea: adversary has to do many insertions to make consolidation expensive.

• analysis: potential function $$\phi$$ equal to number of roots.

• amortized$$_i=$$ real$$_i+\phi_i-\phi_{i-1}$$

• then $$\sum a_i=\sum r_i+\phi_n-\phi_0$$

• upper bounds real cost if $$\phi_n \ge \phi_0$$.

• sufficient that $$\phi_n \ge 0$$ and $$\phi_0$$ fixed

• insertion real cost 1, potential cost 1. total 2.

• deletion: take $$r$$ roots and add $$c$$ children, then do $$r+c$$ scan work.

• $$r$$ roots at start, $$\log n$$ roots at end. So, $$r-\log n$$ potential decrease

• so, total work $$O(c+\log n)=O(\log n)$$

Result: $$O(1)$$ amortized insert, $$O(\log n)$$ amortized delete

• basically easy: cut off node from parent, make root.

• constant time decrease-key

• what goes wrong?

• may violate exponential-in-degree property

“saving private ryan”

• fix: if a node loses more than one child, cut it from parent, make it a root (adds 1 to root potential—ok).

• implement using “mark bit” in node if has lost 1 child (clear when becomes root)

• may cause “cascading cut” until reach unmarked node

• why 2 children? We’ll see.

Analysis: must show

• tree size is exponential in degree

Second potential function: number of mark bits.

• if cascading cut hits $$r$$ nodes, clears $$r$$ mark bits

• adds 1 mark bit where stops

• amortized cost: $$O(1)$$ per decrease key

• note: if cut without marking, couldn’t pay for cascade!

• this is binomial heaps approach. may do same $$O(\log n)$$ consolidation and cutting over and over.

• Wait, problem

• can’t have 2 incompatible potential functions

• new root per cut

• adds to first potential function

• and thus to amortized cost

• fix: double new potential function.

• use one unit to pay for cut, one to pay for increase in 1st potential

• so, doesn’t harm first potential function analysis

Analysis of tree size:

• node $$x$$. consider current children in order were added (indexed from 1, not 0).

• claim: $$i^{th}$$ remaining child (in addition order) has degree at least $$i-2$$

• proof:

• Let $$y$$ be $$i^{th}$$ added child

• When added, the $$i-1$$ items preceding it in the add-order were already there

• i.e., $$x$$ had degree $$\ge i-1$$

• So $$i^{th}$$ child $$y$$ had (same) degree $$\ge i-1$$

• $$y$$ could lose only 1 child before getting cut

• let $$S_k$$ be minimum number of descendants (inc self) of degree $$k$$ node. Deduce $$S_0 =1$$, $$S_1=2$$, and \begin{aligned} S_k &\ge 2+ \sum_{i=0}^{k-2} S_i\end{aligned}

• to upper bound, solve equality \begin{aligned} S_k &= 2+\sum_{i=0}^{k-2} S_i\\ S_k - S_{k-1} &= S_{k-2}\end{aligned}

• deduce $$S_k \ge F_{k+2}$$ fibonacci numbers

• reason for name

• we know $$F_k \ge \phi^k$$

• so biggest possible $$k=\log_\phi n$$

Practical?

• non-amortized versions with same bounds exist (good for realtime apps).

• ie fib heaps reduces comparisons on moderate sized problems

• we’ve done experiments with graph problems where fib wins

• but, regular heaps are in an array

• fib heaps use lots of pointer manipulations

• lose locality of reference, mess up cache.

• work on “implicit heaps” that don’t use pointers

• Pettie developed an optimal one

Cannot beat these bounds, since can use to sort.

## Minimum Spanning Tree

minimum spanning tree (and shortest path) easy in $$O(m+n\log n)$$.

More sophisticated MST:

• why $$n\log n$$? Because deleting from size-$$n$$ heap

• idea: keep heap small to reduce cost.

• choose a parameter $$k$$

• run prim till region has $$k$$ neighbors (i.e. heap members)

• set aside and start over elsewhere.

• heap size bounded by $$k$$, delete by $$\log k$$

• “contract” regions (a la Kruskal) and start over.

Formal:

• phase starts with $$t$$ vertices.

• pick max region size $$k$$ (TBD)

• unmark all vertices and repeat following

• choose unmarked vertex

• Prim until attach to marked vertex or heap reaches size $$k$$

• ie $$k$$ edges incident on current region

• (maybe both ends inside; that’s fine too)

• mark all vertices in region

• contract graph in $$O(m)$$ time and repeat

Analysis:

• time for phase: $$m$$ decrease keys, $$t$$ delete-mins from size-$$k$$ heaps, so $$O(m+t\log k)$$.

• balance:

• have to spend $$m$$

• so can make $$t\log k = m$$ “for free” (no asymptotic change)

• set $$k=2^{m/t}$$.

• runtime of phase $$O(m)$$

• number of phases:

• At end of phase, each compressed vertex “owns” $$k$$ edges (one or both endpoints)

• so next number of vertices $$t' \le m/k$$

• so $$k' = 2^{m/t'} \ge 2^k$$

• what is starting $$k$$?

• Initially $$t=n$$

• so need $$n\log k = O(m)$$ for $$O(m)$$-time phase

• so can take $$k=2^{m/n}\ge 2$$

• (note if $$m>n\log n$$ then $$k>n$$ and one phase suffices to finish)

• when reach $$k=n$$, Prim step solves whole graph, done.

• number of phases: $$\beta(m,n) = \min\set{i \mid \log^{(i)} n \le m/n} \le \log^* n$$.

Remarks:

• subsequently improved to $$O(m\log\beta(m,n))$$ using edge packets

• chazelle improved to $$O(m\alpha(n)\log\alpha(n))$$ using “error-prone heaps”

• ramachandran gave optimal algorithm (runtime not clear)

• randomization gives linear.

• fails for Dijkstra.

• Why? Because cannot contract.