# Persistent Data Structures

Sarnak and Tarjan, "Planar Point Location using persistent trees", Communications of the ACM 29 (1986) 669–679

"Making Data Structures Persistent" by Driscoll, Sarnak, Sleator and Tarjan Journal of Computer and System Sciences 38(1) 1989

Idea: be able to query and/or modify past versions of data structure.

ephemeral: changes to struct destroy all past info

partial persistence: changes to most recent version, query to all past versions

full persistence: queries and changes to all past versions (creates “multiple worlds” situtation)

Goal: general technique that can be applied to *any* data
structure.

# Persistent Trees

Many data structures

consist of fixed-size

*nodes*conencted by*pointers*accessed from some

*root*entry pointdata structure ops traverse pointers

and modify node fields/pointers

runtime determined by number of traverse/modify steps

Our goal: use *simulation* to transform any ephemeral data
structure into a persistent one

measure of success: time and space cost for each

*primitive traversal*and*primitive modification*step.An exposed data structure operation will do some number of primitive steps

we will add persistence to primitive steps, so any data structure can be made persistent

we’ll measure cost of simulation in slowdown and storage

We’ll limit to trees.

But ideas generalize

Key principles:

Lazy: don’t work till you must

If you must work, use your work to “simplify” data structure too

force user (adversary) to spend lots of time to make you work

analysis via potential function measuring “complexity” of structure.

user has to do lots of changes to raise potential,

so you can spread cost of complex ops over many insertions.

potential says how much work

*adversary*has done that you haven’t had to deal with yet.You can charge your work against that work.

another perspective: procrastinate. if you don’t do the work, might never need to.

Lazy approach:

During change, do minimum, i.e. nothing.

Indeed, for only change, don’t even need to remember change!

but we also want lookups, so record changes

Then apply relevant changes when a lookup happens

So, amortized cost 1.

Problem?

issue with second and further lookups

so, do need to incorporate

Full copy bad.

Fat nodes method:

replace each (single-valued) field of data structure by list of all values taken, sorted by time.

requires \(O(1)\) space per data change (

**unavoidable**if keep old data)to lookup data field, need to search based on time.

store values in binary tree

checking/changing a data field takes \(O(\log m)\) time after \(m\) updates

**multiplicative**slowdown of \(O(\log m)\) in structure access.big number from data structures perspective

Path copying:

can only reach node by traversing pointers starting from root

changes to a node only visible to

*ancestors*in pointer structurewhen change a node, copy it and ancestors (back to root of data structure

keep list of roots sorted by update time

\(O(\log m)\) time to find right root (or const, if time is integers) (

**additive**slowdown)same access time as original structure

*additive*instead of multiplicative \(O(\log m)\).modification time and space usage equals number of ancestors: possibly huge!

especially if structure not balanced!

Combined Solution (trees only):

Path copying is too aggressive

how can we be lazy about path copying?

fat nodes bad, but “pleasingly plump” ok.

use fat nodes, but when get too fat, make new path copy

in each node, store 1

*extra*time-stamped fieldif full, overrides one of standard fields for any accesses later than stamped time.

access rule

standard access, just check for overrides while following pointers

constant factor increase in access time.

update rule:

when need to change/copy pointer, use extra field if available.

otherwise, make new copy of node with new info, and recursively modify parent.

Idea: lazy path copying

copy only as much as you must

Analysis idea: adversary has to do many changes to make copying expensive.

analysis:

**potential function**\(\phi\) measuring amount of “deferred work”.amortized\(_i=\) real\(_i+\phi_i-\phi_{i-1}\)

then \(\sum a_i=\sum r_i+\phi_n-\phi_0\)

upper bounds real cost if \(\phi_n \ge \phi_0\).

sufficient that \(\phi_n \ge 0\) and \(\phi_0\) fixed

Analysis

What represents work you need to do later?

full nodes (that may need copying)

that may still be modified (current values)

*live*node: pointed at by*current*root.potential function: number of

*full*live nodes.copying a node is free (new copy not full, pays for copy space/time)

pay for filling an extra pointer (do only once, since can stop at that point).

amortized space per update: \(O(1)\).

# Application: planar point location.

a computational geometry foreshadow

Different model

algebraic operations are \(O(1)\)

e.g. computing line intersections or lengths

even when these return irrationals!

sweep precision under the rug.

planar subdivision

division of plane into polygons

query: “what polygon contains this point”

define complexity of input?

edges of polygons form \(n\)

*segments*/edgessegments meet only at ends/vertices

numerous special-purpose solutions

One solution: dimension reduction

vertical line through each vertex

divides into slabs

in slab, segments maintain one vertical ordering

find query point slab by binary search

build binary search tree for slab with “above-below” queries

\(n\) binary search trees, size \(O(n^2)\), time \(O(n^2\log n)\)

observation: trees all very similar

think of \(x\) axis as time, slabs as “epochs”

at end of epoch, “delete” segments that end, “insert” those that start.

over all time, only \(n\) inserts, \(n\) deletes.

must be able to query over all times

Persistent sorted sets:

find\((x,s,t)\) find (largest key below) \(x\) in set \(s\) at time \(t\)

insert\((i,s,t)\) insert \(i\) in \(s\) at time \(t\)

delete\((i,s,t)\).

# Method: persistent red-black trees

Red black trees

aggressive rebalancers

amortized cost \(O(1)\) to change a field.

store red/black bit in each node

use red/black bit to rebalance.

depth \(O(\log n)\)

Add persistence

updates only in “present”

search: standard binary tree search; \(O(1)\) slowdown for persistence

update: causes changes in red/black fields on path to item, \(O(1)\) rotations.

result: \((\log n)\) space

*per insert/delete*geometry does \(O(n)\) changes, so \(O(n\log n)\) space.

\(O(\log n)\) query time.

Improvement:

red-black bits used only for updates

only need current version of red-black bits

don’t persist old bit versions: just overwrite

only persistent updates needed are for \(O(1)\) rotations

so \(O(1)\) space per update

so \(O(n)\) space overall.

Result: \(O(n)\) space, \(O(\log n)\) query time for planar point location.

Extensions:

method extends to arbitrary pointer-based structures.

\(O(1)\) cost per update for any pointer-based structure with any constant indegree and constant number of fields.

use one extra pointer per field in node

and one pointer per indegree

full persistence with same bounds.