AVL tree

An AVL tree is a self-balancing binary search tree where the height difference between the left and right subtree of any node is at most 1.
AVL trees guarantee for search, insert, and delete.
Named after its inventors Adelson-Velsky and Landis (1962).

Balance Factor

The balance factor of a node is defined as the height of its left subtree minus the height of its right subtree.

BF > 0: left-heavy (left subtree taller)
BF = 0: balanced (equal heights)
BF < 0: right-heavy (right subtree taller)

AVL property: for every node.
If this property is violated after an insertion or deletion, rotations are performed to restore balance.

Insertion

Standard BST insertion
Update heights of ancestors bottom-up
Find the first unbalanced node (if any) on the path from the inserted node to root
Rebalance using rotation(s)/cut-and-link algorithm
After fixing the lowest unbalanced node, the tree is guaranteed to be balanced

Deletion

Perform standard BST deletion
Udpate heights of ancestors bottom-up
Find the first unbalanced node (if any) on the path from the deleted node to root
Rebalance using rotation(s)/cut-and-link algorithm
May need to rebalance parents of the unbalanced node as well, continue up to root

Rebalancing Intuition

An imbalance always involves three nodes (x, y, z) and four subtrees (). Example (“left left” case):

  z (BF = +2)
  / \
     y   T3
    / \
   x   T2
  / \
  T0 T1

The rebalancing algorithm rearranges these 7 parts into the canonical balanced form:

 b
/   \
a	 c
  / \   / \
 T0 T1 T2 T3

where (a, b, c) is the in-order sequence of (x, y, z), and the subtrees are placed in sorted order.

Rebalancing

Given three nodes: z (grandparent, unbalanced node with BF = ±2), y (taller child of z), x (taller child of y),

  1. Let be the in-order sequence of nodes

  2. Let be the in-order sequence of their four subtrees

    Rotation algorithm:

  3. Replace subtree rooted at with subtree rooted at

  4. Make the left child of with subtrees (left) and (right)

  5. Make the right child of with subtrees (left) and (right)

    The same algorithm works for all four possible imbalances - only the mapping to differs:

 Unbalanced node z with BF = +2 (left-heavy):
   - Child y has BF ≥ 0 (straight line)  → LL case: Single rotation
   - Child y has BF < 0  (zigzag)		→ LR case: Double rotation
 Unbalanced node z with BF = -2 (right-heavy):
   - Child y has BF ≤ 0 (straight line)  → RR case: Single rotation
   - Child y has BF > 0  (zigzag)		→ RL case: Double rotation

Cut/Link implementation:

Alternative implementation that avoids case distinction:

  1. Number the 7 parts (x, y, z and ) by in-order traversal positions 1-7

  2. Create array with indices 1-7

  3. “Cut” the subtrees the inorder array:

  4. “Link” by making element at position 4 (node ) the new root

  5. Make elements at positions 2 and 6 (nodes and ) the left and right children

  6. Attach subtrees from positions 1, 3, 5, 7 as children of positions 2, 4, 6

Advantage: “Cleaner” algorithm with no case distinction. “Disadvantage: May require more code”.
Same time complexity.

Height updates after rebalancing:
Update heights by recomputing from bottom-up (post-order)

Logarithmic height bound

An AVL tree of height has at least exponentially many nodes, which implies logarithmic height for nodes.

The recurrence

Let = minimum nodes in an AVL tree of height .

A minimal tree of height has:

  • Root node
  • Minimal subtree of height (taller child)
  • Minimal subtree of height (shorter child, forced by balance constraint)

with .

This is the fibonacci recurrence shifted by 1, giving: compared to Fibonacci . Specifically, .

Exponential lower bound

From the recurrence, for .

Applying this repeatedly:

Choosing gives .

Height bound

For a tree with nodes and height :

Taking logarithms:

Therefore:

The constant factor is tight: AVL height is at most , compared to perfectly balanced trees at . In practice, a tree with billion nodes has height .