Post

Heap 堆

Data Structure: Heap 数据结构——堆

Heap 堆

Heap

Heap is a data structure useful for data with priorities, and has a great performance for getting the data with the highest(lowest) priority and keeping them in order.

1 Definition

  • A max heap is a complete binary tree. The priority of each node is greater than or equal to those(if any) in its children.
  • A min heap is a complete binary tree. The priority of each node is less than or equal to those(if any) in its children.

We will only cover min heaps in the following discussions, as the only difference between min heaps and max heaps is the comparator.

2 API

1
2
3
4
5
6
7
Heap<T>{
    T top(); # returns the element with max(min) priority and removes it
    T peek(); # returns the element with max(min) priority, but doesn't affect the heap
    void insert(element,priority);
    void remove(element);
    void update(element,priority);
}

3 Implementation

3.1 General

The root node has the minimum priority of all nodes. Each node has the minimum priority in the subtree of which the root node is itself. A heap can be implemented using an array. Assuming each node has an index $i$, the index of its parent node(if present) is $(i-1)\div 2$, and the indices of its child nodes(if present) are $i\times 2+1$ and $i\times 2+2$.

3.2 siftUp()

When a new element is added to the end of a heap, the siftUp() method is called to ensure the new structure conforms to the definition of heap.
The process starts from the parent node of the newly added node. The parent node checks whether its children nodes have greater priorities than itself. If not, swap itself with the child node with the minimum priority and continue with its parent recursively to the root node; if so, exit the process.
Here is the pseudocode of the process. It is improved by minimizing the count of swaps:

1
2
3
4
5
6
7
8
9
10
function siftUp(heapArray,index=size-1):
    current := heapArray[index]
    while (index > 0):
        parentIndex := (index-1)/2 # calculating parent index, as mentioned in 3.1
        if (heapArray[parentIndex].priority > current.priority):
            heapArray[index] = heapArray[parentIndex]
            index = parentIndex
        else:
            break
    heapArray[index] = current

3.3 siftDown()

When an element is removed from the top of the heap, the node at the end of the heap is moved to the top. Then the method siftDown() is called to ensure the rest elements still make up a heap.
The process starts from the root node. For each node, find its child node with the minimum priority. If the child node has less priority than the parent, swap the two nodes and continue the process down with this swapped parent node; if so, exit the process.
Here is the pseudocode of this method:

1
2
3
4
5
6
7
8
9
10
function siftDown(heapArray,index=0):
    current := heapArray[index]
    while (index < (size-2)/2): # index of the first leaf node
        childIndex := getMinPriorityChildIndex(index)
        if (heapArray[childIndex].priority < current.priority):
            heapArray[index] = heapArray[childIndex]
            index = childIndex
        else:
            break
    heapArray[index] = current

3.4 insert()

We already have the method siftUp() to restore the qualities of a heap. Now when we need to add an element to a heap, we can simply append it to the heap and call siftUp().

1
2
3
function insert(element,heapArray):
    heapArray.append(element)
    siftUp(heapArray)

3.5 top()

This method is easy to implement as well since we have siftDown().

1
2
3
4
5
6
7
function top(heapArray):
    res := heapArray[0]
    heapArray[0] = heapArray[size-1]
    heapArray.removeEnd()
    siftDown(heapArray)

    return res

3.6 update()

Changing of priority can lead to definition violation of either upper or lower part of the heap, so different kind of remedy should be taken according to the new priority compared to the old.

1
2
3
4
5
6
7
8
9
function update(element,priority,heapArray):
    oldPriority := element.priority
    element.priority = priority

    if(priority > oldPriority):
        siftDown(heapArray,element.index)
    else:
        siftUp(heapArray,element.index)

3.7 heapify()

siftDown() shows that if the subtrees of a node are heaps, the method can be called on that node to make the whole tree, that node being the root, a heap. To heapify an array, we can start from the parent of leaf nodes, make them heaps, and make their parent nodes heaps recursively using siftDown().

1
2
3
function heapify(array):
    for( i := (array.size-2)/2; i >= 0; i--):
        siftDown(array,i)

4 Complexity

OperationTime ComplexitySpace Complexity
insert$O(log~n)$$O(1)$
top$O(log~n)$$O(1)$
peek$O(1)$$O(1)$
heapify$O(n)$$O(1)$

4.1 siftUp() and siftDown()

In siftUp(), each time the node compares to its parent node at $O(1)$, and continues the process with its parent. For a heap with size of n, it has $\lfloor log_2 n\rfloor$ layers of nodes, so the process will at most repeat $\lfloor log_2 n\rfloor$ times. Thus, siftUp() has a complexity of $O(log~n)$. Therefore insert() has a complexity of $O(log~n)$.
Similarly, we can see that siftDown() and top() has a complexity of $O(log~n)$.

4.2 heapify()

Roughly estimating, we need to call siftDown() $n\over{2}$ times in a single call of heapify(), and siftDown() has a complexity of $O(log~n)$, so the complexity of heapify() will not exceed $O(nlog~n)$.
Let’s take a deeper thinking. There are $n\over 2$ leaf nodes, each performs at most $1$ swapping. For nodes at $2^{nd}$ layer, they can perform at most $2$ swappings and there are $n\over 4$ of them.
Thus, we have the following calculation1 of the total operations:

\[\overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{n\over{2^{h+1}}}\rceil\centerdot O(h)= O(n\overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil)\]


\[\because \overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil \leq \overset{\infty}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil = 2\]


\[\therefore O(n\overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil) \leq O(2n) = O(n)\]


We have proven that the complexity of heapify() is $O(n)$.

5 Extensions

5.1 Priority Queue

A priority queue is a collection of zero or more elements. Each element has a priority or value.
In a min(max) priority queue, the find operation finds the element with the minimum(maximum) priority, while delete operation delete this element.
Priority Queue is implemented using a heap.

Queue is a FIFO(first-in-first-out) collection in the narrow sense, so Priority Queue is NOT a specific implementation of Queue. Actually, they differ quite much in implementations.

5.2 d-Heap

A d-heap(min d-heap) conforms to the following constrains:

  • Each node has at most d child nodes.
  • D-heaps are left-aligned complete trees, where the height of the $i^{th}$ subtree is at most the count of its brother nodes on its left.
  • Each node has the minimum priority in the subtree with itself being the root.

If $d$ is 1, the heap becomes a sorted array. Heapify() is then reduced to insertion sorting algorithm, and has a complexity of $O(n^2)$, other operations $O(n)$.



堆可以快速处理含有优先级的数据, 并可快速获得一个数据集的 最高(最低)优先级 数据.

1 定义

  • 一个最大堆是一个完全二叉树. 每个节点的优先级比其所有子节点(如果存在)都更大 或相等.
  • 一个最小堆是一个完全二叉树. 每个节点的优先级比其所有子节点(如果存在)都更小 或相等.

我们将只讨论 最小堆, 因为最小堆和最大堆仅有比较符号的不同.

2 API

1
2
3
4
5
6
7
Heap<T>{
    T top(); # 返回优先级最小的元素, 并删除它
    T peek(); # 返回优先级最小的元素, 不会影响堆结构
    void insert(element,priority);
    void remove(element);
    void update(element,priority);
}

3 实现

3.1 基本结构

根节点具有所有节点中最小的优先级. 每个节点在其自身为根的子树中具有最小的优先级. 堆可以使用数组实现. 假设每个节点的索引为 $i$, 其父节点(如果存在)的索引为 $(i-1)\div 2$, 子节点(如果存在)的索引为$i\times 2+1$$i\times 2+2$.

3.2 上滤操作 siftUp()

当向堆的末尾添加新元素时, 会调用siftUp()方法以确保添加后的结构仍然符合堆的定义.
该过程从新添加节点的父节点开始. 父节点检查其子节点的优先级是否大于自身. 如果不是, 则与具有最小优先级的子节点交换, 并递归地继续向上处理父节点直到根节点;如果是, 则退出.
以下是通过减少交换次数优化的伪代码:

1
2
3
4
5
6
7
8
9
10
function siftUp(heapArray,index=size-1):
    current := heapArray[index]
    while (index > 0):
        parentIndex := (index-1)/2 # calculating parent index, as mentioned in 3.1
        if (heapArray[parentIndex].priority > current.priority):
            heapArray[index] = heapArray[parentIndex]
            index = parentIndex
        else:
            break
    heapArray[index] = current

3.3 下滤操作 siftDown()

当移除堆顶元素时, 将堆末尾节点移动至堆顶, 随后调用siftDown()方法确保剩余元素仍构成堆.
该过程从根节点开始. 对每个节点, 寻找其子节点中优先级最小的节点. 若子节点优先级小于父节点, 则交换两者并继续向下处理;否则终止过程.
伪代码如下:

1
2
3
4
5
6
7
8
9
10
function siftDown(heapArray,index=0):
    current := heapArray[index]
    while (index < (size-2)/2): # index of the first leaf node
        childIndex := getMinPriorityChildIndex(index)
        if (heapArray[childIndex].priority < current.priority):
            heapArray[index] = heapArray[childIndex]
            index = childIndex
        else:
            break
    heapArray[index] = current

3.4 插入操作 insert()

通过append添加元素后调用siftUp()即可维护堆结构:

1
2
3
function insert(element,heapArray):
    heapArray.append(element)
    siftUp(heapArray)

3.5 取出堆顶 top()

利用siftDown()实现:

1
2
3
4
5
6
7
function top(heapArray): 
    res := heapArray[0]
    heapArray[0] = heapArray[size-1]
    heapArray.removeEnd()
    siftDown(heapArray)

    return res

3.6 更新操作 update()

优先级变更会破坏堆的结构, 但只会向上向下单一方向破坏堆结构. 根据新旧优先级的关系, 修复方向会不同.

1
2
3
4
5
6
7
8
9
function update(element,priority,heapArray):
    oldPriority := element.priority
    element.priority = priority

    if(priority > oldPriority):
        siftDown(heapArray,element.index)
    else:
        siftUp(heapArray,element.index)

3.7 堆化操作 heapify()

siftDown() 操作将一个所有子树都是堆的树变成一个堆. 而单个节点显然符合堆的定义. 因此我们可以从叶节点开始向上递归调用 siftDown()来将所有顶点都变成一个堆, 最终传递至根节点.

1
2
3
function heapify(array):
    for( i := (array.size-2)/2; i >= 0; i--):
        siftDown(array,i)

4 复杂度分析

操作时间复杂度空间复杂度
insert$O(log~n)$$O(1)$
top$O(log~n)$$O(1)$
peek$O(1)$$O(1)$
heapify$O(n)$$O(1)$

4.1 上滤 siftUp() 和下滤 siftDown() 的复杂度

siftUp()操作中, 节点每次与父节点进行$O(1)$复杂度比较后继续向上处理. 对于规模为n的堆, 其节点层数为$\lfloor log_2 n\rfloor$, 因此该过程最多重复$\lfloor log_2 n\rfloor$次. 故siftUp()的时间复杂度为$O(log~n)$, 进而insert()操作的时间复杂度为$O(log~n)$.
同理可证, siftDown()top()操作的时间复杂度均为$O(log~n)$.

4.2 堆化 heapify()

粗略估算, 单次heapify()需要调用siftDown()约$n\over{2}$次, 而siftDown()的时间复杂度为$O(log~n)$, 因此heapify()的复杂度上限为$O(nlog~n)$.
让我们深入思考一下. 堆中包含 $n\over 2$个叶节点, 每个最多执行$1$次交换;第二层的$n\over 4$个节点最多执行$2$次交换, 以此向上类推
由此可通过如下计算1得出总操作量:

\[\overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{n\over{2^{h+1}}}\rceil\centerdot O(h)= O(n\overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil)\]


\[\because \overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil \leq \overset{\infty}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil = 2\]


\[\therefore O(n\overset{\lfloor log~n\rfloor}{\underset{h=0}{\sum}}\lceil{h\over{2^{h}}}\rceil) \leq O(2n) = O(n)\]


由此我们可以证明, 堆化heapify()的复杂度为 $O(n)$.

5 扩展

5.1 优先队列

优先队列是由零个或多个元素组成的集合, 每个元素具有优先级属性.
在最小(最大)优先队列中, find操作定位最小(最大)优先级元素, delete操作删除该元素.
优先队列的底层实现基于堆结构.

狭义队列是一个先进先出的集合,所以优先队列并不是队列的一个特殊实现. 事实上它们的实现几乎没有任何关系.

5.2 d叉堆

d叉堆​(最小d叉堆)遵循以下约束条件:

  • 每个节点最多包含d个子节点
  • 保持左对齐的完全树结构, 第$i^{th}$子树的高度不超过其左侧兄弟节点数量
  • 节点在其自身为根的子树中保持最小优先级

当$d=1$时, 堆结构退化为有序数组, 此时heapify()操作退化为插入排序算法, 时间复杂度升至 $O(n^2)$, 其他操作复杂度增至$O(n)$.

References:

  1. 马塞洛·拉·罗卡(Marcello La Rocca)著; 肖鉴明译. 高级算法和数据结构. 北京: 人民邮电出版社, 2023:33. ↩︎ ↩︎2

This post is licensed under CC BY 4.0 by the author.