hello-algo/en/docs/chapter_heap/top_k.md

235 lines
7.1 KiB
Markdown
Raw Normal View History

2024-04-02 19:00:01 +08:00
---
comments: true
---
# 8.3   Top-k problem
!!! question
Given an unordered array `nums` of length $n$, return the largest $k$ elements in the array.
For this problem, we will first introduce two straightforward solutions, then explain a more efficient heap-based method.
## 8.3.1   Method 1: Iterative selection
2024-05-01 07:30:10 +08:00
We can perform $k$ rounds of iterations as shown in Figure 8-6, extracting the $1^{st}$, $2^{nd}$, $\dots$, $k^{th}$ largest elements in each round, with a time complexity of $O(nk)$.
2024-04-02 19:00:01 +08:00
This method is only suitable when $k \ll n$, as the time complexity approaches $O(n^2)$ when $k$ is close to $n$, which is very time-consuming.
![Iteratively finding the largest k elements](top_k.assets/top_k_traversal.png){ class="animation-figure" }
<p align="center"> Figure 8-6 &nbsp; Iteratively finding the largest k elements </p>
!!! tip
When $k = n$, we can obtain a complete ordered sequence, which is equivalent to the "selection sort" algorithm.
## 8.3.2 &nbsp; Method 2: Sorting
2024-05-01 07:30:10 +08:00
As shown in Figure 8-7, we can first sort the array `nums` and then return the last $k$ elements, with a time complexity of $O(n \log n)$.
2024-04-02 19:00:01 +08:00
Clearly, this method "overachieves" the task, as we only need to find the largest $k$ elements, without the need to sort the other elements.
![Sorting to find the largest k elements](top_k.assets/top_k_sorting.png){ class="animation-figure" }
<p align="center"> Figure 8-7 &nbsp; Sorting to find the largest k elements </p>
## 8.3.3 &nbsp; Method 3: Heap
We can solve the Top-k problem more efficiently based on heaps, as shown in the following process.
1. Initialize a min heap, where the top element is the smallest.
2. First, insert the first $k$ elements of the array into the heap.
3. Starting from the $k + 1^{th}$ element, if the current element is greater than the top element of the heap, remove the top element of the heap and insert the current element into the heap.
4. After completing the traversal, the heap contains the largest $k$ elements.
=== "<1>"
![Find the largest k elements based on heap](top_k.assets/top_k_heap_step1.png){ class="animation-figure" }
=== "<2>"
![top_k_heap_step2](top_k.assets/top_k_heap_step2.png){ class="animation-figure" }
=== "<3>"
![top_k_heap_step3](top_k.assets/top_k_heap_step3.png){ class="animation-figure" }
=== "<4>"
![top_k_heap_step4](top_k.assets/top_k_heap_step4.png){ class="animation-figure" }
=== "<5>"
![top_k_heap_step5](top_k.assets/top_k_heap_step5.png){ class="animation-figure" }
=== "<6>"
![top_k_heap_step6](top_k.assets/top_k_heap_step6.png){ class="animation-figure" }
=== "<7>"
![top_k_heap_step7](top_k.assets/top_k_heap_step7.png){ class="animation-figure" }
=== "<8>"
![top_k_heap_step8](top_k.assets/top_k_heap_step8.png){ class="animation-figure" }
=== "<9>"
![top_k_heap_step9](top_k.assets/top_k_heap_step9.png){ class="animation-figure" }
<p align="center"> Figure 8-8 &nbsp; Find the largest k elements based on heap </p>
Example code is as follows:
=== "Python"
```python title="top_k.py"
def top_k_heap(nums: list[int], k: int) -> list[int]:
2024-05-06 05:27:10 +08:00
"""Using heap to find the largest k elements in an array"""
# Initialize min-heap
2024-04-02 19:00:01 +08:00
heap = []
2024-05-06 05:27:10 +08:00
# Enter the first k elements of the array into the heap
2024-04-02 19:00:01 +08:00
for i in range(k):
heapq.heappush(heap, nums[i])
2024-05-06 05:27:10 +08:00
# From the k+1th element, keep the heap length as k
2024-04-02 19:00:01 +08:00
for i in range(k, len(nums)):
2024-05-06 05:27:10 +08:00
# If the current element is larger than the heap top element, remove the heap top element and enter the current element into the heap
2024-04-02 19:00:01 +08:00
if nums[i] > heap[0]:
heapq.heappop(heap)
heapq.heappush(heap, nums[i])
return heap
```
=== "C++"
```cpp title="top_k.cpp"
2024-05-06 14:40:36 +08:00
/* Using heap to find the largest k elements in an array */
priority_queue<int, vector<int>, greater<int>> topKHeap(vector<int> &nums, int k) {
// Initialize min-heap
priority_queue<int, vector<int>, greater<int>> heap;
// Enter the first k elements of the array into the heap
for (int i = 0; i < k; i++) {
heap.push(nums[i]);
}
// From the k+1th element, keep the heap length as k
for (int i = k; i < nums.size(); i++) {
// If the current element is larger than the heap top element, remove the heap top element and enter the current element into the heap
if (nums[i] > heap.top()) {
heap.pop();
heap.push(nums[i]);
}
}
return heap;
}
2024-04-02 19:00:01 +08:00
```
=== "Java"
```java title="top_k.java"
2024-05-06 05:27:10 +08:00
/* Using heap to find the largest k elements in an array */
2024-04-02 19:00:01 +08:00
Queue<Integer> topKHeap(int[] nums, int k) {
2024-05-06 05:27:10 +08:00
// Initialize min-heap
2024-04-02 19:00:01 +08:00
Queue<Integer> heap = new PriorityQueue<Integer>();
2024-05-06 05:27:10 +08:00
// Enter the first k elements of the array into the heap
2024-04-02 19:00:01 +08:00
for (int i = 0; i < k; i++) {
heap.offer(nums[i]);
}
2024-05-06 05:27:10 +08:00
// From the k+1th element, keep the heap length as k
2024-04-02 19:00:01 +08:00
for (int i = k; i < nums.length; i++) {
2024-05-06 05:27:10 +08:00
// If the current element is larger than the heap top element, remove the heap top element and enter the current element into the heap
2024-04-02 19:00:01 +08:00
if (nums[i] > heap.peek()) {
heap.poll();
heap.offer(nums[i]);
}
}
return heap;
}
```
=== "C#"
```csharp title="top_k.cs"
2024-05-06 05:27:10 +08:00
[class]{top_k}-[func]{TopKHeap}
2024-04-02 19:00:01 +08:00
```
=== "Go"
```go title="top_k.go"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{topKHeap}
2024-04-02 19:00:01 +08:00
```
=== "Swift"
```swift title="top_k.swift"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{topKHeap}
2024-04-02 19:00:01 +08:00
```
=== "JS"
```javascript title="top_k.js"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{pushMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{popMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{peekMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{getMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{topKHeap}
2024-04-02 19:00:01 +08:00
```
=== "TS"
```typescript title="top_k.ts"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{pushMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{popMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{peekMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{getMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{topKHeap}
2024-04-02 19:00:01 +08:00
```
=== "Dart"
```dart title="top_k.dart"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{topKHeap}
2024-04-02 19:00:01 +08:00
```
=== "Rust"
```rust title="top_k.rs"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{top_k_heap}
2024-04-02 19:00:01 +08:00
```
=== "C"
```c title="top_k.c"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{pushMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{popMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{peekMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{getMinHeap}
2024-04-02 19:00:01 +08:00
2024-05-06 05:27:10 +08:00
[class]{}-[func]{topKHeap}
2024-04-02 19:00:01 +08:00
```
=== "Kotlin"
```kotlin title="top_k.kt"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{topKHeap}
2024-04-02 19:00:01 +08:00
```
=== "Ruby"
```ruby title="top_k.rb"
2024-05-06 05:27:10 +08:00
[class]{}-[func]{top_k_heap}
2024-04-02 19:00:01 +08:00
```
=== "Zig"
```zig title="top_k.zig"
[class]{}-[func]{topKHeap}
```
A total of $n$ rounds of heap insertions and deletions are performed, with the maximum heap size being $k$, hence the time complexity is $O(n \log k)$. This method is very efficient; when $k$ is small, the time complexity tends towards $O(n)$; when $k$ is large, the time complexity will not exceed $O(n \log n)$.
Additionally, this method is suitable for scenarios with dynamic data streams. By continuously adding data, we can maintain the elements within the heap, thereby achieving dynamic updates of the largest $k$ elements.