diff --git a/en/docs/chapter_hello_algo/index.md b/en/docs/chapter_hello_algo/index.md index fc65157df..0dc41f39e 100644 --- a/en/docs/chapter_hello_algo/index.md +++ b/en/docs/chapter_hello_algo/index.md @@ -9,13 +9,13 @@ A few years ago, I shared the "Sword for Offer" problem solutions on LeetCode, r Directly solving problems seems to be the most popular method — it's simple, direct, and effective. However, problem-solving is like playing a game of Minesweeper: those with strong self-study abilities can defuse the mines one by one, but those with insufficient basics might end up metaphorically bruised from explosions, retreating step by step in frustration. Going through textbooks is also common, but for those aiming for job applications, the energy spent on thesis writing, resume submissions, and preparation for written tests and interviews leaves little for tackling thick books, turning it into a daunting challenge. -If you're facing similar troubles, then this book are lucky to have found you. This book is my answer to the question. While it may not be the best solution, it is at least a positive attempt. This book may not directly land you an offer, but it will guide you through the "knowledge map" in data structures and algorithms, help you understand the shapes, sizes, and locations of different "mines," and enable you to master various "demining methods." With these skills, I believe you can solve problems and read literature more comfortably, gradually building a knowledge system. +If you're facing similar troubles, then this book is lucky to have found you. This book is my answer to the question. While it may not be the best solution, it is at least a positive attempt. This book may not directly land you an offer, but it will guide you through the "knowledge map" in data structures and algorithms, help you understand the shapes, sizes, and locations of different "mines," and enable you to master various "demining methods." With these skills, I believe you can solve problems and read literature more comfortably, gradually building a knowledge system. I deeply agree with Professor Feynman's statement: "Knowledge isn't free. You have to pay attention." In this sense, this book is not entirely "free." To not disappoint the precious "attention" you pay for this book, I will do my best, dedicating my utmost "attention" to this book. Knowing my limitations, although the content of this book has been refined over time, there are surely many errors remaining. I sincerely request critiques and corrections from all teachers and students. -![Hello Algorithms](../assets/covers/chapter_hello_algo.jpg){ class="cover-image" } +![Hello Algo](../assets/covers/chapter_hello_algo.jpg){ class="cover-image" }

Hello, Algo!

diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_example.png b/en/docs/chapter_searching/binary_search.assets/binary_search_example.png new file mode 100644 index 000000000..9051addbd Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_example.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_ranges.png b/en/docs/chapter_searching/binary_search.assets/binary_search_ranges.png new file mode 100644 index 000000000..93bc6a324 Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_ranges.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_step1.png b/en/docs/chapter_searching/binary_search.assets/binary_search_step1.png new file mode 100644 index 000000000..2b3d7af3c Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_step1.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_step2.png b/en/docs/chapter_searching/binary_search.assets/binary_search_step2.png new file mode 100644 index 000000000..606373006 Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_step2.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_step3.png b/en/docs/chapter_searching/binary_search.assets/binary_search_step3.png new file mode 100644 index 000000000..0184c3a23 Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_step3.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_step4.png b/en/docs/chapter_searching/binary_search.assets/binary_search_step4.png new file mode 100644 index 000000000..54e8e12c3 Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_step4.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_step5.png b/en/docs/chapter_searching/binary_search.assets/binary_search_step5.png new file mode 100644 index 000000000..9f57fb75b Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_step5.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_step6.png b/en/docs/chapter_searching/binary_search.assets/binary_search_step6.png new file mode 100644 index 000000000..8a9e4e436 Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_step6.png differ diff --git a/en/docs/chapter_searching/binary_search.assets/binary_search_step7.png b/en/docs/chapter_searching/binary_search.assets/binary_search_step7.png new file mode 100644 index 000000000..293a87837 Binary files /dev/null and b/en/docs/chapter_searching/binary_search.assets/binary_search_step7.png differ diff --git a/en/docs/chapter_searching/binary_search.md b/en/docs/chapter_searching/binary_search.md new file mode 100644 index 000000000..e84e63d6a --- /dev/null +++ b/en/docs/chapter_searching/binary_search.md @@ -0,0 +1,83 @@ +# Binary search + +Binary search is an efficient search algorithm based on the divide-and-conquer strategy. It utilizes the orderliness of data, reducing the search range by half each round until the target element is found or the search interval is empty. + +!!! question + + Given an array `nums` of length $n$, with elements arranged in ascending order and non-repeating. Please find and return the index of element `target` in this array. If the array does not contain the element, return $-1$. An example is shown below. + +![Binary search example data](binary_search.assets/binary_search_example.png) + +As shown in the figure below, we first initialize pointers $i = 0$ and $j = n - 1$, pointing to the first and last elements of the array, representing the search interval $[0, n - 1]$. Please note that square brackets indicate a closed interval, which includes the boundary values themselves. + +Next, perform the following two steps in a loop. + +1. Calculate the midpoint index $m = \lfloor {(i + j) / 2} \rfloor$, where $\lfloor \: \rfloor$ denotes the floor operation. +2. Compare the size of `nums[m]` and `target`, divided into the following three scenarios. + 1. If `nums[m] < target`, it indicates that `target` is in the interval $[m + 1, j]$, thus set $i = m + 1$. + 2. If `nums[m] > target`, it indicates that `target` is in the interval $[i, m - 1]$, thus set $j = m - 1$. + 3. If `nums[m] = target`, it indicates that `target` is found, thus return index $m$. + +If the array does not contain the target element, the search interval will eventually reduce to empty. In this case, return $-1$. + +=== "<1>" + ![Binary search process](binary_search.assets/binary_search_step1.png) + +=== "<2>" + ![binary_search_step2](binary_search.assets/binary_search_step2.png) + +=== "<3>" + ![binary_search_step3](binary_search.assets/binary_search_step3.png) + +=== "<4>" + ![binary_search_step4](binary_search.assets/binary_search_step4.png) + +=== "<5>" + ![binary_search_step5](binary_search.assets/binary_search_step5.png) + +=== "<6>" + ![binary_search_step6](binary_search.assets/binary_search_step6.png) + +=== "<7>" + ![binary_search_step7](binary_search.assets/binary_search_step7.png) + +It's worth noting that since $i$ and $j$ are both of type `int`, **$i + j$ might exceed the range of `int` type**. To avoid large number overflow, we usually use the formula $m = \lfloor {i + (j - i) / 2} \rfloor$ to calculate the midpoint. + +The code is as follows: + +```src +[file]{binary_search}-[class]{}-[func]{binary_search} +``` + +**Time complexity is $O(\log n)$** : In the binary loop, the interval reduces by half each round, hence the number of iterations is $\log_2 n$. + +**Space complexity is $O(1)$** : Pointers $i$ and $j$ use constant size space. + +## Interval representation methods + +Besides the aforementioned closed interval, a common interval representation is the "left-closed right-open" interval, defined as $[0, n)$, where the left boundary includes itself, and the right boundary does not include itself. In this representation, the interval $[i, j)$ is empty when $i = j$. + +We can implement a binary search algorithm with the same functionality based on this representation: + +```src +[file]{binary_search}-[class]{}-[func]{binary_search_lcro} +``` + +As shown in the figure below, in the two types of interval representations, the initialization of the binary search algorithm, the loop condition, and the narrowing interval operation are different. + +Since both boundaries in the "closed interval" representation are defined as closed, the operations to narrow the interval through pointers $i$ and $j$ are also symmetrical. This makes it less prone to errors, **therefore, it is generally recommended to use the "closed interval" approach**. + +![Two types of interval definitions](binary_search.assets/binary_search_ranges.png) + +## Advantages and limitations + +Binary search performs well in both time and space aspects. + +- Binary search is time-efficient. With large data volumes, the logarithmic time complexity has a significant advantage. For instance, when the data size $n = 2^{20}$, linear search requires $2^{20} = 1048576$ iterations, while binary search only requires $\log_2 2^{20} = 20$ iterations. +- Binary search does not require extra space. Compared to search algorithms that rely on additional space (like hash search), binary search is more space-efficient. + +However, binary search is not suitable for all situations, mainly for the following reasons. + +- Binary search is only applicable to ordered data. If the input data is unordered, it is not worth sorting it just to use binary search, as sorting algorithms typically have a time complexity of $O(n \log n)$, which is higher than both linear and binary search. For scenarios with frequent element insertion to maintain array order, inserting elements into specific positions has a time complexity of $O(n)$, which is also quite costly. +- Binary search is only applicable to arrays. Binary search requires non-continuous (jumping) element access, which is inefficient in linked lists, thus not suitable for use in linked lists or data structures based on linked lists. +- With small data volumes, linear search performs better. In linear search, each round only requires 1 decision operation; whereas in binary search, it involves 1 addition, 1 division, 1 to 3 decision operations, 1 addition (subtraction), totaling 4 to 6 operations; therefore, when data volume $n$ is small, linear search can be faster than binary search. diff --git a/en/docs/chapter_searching/binary_search_edge.assets/binary_search_edge_by_element.png b/en/docs/chapter_searching/binary_search_edge.assets/binary_search_edge_by_element.png new file mode 100644 index 000000000..b2799062a Binary files /dev/null and b/en/docs/chapter_searching/binary_search_edge.assets/binary_search_edge_by_element.png differ diff --git a/en/docs/chapter_searching/binary_search_edge.assets/binary_search_right_edge_by_left_edge.png b/en/docs/chapter_searching/binary_search_edge.assets/binary_search_right_edge_by_left_edge.png new file mode 100644 index 000000000..6f10b838e Binary files /dev/null and b/en/docs/chapter_searching/binary_search_edge.assets/binary_search_right_edge_by_left_edge.png differ diff --git a/en/docs/chapter_searching/binary_search_edge.md b/en/docs/chapter_searching/binary_search_edge.md new file mode 100644 index 000000000..3af95f050 --- /dev/null +++ b/en/docs/chapter_searching/binary_search_edge.md @@ -0,0 +1,56 @@ +# Binary search boundaries + +## Find the left boundary + +!!! question + + Given a sorted array `nums` of length $n$, which may contain duplicate elements, return the index of the leftmost element `target`. If the element is not present in the array, return $-1$. + +Recall the method of binary search for an insertion point, after the search is completed, $i$ points to the leftmost `target`, **thus searching for the insertion point is essentially searching for the index of the leftmost `target`**. + +Consider implementing the search for the left boundary using the function for finding an insertion point. Note that the array might not contain `target`, which could lead to the following two results: + +- The index $i$ of the insertion point is out of bounds. +- The element `nums[i]` is not equal to `target`. + +In these cases, simply return $-1$. The code is as follows: + +```src +[file]{binary_search_edge}-[class]{}-[func]{binary_search_left_edge} +``` + +## Find the right boundary + +So how do we find the rightmost `target`? The most straightforward way is to modify the code, replacing the pointer contraction operation in the case of `nums[m] == target`. The code is omitted here, but interested readers can implement it on their own. + +Below we introduce two more cunning methods. + +### Reusing the search for the left boundary + +In fact, we can use the function for finding the leftmost element to find the rightmost element, specifically by **transforming the search for the rightmost `target` into a search for the leftmost `target + 1`**. + +As shown in the figure below, after the search is completed, the pointer $i$ points to the leftmost `target + 1` (if it exists), while $j$ points to the rightmost `target`, **thus returning $j$ is sufficient**. + +![Transforming the search for the right boundary into the search for the left boundary](binary_search_edge.assets/binary_search_right_edge_by_left_edge.png) + +Please note, the insertion point returned is $i$, therefore, it should be subtracted by $1$ to obtain $j$: + +```src +[file]{binary_search_edge}-[class]{}-[func]{binary_search_right_edge} +``` + +### Transforming into an element search + +We know that when the array does not contain `target`, $i$ and $j$ will eventually point to the first element greater and smaller than `target` respectively. + +Thus, as shown in the figure below, we can construct an element that does not exist in the array, to search for the left and right boundaries. + +- To find the leftmost `target`: it can be transformed into searching for `target - 0.5`, and return the pointer $i$. +- To find the rightmost `target`: it can be transformed into searching for `target + 0.5`, and return the pointer $j$. + +![Transforming the search for boundaries into the search for an element](binary_search_edge.assets/binary_search_edge_by_element.png) + +The code is omitted here, but two points are worth noting. + +- The given array does not contain decimals, meaning we do not need to worry about how to handle equal situations. +- Since this method introduces decimals, the variable `target` in the function needs to be changed to a floating point type (no change needed in Python). diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_example.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_example.png new file mode 100644 index 000000000..3803db33d Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_example.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_naive.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_naive.png new file mode 100644 index 000000000..b47ffd79a Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_naive.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step1.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step1.png new file mode 100644 index 000000000..7717114a2 Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step1.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step2.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step2.png new file mode 100644 index 000000000..13b9fec08 Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step2.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step3.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step3.png new file mode 100644 index 000000000..4f26ada24 Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step3.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step4.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step4.png new file mode 100644 index 000000000..262aebc6d Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step4.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step5.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step5.png new file mode 100644 index 000000000..2c739e9f2 Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step5.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step6.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step6.png new file mode 100644 index 000000000..56433a013 Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step6.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step7.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step7.png new file mode 100644 index 000000000..127eb70fc Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step7.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step8.png b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step8.png new file mode 100644 index 000000000..d322ffc3c Binary files /dev/null and b/en/docs/chapter_searching/binary_search_insertion.assets/binary_search_insertion_step8.png differ diff --git a/en/docs/chapter_searching/binary_search_insertion.md b/en/docs/chapter_searching/binary_search_insertion.md new file mode 100644 index 000000000..8eb4a06b0 --- /dev/null +++ b/en/docs/chapter_searching/binary_search_insertion.md @@ -0,0 +1,91 @@ +# Binary search insertion + +Binary search is not only used to search for target elements but also to solve many variant problems, such as searching for the insertion position of target elements. + +## Case with no duplicate elements + +!!! question + + Given an ordered array `nums` of length $n$ and an element `target`, where the array has no duplicate elements. Now insert `target` into the array `nums` while maintaining its order. If the element `target` already exists in the array, insert it to its left side. Please return the index of `target` in the array after insertion. See the example shown in the figure below. + +![Example data for binary search insertion point](binary_search_insertion.assets/binary_search_insertion_example.png) + +If you want to reuse the binary search code from the previous section, you need to answer the following two questions. + +**Question one**: When the array contains `target`, is the insertion point index the index of that element? + +The requirement to insert `target` to the left of equal elements means that the newly inserted `target` replaces the original `target` position. Thus, **when the array contains `target`, the insertion point index is the index of that `target`**. + +**Question two**: When the array does not contain `target`, what is the index of the insertion point? + +Further consider the binary search process: when `nums[m] < target`, pointer $i$ moves, meaning that pointer $i$ is approaching an element greater than or equal to `target`. Similarly, pointer $j$ is always approaching an element less than or equal to `target`. + +Therefore, at the end of the binary, it is certain that: $i$ points to the first element greater than `target`, and $j$ points to the first element less than `target`. **It is easy to see that when the array does not contain `target`, the insertion index is $i$**. The code is as follows: + +```src +[file]{binary_search_insertion}-[class]{}-[func]{binary_search_insertion_simple} +``` + +## Case with duplicate elements + +!!! question + + Based on the previous question, assume the array may contain duplicate elements, all else remains the same. + +Suppose there are multiple `target`s in the array, ordinary binary search can only return the index of one of the `target`s, **and it cannot determine how many `target`s are to the left and right of that element**. + +The task requires inserting the target element to the very left, **so we need to find the index of the leftmost `target` in the array**. Initially consider implementing this through the steps shown in the figure below. + +1. Perform a binary search, get an arbitrary index of `target`, denoted as $k$. +2. Start from index $k$, and perform a linear search to the left until the leftmost `target` is found and return. + +![Linear search for the insertion point of duplicate elements](binary_search_insertion.assets/binary_search_insertion_naive.png) + +Although this method is feasible, it includes linear search, so its time complexity is $O(n)$. This method is inefficient when the array contains many duplicate `target`s. + +Now consider extending the binary search code. As shown in the figure below, the overall process remains the same, each round first calculates the midpoint index $m$, then judges the size relationship between `target` and `nums[m]`, divided into the following cases. + +- When `nums[m] < target` or `nums[m] > target`, it means `target` has not been found yet, thus use the normal binary search interval reduction operation, **thus making pointers $i$ and $j$ approach `target`**. +- When `nums[m] == target`, it indicates that the elements less than `target` are in the interval $[i, m - 1]$, therefore use $j = m - 1$ to narrow the interval, **thus making pointer $j$ approach elements less than `target`**. + +After the loop, $i$ points to the leftmost `target`, and $j$ points to the first element less than `target`, **therefore index $i$ is the insertion point**. + +=== "<1>" + ![Steps for binary search insertion point of duplicate elements](binary_search_insertion.assets/binary_search_insertion_step1.png) + +=== "<2>" + ![binary_search_insertion_step2](binary_search_insertion.assets/binary_search_insertion_step2.png) + +=== "<3>" + ![binary_search_insertion_step3](binary_search_insertion.assets/binary_search_insertion_step3.png) + +=== "<4>" + ![binary_search_insertion_step4](binary_search_insertion.assets/binary_search_insertion_step4.png) + +=== "<5>" + ![binary_search_insertion_step5](binary_search_insertion.assets/binary_search_insertion_step5.png) + +=== "<6>" + ![binary_search_insertion_step6](binary_search_insertion.assets/binary_search_insertion_step6.png) + +=== "<7>" + ![binary_search_insertion_step7](binary_search_insertion.assets/binary_search_insertion_step7.png) + +=== "<8>" + ![binary_search_insertion_step8](binary_search_insertion.assets/binary_search_insertion_step8.png) + +Observe the code, the operations of the branch `nums[m] > target` and `nums[m] == target` are the same, so the two can be combined. + +Even so, we can still keep the conditions expanded, as their logic is clearer and more readable. + +```src +[file]{binary_search_insertion}-[class]{}-[func]{binary_search_insertion} +``` + +!!! tip + + The code in this section uses "closed intervals". Readers interested can implement the "left-closed right-open" method themselves. + +In summary, binary search is merely about setting search targets for pointers $i$ and $j$, which might be a specific element (like `target`) or a range of elements (like elements less than `target`). + +In the continuous loop of binary search, pointers $i$ and $j$ gradually approach the predefined target. Ultimately, they either find the answer or stop after crossing the boundary. diff --git a/en/docs/chapter_searching/index.md b/en/docs/chapter_searching/index.md new file mode 100644 index 000000000..f8e3b8bc8 --- /dev/null +++ b/en/docs/chapter_searching/index.md @@ -0,0 +1,9 @@ +# Searching + +![Searching](../assets/covers/chapter_searching.jpg) + +!!! abstract + + Searching is an unknown adventure, where we may need to traverse every corner of a mysterious space, or perhaps quickly pinpoint our target. + + In this journey of discovery, each exploration may yield an unexpected answer. diff --git a/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_brute_force.png b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_brute_force.png new file mode 100644 index 000000000..33fca976a Binary files /dev/null and b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_brute_force.png differ diff --git a/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step1.png b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step1.png new file mode 100644 index 000000000..df323dd41 Binary files /dev/null and b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step1.png differ diff --git a/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step2.png b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step2.png new file mode 100644 index 000000000..bed0ac02f Binary files /dev/null and b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step2.png differ diff --git a/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step3.png b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step3.png new file mode 100644 index 000000000..accf61def Binary files /dev/null and b/en/docs/chapter_searching/replace_linear_by_hashing.assets/two_sum_hashtable_step3.png differ diff --git a/en/docs/chapter_searching/replace_linear_by_hashing.md b/en/docs/chapter_searching/replace_linear_by_hashing.md new file mode 100644 index 000000000..f02c65ade --- /dev/null +++ b/en/docs/chapter_searching/replace_linear_by_hashing.md @@ -0,0 +1,47 @@ +# Hash optimization strategies + +In algorithm problems, **we often reduce the time complexity of algorithms by replacing linear search with hash search**. Let's use an algorithm problem to deepen understanding. + +!!! question + + Given an integer array `nums` and a target element `target`, please search for two elements in the array whose "sum" equals `target`, and return their array indices. Any solution is acceptable. + +## Linear search: trading time for space + +Consider traversing all possible combinations directly. As shown in the figure below, we initiate a two-layer loop, and in each round, we determine whether the sum of the two integers equals `target`. If so, we return their indices. + +![Linear search solution for two-sum problem](replace_linear_by_hashing.assets/two_sum_brute_force.png) + +The code is shown below: + +```src +[file]{two_sum}-[class]{}-[func]{two_sum_brute_force} +``` + +This method has a time complexity of $O(n^2)$ and a space complexity of $O(1)$, which is very time-consuming with large data volumes. + +## Hash search: trading space for time + +Consider using a hash table, with key-value pairs being the array elements and their indices, respectively. Loop through the array, performing the steps shown in the figures below each round. + +1. Check if the number `target - nums[i]` is in the hash table. If so, directly return the indices of these two elements. +2. Add the key-value pair `nums[i]` and index `i` to the hash table. + +=== "<1>" + ![Help hash table solve two-sum](replace_linear_by_hashing.assets/two_sum_hashtable_step1.png) + +=== "<2>" + ![two_sum_hashtable_step2](replace_linear_by_hashing.assets/two_sum_hashtable_step2.png) + +=== "<3>" + ![two_sum_hashtable_step3](replace_linear_by_hashing.assets/two_sum_hashtable_step3.png) + +The implementation code is shown below, requiring only a single loop: + +```src +[file]{two_sum}-[class]{}-[func]{two_sum_hash_table} +``` + +This method reduces the time complexity from $O(n^2)$ to $O(n)$ by using hash search, greatly improving the running efficiency. + +As it requires maintaining an additional hash table, the space complexity is $O(n)$. **Nevertheless, this method has a more balanced time-space efficiency overall, making it the optimal solution for this problem**. diff --git a/en/docs/chapter_searching/searching_algorithm_revisited.assets/searching_algorithms.png b/en/docs/chapter_searching/searching_algorithm_revisited.assets/searching_algorithms.png new file mode 100644 index 000000000..fff1066ff Binary files /dev/null and b/en/docs/chapter_searching/searching_algorithm_revisited.assets/searching_algorithms.png differ diff --git a/en/docs/chapter_searching/searching_algorithm_revisited.md b/en/docs/chapter_searching/searching_algorithm_revisited.md new file mode 100644 index 000000000..6a88770ef --- /dev/null +++ b/en/docs/chapter_searching/searching_algorithm_revisited.md @@ -0,0 +1,84 @@ +# Search algorithms revisited + +Searching algorithms (searching algorithm) are used to search for one or several elements that meet specific criteria in data structures such as arrays, linked lists, trees, or graphs. + +Searching algorithms can be divided into the following two categories based on their implementation approaches. + +- **Locating the target element by traversing the data structure**, such as traversals of arrays, linked lists, trees, and graphs, etc. +- **Using the organizational structure of the data or the prior information contained in the data to achieve efficient element search**, such as binary search, hash search, and binary search tree search, etc. + +It is not difficult to notice that these topics have been introduced in previous chapters, so searching algorithms are not unfamiliar to us. In this section, we will revisit searching algorithms from a more systematic perspective. + +## Brute-force search + +Brute-force search locates the target element by traversing every element of the data structure. + +- "Linear search" is suitable for linear data structures such as arrays and linked lists. It starts from one end of the data structure, accesses each element one by one, until the target element is found or the other end is reached without finding the target element. +- "Breadth-first search" and "Depth-first search" are two traversal strategies for graphs and trees. Breadth-first search starts from the initial node and searches layer by layer, accessing nodes from near to far. Depth-first search starts from the initial node, follows a path until the end, then backtracks and tries other paths until the entire data structure is traversed. + +The advantage of brute-force search is its simplicity and versatility, **no need for data preprocessing and the help of additional data structures**. + +However, **the time complexity of this type of algorithm is $O(n)$**, where $n$ is the number of elements, so the performance is poor in cases of large data volumes. + +## Adaptive search + +Adaptive search uses the unique properties of data (such as order) to optimize the search process, thereby locating the target element more efficiently. + +- "Binary search" uses the orderliness of data to achieve efficient searching, only suitable for arrays. +- "Hash search" uses a hash table to establish a key-value mapping between search data and target data, thus implementing the query operation. +- "Tree search" in a specific tree structure (such as a binary search tree), quickly eliminates nodes based on node value comparisons, thus locating the target element. + +The advantage of these algorithms is high efficiency, **with time complexities reaching $O(\log n)$ or even $O(1)$**. + +However, **using these algorithms often requires data preprocessing**. For example, binary search requires sorting the array in advance, and hash search and tree search both require the help of additional data structures, maintaining these structures also requires extra time and space overhead. + +!!! tip + + Adaptive search algorithms are often referred to as search algorithms, **mainly used for quickly retrieving target elements in specific data structures**. + +## Choosing a search method + +Given a set of data of size $n$, we can use linear search, binary search, tree search, hash search, and other methods to search for the target element from it. The working principles of these methods are shown in the following figure. + +![Various search strategies](searching_algorithm_revisited.assets/searching_algorithms.png) + +The operation efficiency and characteristics of the aforementioned methods are shown in the following table. + +

Table   Comparison of search algorithm efficiency

+ +| | Linear search | Binary search | Tree search | Hash search | +| ------------------ | ------------- | --------------------- | --------------------------- | -------------------------- | +| Search element | $O(n)$ | $O(\log n)$ | $O(\log n)$ | $O(1)$ | +| Insert element | $O(1)$ | $O(n)$ | $O(\log n)$ | $O(1)$ | +| Delete element | $O(n)$ | $O(n)$ | $O(\log n)$ | $O(1)$ | +| Extra space | $O(1)$ | $O(1)$ | $O(n)$ | $O(n)$ | +| Data preprocessing | / | Sorting $O(n \log n)$ | Building tree $O(n \log n)$ | Building hash table $O(n)$ | +| Data orderliness | Unordered | Ordered | Ordered | Unordered | + +The choice of search algorithm also depends on the volume of data, search performance requirements, data query and update frequency, etc. + +**Linear search** + +- Good versatility, no need for any data preprocessing operations. If we only need to query the data once, then the time for data preprocessing in the other three methods would be longer than the time for linear search. +- Suitable for small volumes of data, where time complexity has a smaller impact on efficiency. +- Suitable for scenarios with high data update frequency, because this method does not require any additional maintenance of the data. + +**Binary search** + +- Suitable for large data volumes, with stable efficiency performance, the worst time complexity being $O(\log n)$. +- The data volume cannot be too large, because storing arrays requires contiguous memory space. +- Not suitable for scenarios with frequent additions and deletions, because maintaining an ordered array incurs high overhead. + +**Hash search** + +- Suitable for scenarios with high query performance requirements, with an average time complexity of $O(1)$. +- Not suitable for scenarios needing ordered data or range searches, because hash tables cannot maintain data orderliness. +- High dependency on hash functions and hash collision handling strategies, with significant performance degradation risks. +- Not suitable for overly large data volumes, because hash tables need extra space to minimize collisions and provide good query performance. + +**Tree search** + +- Suitable for massive data, because tree nodes are stored scattered in memory. +- Suitable for maintaining ordered data or range searches. +- In the continuous addition and deletion of nodes, the binary search tree may become skewed, degrading the time complexity to $O(n)$. +- If using AVL trees or red-black trees, operations can run stably at $O(\log n)$ efficiency, but the operation to maintain tree balance adds extra overhead. diff --git a/en/docs/chapter_searching/summary.md b/en/docs/chapter_searching/summary.md new file mode 100644 index 000000000..7c49cbdab --- /dev/null +++ b/en/docs/chapter_searching/summary.md @@ -0,0 +1,8 @@ +# Summary + +- Binary search depends on the order of data and performs the search by iteratively halving the search interval. It requires the input data to be sorted and is only applicable to arrays or array-based data structures. +- Brute force search locates data by traversing the data structure. Linear search is suitable for arrays and linked lists, while breadth-first search and depth-first search are suitable for graphs and trees. These algorithms are highly versatile, requiring no preprocessing of data, but have a higher time complexity of $O(n)$. +- Hash search, tree search, and binary search are efficient searching methods, capable of quickly locating target elements in specific data structures. These algorithms are highly efficient, with time complexities reaching $O(\log n)$ or even $O(1)$, but they usually require additional data structures. +- In practice, we need to analyze factors such as data volume, search performance requirements, data query and update frequencies, etc., to choose the appropriate search method. +- Linear search is suitable for small or frequently updated data; binary search is suitable for large, sorted data; hash search is suitable for scenarios requiring high query efficiency without the need for range queries; tree search is appropriate for large dynamic data that needs to maintain order and support range queries. +- Replacing linear search with hash search is a common strategy to optimize runtime, reducing the time complexity from $O(n)$ to $O(1)$.