Compare commits

...

5 commits

Author SHA1 Message Date
RafaelCaso
97b8481a79
Merge 5ec15c4a3e into 6348dbe18d 2024-12-04 17:51:58 +08:00
Thomas
6348dbe18d
translation: binary search updates (#1569)
* translation: binary search updates

* fix minor gramma and expression issues
2024-12-04 17:48:48 +08:00
YuZou
ca774eefbf
Update time_complexity.md (#1578)
* Bug fixes and improvements (#1577)
 * correct the implement of exp_recur function and remove +1 operation from the function to simulate the cell division process

* Update time_complexity.rs

* Update time_complexity.md

---------

Co-authored-by: zouy26 <zouy26@chinaunicom.cn>
Co-authored-by: Yudong Jin <krahets@163.com>
2024-12-04 17:36:11 +08:00
RafaelCaso
5ec15c4a3e
Update searching_algorithm_revisited.md
fix typo in original commit
2024-11-11 10:35:50 -05:00
RafaelCaso
a430cc2607
Update searching_algorithm_revisited.md 2024-11-11 10:00:38 -05:00
3 changed files with 40 additions and 40 deletions

View file

@ -1120,7 +1120,7 @@ $$
生物学的“细胞分裂”是指数阶增长的典型例子:初始状态为 $1$ 个细胞,分裂一轮后变为 $2$ 个,分裂两轮后变为 $4$ 个,以此类推,分裂 $n$ 轮后有 $2^n$ 个细胞。
下图和以下代码模拟了细胞分裂的过程,时间复杂度为 $O(2^n)$
下图和以下代码模拟了细胞分裂的过程,时间复杂度为 $O(2^n)$ 。请注意,输入 $n$ 表示分裂轮数,返回值 `count` 表示总分裂次数。
```src
[file]{time_complexity}-[class]{}-[func]{exponential}

View file

@ -1,24 +1,24 @@
# Binary search
<u>Binary search</u> is an efficient search algorithm based on the divide-and-conquer strategy. It utilizes the orderliness of data, reducing the search range by half each round until the target element is found or the search interval is empty.
<u>Binary search</u> is an efficient search algorithm that uses a divide-and-conquer strategy. It takes advantage of the sorted order of elements in an array by reducing the search interval by half in each iteration, continuing until either the target element is found or the search interval becomes empty.
!!! question
Given an array `nums` of length $n$, with elements arranged in ascending order and non-repeating. Please find and return the index of element `target` in this array. If the array does not contain the element, return $-1$. An example is shown in the figure below.
Given an array `nums` of length $n$, where elements are arranged in ascending order without duplicates. Please find and return the index of element `target` in this array. If the array does not contain the element, return $-1$. An example is shown in the figure below.
![Binary search example data](binary_search.assets/binary_search_example.png)
As shown in the figure below, we first initialize pointers $i = 0$ and $j = n - 1$, pointing to the first and last elements of the array, representing the search interval $[0, n - 1]$. Please note that square brackets indicate a closed interval, which includes the boundary values themselves.
As shown in the figure below, we firstly initialize pointers with $i = 0$ and $j = n - 1$, pointing to the first and last element of the array respectively. They also represent the whole search interval $[0, n - 1]$. Please note that square brackets indicate a closed interval, which includes the boundary values themselves.
Next, perform the following two steps in a loop.
And then the following two steps may be performed in a loop.
1. Calculate the midpoint index $m = \lfloor {(i + j) / 2} \rfloor$, where $\lfloor \: \rfloor$ denotes the floor operation.
2. Compare the size of `nums[m]` and `target`, divided into the following three scenarios.
2. Based on the comparison between the value of `nums[m]` and `target`, one of the following three cases will be chosen to execute.
1. If `nums[m] < target`, it indicates that `target` is in the interval $[m + 1, j]$, thus set $i = m + 1$.
2. If `nums[m] > target`, it indicates that `target` is in the interval $[i, m - 1]$, thus set $j = m - 1$.
3. If `nums[m] = target`, it indicates that `target` is found, thus return index $m$.
If the array does not contain the target element, the search interval will eventually reduce to empty. In this case, return $-1$.
If the array does not contain the target element, the search interval will eventually reduce to empty, ending up returning $-1$.
=== "<1>"
![Binary search process](binary_search.assets/binary_search_step1.png)
@ -41,7 +41,7 @@ If the array does not contain the target element, the search interval will event
=== "<7>"
![binary_search_step7](binary_search.assets/binary_search_step7.png)
It's worth noting that since $i$ and $j$ are both of type `int`, **$i + j$ might exceed the range of `int` type**. To avoid large number overflow, we usually use the formula $m = \lfloor {i + (j - i) / 2} \rfloor$ to calculate the midpoint.
It's worth noting that as $i$ and $j$ are both of type `int`, **$i + j$ might exceed the range of `int` type**. To avoid large number overflow, we usually use the formula $m = \lfloor {i + (j - i) / 2} \rfloor$ to calculate the midpoint.
The code is as follows:
@ -49,13 +49,13 @@ The code is as follows:
[file]{binary_search}-[class]{}-[func]{binary_search}
```
**Time complexity is $O(\log n)$** : In the binary loop, the interval reduces by half each round, hence the number of iterations is $\log_2 n$.
**Time complexity is $O(\log n)$** : In the binary loop, the interval decreases by half each round, hence the number of iterations is $\log_2 n$.
**Space complexity is $O(1)$** : Pointers $i$ and $j$ use constant size space.
**Space complexity is $O(1)$** : Pointers $i$ and $j$ occupies constant size of space.
## Interval representation methods
Besides the aforementioned closed interval, a common interval representation is the "left-closed right-open" interval, defined as $[0, n)$, where the left boundary includes itself, and the right boundary does not include itself. In this representation, the interval $[i, j)$ is empty when $i = j$.
Besides the above closed interval, another common interval representation is the "left-closed right-open" interval, defined as $[0, n)$, where the left boundary includes itself, and the right boundary does not. In this representation, the interval $[i, j)$ is empty when $i = j$.
We can implement a binary search algorithm with the same functionality based on this representation:
@ -63,9 +63,9 @@ We can implement a binary search algorithm with the same functionality based on
[file]{binary_search}-[class]{}-[func]{binary_search_lcro}
```
As shown in the figure below, in the two types of interval representations, the initialization of the binary search algorithm, the loop condition, and the narrowing interval operation are different.
As shown in the figure below, under the two types of interval representations, the initialization, loop condition, and narrowing interval operation of the binary search algorithm differ.
Since both boundaries in the "closed interval" representation are defined as closed, the operations to narrow the interval through pointers $i$ and $j$ are also symmetrical. This makes it less prone to errors, **therefore, it is generally recommended to use the "closed interval" approach**.
Since both boundaries in the "closed interval" representation are inclusive, the operations to narrow the interval through pointers $i$ and $j$ are also symmetrical. This makes it less prone to errors, **therefore, it is generally recommended to use the "closed interval" approach**.
![Two types of interval definitions](binary_search.assets/binary_search_ranges.png)
@ -73,11 +73,11 @@ Since both boundaries in the "closed interval" representation are defined as clo
Binary search performs well in both time and space aspects.
- Binary search is time-efficient. With large data volumes, the logarithmic time complexity has a significant advantage. For instance, when the data size $n = 2^{20}$, linear search requires $2^{20} = 1048576$ iterations, while binary search only requires $\log_2 2^{20} = 20$ iterations.
- Binary search does not require extra space. Compared to search algorithms that rely on additional space (like hash search), binary search is more space-efficient.
- Binary search is time-efficient. With large dataset, the logarithmic time complexity offers a major advantage. For instance, given a dataset with size $n = 2^{20}$, linear search requires $2^{20} = 1048576$ iterations, while binary search only demands $\log_2 2^{20} = 20$ loops.
- Binary search does not need extra space. Compared to search algorithms that rely on additional space (like hash search), binary search is more space-efficient.
However, binary search is not suitable for all situations, mainly for the following reasons.
However, binary search may not be suitable for all scenarios due to the following concerns.
- Binary search is only applicable to ordered data. If the input data is unordered, it is not worth sorting it just to use binary search, as sorting algorithms typically have a time complexity of $O(n \log n)$, which is higher than both linear and binary search. For scenarios with frequent element insertion to maintain array order, inserting elements into specific positions has a time complexity of $O(n)$, which is also quite costly.
- Binary search is only applicable to arrays. Binary search requires non-continuous (jumping) element access, which is inefficient in linked lists, thus not suitable for use in linked lists or data structures based on linked lists.
- With small data volumes, linear search performs better. In linear search, each round only requires 1 decision operation; whereas in binary search, it involves 1 addition, 1 division, 1 to 3 decision operations, 1 addition (subtraction), totaling 4 to 6 operations; therefore, when data volume $n$ is small, linear search can be faster than binary search.
- Binary search can only be applied to sorted data. Unsorted data must be sorted before applying binary search, which may not be worthwhile as sorting algorithm typically has a time complexity of $O(n \log n)$. Such cost is even higher than linear search, not to mention binary search itself. For scenarios with frequent insertion, the cost of remaining the array in order is pretty high as the time complexity of inserting new elements into specific positions is $O(n)$.
- Binary search may use array only. Binary search requires non-continuous (jumping) element access, which is inefficient in linked list. As a result, linked list or data structures based on linked list may not be suitable for this algorithm.
- Linear search performs better on small dataset. In linear search, only 1 decision operation is required for each iteration; whereas in binary search, it involves 1 addition, 1 division, 1 to 3 decision operations, 1 addition (subtraction), totaling 4 to 6 operations. Therefore, if data size $n$ is small, linear search is faster than binary search.

View file

@ -1,28 +1,28 @@
# Search algorithms revisited
<u>Searching algorithms (searching algorithm)</u> are used to search for one or several elements that meet specific criteria in data structures such as arrays, linked lists, trees, or graphs.
<u>Searching algorithms (search algorithms)</u> are used to retrieve one or more elements that meet specific criteria within data structures such as arrays, linked lists, trees, or graphs.
Searching algorithms can be divided into the following two categories based on their implementation approaches.
Searching algorithms can be divided into the following two categories based on their approach.
- **Locating the target element by traversing the data structure**, such as traversals of arrays, linked lists, trees, and graphs, etc.
- **Using the organizational structure of the data or the prior information contained in the data to achieve efficient element search**, such as binary search, hash search, and binary search tree search, etc.
- **Using the organizational structure of the data or existing data to achieve efficient element searches**, such as binary search, hash search, binary search tree search, etc.
It is not difficult to notice that these topics have been introduced in previous chapters, so searching algorithms are not unfamiliar to us. In this section, we will revisit searching algorithms from a more systematic perspective.
These topics were introduced in previous chapters, so they are not unfamiliar to us. In this section, we will revisit searching algorithms from a more systematic perspective.
## Brute-force search
Brute-force search locates the target element by traversing every element of the data structure.
A Brute-force search locates the target element by traversing every element of the data structure.
- "Linear search" is suitable for linear data structures such as arrays and linked lists. It starts from one end of the data structure, accesses each element one by one, until the target element is found or the other end is reached without finding the target element.
- "Breadth-first search" and "Depth-first search" are two traversal strategies for graphs and trees. Breadth-first search starts from the initial node and searches layer by layer, accessing nodes from near to far. Depth-first search starts from the initial node, follows a path until the end, then backtracks and tries other paths until the entire data structure is traversed.
- "Linear search" is suitable for linear data structures such as arrays and linked lists. It starts from one end of the data structure and accesses each element one by one until the target element is found or the other end is reached without finding the target element.
- "Breadth-first search" and "Depth-first search" are two traversal strategies for graphs and trees. Breadth-first search starts from the initial node and searches layer by layer (left to right), accessing nodes from near to far. Depth-first search starts from the initial node, follows a path until the end (top to bottom), then backtracks and tries other paths until the entire data structure is traversed.
The advantage of brute-force search is its simplicity and versatility, **no need for data preprocessing and the help of additional data structures**.
The advantage of brute-force search is its simplicity and versatility, **no need for data preprocessing or the help of additional data structures**.
However, **the time complexity of this type of algorithm is $O(n)$**, where $n$ is the number of elements, so the performance is poor in cases of large data volumes.
However, **the time complexity of this type of algorithm is $O(n)$**, where $n$ is the number of elements, so the performance is poor with large data sets.
## Adaptive search
Adaptive search uses the unique properties of data (such as order) to optimize the search process, thereby locating the target element more efficiently.
An Adaptive search uses the unique properties of data (such as order) to optimize the search process, thereby locating the target element more efficiently.
- "Binary search" uses the orderliness of data to achieve efficient searching, only suitable for arrays.
- "Hash search" uses a hash table to establish a key-value mapping between search data and target data, thus implementing the query operation.
@ -30,7 +30,7 @@ Adaptive search uses the unique properties of data (such as order) to optimize t
The advantage of these algorithms is high efficiency, **with time complexities reaching $O(\log n)$ or even $O(1)$**.
However, **using these algorithms often requires data preprocessing**. For example, binary search requires sorting the array in advance, and hash search and tree search both require the help of additional data structures, maintaining these structures also requires extra time and space overhead.
However, **using these algorithms often requires data preprocessing**. For example, binary search requires sorting the array in advance, and hash search and tree search both require the help of additional data structures. Maintaining these structures also requires more overhead in terms of time and space.
!!! tip
@ -38,11 +38,11 @@ However, **using these algorithms often requires data preprocessing**. For examp
## Choosing a search method
Given a set of data of size $n$, we can use linear search, binary search, tree search, hash search, and other methods to search for the target element from it. The working principles of these methods are shown in the figure below.
Given a set of data of size $n$, we can use a linear search, binary search, tree search, hash search, or other methods to retrieve the target element. The working principles of these methods are shown in the figure below.
![Various search strategies](searching_algorithm_revisited.assets/searching_algorithms.png)
The operation efficiency and characteristics of the aforementioned methods are shown in the following table.
The characteristics and operational efficiency of the aforementioned methods are shown in the following table.
<p align="center"> Table <id> &nbsp; Comparison of search algorithm efficiency </p>
@ -55,23 +55,23 @@ The operation efficiency and characteristics of the aforementioned methods are s
| Data preprocessing | / | Sorting $O(n \log n)$ | Building tree $O(n \log n)$ | Building hash table $O(n)$ |
| Data orderliness | Unordered | Ordered | Ordered | Unordered |
The choice of search algorithm also depends on the volume of data, search performance requirements, data query and update frequency, etc.
The choice of search algorithm also depends on the volume of data, search performance requirements, frequency of data queries and updates, etc.
**Linear search**
- Good versatility, no need for any data preprocessing operations. If we only need to query the data once, then the time for data preprocessing in the other three methods would be longer than the time for linear search.
- Good versatility, no need for any data preprocessing operations. If we only need to query the data once, then the time for data preprocessing in the other three methods would be longer than the time for a linear search.
- Suitable for small volumes of data, where time complexity has a smaller impact on efficiency.
- Suitable for scenarios with high data update frequency, because this method does not require any additional maintenance of the data.
- Suitable for scenarios with very frequent data updates, because this method does not require any additional maintenance of the data.
**Binary search**
- Suitable for large data volumes, with stable efficiency performance, the worst time complexity being $O(\log n)$.
- The data volume cannot be too large, because storing arrays requires contiguous memory space.
- Not suitable for scenarios with frequent additions and deletions, because maintaining an ordered array incurs high overhead.
- Suitable for larger data volumes, with stable performance and a worst-case time complexity of $O(\log n)$.
- However, the data volume cannot be too large, because storing arrays requires contiguous memory space.
- Not suitable for scenarios with frequent additions and deletions, because maintaining an ordered array incurs a lot of overhead.
**Hash search**
- Suitable for scenarios with high query performance requirements, with an average time complexity of $O(1)$.
- Suitable for scenarios where fast query performance is essential, with an average time complexity of $O(1)$.
- Not suitable for scenarios needing ordered data or range searches, because hash tables cannot maintain data orderliness.
- High dependency on hash functions and hash collision handling strategies, with significant performance degradation risks.
- Not suitable for overly large data volumes, because hash tables need extra space to minimize collisions and provide good query performance.
@ -80,5 +80,5 @@ The choice of search algorithm also depends on the volume of data, search perfor
- Suitable for massive data, because tree nodes are stored scattered in memory.
- Suitable for maintaining ordered data or range searches.
- In the continuous addition and deletion of nodes, the binary search tree may become skewed, degrading the time complexity to $O(n)$.
- With the continuous addition and deletion of nodes, the binary search tree may become skewed, degrading the time complexity to $O(n)$.
- If using AVL trees or red-black trees, operations can run stably at $O(\log n)$ efficiency, but the operation to maintain tree balance adds extra overhead.