Compare commits

...

5 commits

Author SHA1 Message Date
yanedie
d9305ba281
Merge d11b140ef4 into b6939da46c 2024-12-04 18:05:17 +08:00
K3v123
b6939da46c
translation: Update replace_linear_by_hashing.md (#1551)
* translation: Update replace_linear_by_hashing.md

refined some parts of it.

* Update replace_linear_by_hashing.md

* Update replace_linear_by_hashing.md

---------

Co-authored-by: Yudong Jin <krahets@163.com>
2024-12-04 18:05:03 +08:00
Yudong Jin
abf1f115bf
Bug fixes and improvements (#1581)
* A bug fixes

* Sync zh and zh-hant versions.

* Fix a question in chapter_array_and_linkedlist/summary.md

* Optimize a definition in what_is_dsa.md

* Fix the Contributing guidelines for Chinese-to-English.

* Add a q&a in chapter_array_and_linkedlist/summary.md

* Sync zh and zh-hant versions.

* Update .gitignore

* Sync zh and zh-hant versions.
2024-12-04 17:58:28 +08:00
Thomas
6348dbe18d
translation: binary search updates (#1569)
* translation: binary search updates

* fix minor gramma and expression issues
2024-12-04 17:48:48 +08:00
YuZou
ca774eefbf
Update time_complexity.md (#1578)
* Bug fixes and improvements (#1577)
 * correct the implement of exp_recur function and remove +1 operation from the function to simulate the cell division process

* Update time_complexity.rs

* Update time_complexity.md

---------

Co-authored-by: zouy26 <zouy26@chinaunicom.cn>
Co-authored-by: Yudong Jin <krahets@163.com>
2024-12-04 17:36:11 +08:00
14 changed files with 62 additions and 45 deletions

5
.gitignore vendored
View file

@ -1,7 +1,7 @@
# macOS
.DS_Store
# Editor
# editors
.vscode/
**/.idea
@ -12,6 +12,3 @@
/build
/site
/utils
# test script
test.sh

View file

@ -71,6 +71,16 @@
另一方面,必要使用链表的情况主要是二叉树和图。栈和队列往往会使用编程语言提供的 `stack``queue` ,而非链表。
**Q**初始化列表 `res = [0] * self.size()` 操作,会导致 `res` 的每个元素引用相同的地址吗?
**Q**操作 `res = [[0]] * n` 生成了一个二维列表,其中每一个 `[0]` 都是独立的吗?
不会。但二维数组会有这个问题,例如初始化二维列表 `res = [[0]] * self.size()` ,则多次引用了同一个列表 `[0]`
不是独立的。此二维列表中,所有的 `[0]` 实际上是同一个对象的引用。如果我们修改其中一个元素,会发现所有的对应元素都会随之改变。
如果希望二维列表中的每个 `[0]` 都是独立的,可以使用 `res = [[0] for _ in range(n)]` 来实现。这种方式的原理是初始化了 $n$ 个独立的 `[0]` 列表对象。
**Q**:操作 `res = [0] * n` 生成了一个列表,其中每一个整数 0 都是独立的吗?
在该列表中,所有整数 0 都是同一个对象的引用。这是因为 Python 对小整数(通常是 -5 到 256采用了缓存池机制以便最大化对象复用从而提升性能。
虽然它们指向同一个对象,但我们仍然可以独立修改列表中的每个元素,这是因为 Python 的整数是“不可变对象”。当我们修改某个元素时,实际上是切换为另一个对象的引用,而不是改变原有对象本身。
然而,当列表元素是“可变对象”时(例如列表、字典或类实例等),修改某个元素会直接改变该对象本身,所有引用该对象的元素都会产生相同变化。

View file

@ -1120,7 +1120,7 @@ $$
生物学的“细胞分裂”是指数阶增长的典型例子:初始状态为 $1$ 个细胞,分裂一轮后变为 $2$ 个,分裂两轮后变为 $4$ 个,以此类推,分裂 $n$ 轮后有 $2^n$ 个细胞。
下图和以下代码模拟了细胞分裂的过程,时间复杂度为 $O(2^n)$
下图和以下代码模拟了细胞分裂的过程,时间复杂度为 $O(2^n)$ 。请注意,输入 $n$ 表示分裂轮数,返回值 `count` 表示总分裂次数。
```src
[file]{time_complexity}-[class]{}-[func]{exponential}

View file

@ -5,7 +5,7 @@
- 整理扑克的过程与插入排序算法非常类似。插入排序算法适合排序小型数据集。
- 货币找零的步骤本质上是贪心算法,每一步都采取当前看来最好的选择。
- 算法是在有限时间内解决特定问题的一组指令或操作步骤,而数据结构是计算机中组织和存储数据的方式。
- 数据结构与算法紧密相连。数据结构是算法的基石,而算法是数据结构发挥作用的舞台
- 数据结构与算法紧密相连。数据结构是算法的基石,而算法为数据结构注入生命力
- 我们可以将数据结构与算法类比为拼装积木,积木代表数据,积木的形状和连接方式等代表数据结构,拼装积木的步骤则对应算法。
### Q & A

View file

@ -26,7 +26,7 @@
如下图所示,数据结构与算法高度相关、紧密结合,具体表现在以下三个方面。
- 数据结构是算法的基石。数据结构为算法提供了结构化存储的数据,以及操作数据的方法。
- 算法是数据结构发挥作用的舞台。数据结构本身仅存储数据信息,结合算法才能解决特定问题。
- 算法为数据结构注入生命力。数据结构本身仅存储数据信息,结合算法才能解决特定问题。
- 算法通常可以基于不同的数据结构实现,但执行效率可能相差很大,选择合适的数据结构是关键。
![数据结构与算法的关系](what_is_dsa.assets/relationship_between_data_structure_and_algorithm.png)

View file

@ -32,14 +32,14 @@ That is, our contributors are computer scientists, engineers, and students from
> [!important]
> Before diving in, ensure you're comfortable with the GitHub pull request workflow and have read the "Translation standards" and "Pseudo-code for translation" below.
1. **Self-assignment**: Visit [GitHub projects](https://github.com/users/krahets/projects/2/views/4) to select an unclaimed task and mark it as `In Progress`.
2. **Translation**: We encourage preserving the original meaning while ensuring the translation is natural and fluent.
3. **Peer review**: Please carefully check your changes before submitting a Pull Request (PR). After approval by two reviewers, it will be merged into the project.
1. **Task assignment**: Self-assign a task in the Notion workspace.
2. **Translation**: Optimize the translation on your local PC, referring to the “Translation Pseudo-Code” section below for more details.
3. **Peer review**: Carefully review your changes before submitting a Pull Request (PR). The PR will be merged into the main branch after approval from two reviewers.
## Translation standards
> [!tip]
> The "Accuracy" and "Authenticity" are primarily handled by native Chinese speakers and native English speakers, respectively.
> **The "Accuracy" and "Authenticity" are primarily handled by native Chinese speakers and native English speakers, respectively.**
>
> In some instances, "Accuracy (consistency)" and "Authenticity" represent a trade-off, where optimizing one aspect could significantly affect the other. In such cases, please leave a comment in the pull request for discussion.

View file

@ -1,24 +1,24 @@
# Binary search
<u>Binary search</u> is an efficient search algorithm based on the divide-and-conquer strategy. It utilizes the orderliness of data, reducing the search range by half each round until the target element is found or the search interval is empty.
<u>Binary search</u> is an efficient search algorithm that uses a divide-and-conquer strategy. It takes advantage of the sorted order of elements in an array by reducing the search interval by half in each iteration, continuing until either the target element is found or the search interval becomes empty.
!!! question
Given an array `nums` of length $n$, with elements arranged in ascending order and non-repeating. Please find and return the index of element `target` in this array. If the array does not contain the element, return $-1$. An example is shown in the figure below.
Given an array `nums` of length $n$, where elements are arranged in ascending order without duplicates. Please find and return the index of element `target` in this array. If the array does not contain the element, return $-1$. An example is shown in the figure below.
![Binary search example data](binary_search.assets/binary_search_example.png)
As shown in the figure below, we first initialize pointers $i = 0$ and $j = n - 1$, pointing to the first and last elements of the array, representing the search interval $[0, n - 1]$. Please note that square brackets indicate a closed interval, which includes the boundary values themselves.
As shown in the figure below, we firstly initialize pointers with $i = 0$ and $j = n - 1$, pointing to the first and last element of the array respectively. They also represent the whole search interval $[0, n - 1]$. Please note that square brackets indicate a closed interval, which includes the boundary values themselves.
Next, perform the following two steps in a loop.
And then the following two steps may be performed in a loop.
1. Calculate the midpoint index $m = \lfloor {(i + j) / 2} \rfloor$, where $\lfloor \: \rfloor$ denotes the floor operation.
2. Compare the size of `nums[m]` and `target`, divided into the following three scenarios.
2. Based on the comparison between the value of `nums[m]` and `target`, one of the following three cases will be chosen to execute.
1. If `nums[m] < target`, it indicates that `target` is in the interval $[m + 1, j]$, thus set $i = m + 1$.
2. If `nums[m] > target`, it indicates that `target` is in the interval $[i, m - 1]$, thus set $j = m - 1$.
3. If `nums[m] = target`, it indicates that `target` is found, thus return index $m$.
If the array does not contain the target element, the search interval will eventually reduce to empty. In this case, return $-1$.
If the array does not contain the target element, the search interval will eventually reduce to empty, ending up returning $-1$.
=== "<1>"
![Binary search process](binary_search.assets/binary_search_step1.png)
@ -41,7 +41,7 @@ If the array does not contain the target element, the search interval will event
=== "<7>"
![binary_search_step7](binary_search.assets/binary_search_step7.png)
It's worth noting that since $i$ and $j$ are both of type `int`, **$i + j$ might exceed the range of `int` type**. To avoid large number overflow, we usually use the formula $m = \lfloor {i + (j - i) / 2} \rfloor$ to calculate the midpoint.
It's worth noting that as $i$ and $j$ are both of type `int`, **$i + j$ might exceed the range of `int` type**. To avoid large number overflow, we usually use the formula $m = \lfloor {i + (j - i) / 2} \rfloor$ to calculate the midpoint.
The code is as follows:
@ -49,13 +49,13 @@ The code is as follows:
[file]{binary_search}-[class]{}-[func]{binary_search}
```
**Time complexity is $O(\log n)$** : In the binary loop, the interval reduces by half each round, hence the number of iterations is $\log_2 n$.
**Time complexity is $O(\log n)$** : In the binary loop, the interval decreases by half each round, hence the number of iterations is $\log_2 n$.
**Space complexity is $O(1)$** : Pointers $i$ and $j$ use constant size space.
**Space complexity is $O(1)$** : Pointers $i$ and $j$ occupies constant size of space.
## Interval representation methods
Besides the aforementioned closed interval, a common interval representation is the "left-closed right-open" interval, defined as $[0, n)$, where the left boundary includes itself, and the right boundary does not include itself. In this representation, the interval $[i, j)$ is empty when $i = j$.
Besides the above closed interval, another common interval representation is the "left-closed right-open" interval, defined as $[0, n)$, where the left boundary includes itself, and the right boundary does not. In this representation, the interval $[i, j)$ is empty when $i = j$.
We can implement a binary search algorithm with the same functionality based on this representation:
@ -63,9 +63,9 @@ We can implement a binary search algorithm with the same functionality based on
[file]{binary_search}-[class]{}-[func]{binary_search_lcro}
```
As shown in the figure below, in the two types of interval representations, the initialization of the binary search algorithm, the loop condition, and the narrowing interval operation are different.
As shown in the figure below, under the two types of interval representations, the initialization, loop condition, and narrowing interval operation of the binary search algorithm differ.
Since both boundaries in the "closed interval" representation are defined as closed, the operations to narrow the interval through pointers $i$ and $j$ are also symmetrical. This makes it less prone to errors, **therefore, it is generally recommended to use the "closed interval" approach**.
Since both boundaries in the "closed interval" representation are inclusive, the operations to narrow the interval through pointers $i$ and $j$ are also symmetrical. This makes it less prone to errors, **therefore, it is generally recommended to use the "closed interval" approach**.
![Two types of interval definitions](binary_search.assets/binary_search_ranges.png)
@ -73,11 +73,11 @@ Since both boundaries in the "closed interval" representation are defined as clo
Binary search performs well in both time and space aspects.
- Binary search is time-efficient. With large data volumes, the logarithmic time complexity has a significant advantage. For instance, when the data size $n = 2^{20}$, linear search requires $2^{20} = 1048576$ iterations, while binary search only requires $\log_2 2^{20} = 20$ iterations.
- Binary search does not require extra space. Compared to search algorithms that rely on additional space (like hash search), binary search is more space-efficient.
- Binary search is time-efficient. With large dataset, the logarithmic time complexity offers a major advantage. For instance, given a dataset with size $n = 2^{20}$, linear search requires $2^{20} = 1048576$ iterations, while binary search only demands $\log_2 2^{20} = 20$ loops.
- Binary search does not need extra space. Compared to search algorithms that rely on additional space (like hash search), binary search is more space-efficient.
However, binary search is not suitable for all situations, mainly for the following reasons.
However, binary search may not be suitable for all scenarios due to the following concerns.
- Binary search is only applicable to ordered data. If the input data is unordered, it is not worth sorting it just to use binary search, as sorting algorithms typically have a time complexity of $O(n \log n)$, which is higher than both linear and binary search. For scenarios with frequent element insertion to maintain array order, inserting elements into specific positions has a time complexity of $O(n)$, which is also quite costly.
- Binary search is only applicable to arrays. Binary search requires non-continuous (jumping) element access, which is inefficient in linked lists, thus not suitable for use in linked lists or data structures based on linked lists.
- With small data volumes, linear search performs better. In linear search, each round only requires 1 decision operation; whereas in binary search, it involves 1 addition, 1 division, 1 to 3 decision operations, 1 addition (subtraction), totaling 4 to 6 operations; therefore, when data volume $n$ is small, linear search can be faster than binary search.
- Binary search can only be applied to sorted data. Unsorted data must be sorted before applying binary search, which may not be worthwhile as sorting algorithm typically has a time complexity of $O(n \log n)$. Such cost is even higher than linear search, not to mention binary search itself. For scenarios with frequent insertion, the cost of remaining the array in order is pretty high as the time complexity of inserting new elements into specific positions is $O(n)$.
- Binary search may use array only. Binary search requires non-continuous (jumping) element access, which is inefficient in linked list. As a result, linked list or data structures based on linked list may not be suitable for this algorithm.
- Linear search performs better on small dataset. In linear search, only 1 decision operation is required for each iteration; whereas in binary search, it involves 1 addition, 1 division, 1 to 3 decision operations, 1 addition (subtraction), totaling 4 to 6 operations. Therefore, if data size $n$ is small, linear search is faster than binary search.

View file

@ -1,6 +1,6 @@
# Hash optimization strategies
In algorithm problems, **we often reduce the time complexity of algorithms by replacing linear search with hash search**. Let's use an algorithm problem to deepen understanding.
In algorithm problems, **we often reduce the time complexity of an algorithm by replacing a linear search with a hash-based search**. Let's use an algorithm problem to deepen the understanding.
!!! question
@ -8,7 +8,7 @@ In algorithm problems, **we often reduce the time complexity of algorithms by re
## Linear search: trading time for space
Consider traversing all possible combinations directly. As shown in the figure below, we initiate a two-layer loop, and in each round, we determine whether the sum of the two integers equals `target`. If so, we return their indices.
Consider traversing through all possible combinations directly. As shown in the figure below, we initiate a nested loop, and in each iteration, we determine whether the sum of the two integers equals `target`. If so, we return their indices.
![Linear search solution for two-sum problem](replace_linear_by_hashing.assets/two_sum_brute_force.png)
@ -18,11 +18,11 @@ The code is shown below:
[file]{two_sum}-[class]{}-[func]{two_sum_brute_force}
```
This method has a time complexity of $O(n^2)$ and a space complexity of $O(1)$, which is very time-consuming with large data volumes.
This method has a time complexity of $O(n^2)$ and a space complexity of $O(1)$, which can be very time-consuming with large data volumes.
## Hash search: trading space for time
Consider using a hash table, with key-value pairs being the array elements and their indices, respectively. Loop through the array, performing the steps shown in the figure below each round.
Consider using a hash table, where the key-value pairs are the array elements and their indices, respectively. Loop through the array, performing the steps shown in the figure below during each iteration.
1. Check if the number `target - nums[i]` is in the hash table. If so, directly return the indices of these two elements.
2. Add the key-value pair `nums[i]` and index `i` to the hash table.
@ -42,6 +42,6 @@ The implementation code is shown below, requiring only a single loop:
[file]{two_sum}-[class]{}-[func]{two_sum_hash_table}
```
This method reduces the time complexity from $O(n^2)$ to $O(n)$ by using hash search, greatly improving the running efficiency.
This method reduces the time complexity from $O(n^2)$ to $O(n)$ by using hash search, significantly enhancing runtime efficiency.
As it requires maintaining an additional hash table, the space complexity is $O(n)$. **Nevertheless, this method has a more balanced time-space efficiency overall, making it the optimal solution for this problem**.

View file

@ -414,7 +414,7 @@
<!-- contributors -->
<div style="margin: 2em auto;">
<h3>Contributors</h3>
<p>This book has been optimized by the efforts of over 180 contributors. We sincerely thank them for their invaluable time and contributions!</p>
<p>This book has been refined by the efforts of over 180 contributors. We sincerely thank them for their invaluable time and contributions!</p>
<a href="https://github.com/krahets/hello-algo/graphs/contributors">
<img src="https://contrib.rocks/image?repo=krahets/hello-algo&max=300&columns=16" alt="Contributors" style="width: 100%; max-width: 38.5em;">
</a>

View file

@ -115,9 +115,9 @@ int main() {
int i = 1;
int l = abt.left(i), r = abt.right(i), p = abt.parent(i);
cout << "\n當前節點的索引為 " << i << ",值為 " << abt.val(i) << "\n";
cout << "其左子節點的索引為 " << l << ",值為 " << (l != INT_MAX ? to_string(abt.val(l)) : "nullptr") << "\n";
cout << "其右子節點的索引為 " << r << ",值為 " << (r != INT_MAX ? to_string(abt.val(r)) : "nullptr") << "\n";
cout << "其父節點的索引為 " << p << ",值為 " << (p != INT_MAX ? to_string(abt.val(p)) : "nullptr") << "\n";
cout << "其左子節點的索引為 " << l << ",值為 " << (abt.val(l) != INT_MAX ? to_string(abt.val(l)) : "nullptr") << "\n";
cout << "其右子節點的索引為 " << r << ",值為 " << (abt.val(r) != INT_MAX ? to_string(abt.val(r)) : "nullptr") << "\n";
cout << "其父節點的索引為 " << p << ",值為 " << (abt.val(p) != INT_MAX ? to_string(abt.val(p)) : "nullptr") << "\n";
// 走訪樹
vector<int> res = abt.levelOrder();

View file

@ -71,6 +71,16 @@
另一方面,必要使用鏈結串列的情況主要是二元樹和圖。堆疊和佇列往往會使用程式語言提供的 `stack``queue` ,而非鏈結串列。
**Q**初始化串列 `res = [0] * self.size()` 操作,會導致 `res` 的每個元素引用相同的位址嗎?
**Q**操作 `res = [[0]] * n` 生成了一個二維串列,其中每一個 `[0]` 都是獨立的嗎?
不會。但二維陣列會有這個問題,例如初始化二維串列 `res = [[0]] * self.size()` ,則多次引用了同一個串列 `[0]`
不是獨立的。此二維串列中,所有的 `[0]` 實際上是同一個物件的引用。如果我們修改其中一個元素,會發現所有的對應元素都會隨之改變。
如果希望二維串列中的每個 `[0]` 都是獨立的,可以使用 `res = [[0] for _ in range(n)]` 來實現。這種方式的原理是初始化了 $n$ 個獨立的 `[0]` 串列物件。
**Q**:操作 `res = [0] * n` 生成了一個串列,其中每一個整數 0 都是獨立的嗎?
在該串列中,所有整數 0 都是同一個物件的引用。這是因為 Python 對小整數(通常是 -5 到 256採用了快取池機制以便最大化物件複用從而提升效能。
雖然它們指向同一個物件,但我們仍然可以獨立修改串列中的每個元素,這是因為 Python 的整數是“不可變物件”。當我們修改某個元素時,實際上是切換為另一個物件的引用,而不是改變原有物件本身。
然而,當串列元素是“可變物件”時(例如串列、字典或類別例項等),修改某個元素會直接改變該物件本身,所有引用該物件的元素都會產生相同變化。

View file

@ -1120,7 +1120,7 @@ $$
生物學的“細胞分裂”是指數階增長的典型例子:初始狀態為 $1$ 個細胞,分裂一輪後變為 $2$ 個,分裂兩輪後變為 $4$ 個,以此類推,分裂 $n$ 輪後有 $2^n$ 個細胞。
下圖和以下程式碼模擬了細胞分裂的過程,時間複雜度為 $O(2^n)$
下圖和以下程式碼模擬了細胞分裂的過程,時間複雜度為 $O(2^n)$ 。請注意,輸入 $n$ 表示分裂輪數,返回值 `count` 表示總分裂次數。
```src
[file]{time_complexity}-[class]{}-[func]{exponential}

View file

@ -5,7 +5,7 @@
- 整理撲克的過程與插入排序演算法非常類似。插入排序演算法適合排序小型資料集。
- 貨幣找零的步驟本質上是貪婪演算法,每一步都採取當前看來最好的選擇。
- 演算法是在有限時間內解決特定問題的一組指令或操作步驟,而資料結構是計算機中組織和儲存資料的方式。
- 資料結構與演算法緊密相連。資料結構是演算法的基石,而演算法是資料結構發揮作用的舞臺
- 資料結構與演算法緊密相連。資料結構是演算法的基石,而演算法為資料結構注入生命力
- 我們可以將資料結構與演算法類比為拼裝積木,積木代表資料,積木的形狀和連線方式等代表資料結構,拼裝積木的步驟則對應演算法。
### Q & A

View file

@ -26,7 +26,7 @@
如下圖所示,資料結構與演算法高度相關、緊密結合,具體表現在以下三個方面。
- 資料結構是演算法的基石。資料結構為演算法提供了結構化儲存的資料,以及操作資料的方法。
- 演算法是資料結構發揮作用的舞臺。資料結構本身僅儲存資料資訊,結合演算法才能解決特定問題。
- 演算法為資料結構注入生命力。資料結構本身僅儲存資料資訊,結合演算法才能解決特定問題。
- 演算法通常可以基於不同的資料結構實現,但執行效率可能相差很大,選擇合適的資料結構是關鍵。
![資料結構與演算法的關係](what_is_dsa.assets/relationship_between_data_structure_and_algorithm.png)