Compare commits

...

5 commits

Author SHA1 Message Date
RafaelCaso
f9ccf65412
Merge 5ec15c4a3e into b6939da46c 2024-12-04 18:07:38 +08:00
K3v123
b6939da46c
translation: Update replace_linear_by_hashing.md (#1551)
* translation: Update replace_linear_by_hashing.md

refined some parts of it.

* Update replace_linear_by_hashing.md

* Update replace_linear_by_hashing.md

---------

Co-authored-by: Yudong Jin <krahets@163.com>
2024-12-04 18:05:03 +08:00
Yudong Jin
abf1f115bf
Bug fixes and improvements (#1581)
* A bug fixes

* Sync zh and zh-hant versions.

* Fix a question in chapter_array_and_linkedlist/summary.md

* Optimize a definition in what_is_dsa.md

* Fix the Contributing guidelines for Chinese-to-English.

* Add a q&a in chapter_array_and_linkedlist/summary.md

* Sync zh and zh-hant versions.

* Update .gitignore

* Sync zh and zh-hant versions.
2024-12-04 17:58:28 +08:00
RafaelCaso
5ec15c4a3e
Update searching_algorithm_revisited.md
fix typo in original commit
2024-11-11 10:35:50 -05:00
RafaelCaso
a430cc2607
Update searching_algorithm_revisited.md 2024-11-11 10:00:38 -05:00
13 changed files with 64 additions and 47 deletions

5
.gitignore vendored
View file

@ -1,7 +1,7 @@
# macOS
.DS_Store
# Editor
# editors
.vscode/
**/.idea
@ -12,6 +12,3 @@
/build
/site
/utils
# test script
test.sh

View file

@ -71,6 +71,16 @@
另一方面,必要使用链表的情况主要是二叉树和图。栈和队列往往会使用编程语言提供的 `stack``queue` ,而非链表。
**Q**初始化列表 `res = [0] * self.size()` 操作,会导致 `res` 的每个元素引用相同的地址吗?
**Q**操作 `res = [[0]] * n` 生成了一个二维列表,其中每一个 `[0]` 都是独立的吗?
不会。但二维数组会有这个问题,例如初始化二维列表 `res = [[0]] * self.size()` ,则多次引用了同一个列表 `[0]`
不是独立的。此二维列表中,所有的 `[0]` 实际上是同一个对象的引用。如果我们修改其中一个元素,会发现所有的对应元素都会随之改变。
如果希望二维列表中的每个 `[0]` 都是独立的,可以使用 `res = [[0] for _ in range(n)]` 来实现。这种方式的原理是初始化了 $n$ 个独立的 `[0]` 列表对象。
**Q**:操作 `res = [0] * n` 生成了一个列表,其中每一个整数 0 都是独立的吗?
在该列表中,所有整数 0 都是同一个对象的引用。这是因为 Python 对小整数(通常是 -5 到 256采用了缓存池机制以便最大化对象复用从而提升性能。
虽然它们指向同一个对象,但我们仍然可以独立修改列表中的每个元素,这是因为 Python 的整数是“不可变对象”。当我们修改某个元素时,实际上是切换为另一个对象的引用,而不是改变原有对象本身。
然而,当列表元素是“可变对象”时(例如列表、字典或类实例等),修改某个元素会直接改变该对象本身,所有引用该对象的元素都会产生相同变化。

View file

@ -5,7 +5,7 @@
- 整理扑克的过程与插入排序算法非常类似。插入排序算法适合排序小型数据集。
- 货币找零的步骤本质上是贪心算法,每一步都采取当前看来最好的选择。
- 算法是在有限时间内解决特定问题的一组指令或操作步骤,而数据结构是计算机中组织和存储数据的方式。
- 数据结构与算法紧密相连。数据结构是算法的基石,而算法是数据结构发挥作用的舞台
- 数据结构与算法紧密相连。数据结构是算法的基石,而算法为数据结构注入生命力
- 我们可以将数据结构与算法类比为拼装积木,积木代表数据,积木的形状和连接方式等代表数据结构,拼装积木的步骤则对应算法。
### Q & A

View file

@ -26,7 +26,7 @@
如下图所示,数据结构与算法高度相关、紧密结合,具体表现在以下三个方面。
- 数据结构是算法的基石。数据结构为算法提供了结构化存储的数据,以及操作数据的方法。
- 算法是数据结构发挥作用的舞台。数据结构本身仅存储数据信息,结合算法才能解决特定问题。
- 算法为数据结构注入生命力。数据结构本身仅存储数据信息,结合算法才能解决特定问题。
- 算法通常可以基于不同的数据结构实现,但执行效率可能相差很大,选择合适的数据结构是关键。
![数据结构与算法的关系](what_is_dsa.assets/relationship_between_data_structure_and_algorithm.png)

View file

@ -32,14 +32,14 @@ That is, our contributors are computer scientists, engineers, and students from
> [!important]
> Before diving in, ensure you're comfortable with the GitHub pull request workflow and have read the "Translation standards" and "Pseudo-code for translation" below.
1. **Self-assignment**: Visit [GitHub projects](https://github.com/users/krahets/projects/2/views/4) to select an unclaimed task and mark it as `In Progress`.
2. **Translation**: We encourage preserving the original meaning while ensuring the translation is natural and fluent.
3. **Peer review**: Please carefully check your changes before submitting a Pull Request (PR). After approval by two reviewers, it will be merged into the project.
1. **Task assignment**: Self-assign a task in the Notion workspace.
2. **Translation**: Optimize the translation on your local PC, referring to the “Translation Pseudo-Code” section below for more details.
3. **Peer review**: Carefully review your changes before submitting a Pull Request (PR). The PR will be merged into the main branch after approval from two reviewers.
## Translation standards
> [!tip]
> The "Accuracy" and "Authenticity" are primarily handled by native Chinese speakers and native English speakers, respectively.
> **The "Accuracy" and "Authenticity" are primarily handled by native Chinese speakers and native English speakers, respectively.**
>
> In some instances, "Accuracy (consistency)" and "Authenticity" represent a trade-off, where optimizing one aspect could significantly affect the other. In such cases, please leave a comment in the pull request for discussion.

View file

@ -1,6 +1,6 @@
# Hash optimization strategies
In algorithm problems, **we often reduce the time complexity of algorithms by replacing linear search with hash search**. Let's use an algorithm problem to deepen understanding.
In algorithm problems, **we often reduce the time complexity of an algorithm by replacing a linear search with a hash-based search**. Let's use an algorithm problem to deepen the understanding.
!!! question
@ -8,7 +8,7 @@ In algorithm problems, **we often reduce the time complexity of algorithms by re
## Linear search: trading time for space
Consider traversing all possible combinations directly. As shown in the figure below, we initiate a two-layer loop, and in each round, we determine whether the sum of the two integers equals `target`. If so, we return their indices.
Consider traversing through all possible combinations directly. As shown in the figure below, we initiate a nested loop, and in each iteration, we determine whether the sum of the two integers equals `target`. If so, we return their indices.
![Linear search solution for two-sum problem](replace_linear_by_hashing.assets/two_sum_brute_force.png)
@ -18,11 +18,11 @@ The code is shown below:
[file]{two_sum}-[class]{}-[func]{two_sum_brute_force}
```
This method has a time complexity of $O(n^2)$ and a space complexity of $O(1)$, which is very time-consuming with large data volumes.
This method has a time complexity of $O(n^2)$ and a space complexity of $O(1)$, which can be very time-consuming with large data volumes.
## Hash search: trading space for time
Consider using a hash table, with key-value pairs being the array elements and their indices, respectively. Loop through the array, performing the steps shown in the figure below each round.
Consider using a hash table, where the key-value pairs are the array elements and their indices, respectively. Loop through the array, performing the steps shown in the figure below during each iteration.
1. Check if the number `target - nums[i]` is in the hash table. If so, directly return the indices of these two elements.
2. Add the key-value pair `nums[i]` and index `i` to the hash table.
@ -42,6 +42,6 @@ The implementation code is shown below, requiring only a single loop:
[file]{two_sum}-[class]{}-[func]{two_sum_hash_table}
```
This method reduces the time complexity from $O(n^2)$ to $O(n)$ by using hash search, greatly improving the running efficiency.
This method reduces the time complexity from $O(n^2)$ to $O(n)$ by using hash search, significantly enhancing runtime efficiency.
As it requires maintaining an additional hash table, the space complexity is $O(n)$. **Nevertheless, this method has a more balanced time-space efficiency overall, making it the optimal solution for this problem**.

View file

@ -1,28 +1,28 @@
# Search algorithms revisited
<u>Searching algorithms (searching algorithm)</u> are used to search for one or several elements that meet specific criteria in data structures such as arrays, linked lists, trees, or graphs.
<u>Searching algorithms (search algorithms)</u> are used to retrieve one or more elements that meet specific criteria within data structures such as arrays, linked lists, trees, or graphs.
Searching algorithms can be divided into the following two categories based on their implementation approaches.
Searching algorithms can be divided into the following two categories based on their approach.
- **Locating the target element by traversing the data structure**, such as traversals of arrays, linked lists, trees, and graphs, etc.
- **Using the organizational structure of the data or the prior information contained in the data to achieve efficient element search**, such as binary search, hash search, and binary search tree search, etc.
- **Using the organizational structure of the data or existing data to achieve efficient element searches**, such as binary search, hash search, binary search tree search, etc.
It is not difficult to notice that these topics have been introduced in previous chapters, so searching algorithms are not unfamiliar to us. In this section, we will revisit searching algorithms from a more systematic perspective.
These topics were introduced in previous chapters, so they are not unfamiliar to us. In this section, we will revisit searching algorithms from a more systematic perspective.
## Brute-force search
Brute-force search locates the target element by traversing every element of the data structure.
A Brute-force search locates the target element by traversing every element of the data structure.
- "Linear search" is suitable for linear data structures such as arrays and linked lists. It starts from one end of the data structure, accesses each element one by one, until the target element is found or the other end is reached without finding the target element.
- "Breadth-first search" and "Depth-first search" are two traversal strategies for graphs and trees. Breadth-first search starts from the initial node and searches layer by layer, accessing nodes from near to far. Depth-first search starts from the initial node, follows a path until the end, then backtracks and tries other paths until the entire data structure is traversed.
- "Linear search" is suitable for linear data structures such as arrays and linked lists. It starts from one end of the data structure and accesses each element one by one until the target element is found or the other end is reached without finding the target element.
- "Breadth-first search" and "Depth-first search" are two traversal strategies for graphs and trees. Breadth-first search starts from the initial node and searches layer by layer (left to right), accessing nodes from near to far. Depth-first search starts from the initial node, follows a path until the end (top to bottom), then backtracks and tries other paths until the entire data structure is traversed.
The advantage of brute-force search is its simplicity and versatility, **no need for data preprocessing and the help of additional data structures**.
The advantage of brute-force search is its simplicity and versatility, **no need for data preprocessing or the help of additional data structures**.
However, **the time complexity of this type of algorithm is $O(n)$**, where $n$ is the number of elements, so the performance is poor in cases of large data volumes.
However, **the time complexity of this type of algorithm is $O(n)$**, where $n$ is the number of elements, so the performance is poor with large data sets.
## Adaptive search
Adaptive search uses the unique properties of data (such as order) to optimize the search process, thereby locating the target element more efficiently.
An Adaptive search uses the unique properties of data (such as order) to optimize the search process, thereby locating the target element more efficiently.
- "Binary search" uses the orderliness of data to achieve efficient searching, only suitable for arrays.
- "Hash search" uses a hash table to establish a key-value mapping between search data and target data, thus implementing the query operation.
@ -30,7 +30,7 @@ Adaptive search uses the unique properties of data (such as order) to optimize t
The advantage of these algorithms is high efficiency, **with time complexities reaching $O(\log n)$ or even $O(1)$**.
However, **using these algorithms often requires data preprocessing**. For example, binary search requires sorting the array in advance, and hash search and tree search both require the help of additional data structures, maintaining these structures also requires extra time and space overhead.
However, **using these algorithms often requires data preprocessing**. For example, binary search requires sorting the array in advance, and hash search and tree search both require the help of additional data structures. Maintaining these structures also requires more overhead in terms of time and space.
!!! tip
@ -38,11 +38,11 @@ However, **using these algorithms often requires data preprocessing**. For examp
## Choosing a search method
Given a set of data of size $n$, we can use linear search, binary search, tree search, hash search, and other methods to search for the target element from it. The working principles of these methods are shown in the figure below.
Given a set of data of size $n$, we can use a linear search, binary search, tree search, hash search, or other methods to retrieve the target element. The working principles of these methods are shown in the figure below.
![Various search strategies](searching_algorithm_revisited.assets/searching_algorithms.png)
The operation efficiency and characteristics of the aforementioned methods are shown in the following table.
The characteristics and operational efficiency of the aforementioned methods are shown in the following table.
<p align="center"> Table <id> &nbsp; Comparison of search algorithm efficiency </p>
@ -55,23 +55,23 @@ The operation efficiency and characteristics of the aforementioned methods are s
| Data preprocessing | / | Sorting $O(n \log n)$ | Building tree $O(n \log n)$ | Building hash table $O(n)$ |
| Data orderliness | Unordered | Ordered | Ordered | Unordered |
The choice of search algorithm also depends on the volume of data, search performance requirements, data query and update frequency, etc.
The choice of search algorithm also depends on the volume of data, search performance requirements, frequency of data queries and updates, etc.
**Linear search**
- Good versatility, no need for any data preprocessing operations. If we only need to query the data once, then the time for data preprocessing in the other three methods would be longer than the time for linear search.
- Good versatility, no need for any data preprocessing operations. If we only need to query the data once, then the time for data preprocessing in the other three methods would be longer than the time for a linear search.
- Suitable for small volumes of data, where time complexity has a smaller impact on efficiency.
- Suitable for scenarios with high data update frequency, because this method does not require any additional maintenance of the data.
- Suitable for scenarios with very frequent data updates, because this method does not require any additional maintenance of the data.
**Binary search**
- Suitable for large data volumes, with stable efficiency performance, the worst time complexity being $O(\log n)$.
- The data volume cannot be too large, because storing arrays requires contiguous memory space.
- Not suitable for scenarios with frequent additions and deletions, because maintaining an ordered array incurs high overhead.
- Suitable for larger data volumes, with stable performance and a worst-case time complexity of $O(\log n)$.
- However, the data volume cannot be too large, because storing arrays requires contiguous memory space.
- Not suitable for scenarios with frequent additions and deletions, because maintaining an ordered array incurs a lot of overhead.
**Hash search**
- Suitable for scenarios with high query performance requirements, with an average time complexity of $O(1)$.
- Suitable for scenarios where fast query performance is essential, with an average time complexity of $O(1)$.
- Not suitable for scenarios needing ordered data or range searches, because hash tables cannot maintain data orderliness.
- High dependency on hash functions and hash collision handling strategies, with significant performance degradation risks.
- Not suitable for overly large data volumes, because hash tables need extra space to minimize collisions and provide good query performance.
@ -80,5 +80,5 @@ The choice of search algorithm also depends on the volume of data, search perfor
- Suitable for massive data, because tree nodes are stored scattered in memory.
- Suitable for maintaining ordered data or range searches.
- In the continuous addition and deletion of nodes, the binary search tree may become skewed, degrading the time complexity to $O(n)$.
- With the continuous addition and deletion of nodes, the binary search tree may become skewed, degrading the time complexity to $O(n)$.
- If using AVL trees or red-black trees, operations can run stably at $O(\log n)$ efficiency, but the operation to maintain tree balance adds extra overhead.

View file

@ -414,7 +414,7 @@
<!-- contributors -->
<div style="margin: 2em auto;">
<h3>Contributors</h3>
<p>This book has been optimized by the efforts of over 180 contributors. We sincerely thank them for their invaluable time and contributions!</p>
<p>This book has been refined by the efforts of over 180 contributors. We sincerely thank them for their invaluable time and contributions!</p>
<a href="https://github.com/krahets/hello-algo/graphs/contributors">
<img src="https://contrib.rocks/image?repo=krahets/hello-algo&max=300&columns=16" alt="Contributors" style="width: 100%; max-width: 38.5em;">
</a>

View file

@ -115,9 +115,9 @@ int main() {
int i = 1;
int l = abt.left(i), r = abt.right(i), p = abt.parent(i);
cout << "\n當前節點的索引為 " << i << ",值為 " << abt.val(i) << "\n";
cout << "其左子節點的索引為 " << l << ",值為 " << (l != INT_MAX ? to_string(abt.val(l)) : "nullptr") << "\n";
cout << "其右子節點的索引為 " << r << ",值為 " << (r != INT_MAX ? to_string(abt.val(r)) : "nullptr") << "\n";
cout << "其父節點的索引為 " << p << ",值為 " << (p != INT_MAX ? to_string(abt.val(p)) : "nullptr") << "\n";
cout << "其左子節點的索引為 " << l << ",值為 " << (abt.val(l) != INT_MAX ? to_string(abt.val(l)) : "nullptr") << "\n";
cout << "其右子節點的索引為 " << r << ",值為 " << (abt.val(r) != INT_MAX ? to_string(abt.val(r)) : "nullptr") << "\n";
cout << "其父節點的索引為 " << p << ",值為 " << (abt.val(p) != INT_MAX ? to_string(abt.val(p)) : "nullptr") << "\n";
// 走訪樹
vector<int> res = abt.levelOrder();

View file

@ -71,6 +71,16 @@
另一方面,必要使用鏈結串列的情況主要是二元樹和圖。堆疊和佇列往往會使用程式語言提供的 `stack``queue` ,而非鏈結串列。
**Q**初始化串列 `res = [0] * self.size()` 操作,會導致 `res` 的每個元素引用相同的位址嗎?
**Q**操作 `res = [[0]] * n` 生成了一個二維串列,其中每一個 `[0]` 都是獨立的嗎?
不會。但二維陣列會有這個問題,例如初始化二維串列 `res = [[0]] * self.size()` ,則多次引用了同一個串列 `[0]`
不是獨立的。此二維串列中,所有的 `[0]` 實際上是同一個物件的引用。如果我們修改其中一個元素,會發現所有的對應元素都會隨之改變。
如果希望二維串列中的每個 `[0]` 都是獨立的,可以使用 `res = [[0] for _ in range(n)]` 來實現。這種方式的原理是初始化了 $n$ 個獨立的 `[0]` 串列物件。
**Q**:操作 `res = [0] * n` 生成了一個串列,其中每一個整數 0 都是獨立的嗎?
在該串列中,所有整數 0 都是同一個物件的引用。這是因為 Python 對小整數(通常是 -5 到 256採用了快取池機制以便最大化物件複用從而提升效能。
雖然它們指向同一個物件,但我們仍然可以獨立修改串列中的每個元素,這是因為 Python 的整數是“不可變物件”。當我們修改某個元素時,實際上是切換為另一個物件的引用,而不是改變原有物件本身。
然而,當串列元素是“可變物件”時(例如串列、字典或類別例項等),修改某個元素會直接改變該物件本身,所有引用該物件的元素都會產生相同變化。

View file

@ -1120,7 +1120,7 @@ $$
生物學的“細胞分裂”是指數階增長的典型例子:初始狀態為 $1$ 個細胞,分裂一輪後變為 $2$ 個,分裂兩輪後變為 $4$ 個,以此類推,分裂 $n$ 輪後有 $2^n$ 個細胞。
下圖和以下程式碼模擬了細胞分裂的過程,時間複雜度為 $O(2^n)$
下圖和以下程式碼模擬了細胞分裂的過程,時間複雜度為 $O(2^n)$ 。請注意,輸入 $n$ 表示分裂輪數,返回值 `count` 表示總分裂次數。
```src
[file]{time_complexity}-[class]{}-[func]{exponential}

View file

@ -5,7 +5,7 @@
- 整理撲克的過程與插入排序演算法非常類似。插入排序演算法適合排序小型資料集。
- 貨幣找零的步驟本質上是貪婪演算法,每一步都採取當前看來最好的選擇。
- 演算法是在有限時間內解決特定問題的一組指令或操作步驟,而資料結構是計算機中組織和儲存資料的方式。
- 資料結構與演算法緊密相連。資料結構是演算法的基石,而演算法是資料結構發揮作用的舞臺
- 資料結構與演算法緊密相連。資料結構是演算法的基石,而演算法為資料結構注入生命力
- 我們可以將資料結構與演算法類比為拼裝積木,積木代表資料,積木的形狀和連線方式等代表資料結構,拼裝積木的步驟則對應演算法。
### Q & A

View file

@ -26,7 +26,7 @@
如下圖所示,資料結構與演算法高度相關、緊密結合,具體表現在以下三個方面。
- 資料結構是演算法的基石。資料結構為演算法提供了結構化儲存的資料,以及操作資料的方法。
- 演算法是資料結構發揮作用的舞臺。資料結構本身僅儲存資料資訊,結合演算法才能解決特定問題。
- 演算法為資料結構注入生命力。資料結構本身僅儲存資料資訊,結合演算法才能解決特定問題。
- 演算法通常可以基於不同的資料結構實現,但執行效率可能相差很大,選擇合適的資料結構是關鍵。
![資料結構與演算法的關係](what_is_dsa.assets/relationship_between_data_structure_and_algorithm.png)