diff --git a/docs-en/chapter_array_and_linkedlist/array.md b/docs-en/chapter_array_and_linkedlist/array.md index b080c0d86..499562d0b 100755 --- a/docs-en/chapter_array_and_linkedlist/array.md +++ b/docs-en/chapter_array_and_linkedlist/array.md @@ -287,7 +287,7 @@ Accessing elements in an array is highly efficient, allowing us to randomly acce } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -469,7 +469,7 @@ It's important to note that due to the fixed length of an array, inserting an el } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -628,7 +628,7 @@ Please note that after deletion, the former last element becomes "meaningless," } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -852,7 +852,7 @@ In most programming languages, we can traverse an array either by using indices } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1020,7 +1020,7 @@ Because arrays are linear data structures, this operation is commonly referred t } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1230,7 +1230,7 @@ To expand an array, it's necessary to create a larger array and then copy the e } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
diff --git a/docs-en/chapter_array_and_linkedlist/linked_list.md b/docs-en/chapter_array_and_linkedlist/linked_list.md index 49f861d0e..2857f6ac4 100755 --- a/docs-en/chapter_array_and_linkedlist/linked_list.md +++ b/docs-en/chapter_array_and_linkedlist/linked_list.md @@ -540,7 +540,7 @@ In contrast, the time complexity of inserting an element in an array is $O(n)$, } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -730,7 +730,7 @@ Note that although node `P` still points to `n1` after the deletion operation is } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -909,7 +909,7 @@ Note that although node `P` still points to `n1` after the deletion operation is } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1111,7 +1111,7 @@ Traverse the linked list to find a node with a value equal to `target`, and outp } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
diff --git a/docs-en/chapter_array_and_linkedlist/list.md b/docs-en/chapter_array_and_linkedlist/list.md index e562b009a..789d90086 100755 --- a/docs-en/chapter_array_and_linkedlist/list.md +++ b/docs-en/chapter_array_and_linkedlist/list.md @@ -2119,7 +2119,7 @@ To enhance our understanding of how lists work, we will attempt to implement a s } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
diff --git a/docs-en/chapter_computational_complexity/iteration_and_recursion.md b/docs-en/chapter_computational_complexity/iteration_and_recursion.md index d0005a962..ee661b9ef 100644 --- a/docs-en/chapter_computational_complexity/iteration_and_recursion.md +++ b/docs-en/chapter_computational_complexity/iteration_and_recursion.md @@ -182,7 +182,7 @@ The following function implements the sum $1 + 2 + \dots + n$ using a `for` loop } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -393,7 +393,7 @@ Below we use a `while` loop to implement the sum $1 + 2 + \dots + n$: } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -617,7 +617,7 @@ For example, in the following code, the condition variable $i$ is updated twice } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -836,7 +836,7 @@ We can nest one loop structure within another. Below is an example using `for` l } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1046,7 +1046,7 @@ Observe the following code, where calling the function `recur(n)` completes the } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1247,7 +1247,7 @@ For example, in calculating $1 + 2 + \dots + n$, we can make the result variable } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1460,7 +1460,7 @@ Using the recursive relation, and considering the first two numbers as terminati } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1783,7 +1783,7 @@ Therefore, **we can use an explicit stack to simulate the behavior of the call s } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
diff --git a/docs-en/chapter_computational_complexity/space_complexity.md b/docs-en/chapter_computational_complexity/space_complexity.md index 337a7b9b9..97c92e202 100644 --- a/docs-en/chapter_computational_complexity/space_complexity.md +++ b/docs-en/chapter_computational_complexity/space_complexity.md @@ -1065,7 +1065,7 @@ Note that memory occupied by initializing variables or calling functions in a lo } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1334,7 +1334,7 @@ Linear order is common in arrays, linked lists, stacks, queues, etc., where the } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1480,7 +1480,7 @@ As shown below, this function's recursive depth is $n$, meaning there are $n$ in } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1705,7 +1705,7 @@ Quadratic order is common in matrices and graphs, where the number of elements i } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1869,7 +1869,7 @@ As shown below, the recursive depth of this function is $n$, and in each recursi } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -2047,7 +2047,7 @@ Exponential order is common in binary trees. Observe the below image, a "full bi } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
diff --git a/docs-en/chapter_computational_complexity/time_complexity.md b/docs-en/chapter_computational_complexity/time_complexity.md index 0af6f5def..8ca18e698 100644 --- a/docs-en/chapter_computational_complexity/time_complexity.md +++ b/docs-en/chapter_computational_complexity/time_complexity.md @@ -1119,7 +1119,7 @@ Constant order means the number of operations is independent of the input data s } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1276,7 +1276,7 @@ Linear order indicates the number of operations grows linearly with the input da } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1449,7 +1449,7 @@ Operations like array traversal and linked list traversal have a time complexity } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1651,7 +1651,7 @@ Quadratic order means the number of operations grows quadratically with the inpu } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -1936,7 +1936,7 @@ For instance, in bubble sort, the outer loop runs $n - 1$ times, and the inner l } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -2169,7 +2169,7 @@ The following image and code simulate the cell division process, with a time com } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -2309,7 +2309,7 @@ In practice, exponential order often appears in recursive functions. For example } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -2491,7 +2491,7 @@ The following image and code simulate the "halving each round" process, with a t } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -2631,7 +2631,7 @@ Like exponential order, logarithmic order also frequently appears in recursive f } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -2829,7 +2829,7 @@ Linear-logarithmic order often appears in nested loops, with the complexities of } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -3040,7 +3040,7 @@ Factorials are typically implemented using recursion. As shown in the image and } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
@@ -3407,7 +3407,7 @@ The "worst-case time complexity" corresponds to the asymptotic upper bound, deno } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
Full Screen >
diff --git a/docs-en/chapter_hashing/hash_algorithm.md b/docs-en/chapter_hashing/hash_algorithm.md new file mode 100644 index 000000000..773a52994 --- /dev/null +++ b/docs-en/chapter_hashing/hash_algorithm.md @@ -0,0 +1,886 @@ +--- +comments: true +--- + +# 6.3   Hash Algorithms + +The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and chaining can **only ensure that the hash table functions normally when collisions occur, but cannot reduce the frequency of hash collisions**. + +If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the Figure 6-8 , for a chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$. + +![Ideal and Worst Cases of Hash Collisions](hash_algorithm.assets/hash_collision_best_worst_condition.png){ class="animation-figure" } + +

Figure 6-8   Ideal and Worst Cases of Hash Collisions

+ +**The distribution of key-value pairs is determined by the hash function**. Recalling the steps of calculating a hash function, first compute the hash value, then modulo it by the array length: + +```shell +index = hash(key) % capacity +``` + +Observing the above formula, when the hash table capacity `capacity` is fixed, **the hash algorithm `hash()` determines the output value**, thereby determining the distribution of key-value pairs in the hash table. + +This means that, to reduce the probability of hash collisions, we should focus on the design of the hash algorithm `hash()`. + +## 6.3.1   Goals of Hash Algorithms + +To achieve a "fast and stable" hash table data structure, hash algorithms should have the following characteristics: + +- **Determinism**: For the same input, the hash algorithm should always produce the same output. Only then can the hash table be reliable. +- **High Efficiency**: The process of computing the hash value should be fast enough. The smaller the computational overhead, the more practical the hash table. +- **Uniform Distribution**: The hash algorithm should ensure that key-value pairs are evenly distributed in the hash table. The more uniform the distribution, the lower the probability of hash collisions. + +In fact, hash algorithms are not only used to implement hash tables but are also widely applied in other fields. + +- **Password Storage**: To protect the security of user passwords, systems usually do not store the plaintext passwords but rather the hash values of the passwords. When a user enters a password, the system calculates the hash value of the input and compares it with the stored hash value. If they match, the password is considered correct. +- **Data Integrity Check**: The data sender can calculate the hash value of the data and send it along; the receiver can recalculate the hash value of the received data and compare it with the received hash value. If they match, the data is considered intact. + +For cryptographic applications, to prevent reverse engineering such as deducing the original password from the hash value, hash algorithms need higher-level security features. + +- **Unidirectionality**: It should be impossible to deduce any information about the input data from the hash value. +- **Collision Resistance**: It should be extremely difficult to find two different inputs that produce the same hash value. +- **Avalanche Effect**: Minor changes in the input should lead to significant and unpredictable changes in the output. + +Note that **"Uniform Distribution" and "Collision Resistance" are two separate concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password. + +## 6.3.2   Design of Hash Algorithms + +The design of hash algorithms is a complex issue that requires consideration of many factors. However, for some less demanding scenarios, we can also design some simple hash algorithms. + +- **Additive Hash**: Add up the ASCII codes of each character in the input and use the total sum as the hash value. +- **Multiplicative Hash**: Utilize the non-correlation of multiplication, multiplying each round by a constant, accumulating the ASCII codes of each character into the hash value. +- **XOR Hash**: Accumulate the hash value by XORing each element of the input data. +- **Rotating Hash**: Accumulate the ASCII code of each character into a hash value, performing a rotation operation on the hash value before each accumulation. + +=== "Python" + + ```python title="simple_hash.py" + def add_hash(key: str) -> int: + """加法哈希""" + hash = 0 + modulus = 1000000007 + for c in key: + hash += ord(c) + return hash % modulus + + def mul_hash(key: str) -> int: + """乘法哈希""" + hash = 0 + modulus = 1000000007 + for c in key: + hash = 31 * hash + ord(c) + return hash % modulus + + def xor_hash(key: str) -> int: + """异或哈希""" + hash = 0 + modulus = 1000000007 + for c in key: + hash ^= ord(c) + return hash % modulus + + def rot_hash(key: str) -> int: + """旋转哈希""" + hash = 0 + modulus = 1000000007 + for c in key: + hash = (hash << 4) ^ (hash >> 28) ^ ord(c) + return hash % modulus + ``` + +=== "C++" + + ```cpp title="simple_hash.cpp" + /* 加法哈希 */ + int addHash(string key) { + long long hash = 0; + const int MODULUS = 1000000007; + for (unsigned char c : key) { + hash = (hash + (int)c) % MODULUS; + } + return (int)hash; + } + + /* 乘法哈希 */ + int mulHash(string key) { + long long hash = 0; + const int MODULUS = 1000000007; + for (unsigned char c : key) { + hash = (31 * hash + (int)c) % MODULUS; + } + return (int)hash; + } + + /* 异或哈希 */ + int xorHash(string key) { + int hash = 0; + const int MODULUS = 1000000007; + for (unsigned char c : key) { + hash ^= (int)c; + } + return hash & MODULUS; + } + + /* 旋转哈希 */ + int rotHash(string key) { + long long hash = 0; + const int MODULUS = 1000000007; + for (unsigned char c : key) { + hash = ((hash << 4) ^ (hash >> 28) ^ (int)c) % MODULUS; + } + return (int)hash; + } + ``` + +=== "Java" + + ```java title="simple_hash.java" + /* 加法哈希 */ + int addHash(String key) { + long hash = 0; + final int MODULUS = 1000000007; + for (char c : key.toCharArray()) { + hash = (hash + (int) c) % MODULUS; + } + return (int) hash; + } + + /* 乘法哈希 */ + int mulHash(String key) { + long hash = 0; + final int MODULUS = 1000000007; + for (char c : key.toCharArray()) { + hash = (31 * hash + (int) c) % MODULUS; + } + return (int) hash; + } + + /* 异或哈希 */ + int xorHash(String key) { + int hash = 0; + final int MODULUS = 1000000007; + for (char c : key.toCharArray()) { + hash ^= (int) c; + } + return hash & MODULUS; + } + + /* 旋转哈希 */ + int rotHash(String key) { + long hash = 0; + final int MODULUS = 1000000007; + for (char c : key.toCharArray()) { + hash = ((hash << 4) ^ (hash >> 28) ^ (int) c) % MODULUS; + } + return (int) hash; + } + ``` + +=== "C#" + + ```csharp title="simple_hash.cs" + /* 加法哈希 */ + int AddHash(string key) { + long hash = 0; + const int MODULUS = 1000000007; + foreach (char c in key) { + hash = (hash + c) % MODULUS; + } + return (int)hash; + } + + /* 乘法哈希 */ + int MulHash(string key) { + long hash = 0; + const int MODULUS = 1000000007; + foreach (char c in key) { + hash = (31 * hash + c) % MODULUS; + } + return (int)hash; + } + + /* 异或哈希 */ + int XorHash(string key) { + int hash = 0; + const int MODULUS = 1000000007; + foreach (char c in key) { + hash ^= c; + } + return hash & MODULUS; + } + + /* 旋转哈希 */ + int RotHash(string key) { + long hash = 0; + const int MODULUS = 1000000007; + foreach (char c in key) { + hash = ((hash << 4) ^ (hash >> 28) ^ c) % MODULUS; + } + return (int)hash; + } + ``` + +=== "Go" + + ```go title="simple_hash.go" + /* 加法哈希 */ + func addHash(key string) int { + var hash int64 + var modulus int64 + + modulus = 1000000007 + for _, b := range []byte(key) { + hash = (hash + int64(b)) % modulus + } + return int(hash) + } + + /* 乘法哈希 */ + func mulHash(key string) int { + var hash int64 + var modulus int64 + + modulus = 1000000007 + for _, b := range []byte(key) { + hash = (31*hash + int64(b)) % modulus + } + return int(hash) + } + + /* 异或哈希 */ + func xorHash(key string) int { + hash := 0 + modulus := 1000000007 + for _, b := range []byte(key) { + fmt.Println(int(b)) + hash ^= int(b) + hash = (31*hash + int(b)) % modulus + } + return hash & modulus + } + + /* 旋转哈希 */ + func rotHash(key string) int { + var hash int64 + var modulus int64 + + modulus = 1000000007 + for _, b := range []byte(key) { + hash = ((hash << 4) ^ (hash >> 28) ^ int64(b)) % modulus + } + return int(hash) + } + ``` + +=== "Swift" + + ```swift title="simple_hash.swift" + /* 加法哈希 */ + func addHash(key: String) -> Int { + var hash = 0 + let MODULUS = 1_000_000_007 + for c in key { + for scalar in c.unicodeScalars { + hash = (hash + Int(scalar.value)) % MODULUS + } + } + return hash + } + + /* 乘法哈希 */ + func mulHash(key: String) -> Int { + var hash = 0 + let MODULUS = 1_000_000_007 + for c in key { + for scalar in c.unicodeScalars { + hash = (31 * hash + Int(scalar.value)) % MODULUS + } + } + return hash + } + + /* 异或哈希 */ + func xorHash(key: String) -> Int { + var hash = 0 + let MODULUS = 1_000_000_007 + for c in key { + for scalar in c.unicodeScalars { + hash ^= Int(scalar.value) + } + } + return hash & MODULUS + } + + /* 旋转哈希 */ + func rotHash(key: String) -> Int { + var hash = 0 + let MODULUS = 1_000_000_007 + for c in key { + for scalar in c.unicodeScalars { + hash = ((hash << 4) ^ (hash >> 28) ^ Int(scalar.value)) % MODULUS + } + } + return hash + } + ``` + +=== "JS" + + ```javascript title="simple_hash.js" + /* 加法哈希 */ + function addHash(key) { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash = (hash + c.charCodeAt(0)) % MODULUS; + } + return hash; + } + + /* 乘法哈希 */ + function mulHash(key) { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash = (31 * hash + c.charCodeAt(0)) % MODULUS; + } + return hash; + } + + /* 异或哈希 */ + function xorHash(key) { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash ^= c.charCodeAt(0); + } + return hash & MODULUS; + } + + /* 旋转哈希 */ + function rotHash(key) { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash = ((hash << 4) ^ (hash >> 28) ^ c.charCodeAt(0)) % MODULUS; + } + return hash; + } + ``` + +=== "TS" + + ```typescript title="simple_hash.ts" + /* 加法哈希 */ + function addHash(key: string): number { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash = (hash + c.charCodeAt(0)) % MODULUS; + } + return hash; + } + + /* 乘法哈希 */ + function mulHash(key: string): number { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash = (31 * hash + c.charCodeAt(0)) % MODULUS; + } + return hash; + } + + /* 异或哈希 */ + function xorHash(key: string): number { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash ^= c.charCodeAt(0); + } + return hash & MODULUS; + } + + /* 旋转哈希 */ + function rotHash(key: string): number { + let hash = 0; + const MODULUS = 1000000007; + for (const c of key) { + hash = ((hash << 4) ^ (hash >> 28) ^ c.charCodeAt(0)) % MODULUS; + } + return hash; + } + ``` + +=== "Dart" + + ```dart title="simple_hash.dart" + /* 加法哈希 */ + int addHash(String key) { + int hash = 0; + final int MODULUS = 1000000007; + for (int i = 0; i < key.length; i++) { + hash = (hash + key.codeUnitAt(i)) % MODULUS; + } + return hash; + } + + /* 乘法哈希 */ + int mulHash(String key) { + int hash = 0; + final int MODULUS = 1000000007; + for (int i = 0; i < key.length; i++) { + hash = (31 * hash + key.codeUnitAt(i)) % MODULUS; + } + return hash; + } + + /* 异或哈希 */ + int xorHash(String key) { + int hash = 0; + final int MODULUS = 1000000007; + for (int i = 0; i < key.length; i++) { + hash ^= key.codeUnitAt(i); + } + return hash & MODULUS; + } + + /* 旋转哈希 */ + int rotHash(String key) { + int hash = 0; + final int MODULUS = 1000000007; + for (int i = 0; i < key.length; i++) { + hash = ((hash << 4) ^ (hash >> 28) ^ key.codeUnitAt(i)) % MODULUS; + } + return hash; + } + ``` + +=== "Rust" + + ```rust title="simple_hash.rs" + /* 加法哈希 */ + fn add_hash(key: &str) -> i32 { + let mut hash = 0_i64; + const MODULUS: i64 = 1000000007; + + for c in key.chars() { + hash = (hash + c as i64) % MODULUS; + } + + hash as i32 + } + + /* 乘法哈希 */ + fn mul_hash(key: &str) -> i32 { + let mut hash = 0_i64; + const MODULUS: i64 = 1000000007; + + for c in key.chars() { + hash = (31 * hash + c as i64) % MODULUS; + } + + hash as i32 + } + + /* 异或哈希 */ + fn xor_hash(key: &str) -> i32 { + let mut hash = 0_i64; + const MODULUS: i64 = 1000000007; + + for c in key.chars() { + hash ^= c as i64; + } + + (hash & MODULUS) as i32 + } + + /* 旋转哈希 */ + fn rot_hash(key: &str) -> i32 { + let mut hash = 0_i64; + const MODULUS: i64 = 1000000007; + + for c in key.chars() { + hash = ((hash << 4) ^ (hash >> 28) ^ c as i64) % MODULUS; + } + + hash as i32 + } + ``` + +=== "C" + + ```c title="simple_hash.c" + /* 加法哈希 */ + int addHash(char *key) { + long long hash = 0; + const int MODULUS = 1000000007; + for (int i = 0; i < strlen(key); i++) { + hash = (hash + (unsigned char)key[i]) % MODULUS; + } + return (int)hash; + } + + /* 乘法哈希 */ + int mulHash(char *key) { + long long hash = 0; + const int MODULUS = 1000000007; + for (int i = 0; i < strlen(key); i++) { + hash = (31 * hash + (unsigned char)key[i]) % MODULUS; + } + return (int)hash; + } + + /* 异或哈希 */ + int xorHash(char *key) { + int hash = 0; + const int MODULUS = 1000000007; + + for (int i = 0; i < strlen(key); i++) { + hash ^= (unsigned char)key[i]; + } + return hash & MODULUS; + } + + /* 旋转哈希 */ + int rotHash(char *key) { + long long hash = 0; + const int MODULUS = 1000000007; + for (int i = 0; i < strlen(key); i++) { + hash = ((hash << 4) ^ (hash >> 28) ^ (unsigned char)key[i]) % MODULUS; + } + + return (int)hash; + } + ``` + +=== "Zig" + + ```zig title="simple_hash.zig" + [class]{}-[func]{addHash} + + [class]{}-[func]{mulHash} + + [class]{}-[func]{xorHash} + + [class]{}-[func]{rotHash} + ``` + +??? pythontutor "Code Visualization" + +
+
Full Screen >
+ +It is observed that the last step of each hash algorithm is to take the modulus of the large prime number $1000000007$ to ensure that the hash value is within an appropriate range. It is worth pondering why emphasis is placed on modulo a prime number, or what are the disadvantages of modulo a composite number? This is an interesting question. + +To conclude: **Using a large prime number as the modulus can maximize the uniform distribution of hash values**. Since a prime number does not share common factors with other numbers, it can reduce the periodic patterns caused by the modulo operation, thus avoiding hash collisions. + +For example, suppose we choose the composite number $9$ as the modulus, which can be divided by $3$, then all `key` divisible by $3$ will be mapped to hash values $0$, $3$, $6$. + +$$ +\begin{aligned} +\text{modulus} & = 9 \newline +\text{key} & = \{ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, \dots \} \newline +\text{hash} & = \{ 0, 3, 6, 0, 3, 6, 0, 3, 6, 0, 3, 6,\dots \} +\end{aligned} +$$ + +If the input `key` happens to have this kind of arithmetic sequence distribution, then the hash values will cluster, thereby exacerbating hash collisions. Now, suppose we replace `modulus` with the prime number $13$, since there are no common factors between `key` and `modulus`, the uniformity of the output hash values will be significantly improved. + +$$ +\begin{aligned} +\text{modulus} & = 13 \newline +\text{key} & = \{ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, \dots \} \newline +\text{hash} & = \{ 0, 3, 6, 9, 12, 2, 5, 8, 11, 1, 4, 7, \dots \} +\end{aligned} +$$ + +It is worth noting that if the `key` is guaranteed to be randomly and uniformly distributed, then choosing a prime number or a composite number as the modulus can both produce uniformly distributed hash values. However, when the distribution of `key` has some periodicity, modulo a composite number is more likely to result in clustering. + +In summary, we usually choose a prime number as the modulus, and this prime number should be large enough to eliminate periodic patterns as much as possible, enhancing the robustness of the hash algorithm. + +## 6.3.3   Common Hash Algorithms + +It is not hard to see that the simple hash algorithms mentioned above are quite "fragile" and far from reaching the design goals of hash algorithms. For example, since addition and XOR obey the commutative law, additive hash and XOR hash cannot distinguish strings with the same content but in different order, which may exacerbate hash collisions and cause security issues. + +In practice, we usually use some standard hash algorithms, such as MD5, SHA-1, SHA-2, and SHA-3. They can map input data of any length to a fixed-length hash value. + +Over the past century, hash algorithms have been in a continuous process of upgrading and optimization. Some researchers strive to improve the performance of hash algorithms, while others, including hackers, are dedicated to finding security issues in hash algorithms. The Table 6-2 shows hash algorithms commonly used in practical applications. + +- MD5 and SHA-1 have been successfully attacked multiple times and are thus abandoned in various security applications. +- SHA-2 series, especially SHA-256, is one of the most secure hash algorithms to date, with no successful attacks reported, hence commonly used in various security applications and protocols. +- SHA-3 has lower implementation costs and higher computational efficiency compared to SHA-2, but its current usage coverage is not as extensive as the SHA-2 series. + +

Table 6-2   Common Hash Algorithms

+ +
+ +| | MD5 | SHA-1 | SHA-2 | SHA-3 | +| --------------- | ----------------------------------------------- | ----------------------------------- | ----------------------------------------------------------------- | ---------------------------- | +| Release Year | 1992 | 1995 | 2002 | 2008 | +| Output Length | 128 bit | 160 bit | 256/512 bit | 224/256/384/512 bit | +| Hash Collisions | Frequent | Frequent | Rare | Rare | +| Security Level | Low, has been successfully attacked | Low, has been successfully attacked | High | High | +| Applications | Abandoned, still used for data integrity checks | Abandoned | Cryptocurrency transaction verification, digital signatures, etc. | Can be used to replace SHA-2 | + +
+ +# Hash Values in Data Structures + +We know that the keys in a hash table can be of various data types such as integers, decimals, or strings. Programming languages usually provide built-in hash algorithms for these data types to calculate the bucket indices in the hash table. Taking Python as an example, we can use the `hash()` function to compute the hash values for various data types. + +- The hash values of integers and booleans are their own values. +- The calculation of hash values for floating-point numbers and strings is more complex, and interested readers are encouraged to study this on their own. +- The hash value of a tuple is a combination of the hash values of each of its elements, resulting in a single hash value. +- The hash value of an object is generated based on its memory address. By overriding the hash method of an object, hash values can be generated based on content. + +!!! tip + + Be aware that the definition and methods of the built-in hash value calculation functions in different programming languages vary. + +=== "Python" + + ```python title="built_in_hash.py" + num = 3 + hash_num = hash(num) + # Hash value of integer 3 is 3 + + bol = True + hash_bol = hash(bol) + # Hash value of boolean True is 1 + + dec = 3.14159 + hash_dec = hash(dec) + # Hash value of decimal 3.14159 is 326484311674566659 + + str = "Hello 算法" + hash_str = hash(str) + # Hash value of string "Hello 算法" is 4617003410720528961 + + tup = (12836, "小哈") + hash_tup = hash(tup) + # Hash value of tuple (12836, '小哈') is 1029005403108185979 + + obj = ListNode(0) + hash_obj = hash(obj) + # Hash value of ListNode object at 0x1058fd810 is 274267521 + ``` + +=== "C++" + + ```cpp title="built_in_hash.cpp" + int num = 3; + size_t hashNum = hash()(num); + // Hash value of integer 3 is 3 + + bool bol = true; + size_t hashBol = hash()(bol); + // Hash value of boolean 1 is 1 + + double dec = 3.14159; + size_t hashDec = hash()(dec); + // Hash value of decimal 3.14159 is 4614256650576692846 + + string str = "Hello 算法"; + size_t hashStr = hash()(str); + // Hash value of string "Hello 算法" is 15466937326284535026 + + // In C++, built-in std::hash() only provides hash values for basic data types + // Hash values for arrays and objects need to be implemented separately + ``` + +=== "Java" + + ```java title="built_in_hash.java" + int num = 3; + int hashNum = Integer.hashCode(num); + // Hash value of integer 3 is 3 + + boolean bol = true; + int hashBol = Boolean.hashCode(bol); + // Hash value of boolean true is 1231 + + double dec = 3.14159; + int hashDec = Double.hashCode(dec); + // Hash value of decimal 3.14159 is -1340954729 + + String str = "Hello 算法"; + int hashStr = str.hashCode(); + // Hash value of string "Hello 算法" is -727081396 + + Object[] arr = { 12836, "小哈" }; + int hashTup = Arrays.hashCode(arr); + // Hash value of array [12836, 小哈] is 1151158 + + ListNode obj = new ListNode(0); + int hashObj = obj.hashCode(); + // Hash value of ListNode object utils.ListNode@7dc5e7b4 is 2110121908 + ``` + +=== "C#" + + ```csharp title="built_in_hash.cs" + int num = 3; + int hashNum = num.GetHashCode(); + // Hash value of integer 3 is 3; + + bool bol = true; + int hashBol = bol.GetHashCode(); + // Hash value of boolean true is 1; + + double dec = 3.14159; + int hashDec = dec.GetHashCode(); + // Hash value of decimal 3.14159 is -1340954729; + + string str = "Hello 算法"; + int hashStr = str.GetHashCode(); + // Hash value of string "Hello 算法" is -586107568; + + object[] arr = [12836, "小哈"]; + int hashTup = arr.GetHashCode(); + // Hash value of array [12836, 小哈] is 42931033; + + ListNode obj = new(0); + int hashObj = obj.GetHashCode(); + // Hash value of ListNode object 0 is 39053774; + ``` + +=== "Go" + + ```go title="built_in_hash.go" + // Go does not provide built-in hash code functions + ``` + +=== "Swift" + + ```swift title="built_in_hash.swift" + let num = 3 + let hashNum = num.hashValue + // Hash value of integer 3 is 9047044699613009734 + + let bol = true + let hashBol = bol.hashValue + // Hash value of boolean true is -4431640247352757451 + + let dec = 3.14159 + let hashDec = dec.hashValue + // Hash value of decimal 3.14159 is -2465384235396674631 + + let str = "Hello 算法" + let hashStr = str.hashValue + // Hash value of string "Hello 算法" is -7850626797806988787 + + let arr = [AnyHashable(12836), AnyHashable("小哈")] + let hashTup = arr.hashValue + // Hash value of array [AnyHashable(12836), AnyHashable("小哈")] is -2308633508154532996 + + let obj = ListNode(x: 0) + let hashObj = obj.hashValue + // Hash value of ListNode object utils.ListNode is -2434780518035996159 + ``` + +=== "JS" + + ```javascript title="built_in_hash.js" + // JavaScript does not provide built-in hash code functions + ``` + +=== "TS" + + ```typescript title="built_in_hash.ts" + // TypeScript does not provide built-in hash code functions + ``` + +=== "Dart" + + ```dart title="built_in_hash.dart" + int num = 3; + int hashNum = num.hashCode; + // Hash value of integer 3 is 34803 + + bool bol = true; + int hashBol = bol.hashCode; + // Hash value of boolean true is 1231 + + double dec = 3.14159; + int hashDec = dec.hashCode; + // Hash value of decimal 3.14159 is 2570631074981783 + + String str = "Hello 算法"; + int hashStr = str.hashCode; + // Hash value of string "Hello 算法" is 468167534 + + List arr = [12836, "小哈"]; + int hashArr = arr.hashCode; + // Hash value of array [12836, 小哈] is 976512528 + + ListNode obj = new ListNode(0); + int hashObj = obj.hashCode; + // Hash value of ListNode object Instance of 'ListNode' is 1033450432 + ``` + +=== "Rust" + + ```rust title="built_in_hash.rs" + use std::collections::hash_map::DefaultHasher; + use std::hash::{Hash, Hasher}; + + let num = 3; + let mut num_hasher = DefaultHasher::new(); + num.hash(&mut num_hasher); + let hash_num = num_hasher.finish(); + // Hash value of integer 3 is 568126464209439262 + + let bol = true; + let mut bol_hasher = DefaultHasher::new(); + bol.hash(&mut bol_hasher); + let hash_bol = bol_hasher.finish(); + // Hash value of boolean true is 4952851536318644461 + + let dec: f32 = 3.14159; + let mut dec_hasher = DefaultHasher::new(); + dec.to_bits().hash(&mut dec_hasher); + let hash_dec = dec_hasher.finish(); + // Hash value of decimal 3.14159 is 2566941990314602357 + + let str = "Hello 算法"; + let mut str_hasher = DefaultHasher::new(); + str.hash(&mut str_hasher); + let hash_str = str_hasher.finish(); + // Hash value of string "Hello 算法" is 16092673739211250988 + + let arr = (&12836, &"小哈"); + let mut tup_hasher = DefaultHasher::new(); + arr.hash(&mut tup_hasher); + let hash_tup = tup_hasher.finish(); + // Hash value of tuple (12836, "小哈") is 1885128010422702749 + + let node = ListNode::new(42); + let mut hasher = DefaultHasher::new(); + node.borrow().val.hash(&mut hasher); + let hash = hasher.finish(); + // Hash value of ListNode object RefCell { value: ListNode { val: 42, next: None } } is 15387811073369036852 + ``` + +=== "C" + + ```c title="built_in_hash.c" + // C does not provide built-in hash code functions + ``` + +=== "Zig" + + ```zig title="built_in_hash.zig" + + ``` + +??? pythontutor "Code Visualization" + +
+
Full Screen >
+ +In many programming languages, **only immutable objects can serve as the `key` in a hash table**. If we use a list (dynamic array) as a `key`, when the contents of the list change, its hash value also changes, and we would no longer be able to find the original `value` in the hash table. + +Although the member variables of a custom object (such as a linked list node) are mutable, it is hashable. **This is because the hash value of an object is usually generated based on its memory address**, and even if the contents of the object change, the memory address remains the same, so the hash value remains unchanged. + +You might have noticed that the hash values output in different consoles are different. **This is because the Python interpreter adds a random salt to the string hash function each time it starts up**. This approach effectively prevents HashDoS attacks and enhances the security of the hash algorithm. diff --git a/docs-en/chapter_hashing/hash_collision.md b/docs-en/chapter_hashing/hash_collision.md new file mode 100644 index 000000000..37c42a3c5 --- /dev/null +++ b/docs-en/chapter_hashing/hash_collision.md @@ -0,0 +1,2882 @@ +--- +comments: true +--- + +# 6.2   Hash Collision + +As mentioned in the previous section, **usually the input space of a hash function is much larger than its output space**, making hash collisions theoretically inevitable. For example, if the input space consists of all integers and the output space is the size of the array capacity, multiple integers will inevitably map to the same bucket index. + +Hash collisions can lead to incorrect query results, severely affecting the usability of hash tables. To solve this problem, we expand the hash table whenever a hash collision occurs, until the collision is resolved. This method is simple and effective but inefficient due to the extensive data transfer and hash value computation involved in resizing the hash table. To improve efficiency, we can adopt the following strategies: + +1. Improve the data structure of the hash table, **allowing it to function normally in the event of a hash collision**. +2. Only perform resizing when necessary, i.e., when hash collisions are severe. + +There are mainly two methods for improving the structure of hash tables: "Separate Chaining" and "Open Addressing". + +## 6.2.1   Separate Chaining + +In the original hash table, each bucket can store only one key-value pair. "Separate chaining" transforms individual elements into a linked list, with key-value pairs as list nodes, storing all colliding key-value pairs in the same list. The Figure 6-5 shows an example of a hash table with separate chaining. + +![Separate Chaining Hash Table](hash_collision.assets/hash_table_chaining.png){ class="animation-figure" } + +

Figure 6-5   Separate Chaining Hash Table

+ +The operations of a hash table implemented with separate chaining have changed as follows: + +- **Querying Elements**: Input `key`, pass through the hash function to obtain the bucket index, access the head node of the list, then traverse the list and compare `key` to find the target key-value pair. +- **Adding Elements**: First access the list head node via the hash function, then add the node (key-value pair) to the list. +- **Deleting Elements**: Access the list head based on the hash function's result, then traverse the list to find and remove the target node. + +Separate chaining has the following limitations: + +- **Increased Space Usage**: The linked list contains node pointers, which consume more memory space than arrays. +- **Reduced Query Efficiency**: Due to the need for linear traversal of the list to find the corresponding element. + +The code below provides a simple implementation of a separate chaining hash table, with two things to note: + +- Lists (dynamic arrays) are used instead of linked lists for simplicity. In this setup, the hash table (array) contains multiple buckets, each of which is a list. +- This implementation includes a method for resizing the hash table. When the load factor exceeds $\frac{2}{3}$, we resize the hash table to twice its original size. + +=== "Python" + + ```python title="hash_map_chaining.py" + class HashMapChaining: + """链式地址哈希表""" + + def __init__(self): + """构造方法""" + self.size = 0 # 键值对数量 + self.capacity = 4 # 哈希表容量 + self.load_thres = 2.0 / 3.0 # 触发扩容的负载因子阈值 + self.extend_ratio = 2 # 扩容倍数 + self.buckets = [[] for _ in range(self.capacity)] # 桶数组 + + def hash_func(self, key: int) -> int: + """哈希函数""" + return key % self.capacity + + def load_factor(self) -> float: + """负载因子""" + return self.size / self.capacity + + def get(self, key: int) -> str | None: + """查询操作""" + index = self.hash_func(key) + bucket = self.buckets[index] + # 遍历桶,若找到 key ,则返回对应 val + for pair in bucket: + if pair.key == key: + return pair.val + # 若未找到 key ,则返回 None + return None + + def put(self, key: int, val: str): + """添加操作""" + # 当负载因子超过阈值时,执行扩容 + if self.load_factor() > self.load_thres: + self.extend() + index = self.hash_func(key) + bucket = self.buckets[index] + # 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for pair in bucket: + if pair.key == key: + pair.val = val + return + # 若无该 key ,则将键值对添加至尾部 + pair = Pair(key, val) + bucket.append(pair) + self.size += 1 + + def remove(self, key: int): + """删除操作""" + index = self.hash_func(key) + bucket = self.buckets[index] + # 遍历桶,从中删除键值对 + for pair in bucket: + if pair.key == key: + bucket.remove(pair) + self.size -= 1 + break + + def extend(self): + """扩容哈希表""" + # 暂存原哈希表 + buckets = self.buckets + # 初始化扩容后的新哈希表 + self.capacity *= self.extend_ratio + self.buckets = [[] for _ in range(self.capacity)] + self.size = 0 + # 将键值对从原哈希表搬运至新哈希表 + for bucket in buckets: + for pair in bucket: + self.put(pair.key, pair.val) + + def print(self): + """打印哈希表""" + for bucket in self.buckets: + res = [] + for pair in bucket: + res.append(str(pair.key) + " -> " + pair.val) + print(res) + ``` + +=== "C++" + + ```cpp title="hash_map_chaining.cpp" + /* 链式地址哈希表 */ + class HashMapChaining { + private: + int size; // 键值对数量 + int capacity; // 哈希表容量 + double loadThres; // 触发扩容的负载因子阈值 + int extendRatio; // 扩容倍数 + vector> buckets; // 桶数组 + + public: + /* 构造方法 */ + HashMapChaining() : size(0), capacity(4), loadThres(2.0 / 3.0), extendRatio(2) { + buckets.resize(capacity); + } + + /* 析构方法 */ + ~HashMapChaining() { + for (auto &bucket : buckets) { + for (Pair *pair : bucket) { + // 释放内存 + delete pair; + } + } + } + + /* 哈希函数 */ + int hashFunc(int key) { + return key % capacity; + } + + /* 负载因子 */ + double loadFactor() { + return (double)size / (double)capacity; + } + + /* 查询操作 */ + string get(int key) { + int index = hashFunc(key); + // 遍历桶,若找到 key ,则返回对应 val + for (Pair *pair : buckets[index]) { + if (pair->key == key) { + return pair->val; + } + } + // 若未找到 key ,则返回空字符串 + return ""; + } + + /* 添加操作 */ + void put(int key, string val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor() > loadThres) { + extend(); + } + int index = hashFunc(key); + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for (Pair *pair : buckets[index]) { + if (pair->key == key) { + pair->val = val; + return; + } + } + // 若无该 key ,则将键值对添加至尾部 + buckets[index].push_back(new Pair(key, val)); + size++; + } + + /* 删除操作 */ + void remove(int key) { + int index = hashFunc(key); + auto &bucket = buckets[index]; + // 遍历桶,从中删除键值对 + for (int i = 0; i < bucket.size(); i++) { + if (bucket[i]->key == key) { + Pair *tmp = bucket[i]; + bucket.erase(bucket.begin() + i); // 从中删除键值对 + delete tmp; // 释放内存 + size--; + return; + } + } + } + + /* 扩容哈希表 */ + void extend() { + // 暂存原哈希表 + vector> bucketsTmp = buckets; + // 初始化扩容后的新哈希表 + capacity *= extendRatio; + buckets.clear(); + buckets.resize(capacity); + size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (auto &bucket : bucketsTmp) { + for (Pair *pair : bucket) { + put(pair->key, pair->val); + // 释放内存 + delete pair; + } + } + } + + /* 打印哈希表 */ + void print() { + for (auto &bucket : buckets) { + cout << "["; + for (Pair *pair : bucket) { + cout << pair->key << " -> " << pair->val << ", "; + } + cout << "]\n"; + } + } + }; + ``` + +=== "Java" + + ```java title="hash_map_chaining.java" + /* 链式地址哈希表 */ + class HashMapChaining { + int size; // 键值对数量 + int capacity; // 哈希表容量 + double loadThres; // 触发扩容的负载因子阈值 + int extendRatio; // 扩容倍数 + List> buckets; // 桶数组 + + /* 构造方法 */ + public HashMapChaining() { + size = 0; + capacity = 4; + loadThres = 2.0 / 3.0; + extendRatio = 2; + buckets = new ArrayList<>(capacity); + for (int i = 0; i < capacity; i++) { + buckets.add(new ArrayList<>()); + } + } + + /* 哈希函数 */ + int hashFunc(int key) { + return key % capacity; + } + + /* 负载因子 */ + double loadFactor() { + return (double) size / capacity; + } + + /* 查询操作 */ + String get(int key) { + int index = hashFunc(key); + List bucket = buckets.get(index); + // 遍历桶,若找到 key ,则返回对应 val + for (Pair pair : bucket) { + if (pair.key == key) { + return pair.val; + } + } + // 若未找到 key ,则返回 null + return null; + } + + /* 添加操作 */ + void put(int key, String val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor() > loadThres) { + extend(); + } + int index = hashFunc(key); + List bucket = buckets.get(index); + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for (Pair pair : bucket) { + if (pair.key == key) { + pair.val = val; + return; + } + } + // 若无该 key ,则将键值对添加至尾部 + Pair pair = new Pair(key, val); + bucket.add(pair); + size++; + } + + /* 删除操作 */ + void remove(int key) { + int index = hashFunc(key); + List bucket = buckets.get(index); + // 遍历桶,从中删除键值对 + for (Pair pair : bucket) { + if (pair.key == key) { + bucket.remove(pair); + size--; + break; + } + } + } + + /* 扩容哈希表 */ + void extend() { + // 暂存原哈希表 + List> bucketsTmp = buckets; + // 初始化扩容后的新哈希表 + capacity *= extendRatio; + buckets = new ArrayList<>(capacity); + for (int i = 0; i < capacity; i++) { + buckets.add(new ArrayList<>()); + } + size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (List bucket : bucketsTmp) { + for (Pair pair : bucket) { + put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + void print() { + for (List bucket : buckets) { + List res = new ArrayList<>(); + for (Pair pair : bucket) { + res.add(pair.key + " -> " + pair.val); + } + System.out.println(res); + } + } + } + ``` + +=== "C#" + + ```csharp title="hash_map_chaining.cs" + /* 链式地址哈希表 */ + class HashMapChaining { + int size; // 键值对数量 + int capacity; // 哈希表容量 + double loadThres; // 触发扩容的负载因子阈值 + int extendRatio; // 扩容倍数 + List> buckets; // 桶数组 + + /* 构造方法 */ + public HashMapChaining() { + size = 0; + capacity = 4; + loadThres = 2.0 / 3.0; + extendRatio = 2; + buckets = new List>(capacity); + for (int i = 0; i < capacity; i++) { + buckets.Add([]); + } + } + + /* 哈希函数 */ + int HashFunc(int key) { + return key % capacity; + } + + /* 负载因子 */ + double LoadFactor() { + return (double)size / capacity; + } + + /* 查询操作 */ + public string? Get(int key) { + int index = HashFunc(key); + // 遍历桶,若找到 key ,则返回对应 val + foreach (Pair pair in buckets[index]) { + if (pair.key == key) { + return pair.val; + } + } + // 若未找到 key ,则返回 null + return null; + } + + /* 添加操作 */ + public void Put(int key, string val) { + // 当负载因子超过阈值时,执行扩容 + if (LoadFactor() > loadThres) { + Extend(); + } + int index = HashFunc(key); + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + foreach (Pair pair in buckets[index]) { + if (pair.key == key) { + pair.val = val; + return; + } + } + // 若无该 key ,则将键值对添加至尾部 + buckets[index].Add(new Pair(key, val)); + size++; + } + + /* 删除操作 */ + public void Remove(int key) { + int index = HashFunc(key); + // 遍历桶,从中删除键值对 + foreach (Pair pair in buckets[index].ToList()) { + if (pair.key == key) { + buckets[index].Remove(pair); + size--; + break; + } + } + } + + /* 扩容哈希表 */ + void Extend() { + // 暂存原哈希表 + List> bucketsTmp = buckets; + // 初始化扩容后的新哈希表 + capacity *= extendRatio; + buckets = new List>(capacity); + for (int i = 0; i < capacity; i++) { + buckets.Add([]); + } + size = 0; + // 将键值对从原哈希表搬运至新哈希表 + foreach (List bucket in bucketsTmp) { + foreach (Pair pair in bucket) { + Put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + public void Print() { + foreach (List bucket in buckets) { + List res = []; + foreach (Pair pair in bucket) { + res.Add(pair.key + " -> " + pair.val); + } + foreach (string kv in res) { + Console.WriteLine(kv); + } + } + } + } + ``` + +=== "Go" + + ```go title="hash_map_chaining.go" + /* 链式地址哈希表 */ + type hashMapChaining struct { + size int // 键值对数量 + capacity int // 哈希表容量 + loadThres float64 // 触发扩容的负载因子阈值 + extendRatio int // 扩容倍数 + buckets [][]pair // 桶数组 + } + + /* 构造方法 */ + func newHashMapChaining() *hashMapChaining { + buckets := make([][]pair, 4) + for i := 0; i < 4; i++ { + buckets[i] = make([]pair, 0) + } + return &hashMapChaining{ + size: 0, + capacity: 4, + loadThres: 2.0 / 3.0, + extendRatio: 2, + buckets: buckets, + } + } + + /* 哈希函数 */ + func (m *hashMapChaining) hashFunc(key int) int { + return key % m.capacity + } + + /* 负载因子 */ + func (m *hashMapChaining) loadFactor() float64 { + return float64(m.size) / float64(m.capacity) + } + + /* 查询操作 */ + func (m *hashMapChaining) get(key int) string { + idx := m.hashFunc(key) + bucket := m.buckets[idx] + // 遍历桶,若找到 key ,则返回对应 val + for _, p := range bucket { + if p.key == key { + return p.val + } + } + // 若未找到 key ,则返回空字符串 + return "" + } + + /* 添加操作 */ + func (m *hashMapChaining) put(key int, val string) { + // 当负载因子超过阈值时,执行扩容 + if m.loadFactor() > m.loadThres { + m.extend() + } + idx := m.hashFunc(key) + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for i := range m.buckets[idx] { + if m.buckets[idx][i].key == key { + m.buckets[idx][i].val = val + return + } + } + // 若无该 key ,则将键值对添加至尾部 + p := pair{ + key: key, + val: val, + } + m.buckets[idx] = append(m.buckets[idx], p) + m.size += 1 + } + + /* 删除操作 */ + func (m *hashMapChaining) remove(key int) { + idx := m.hashFunc(key) + // 遍历桶,从中删除键值对 + for i, p := range m.buckets[idx] { + if p.key == key { + // 切片删除 + m.buckets[idx] = append(m.buckets[idx][:i], m.buckets[idx][i+1:]...) + m.size -= 1 + break + } + } + } + + /* 扩容哈希表 */ + func (m *hashMapChaining) extend() { + // 暂存原哈希表 + tmpBuckets := make([][]pair, len(m.buckets)) + for i := 0; i < len(m.buckets); i++ { + tmpBuckets[i] = make([]pair, len(m.buckets[i])) + copy(tmpBuckets[i], m.buckets[i]) + } + // 初始化扩容后的新哈希表 + m.capacity *= m.extendRatio + m.buckets = make([][]pair, m.capacity) + for i := 0; i < m.capacity; i++ { + m.buckets[i] = make([]pair, 0) + } + m.size = 0 + // 将键值对从原哈希表搬运至新哈希表 + for _, bucket := range tmpBuckets { + for _, p := range bucket { + m.put(p.key, p.val) + } + } + } + + /* 打印哈希表 */ + func (m *hashMapChaining) print() { + var builder strings.Builder + + for _, bucket := range m.buckets { + builder.WriteString("[") + for _, p := range bucket { + builder.WriteString(strconv.Itoa(p.key) + " -> " + p.val + " ") + } + builder.WriteString("]") + fmt.Println(builder.String()) + builder.Reset() + } + } + ``` + +=== "Swift" + + ```swift title="hash_map_chaining.swift" + /* 链式地址哈希表 */ + class HashMapChaining { + var size: Int // 键值对数量 + var capacity: Int // 哈希表容量 + var loadThres: Double // 触发扩容的负载因子阈值 + var extendRatio: Int // 扩容倍数 + var buckets: [[Pair]] // 桶数组 + + /* 构造方法 */ + init() { + size = 0 + capacity = 4 + loadThres = 2.0 / 3.0 + extendRatio = 2 + buckets = Array(repeating: [], count: capacity) + } + + /* 哈希函数 */ + func hashFunc(key: Int) -> Int { + key % capacity + } + + /* 负载因子 */ + func loadFactor() -> Double { + Double(size / capacity) + } + + /* 查询操作 */ + func get(key: Int) -> String? { + let index = hashFunc(key: key) + let bucket = buckets[index] + // 遍历桶,若找到 key ,则返回对应 val + for pair in bucket { + if pair.key == key { + return pair.val + } + } + // 若未找到 key ,则返回 nil + return nil + } + + /* 添加操作 */ + func put(key: Int, val: String) { + // 当负载因子超过阈值时,执行扩容 + if loadFactor() > loadThres { + extend() + } + let index = hashFunc(key: key) + let bucket = buckets[index] + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for pair in bucket { + if pair.key == key { + pair.val = val + return + } + } + // 若无该 key ,则将键值对添加至尾部 + let pair = Pair(key: key, val: val) + buckets[index].append(pair) + size += 1 + } + + /* 删除操作 */ + func remove(key: Int) { + let index = hashFunc(key: key) + let bucket = buckets[index] + // 遍历桶,从中删除键值对 + for (pairIndex, pair) in bucket.enumerated() { + if pair.key == key { + buckets[index].remove(at: pairIndex) + } + } + size -= 1 + } + + /* 扩容哈希表 */ + func extend() { + // 暂存原哈希表 + let bucketsTmp = buckets + // 初始化扩容后的新哈希表 + capacity *= extendRatio + buckets = Array(repeating: [], count: capacity) + size = 0 + // 将键值对从原哈希表搬运至新哈希表 + for bucket in bucketsTmp { + for pair in bucket { + put(key: pair.key, val: pair.val) + } + } + } + + /* 打印哈希表 */ + func print() { + for bucket in buckets { + let res = bucket.map { "\($0.key) -> \($0.val)" } + Swift.print(res) + } + } + } + ``` + +=== "JS" + + ```javascript title="hash_map_chaining.js" + /* 链式地址哈希表 */ + class HashMapChaining { + #size; // 键值对数量 + #capacity; // 哈希表容量 + #loadThres; // 触发扩容的负载因子阈值 + #extendRatio; // 扩容倍数 + #buckets; // 桶数组 + + /* 构造方法 */ + constructor() { + this.#size = 0; + this.#capacity = 4; + this.#loadThres = 2.0 / 3.0; + this.#extendRatio = 2; + this.#buckets = new Array(this.#capacity).fill(null).map((x) => []); + } + + /* 哈希函数 */ + #hashFunc(key) { + return key % this.#capacity; + } + + /* 负载因子 */ + #loadFactor() { + return this.#size / this.#capacity; + } + + /* 查询操作 */ + get(key) { + const index = this.#hashFunc(key); + const bucket = this.#buckets[index]; + // 遍历桶,若找到 key ,则返回对应 val + for (const pair of bucket) { + if (pair.key === key) { + return pair.val; + } + } + // 若未找到 key ,则返回 null + return null; + } + + /* 添加操作 */ + put(key, val) { + // 当负载因子超过阈值时,执行扩容 + if (this.#loadFactor() > this.#loadThres) { + this.#extend(); + } + const index = this.#hashFunc(key); + const bucket = this.#buckets[index]; + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for (const pair of bucket) { + if (pair.key === key) { + pair.val = val; + return; + } + } + // 若无该 key ,则将键值对添加至尾部 + const pair = new Pair(key, val); + bucket.push(pair); + this.#size++; + } + + /* 删除操作 */ + remove(key) { + const index = this.#hashFunc(key); + let bucket = this.#buckets[index]; + // 遍历桶,从中删除键值对 + for (let i = 0; i < bucket.length; i++) { + if (bucket[i].key === key) { + bucket.splice(i, 1); + this.#size--; + break; + } + } + } + + /* 扩容哈希表 */ + #extend() { + // 暂存原哈希表 + const bucketsTmp = this.#buckets; + // 初始化扩容后的新哈希表 + this.#capacity *= this.#extendRatio; + this.#buckets = new Array(this.#capacity).fill(null).map((x) => []); + this.#size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (const bucket of bucketsTmp) { + for (const pair of bucket) { + this.put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + print() { + for (const bucket of this.#buckets) { + let res = []; + for (const pair of bucket) { + res.push(pair.key + ' -> ' + pair.val); + } + console.log(res); + } + } + } + ``` + +=== "TS" + + ```typescript title="hash_map_chaining.ts" + /* 链式地址哈希表 */ + class HashMapChaining { + #size: number; // 键值对数量 + #capacity: number; // 哈希表容量 + #loadThres: number; // 触发扩容的负载因子阈值 + #extendRatio: number; // 扩容倍数 + #buckets: Pair[][]; // 桶数组 + + /* 构造方法 */ + constructor() { + this.#size = 0; + this.#capacity = 4; + this.#loadThres = 2.0 / 3.0; + this.#extendRatio = 2; + this.#buckets = new Array(this.#capacity).fill(null).map((x) => []); + } + + /* 哈希函数 */ + #hashFunc(key: number): number { + return key % this.#capacity; + } + + /* 负载因子 */ + #loadFactor(): number { + return this.#size / this.#capacity; + } + + /* 查询操作 */ + get(key: number): string | null { + const index = this.#hashFunc(key); + const bucket = this.#buckets[index]; + // 遍历桶,若找到 key ,则返回对应 val + for (const pair of bucket) { + if (pair.key === key) { + return pair.val; + } + } + // 若未找到 key ,则返回 null + return null; + } + + /* 添加操作 */ + put(key: number, val: string): void { + // 当负载因子超过阈值时,执行扩容 + if (this.#loadFactor() > this.#loadThres) { + this.#extend(); + } + const index = this.#hashFunc(key); + const bucket = this.#buckets[index]; + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for (const pair of bucket) { + if (pair.key === key) { + pair.val = val; + return; + } + } + // 若无该 key ,则将键值对添加至尾部 + const pair = new Pair(key, val); + bucket.push(pair); + this.#size++; + } + + /* 删除操作 */ + remove(key: number): void { + const index = this.#hashFunc(key); + let bucket = this.#buckets[index]; + // 遍历桶,从中删除键值对 + for (let i = 0; i < bucket.length; i++) { + if (bucket[i].key === key) { + bucket.splice(i, 1); + this.#size--; + break; + } + } + } + + /* 扩容哈希表 */ + #extend(): void { + // 暂存原哈希表 + const bucketsTmp = this.#buckets; + // 初始化扩容后的新哈希表 + this.#capacity *= this.#extendRatio; + this.#buckets = new Array(this.#capacity).fill(null).map((x) => []); + this.#size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (const bucket of bucketsTmp) { + for (const pair of bucket) { + this.put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + print(): void { + for (const bucket of this.#buckets) { + let res = []; + for (const pair of bucket) { + res.push(pair.key + ' -> ' + pair.val); + } + console.log(res); + } + } + } + ``` + +=== "Dart" + + ```dart title="hash_map_chaining.dart" + /* 链式地址哈希表 */ + class HashMapChaining { + late int size; // 键值对数量 + late int capacity; // 哈希表容量 + late double loadThres; // 触发扩容的负载因子阈值 + late int extendRatio; // 扩容倍数 + late List> buckets; // 桶数组 + + /* 构造方法 */ + HashMapChaining() { + size = 0; + capacity = 4; + loadThres = 2.0 / 3.0; + extendRatio = 2; + buckets = List.generate(capacity, (_) => []); + } + + /* 哈希函数 */ + int hashFunc(int key) { + return key % capacity; + } + + /* 负载因子 */ + double loadFactor() { + return size / capacity; + } + + /* 查询操作 */ + String? get(int key) { + int index = hashFunc(key); + List bucket = buckets[index]; + // 遍历桶,若找到 key ,则返回对应 val + for (Pair pair in bucket) { + if (pair.key == key) { + return pair.val; + } + } + // 若未找到 key ,则返回 null + return null; + } + + /* 添加操作 */ + void put(int key, String val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor() > loadThres) { + extend(); + } + int index = hashFunc(key); + List bucket = buckets[index]; + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for (Pair pair in bucket) { + if (pair.key == key) { + pair.val = val; + return; + } + } + // 若无该 key ,则将键值对添加至尾部 + Pair pair = Pair(key, val); + bucket.add(pair); + size++; + } + + /* 删除操作 */ + void remove(int key) { + int index = hashFunc(key); + List bucket = buckets[index]; + // 遍历桶,从中删除键值对 + for (Pair pair in bucket) { + if (pair.key == key) { + bucket.remove(pair); + size--; + break; + } + } + } + + /* 扩容哈希表 */ + void extend() { + // 暂存原哈希表 + List> bucketsTmp = buckets; + // 初始化扩容后的新哈希表 + capacity *= extendRatio; + buckets = List.generate(capacity, (_) => []); + size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (List bucket in bucketsTmp) { + for (Pair pair in bucket) { + put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + void printHashMap() { + for (List bucket in buckets) { + List res = []; + for (Pair pair in bucket) { + res.add("${pair.key} -> ${pair.val}"); + } + print(res); + } + } + } + ``` + +=== "Rust" + + ```rust title="hash_map_chaining.rs" + /* 链式地址哈希表 */ + struct HashMapChaining { + size: i32, + capacity: i32, + load_thres: f32, + extend_ratio: i32, + buckets: Vec>, + } + + impl HashMapChaining { + /* 构造方法 */ + fn new() -> Self { + Self { + size: 0, + capacity: 4, + load_thres: 2.0 / 3.0, + extend_ratio: 2, + buckets: vec![vec![]; 4], + } + } + + /* 哈希函数 */ + fn hash_func(&self, key: i32) -> usize { + key as usize % self.capacity as usize + } + + /* 负载因子 */ + fn load_factor(&self) -> f32 { + self.size as f32 / self.capacity as f32 + } + + /* 删除操作 */ + fn remove(&mut self, key: i32) -> Option { + let index = self.hash_func(key); + let bucket = &mut self.buckets[index]; + + // 遍历桶,从中删除键值对 + for i in 0..bucket.len() { + if bucket[i].key == key { + let pair = bucket.remove(i); + self.size -= 1; + return Some(pair.val); + } + } + + // 若未找到 key ,则返回 None + None + } + + /* 扩容哈希表 */ + fn extend(&mut self) { + // 暂存原哈希表 + let buckets_tmp = std::mem::replace(&mut self.buckets, vec![]); + + // 初始化扩容后的新哈希表 + self.capacity *= self.extend_ratio; + self.buckets = vec![Vec::new(); self.capacity as usize]; + self.size = 0; + + // 将键值对从原哈希表搬运至新哈希表 + for bucket in buckets_tmp { + for pair in bucket { + self.put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + fn print(&self) { + for bucket in &self.buckets { + let mut res = Vec::new(); + for pair in bucket { + res.push(format!("{} -> {}", pair.key, pair.val)); + } + println!("{:?}", res); + } + } + + /* 添加操作 */ + fn put(&mut self, key: i32, val: String) { + // 当负载因子超过阈值时,执行扩容 + if self.load_factor() > self.load_thres { + self.extend(); + } + + let index = self.hash_func(key); + let bucket = &mut self.buckets[index]; + + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + for pair in bucket { + if pair.key == key { + pair.val = val.clone(); + return; + } + } + let bucket = &mut self.buckets[index]; + + // 若无该 key ,则将键值对添加至尾部 + let pair = Pair { + key, + val: val.clone(), + }; + bucket.push(pair); + self.size += 1; + } + + /* 查询操作 */ + fn get(&self, key: i32) -> Option<&str> { + let index = self.hash_func(key); + let bucket = &self.buckets[index]; + + // 遍历桶,若找到 key ,则返回对应 val + for pair in bucket { + if pair.key == key { + return Some(&pair.val); + } + } + + // 若未找到 key ,则返回 None + None + } + } + ``` + +=== "C" + + ```c title="hash_map_chaining.c" + /* 链表节点 */ + typedef struct Node { + Pair *pair; + struct Node *next; + } Node; + + /* 链式地址哈希表 */ + typedef struct { + int size; // 键值对数量 + int capacity; // 哈希表容量 + double loadThres; // 触发扩容的负载因子阈值 + int extendRatio; // 扩容倍数 + Node **buckets; // 桶数组 + } HashMapChaining; + + /* 构造函数 */ + HashMapChaining *newHashMapChaining() { + HashMapChaining *hashMap = (HashMapChaining *)malloc(sizeof(HashMapChaining)); + hashMap->size = 0; + hashMap->capacity = 4; + hashMap->loadThres = 2.0 / 3.0; + hashMap->extendRatio = 2; + hashMap->buckets = (Node **)malloc(hashMap->capacity * sizeof(Node *)); + for (int i = 0; i < hashMap->capacity; i++) { + hashMap->buckets[i] = NULL; + } + return hashMap; + } + + /* 析构函数 */ + void delHashMapChaining(HashMapChaining *hashMap) { + for (int i = 0; i < hashMap->capacity; i++) { + Node *cur = hashMap->buckets[i]; + while (cur) { + Node *tmp = cur; + cur = cur->next; + free(tmp->pair); + free(tmp); + } + } + free(hashMap->buckets); + free(hashMap); + } + + /* 哈希函数 */ + int hashFunc(HashMapChaining *hashMap, int key) { + return key % hashMap->capacity; + } + + /* 负载因子 */ + double loadFactor(HashMapChaining *hashMap) { + return (double)hashMap->size / (double)hashMap->capacity; + } + + /* 查询操作 */ + char *get(HashMapChaining *hashMap, int key) { + int index = hashFunc(hashMap, key); + // 遍历桶,若找到 key ,则返回对应 val + Node *cur = hashMap->buckets[index]; + while (cur) { + if (cur->pair->key == key) { + return cur->pair->val; + } + cur = cur->next; + } + return ""; // 若未找到 key ,则返回空字符串 + } + + /* 添加操作 */ + void put(HashMapChaining *hashMap, int key, const char *val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor(hashMap) > hashMap->loadThres) { + extend(hashMap); + } + int index = hashFunc(hashMap, key); + // 遍历桶,若遇到指定 key ,则更新对应 val 并返回 + Node *cur = hashMap->buckets[index]; + while (cur) { + if (cur->pair->key == key) { + strcpy(cur->pair->val, val); // 若遇到指定 key ,则更新对应 val 并返回 + return; + } + cur = cur->next; + } + // 若无该 key ,则将键值对添加至链表头部 + Pair *newPair = (Pair *)malloc(sizeof(Pair)); + newPair->key = key; + strcpy(newPair->val, val); + Node *newNode = (Node *)malloc(sizeof(Node)); + newNode->pair = newPair; + newNode->next = hashMap->buckets[index]; + hashMap->buckets[index] = newNode; + hashMap->size++; + } + + /* 扩容哈希表 */ + void extend(HashMapChaining *hashMap) { + // 暂存原哈希表 + int oldCapacity = hashMap->capacity; + Node **oldBuckets = hashMap->buckets; + // 初始化扩容后的新哈希表 + hashMap->capacity *= hashMap->extendRatio; + hashMap->buckets = (Node **)malloc(hashMap->capacity * sizeof(Node *)); + for (int i = 0; i < hashMap->capacity; i++) { + hashMap->buckets[i] = NULL; + } + hashMap->size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (int i = 0; i < oldCapacity; i++) { + Node *cur = oldBuckets[i]; + while (cur) { + put(hashMap, cur->pair->key, cur->pair->val); + Node *temp = cur; + cur = cur->next; + // 释放内存 + free(temp->pair); + free(temp); + } + } + + free(oldBuckets); + } + + /* 删除操作 */ + void removeItem(HashMapChaining *hashMap, int key) { + int index = hashFunc(hashMap, key); + Node *cur = hashMap->buckets[index]; + Node *pre = NULL; + while (cur) { + if (cur->pair->key == key) { + // 从中删除键值对 + if (pre) { + pre->next = cur->next; + } else { + hashMap->buckets[index] = cur->next; + } + // 释放内存 + free(cur->pair); + free(cur); + hashMap->size--; + return; + } + pre = cur; + cur = cur->next; + } + } + + /* 打印哈希表 */ + void print(HashMapChaining *hashMap) { + for (int i = 0; i < hashMap->capacity; i++) { + Node *cur = hashMap->buckets[i]; + printf("["); + while (cur) { + printf("%d -> %s, ", cur->pair->key, cur->pair->val); + cur = cur->next; + } + printf("]\n"); + } + } + ``` + +=== "Zig" + + ```zig title="hash_map_chaining.zig" + [class]{HashMapChaining}-[func]{} + ``` + +??? pythontutor "Code Visualization" + +
+ + +It's worth noting that when the list is very long, the query efficiency $O(n)$ is poor. **At this point, the list can be converted to an "AVL tree" or "Red-Black tree"** to optimize the time complexity of the query operation to $O(\log n)$. + +## 6.2.2   Open Addressing + +"Open addressing" does not introduce additional data structures but uses "multiple probes" to handle hash collisions. The probing methods mainly include linear probing, quadratic probing, and double hashing. + +Let's use linear probing as an example to introduce the mechanism of open addressing hash tables. + +### 1.   Linear Probing + +Linear probing uses a fixed-step linear search for probing, differing from ordinary hash tables. + +- **Inserting Elements**: Calculate the bucket index using the hash function. If the bucket already contains an element, linearly traverse forward from the conflict position (usually with a step size of $1$) until an empty bucket is found, then insert the element. +- **Searching for Elements**: If a hash collision is found, use the same step size to linearly traverse forward until the corresponding element is found and return `value`; if an empty bucket is encountered, it means the target element is not in the hash table, so return `None`. + +The Figure 6-6 shows the distribution of key-value pairs in an open addressing (linear probing) hash table. According to this hash function, keys with the same last two digits will be mapped to the same bucket. Through linear probing, they are stored consecutively in that bucket and the buckets below it. + +![Distribution of Key-Value Pairs in Open Addressing (Linear Probing) Hash Table](hash_collision.assets/hash_table_linear_probing.png){ class="animation-figure" } + +

Figure 6-6   Distribution of Key-Value Pairs in Open Addressing (Linear Probing) Hash Table

+ +However, **linear probing tends to create "clustering"**. Specifically, the longer a continuous position in the array is occupied, the more likely these positions are to encounter hash collisions, further promoting the growth of these clusters and eventually leading to deterioration in the efficiency of operations. + +It's important to note that **we cannot directly delete elements in an open addressing hash table**. Deleting an element creates an empty bucket `None` in the array. When searching for elements, if linear probing encounters this empty bucket, it will return, making the elements below this bucket inaccessible. The program may incorrectly assume these elements do not exist, as shown in the Figure 6-7 . + +![Query Issues Caused by Deletion in Open Addressing](hash_collision.assets/hash_table_open_addressing_deletion.png){ class="animation-figure" } + +

Figure 6-7   Query Issues Caused by Deletion in Open Addressing

+ +To solve this problem, we can use a "lazy deletion" mechanism: instead of directly removing elements from the hash table, **use a constant `TOMBSTONE` to mark the bucket**. In this mechanism, both `None` and `TOMBSTONE` represent empty buckets and can hold key-value pairs. However, when linear probing encounters `TOMBSTONE`, it should continue traversing since there may still be key-value pairs below it. + +However, **lazy deletion may accelerate the degradation of hash table performance**. Every deletion operation produces a delete mark, and as `TOMBSTONE` increases, so does the search time, as linear probing may have to skip multiple `TOMBSTONE` to find the target element. + +Therefore, consider recording the index of the first `TOMBSTONE` encountered during linear probing and swapping the target element found with this `TOMBSTONE`. The advantage of this is that each time a query or addition is performed, the element is moved to a bucket closer to the ideal position (starting point of probing), thereby optimizing the query efficiency. + +The code below implements an open addressing (linear probing) hash table with lazy deletion. To make fuller use of the hash table space, we treat the hash table as a "circular array," continuing to traverse from the beginning when the end of the array is passed. + +=== "Python" + + ```python title="hash_map_open_addressing.py" + class HashMapOpenAddressing: + """开放寻址哈希表""" + + def __init__(self): + """构造方法""" + self.size = 0 # 键值对数量 + self.capacity = 4 # 哈希表容量 + self.load_thres = 2.0 / 3.0 # 触发扩容的负载因子阈值 + self.extend_ratio = 2 # 扩容倍数 + self.buckets: list[Pair | None] = [None] * self.capacity # 桶数组 + self.TOMBSTONE = Pair(-1, "-1") # 删除标记 + + def hash_func(self, key: int) -> int: + """哈希函数""" + return key % self.capacity + + def load_factor(self) -> float: + """负载因子""" + return self.size / self.capacity + + def find_bucket(self, key: int) -> int: + """搜索 key 对应的桶索引""" + index = self.hash_func(key) + first_tombstone = -1 + # 线性探测,当遇到空桶时跳出 + while self.buckets[index] is not None: + # 若遇到 key ,返回对应的桶索引 + if self.buckets[index].key == key: + # 若之前遇到了删除标记,则将键值对移动至该索引处 + if first_tombstone != -1: + self.buckets[first_tombstone] = self.buckets[index] + self.buckets[index] = self.TOMBSTONE + return first_tombstone # 返回移动后的桶索引 + return index # 返回桶索引 + # 记录遇到的首个删除标记 + if first_tombstone == -1 and self.buckets[index] is self.TOMBSTONE: + first_tombstone = index + # 计算桶索引,越过尾部则返回头部 + index = (index + 1) % self.capacity + # 若 key 不存在,则返回添加点的索引 + return index if first_tombstone == -1 else first_tombstone + + def get(self, key: int) -> str: + """查询操作""" + # 搜索 key 对应的桶索引 + index = self.find_bucket(key) + # 若找到键值对,则返回对应 val + if self.buckets[index] not in [None, self.TOMBSTONE]: + return self.buckets[index].val + # 若键值对不存在,则返回 None + return None + + def put(self, key: int, val: str): + """添加操作""" + # 当负载因子超过阈值时,执行扩容 + if self.load_factor() > self.load_thres: + self.extend() + # 搜索 key 对应的桶索引 + index = self.find_bucket(key) + # 若找到键值对,则覆盖 val 并返回 + if self.buckets[index] not in [None, self.TOMBSTONE]: + self.buckets[index].val = val + return + # 若键值对不存在,则添加该键值对 + self.buckets[index] = Pair(key, val) + self.size += 1 + + def remove(self, key: int): + """删除操作""" + # 搜索 key 对应的桶索引 + index = self.find_bucket(key) + # 若找到键值对,则用删除标记覆盖它 + if self.buckets[index] not in [None, self.TOMBSTONE]: + self.buckets[index] = self.TOMBSTONE + self.size -= 1 + + def extend(self): + """扩容哈希表""" + # 暂存原哈希表 + buckets_tmp = self.buckets + # 初始化扩容后的新哈希表 + self.capacity *= self.extend_ratio + self.buckets = [None] * self.capacity + self.size = 0 + # 将键值对从原哈希表搬运至新哈希表 + for pair in buckets_tmp: + if pair not in [None, self.TOMBSTONE]: + self.put(pair.key, pair.val) + + def print(self): + """打印哈希表""" + for pair in self.buckets: + if pair is None: + print("None") + elif pair is self.TOMBSTONE: + print("TOMBSTONE") + else: + print(pair.key, "->", pair.val) + ``` + +=== "C++" + + ```cpp title="hash_map_open_addressing.cpp" + /* 开放寻址哈希表 */ + class HashMapOpenAddressing { + private: + int size; // 键值对数量 + int capacity = 4; // 哈希表容量 + const double loadThres = 2.0 / 3.0; // 触发扩容的负载因子阈值 + const int extendRatio = 2; // 扩容倍数 + vector buckets; // 桶数组 + Pair *TOMBSTONE = new Pair(-1, "-1"); // 删除标记 + + public: + /* 构造方法 */ + HashMapOpenAddressing() : size(0), buckets(capacity, nullptr) { + } + + /* 析构方法 */ + ~HashMapOpenAddressing() { + for (Pair *pair : buckets) { + if (pair != nullptr && pair != TOMBSTONE) { + delete pair; + } + } + delete TOMBSTONE; + } + + /* 哈希函数 */ + int hashFunc(int key) { + return key % capacity; + } + + /* 负载因子 */ + double loadFactor() { + return (double)size / capacity; + } + + /* 搜索 key 对应的桶索引 */ + int findBucket(int key) { + int index = hashFunc(key); + int firstTombstone = -1; + // 线性探测,当遇到空桶时跳出 + while (buckets[index] != nullptr) { + // 若遇到 key ,返回对应的桶索引 + if (buckets[index]->key == key) { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if (firstTombstone != -1) { + buckets[firstTombstone] = buckets[index]; + buckets[index] = TOMBSTONE; + return firstTombstone; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if (firstTombstone == -1 && buckets[index] == TOMBSTONE) { + firstTombstone = index; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % capacity; + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone == -1 ? index : firstTombstone; + } + + /* 查询操作 */ + string get(int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则返回对应 val + if (buckets[index] != nullptr && buckets[index] != TOMBSTONE) { + return buckets[index]->val; + } + // 若键值对不存在,则返回空字符串 + return ""; + } + + /* 添加操作 */ + void put(int key, string val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor() > loadThres) { + extend(); + } + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则覆盖 val 并返回 + if (buckets[index] != nullptr && buckets[index] != TOMBSTONE) { + buckets[index]->val = val; + return; + } + // 若键值对不存在,则添加该键值对 + buckets[index] = new Pair(key, val); + size++; + } + + /* 删除操作 */ + void remove(int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则用删除标记覆盖它 + if (buckets[index] != nullptr && buckets[index] != TOMBSTONE) { + delete buckets[index]; + buckets[index] = TOMBSTONE; + size--; + } + } + + /* 扩容哈希表 */ + void extend() { + // 暂存原哈希表 + vector bucketsTmp = buckets; + // 初始化扩容后的新哈希表 + capacity *= extendRatio; + buckets = vector(capacity, nullptr); + size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (Pair *pair : bucketsTmp) { + if (pair != nullptr && pair != TOMBSTONE) { + put(pair->key, pair->val); + delete pair; + } + } + } + + /* 打印哈希表 */ + void print() { + for (Pair *pair : buckets) { + if (pair == nullptr) { + cout << "nullptr" << endl; + } else if (pair == TOMBSTONE) { + cout << "TOMBSTONE" << endl; + } else { + cout << pair->key << " -> " << pair->val << endl; + } + } + } + }; + ``` + +=== "Java" + + ```java title="hash_map_open_addressing.java" + /* 开放寻址哈希表 */ + class HashMapOpenAddressing { + private int size; // 键值对数量 + private int capacity = 4; // 哈希表容量 + private final double loadThres = 2.0 / 3.0; // 触发扩容的负载因子阈值 + private final int extendRatio = 2; // 扩容倍数 + private Pair[] buckets; // 桶数组 + private final Pair TOMBSTONE = new Pair(-1, "-1"); // 删除标记 + + /* 构造方法 */ + public HashMapOpenAddressing() { + size = 0; + buckets = new Pair[capacity]; + } + + /* 哈希函数 */ + private int hashFunc(int key) { + return key % capacity; + } + + /* 负载因子 */ + private double loadFactor() { + return (double) size / capacity; + } + + /* 搜索 key 对应的桶索引 */ + private int findBucket(int key) { + int index = hashFunc(key); + int firstTombstone = -1; + // 线性探测,当遇到空桶时跳出 + while (buckets[index] != null) { + // 若遇到 key ,返回对应的桶索引 + if (buckets[index].key == key) { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if (firstTombstone != -1) { + buckets[firstTombstone] = buckets[index]; + buckets[index] = TOMBSTONE; + return firstTombstone; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if (firstTombstone == -1 && buckets[index] == TOMBSTONE) { + firstTombstone = index; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % capacity; + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone == -1 ? index : firstTombstone; + } + + /* 查询操作 */ + public String get(int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则返回对应 val + if (buckets[index] != null && buckets[index] != TOMBSTONE) { + return buckets[index].val; + } + // 若键值对不存在,则返回 null + return null; + } + + /* 添加操作 */ + public void put(int key, String val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor() > loadThres) { + extend(); + } + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则覆盖 val 并返回 + if (buckets[index] != null && buckets[index] != TOMBSTONE) { + buckets[index].val = val; + return; + } + // 若键值对不存在,则添加该键值对 + buckets[index] = new Pair(key, val); + size++; + } + + /* 删除操作 */ + public void remove(int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则用删除标记覆盖它 + if (buckets[index] != null && buckets[index] != TOMBSTONE) { + buckets[index] = TOMBSTONE; + size--; + } + } + + /* 扩容哈希表 */ + private void extend() { + // 暂存原哈希表 + Pair[] bucketsTmp = buckets; + // 初始化扩容后的新哈希表 + capacity *= extendRatio; + buckets = new Pair[capacity]; + size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (Pair pair : bucketsTmp) { + if (pair != null && pair != TOMBSTONE) { + put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + public void print() { + for (Pair pair : buckets) { + if (pair == null) { + System.out.println("null"); + } else if (pair == TOMBSTONE) { + System.out.println("TOMBSTONE"); + } else { + System.out.println(pair.key + " -> " + pair.val); + } + } + } + } + ``` + +=== "C#" + + ```csharp title="hash_map_open_addressing.cs" + /* 开放寻址哈希表 */ + class HashMapOpenAddressing { + int size; // 键值对数量 + int capacity = 4; // 哈希表容量 + double loadThres = 2.0 / 3.0; // 触发扩容的负载因子阈值 + int extendRatio = 2; // 扩容倍数 + Pair[] buckets; // 桶数组 + Pair TOMBSTONE = new(-1, "-1"); // 删除标记 + + /* 构造方法 */ + public HashMapOpenAddressing() { + size = 0; + buckets = new Pair[capacity]; + } + + /* 哈希函数 */ + int HashFunc(int key) { + return key % capacity; + } + + /* 负载因子 */ + double LoadFactor() { + return (double)size / capacity; + } + + /* 搜索 key 对应的桶索引 */ + int FindBucket(int key) { + int index = HashFunc(key); + int firstTombstone = -1; + // 线性探测,当遇到空桶时跳出 + while (buckets[index] != null) { + // 若遇到 key ,返回对应的桶索引 + if (buckets[index].key == key) { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if (firstTombstone != -1) { + buckets[firstTombstone] = buckets[index]; + buckets[index] = TOMBSTONE; + return firstTombstone; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if (firstTombstone == -1 && buckets[index] == TOMBSTONE) { + firstTombstone = index; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % capacity; + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone == -1 ? index : firstTombstone; + } + + /* 查询操作 */ + public string? Get(int key) { + // 搜索 key 对应的桶索引 + int index = FindBucket(key); + // 若找到键值对,则返回对应 val + if (buckets[index] != null && buckets[index] != TOMBSTONE) { + return buckets[index].val; + } + // 若键值对不存在,则返回 null + return null; + } + + /* 添加操作 */ + public void Put(int key, string val) { + // 当负载因子超过阈值时,执行扩容 + if (LoadFactor() > loadThres) { + Extend(); + } + // 搜索 key 对应的桶索引 + int index = FindBucket(key); + // 若找到键值对,则覆盖 val 并返回 + if (buckets[index] != null && buckets[index] != TOMBSTONE) { + buckets[index].val = val; + return; + } + // 若键值对不存在,则添加该键值对 + buckets[index] = new Pair(key, val); + size++; + } + + /* 删除操作 */ + public void Remove(int key) { + // 搜索 key 对应的桶索引 + int index = FindBucket(key); + // 若找到键值对,则用删除标记覆盖它 + if (buckets[index] != null && buckets[index] != TOMBSTONE) { + buckets[index] = TOMBSTONE; + size--; + } + } + + /* 扩容哈希表 */ + void Extend() { + // 暂存原哈希表 + Pair[] bucketsTmp = buckets; + // 初始化扩容后的新哈希表 + capacity *= extendRatio; + buckets = new Pair[capacity]; + size = 0; + // 将键值对从原哈希表搬运至新哈希表 + foreach (Pair pair in bucketsTmp) { + if (pair != null && pair != TOMBSTONE) { + Put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + public void Print() { + foreach (Pair pair in buckets) { + if (pair == null) { + Console.WriteLine("null"); + } else if (pair == TOMBSTONE) { + Console.WriteLine("TOMBSTONE"); + } else { + Console.WriteLine(pair.key + " -> " + pair.val); + } + } + } + } + ``` + +=== "Go" + + ```go title="hash_map_open_addressing.go" + /* 开放寻址哈希表 */ + type hashMapOpenAddressing struct { + size int // 键值对数量 + capacity int // 哈希表容量 + loadThres float64 // 触发扩容的负载因子阈值 + extendRatio int // 扩容倍数 + buckets []pair // 桶数组 + removed pair // 删除标记 + } + + /* 构造方法 */ + func newHashMapOpenAddressing() *hashMapOpenAddressing { + buckets := make([]pair, 4) + return &hashMapOpenAddressing{ + size: 0, + capacity: 4, + loadThres: 2.0 / 3.0, + extendRatio: 2, + buckets: buckets, + removed: pair{ + key: -1, + val: "-1", + }, + } + } + + /* 哈希函数 */ + func (m *hashMapOpenAddressing) hashFunc(key int) int { + return key % m.capacity + } + + /* 负载因子 */ + func (m *hashMapOpenAddressing) loadFactor() float64 { + return float64(m.size) / float64(m.capacity) + } + + /* 查询操作 */ + func (m *hashMapOpenAddressing) get(key int) string { + idx := m.hashFunc(key) + // 线性探测,从 index 开始向后遍历 + for i := 0; i < m.capacity; i++ { + // 计算桶索引,越过尾部则返回头部 + j := (idx + i) % m.capacity + // 若遇到空桶,说明无此 key ,则返回 null + if m.buckets[j] == (pair{}) { + return "" + } + // 若遇到指定 key ,则返回对应 val + if m.buckets[j].key == key && m.buckets[j] != m.removed { + return m.buckets[j].val + } + } + // 若未找到 key ,则返回空字符串 + return "" + } + + /* 添加操作 */ + func (m *hashMapOpenAddressing) put(key int, val string) { + // 当负载因子超过阈值时,执行扩容 + if m.loadFactor() > m.loadThres { + m.extend() + } + idx := m.hashFunc(key) + // 线性探测,从 index 开始向后遍历 + for i := 0; i < m.capacity; i++ { + // 计算桶索引,越过尾部则返回头部 + j := (idx + i) % m.capacity + // 若遇到空桶、或带有删除标记的桶,则将键值对放入该桶 + if m.buckets[j] == (pair{}) || m.buckets[j] == m.removed { + m.buckets[j] = pair{ + key: key, + val: val, + } + m.size += 1 + return + } + // 若遇到指定 key ,则更新对应 val + if m.buckets[j].key == key { + m.buckets[j].val = val + return + } + } + } + + /* 删除操作 */ + func (m *hashMapOpenAddressing) remove(key int) { + idx := m.hashFunc(key) + // 遍历桶,从中删除键值对 + // 线性探测,从 index 开始向后遍历 + for i := 0; i < m.capacity; i++ { + // 计算桶索引,越过尾部则返回头部 + j := (idx + i) % m.capacity + // 若遇到空桶,说明无此 key ,则直接返回 + if m.buckets[j] == (pair{}) { + return + } + // 若遇到指定 key ,则标记删除并返回 + if m.buckets[j].key == key { + m.buckets[j] = m.removed + m.size -= 1 + } + } + } + + /* 扩容哈希表 */ + func (m *hashMapOpenAddressing) extend() { + // 暂存原哈希表 + tmpBuckets := make([]pair, len(m.buckets)) + copy(tmpBuckets, m.buckets) + + // 初始化扩容后的新哈希表 + m.capacity *= m.extendRatio + m.buckets = make([]pair, m.capacity) + m.size = 0 + // 将键值对从原哈希表搬运至新哈希表 + for _, p := range tmpBuckets { + if p != (pair{}) && p != m.removed { + m.put(p.key, p.val) + } + } + } + + /* 打印哈希表 */ + func (m *hashMapOpenAddressing) print() { + for _, p := range m.buckets { + if p != (pair{}) { + fmt.Println(strconv.Itoa(p.key) + " -> " + p.val) + } else { + fmt.Println("nil") + } + } + } + ``` + +=== "Swift" + + ```swift title="hash_map_open_addressing.swift" + /* 开放寻址哈希表 */ + class HashMapOpenAddressing { + var size: Int // 键值对数量 + var capacity: Int // 哈希表容量 + var loadThres: Double // 触发扩容的负载因子阈值 + var extendRatio: Int // 扩容倍数 + var buckets: [Pair?] // 桶数组 + var TOMBSTONE: Pair // 删除标记 + + /* 构造方法 */ + init() { + size = 0 + capacity = 4 + loadThres = 2.0 / 3.0 + extendRatio = 2 + buckets = Array(repeating: nil, count: capacity) + TOMBSTONE = Pair(key: -1, val: "-1") + } + + /* 哈希函数 */ + func hashFunc(key: Int) -> Int { + key % capacity + } + + /* 负载因子 */ + func loadFactor() -> Double { + Double(size / capacity) + } + + /* 搜索 key 对应的桶索引 */ + func findBucket(key: Int) -> Int { + var index = hashFunc(key: key) + var firstTombstone = -1 + // 线性探测,当遇到空桶时跳出 + while buckets[index] != nil { + // 若遇到 key ,返回对应的桶索引 + if buckets[index]!.key == key { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if firstTombstone != -1 { + buckets[firstTombstone] = buckets[index] + buckets[index] = TOMBSTONE + return firstTombstone // 返回移动后的桶索引 + } + return index // 返回桶索引 + } + // 记录遇到的首个删除标记 + if firstTombstone == -1 && buckets[index] == TOMBSTONE { + firstTombstone = index + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % capacity + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone == -1 ? index : firstTombstone + } + + /* 查询操作 */ + func get(key: Int) -> String? { + // 搜索 key 对应的桶索引 + let index = findBucket(key: key) + // 若找到键值对,则返回对应 val + if buckets[index] != nil, buckets[index] != TOMBSTONE { + return buckets[index]!.val + } + // 若键值对不存在,则返回 null + return nil + } + + /* 添加操作 */ + func put(key: Int, val: String) { + // 当负载因子超过阈值时,执行扩容 + if loadFactor() > loadThres { + extend() + } + // 搜索 key 对应的桶索引 + let index = findBucket(key: key) + // 若找到键值对,则覆盖 val 并返回 + if buckets[index] != nil, buckets[index] != TOMBSTONE { + buckets[index]!.val = val + return + } + // 若键值对不存在,则添加该键值对 + buckets[index] = Pair(key: key, val: val) + size += 1 + } + + /* 删除操作 */ + func remove(key: Int) { + // 搜索 key 对应的桶索引 + let index = findBucket(key: key) + // 若找到键值对,则用删除标记覆盖它 + if buckets[index] != nil, buckets[index] != TOMBSTONE { + buckets[index] = TOMBSTONE + size -= 1 + } + } + + /* 扩容哈希表 */ + func extend() { + // 暂存原哈希表 + let bucketsTmp = buckets + // 初始化扩容后的新哈希表 + capacity *= extendRatio + buckets = Array(repeating: nil, count: capacity) + size = 0 + // 将键值对从原哈希表搬运至新哈希表 + for pair in bucketsTmp { + if let pair, pair != TOMBSTONE { + put(key: pair.key, val: pair.val) + } + } + } + + /* 打印哈希表 */ + func print() { + for pair in buckets { + if pair == nil { + Swift.print("null") + } else if pair == TOMBSTONE { + Swift.print("TOMBSTONE") + } else { + Swift.print("\(pair!.key) -> \(pair!.val)") + } + } + } + } + ``` + +=== "JS" + + ```javascript title="hash_map_open_addressing.js" + /* 开放寻址哈希表 */ + class HashMapOpenAddressing { + #size; // 键值对数量 + #capacity; // 哈希表容量 + #loadThres; // 触发扩容的负载因子阈值 + #extendRatio; // 扩容倍数 + #buckets; // 桶数组 + #TOMBSTONE; // 删除标记 + + /* 构造方法 */ + constructor() { + this.#size = 0; // 键值对数量 + this.#capacity = 4; // 哈希表容量 + this.#loadThres = 2.0 / 3.0; // 触发扩容的负载因子阈值 + this.#extendRatio = 2; // 扩容倍数 + this.#buckets = Array(this.#capacity).fill(null); // 桶数组 + this.#TOMBSTONE = new Pair(-1, '-1'); // 删除标记 + } + + /* 哈希函数 */ + #hashFunc(key) { + return key % this.#capacity; + } + + /* 负载因子 */ + #loadFactor() { + return this.#size / this.#capacity; + } + + /* 搜索 key 对应的桶索引 */ + #findBucket(key) { + let index = this.#hashFunc(key); + let firstTombstone = -1; + // 线性探测,当遇到空桶时跳出 + while (this.#buckets[index] !== null) { + // 若遇到 key ,返回对应的桶索引 + if (this.#buckets[index].key === key) { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if (firstTombstone !== -1) { + this.#buckets[firstTombstone] = this.#buckets[index]; + this.#buckets[index] = this.#TOMBSTONE; + return firstTombstone; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if ( + firstTombstone === -1 && + this.#buckets[index] === this.#TOMBSTONE + ) { + firstTombstone = index; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % this.#capacity; + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone === -1 ? index : firstTombstone; + } + + /* 查询操作 */ + get(key) { + // 搜索 key 对应的桶索引 + const index = this.#findBucket(key); + // 若找到键值对,则返回对应 val + if ( + this.#buckets[index] !== null && + this.#buckets[index] !== this.#TOMBSTONE + ) { + return this.#buckets[index].val; + } + // 若键值对不存在,则返回 null + return null; + } + + /* 添加操作 */ + put(key, val) { + // 当负载因子超过阈值时,执行扩容 + if (this.#loadFactor() > this.#loadThres) { + this.#extend(); + } + // 搜索 key 对应的桶索引 + const index = this.#findBucket(key); + // 若找到键值对,则覆盖 val 并返回 + if ( + this.#buckets[index] !== null && + this.#buckets[index] !== this.#TOMBSTONE + ) { + this.#buckets[index].val = val; + return; + } + // 若键值对不存在,则添加该键值对 + this.#buckets[index] = new Pair(key, val); + this.#size++; + } + + /* 删除操作 */ + remove(key) { + // 搜索 key 对应的桶索引 + const index = this.#findBucket(key); + // 若找到键值对,则用删除标记覆盖它 + if ( + this.#buckets[index] !== null && + this.#buckets[index] !== this.#TOMBSTONE + ) { + this.#buckets[index] = this.#TOMBSTONE; + this.#size--; + } + } + + /* 扩容哈希表 */ + #extend() { + // 暂存原哈希表 + const bucketsTmp = this.#buckets; + // 初始化扩容后的新哈希表 + this.#capacity *= this.#extendRatio; + this.#buckets = Array(this.#capacity).fill(null); + this.#size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (const pair of bucketsTmp) { + if (pair !== null && pair !== this.#TOMBSTONE) { + this.put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + print() { + for (const pair of this.#buckets) { + if (pair === null) { + console.log('null'); + } else if (pair === this.#TOMBSTONE) { + console.log('TOMBSTONE'); + } else { + console.log(pair.key + ' -> ' + pair.val); + } + } + } + } + ``` + +=== "TS" + + ```typescript title="hash_map_open_addressing.ts" + /* 开放寻址哈希表 */ + class HashMapOpenAddressing { + private size: number; // 键值对数量 + private capacity: number; // 哈希表容量 + private loadThres: number; // 触发扩容的负载因子阈值 + private extendRatio: number; // 扩容倍数 + private buckets: Array; // 桶数组 + private TOMBSTONE: Pair; // 删除标记 + + /* 构造方法 */ + constructor() { + this.size = 0; // 键值对数量 + this.capacity = 4; // 哈希表容量 + this.loadThres = 2.0 / 3.0; // 触发扩容的负载因子阈值 + this.extendRatio = 2; // 扩容倍数 + this.buckets = Array(this.capacity).fill(null); // 桶数组 + this.TOMBSTONE = new Pair(-1, '-1'); // 删除标记 + } + + /* 哈希函数 */ + private hashFunc(key: number): number { + return key % this.capacity; + } + + /* 负载因子 */ + private loadFactor(): number { + return this.size / this.capacity; + } + + /* 搜索 key 对应的桶索引 */ + private findBucket(key: number): number { + let index = this.hashFunc(key); + let firstTombstone = -1; + // 线性探测,当遇到空桶时跳出 + while (this.buckets[index] !== null) { + // 若遇到 key ,返回对应的桶索引 + if (this.buckets[index]!.key === key) { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if (firstTombstone !== -1) { + this.buckets[firstTombstone] = this.buckets[index]; + this.buckets[index] = this.TOMBSTONE; + return firstTombstone; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if ( + firstTombstone === -1 && + this.buckets[index] === this.TOMBSTONE + ) { + firstTombstone = index; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % this.capacity; + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone === -1 ? index : firstTombstone; + } + + /* 查询操作 */ + get(key: number): string | null { + // 搜索 key 对应的桶索引 + const index = this.findBucket(key); + // 若找到键值对,则返回对应 val + if ( + this.buckets[index] !== null && + this.buckets[index] !== this.TOMBSTONE + ) { + return this.buckets[index]!.val; + } + // 若键值对不存在,则返回 null + return null; + } + + /* 添加操作 */ + put(key: number, val: string): void { + // 当负载因子超过阈值时,执行扩容 + if (this.loadFactor() > this.loadThres) { + this.extend(); + } + // 搜索 key 对应的桶索引 + const index = this.findBucket(key); + // 若找到键值对,则覆盖 val 并返回 + if ( + this.buckets[index] !== null && + this.buckets[index] !== this.TOMBSTONE + ) { + this.buckets[index]!.val = val; + return; + } + // 若键值对不存在,则添加该键值对 + this.buckets[index] = new Pair(key, val); + this.size++; + } + + /* 删除操作 */ + remove(key: number): void { + // 搜索 key 对应的桶索引 + const index = this.findBucket(key); + // 若找到键值对,则用删除标记覆盖它 + if ( + this.buckets[index] !== null && + this.buckets[index] !== this.TOMBSTONE + ) { + this.buckets[index] = this.TOMBSTONE; + this.size--; + } + } + + /* 扩容哈希表 */ + private extend(): void { + // 暂存原哈希表 + const bucketsTmp = this.buckets; + // 初始化扩容后的新哈希表 + this.capacity *= this.extendRatio; + this.buckets = Array(this.capacity).fill(null); + this.size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (const pair of bucketsTmp) { + if (pair !== null && pair !== this.TOMBSTONE) { + this.put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + print(): void { + for (const pair of this.buckets) { + if (pair === null) { + console.log('null'); + } else if (pair === this.TOMBSTONE) { + console.log('TOMBSTONE'); + } else { + console.log(pair.key + ' -> ' + pair.val); + } + } + } + } + ``` + +=== "Dart" + + ```dart title="hash_map_open_addressing.dart" + /* 开放寻址哈希表 */ + class HashMapOpenAddressing { + late int _size; // 键值对数量 + int _capacity = 4; // 哈希表容量 + double _loadThres = 2.0 / 3.0; // 触发扩容的负载因子阈值 + int _extendRatio = 2; // 扩容倍数 + late List _buckets; // 桶数组 + Pair _TOMBSTONE = Pair(-1, "-1"); // 删除标记 + + /* 构造方法 */ + HashMapOpenAddressing() { + _size = 0; + _buckets = List.generate(_capacity, (index) => null); + } + + /* 哈希函数 */ + int hashFunc(int key) { + return key % _capacity; + } + + /* 负载因子 */ + double loadFactor() { + return _size / _capacity; + } + + /* 搜索 key 对应的桶索引 */ + int findBucket(int key) { + int index = hashFunc(key); + int firstTombstone = -1; + // 线性探测,当遇到空桶时跳出 + while (_buckets[index] != null) { + // 若遇到 key ,返回对应的桶索引 + if (_buckets[index]!.key == key) { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if (firstTombstone != -1) { + _buckets[firstTombstone] = _buckets[index]; + _buckets[index] = _TOMBSTONE; + return firstTombstone; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if (firstTombstone == -1 && _buckets[index] == _TOMBSTONE) { + firstTombstone = index; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % _capacity; + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone == -1 ? index : firstTombstone; + } + + /* 查询操作 */ + String? get(int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则返回对应 val + if (_buckets[index] != null && _buckets[index] != _TOMBSTONE) { + return _buckets[index]!.val; + } + // 若键值对不存在,则返回 null + return null; + } + + /* 添加操作 */ + void put(int key, String val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor() > _loadThres) { + extend(); + } + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则覆盖 val 并返回 + if (_buckets[index] != null && _buckets[index] != _TOMBSTONE) { + _buckets[index]!.val = val; + return; + } + // 若键值对不存在,则添加该键值对 + _buckets[index] = new Pair(key, val); + _size++; + } + + /* 删除操作 */ + void remove(int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(key); + // 若找到键值对,则用删除标记覆盖它 + if (_buckets[index] != null && _buckets[index] != _TOMBSTONE) { + _buckets[index] = _TOMBSTONE; + _size--; + } + } + + /* 扩容哈希表 */ + void extend() { + // 暂存原哈希表 + List bucketsTmp = _buckets; + // 初始化扩容后的新哈希表 + _capacity *= _extendRatio; + _buckets = List.generate(_capacity, (index) => null); + _size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (Pair? pair in bucketsTmp) { + if (pair != null && pair != _TOMBSTONE) { + put(pair.key, pair.val); + } + } + } + + /* 打印哈希表 */ + void printHashMap() { + for (Pair? pair in _buckets) { + if (pair == null) { + print("null"); + } else if (pair == _TOMBSTONE) { + print("TOMBSTONE"); + } else { + print("${pair.key} -> ${pair.val}"); + } + } + } + } + ``` + +=== "Rust" + + ```rust title="hash_map_open_addressing.rs" + /* 开放寻址哈希表 */ + struct HashMapOpenAddressing { + size: usize, // 键值对数量 + capacity: usize, // 哈希表容量 + load_thres: f64, // 触发扩容的负载因子阈值 + extend_ratio: usize, // 扩容倍数 + buckets: Vec>, // 桶数组 + TOMBSTONE: Option, // 删除标记 + } + + + impl HashMapOpenAddressing { + /* 构造方法 */ + fn new() -> Self { + Self { + size: 0, + capacity: 4, + load_thres: 2.0 / 3.0, + extend_ratio: 2, + buckets: vec![None; 4], + TOMBSTONE: Some(Pair {key: -1, val: "-1".to_string()}), + } + } + + /* 哈希函数 */ + fn hash_func(&self, key: i32) -> usize { + (key % self.capacity as i32) as usize + } + + /* 负载因子 */ + fn load_factor(&self) -> f64 { + self.size as f64 / self.capacity as f64 + } + + /* 搜索 key 对应的桶索引 */ + fn find_bucket(&mut self, key: i32) -> usize { + let mut index = self.hash_func(key); + let mut first_tombstone = -1; + // 线性探测,当遇到空桶时跳出 + while self.buckets[index].is_some() { + // 若遇到 key,返回对应的桶索引 + if self.buckets[index].as_ref().unwrap().key == key { + // 若之前遇到了删除标记,则将建值对移动至该索引 + if first_tombstone != -1 { + self.buckets[first_tombstone as usize] = self.buckets[index].take(); + self.buckets[index] = self.TOMBSTONE.clone(); + return first_tombstone as usize; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if first_tombstone == -1 && self.buckets[index] == self.TOMBSTONE { + first_tombstone = index as i32; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % self.capacity; + } + // 若 key 不存在,则返回添加点的索引 + if first_tombstone == -1 { index } else { first_tombstone as usize } + } + + /* 查询操作 */ + fn get(&mut self, key: i32) -> Option<&str> { + // 搜索 key 对应的桶索引 + let index = self.find_bucket(key); + // 若找到键值对,则返回对应 val + if self.buckets[index].is_some() && self.buckets[index] != self.TOMBSTONE { + return self.buckets[index].as_ref().map(|pair| &pair.val as &str); + } + // 若键值对不存在,则返回 null + None + } + + /* 添加操作 */ + fn put(&mut self, key: i32, val: String) { + // 当负载因子超过阈值时,执行扩容 + if self.load_factor() > self.load_thres { + self.extend(); + } + // 搜索 key 对应的桶索引 + let index = self.find_bucket(key); + // 若找到键值对,则覆盖 val 并返回 + if self.buckets[index].is_some() && self.buckets[index] != self.TOMBSTONE { + self.buckets[index].as_mut().unwrap().val = val; + return; + } + // 若键值对不存在,则添加该键值对 + self.buckets[index] = Some(Pair { key, val }); + self.size += 1; + } + + /* 删除操作 */ + fn remove(&mut self, key: i32) { + // 搜索 key 对应的桶索引 + let index = self.find_bucket(key); + // 若找到键值对,则用删除标记覆盖它 + if self.buckets[index].is_some() && self.buckets[index] != self.TOMBSTONE { + self.buckets[index] = self.TOMBSTONE.clone(); + self.size -= 1; + } + } + + /* 扩容哈希表 */ + fn extend(&mut self) { + // 暂存原哈希表 + let buckets_tmp = self.buckets.clone(); + // 初始化扩容后的新哈希表 + self.capacity *= self.extend_ratio; + self.buckets = vec![None; self.capacity]; + self.size = 0; + + // 将键值对从原哈希表搬运至新哈希表 + for pair in buckets_tmp { + if pair.is_none() || pair == self.TOMBSTONE { + continue; + } + let pair = pair.unwrap(); + + self.put(pair.key, pair.val); + } + } + /* 打印哈希表 */ + fn print(&self) { + for pair in &self.buckets { + if pair.is_none() { + println!("null"); + } else if pair == &self.TOMBSTONE { + println!("TOMBSTONE"); + } else { + let pair = pair.as_ref().unwrap(); + println!("{} -> {}", pair.key, pair.val); + } + } + } + } + ``` + +=== "C" + + ```c title="hash_map_open_addressing.c" + /* 开放寻址哈希表 */ + typedef struct { + int size; // 键值对数量 + int capacity; // 哈希表容量 + double loadThres; // 触发扩容的负载因子阈值 + int extendRatio; // 扩容倍数 + Pair **buckets; // 桶数组 + Pair *TOMBSTONE; // 删除标记 + } HashMapOpenAddressing; + + /* 构造函数 */ + HashMapOpenAddressing *newHashMapOpenAddressing() { + HashMapOpenAddressing *hashMap = (HashMapOpenAddressing *)malloc(sizeof(HashMapOpenAddressing)); + hashMap->size = 0; + hashMap->capacity = 4; + hashMap->loadThres = 2.0 / 3.0; + hashMap->extendRatio = 2; + hashMap->buckets = (Pair **)malloc(sizeof(Pair *) * hashMap->capacity); + hashMap->TOMBSTONE = (Pair *)malloc(sizeof(Pair)); + hashMap->TOMBSTONE->key = -1; + hashMap->TOMBSTONE->val = "-1"; + + return hashMap; + } + + /* 析构函数 */ + void delHashMapOpenAddressing(HashMapOpenAddressing *hashMap) { + for (int i = 0; i < hashMap->capacity; i++) { + Pair *pair = hashMap->buckets[i]; + if (pair != NULL && pair != hashMap->TOMBSTONE) { + free(pair->val); + free(pair); + } + } + } + + /* 哈希函数 */ + int hashFunc(HashMapOpenAddressing *hashMap, int key) { + return key % hashMap->capacity; + } + + /* 负载因子 */ + double loadFactor(HashMapOpenAddressing *hashMap) { + return (double)hashMap->size / (double)hashMap->capacity; + } + + /* 搜索 key 对应的桶索引 */ + int findBucket(HashMapOpenAddressing *hashMap, int key) { + int index = hashFunc(hashMap, key); + int firstTombstone = -1; + // 线性探测,当遇到空桶时跳出 + while (hashMap->buckets[index] != NULL) { + // 若遇到 key ,返回对应的桶索引 + if (hashMap->buckets[index]->key == key) { + // 若之前遇到了删除标记,则将键值对移动至该索引处 + if (firstTombstone != -1) { + hashMap->buckets[firstTombstone] = hashMap->buckets[index]; + hashMap->buckets[index] = hashMap->TOMBSTONE; + return firstTombstone; // 返回移动后的桶索引 + } + return index; // 返回桶索引 + } + // 记录遇到的首个删除标记 + if (firstTombstone == -1 && hashMap->buckets[index] == hashMap->TOMBSTONE) { + firstTombstone = index; + } + // 计算桶索引,越过尾部则返回头部 + index = (index + 1) % hashMap->capacity; + } + // 若 key 不存在,则返回添加点的索引 + return firstTombstone == -1 ? index : firstTombstone; + } + + /* 查询操作 */ + char *get(HashMapOpenAddressing *hashMap, int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(hashMap, key); + // 若找到键值对,则返回对应 val + if (hashMap->buckets[index] != NULL && hashMap->buckets[index] != hashMap->TOMBSTONE) { + return hashMap->buckets[index]->val; + } + // 若键值对不存在,则返回空字符串 + return ""; + } + + /* 添加操作 */ + void put(HashMapOpenAddressing *hashMap, int key, char *val) { + // 当负载因子超过阈值时,执行扩容 + if (loadFactor(hashMap) > hashMap->loadThres) { + extend(hashMap); + } + // 搜索 key 对应的桶索引 + int index = findBucket(hashMap, key); + // 若找到键值对,则覆盖 val 并返回 + if (hashMap->buckets[index] != NULL && hashMap->buckets[index] != hashMap->TOMBSTONE) { + free(hashMap->buckets[index]->val); + hashMap->buckets[index]->val = (char *)malloc(sizeof(strlen(val) + 1)); + strcpy(hashMap->buckets[index]->val, val); + hashMap->buckets[index]->val[strlen(val)] = '\0'; + return; + } + // 若键值对不存在,则添加该键值对 + Pair *pair = (Pair *)malloc(sizeof(Pair)); + pair->key = key; + pair->val = (char *)malloc(sizeof(strlen(val) + 1)); + strcpy(pair->val, val); + pair->val[strlen(val)] = '\0'; + + hashMap->buckets[index] = pair; + hashMap->size++; + } + + /* 删除操作 */ + void removeItem(HashMapOpenAddressing *hashMap, int key) { + // 搜索 key 对应的桶索引 + int index = findBucket(hashMap, key); + // 若找到键值对,则用删除标记覆盖它 + if (hashMap->buckets[index] != NULL && hashMap->buckets[index] != hashMap->TOMBSTONE) { + Pair *pair = hashMap->buckets[index]; + free(pair->val); + free(pair); + hashMap->buckets[index] = hashMap->TOMBSTONE; + hashMap->size--; + } + } + + /* 扩容哈希表 */ + void extend(HashMapOpenAddressing *hashMap) { + // 暂存原哈希表 + Pair **bucketsTmp = hashMap->buckets; + int oldCapacity = hashMap->capacity; + // 初始化扩容后的新哈希表 + hashMap->capacity *= hashMap->extendRatio; + hashMap->buckets = (Pair **)malloc(sizeof(Pair *) * hashMap->capacity); + hashMap->size = 0; + // 将键值对从原哈希表搬运至新哈希表 + for (int i = 0; i < oldCapacity; i++) { + Pair *pair = bucketsTmp[i]; + if (pair != NULL && pair != hashMap->TOMBSTONE) { + put(hashMap, pair->key, pair->val); + free(pair->val); + free(pair); + } + } + free(bucketsTmp); + } + + /* 打印哈希表 */ + void print(HashMapOpenAddressing *hashMap) { + for (int i = 0; i < hashMap->capacity; i++) { + Pair *pair = hashMap->buckets[i]; + if (pair == NULL) { + printf("NULL\n"); + } else if (pair == hashMap->TOMBSTONE) { + printf("TOMBSTONE\n"); + } else { + printf("%d -> %s\n", pair->key, pair->val); + } + } + } + ``` + +=== "Zig" + + ```zig title="hash_map_open_addressing.zig" + [class]{HashMapOpenAddressing}-[func]{} + ``` + +### 2.   Quadratic Probing + +Quadratic probing is similar to linear probing and is one of the common strategies of open addressing. When a collision occurs, quadratic probing does not simply skip a fixed number of steps but skips "the square of the number of probes," i.e., $1, 4, 9, \dots$ steps. + +Quadratic probing has the following advantages: + +- Quadratic probing attempts to alleviate the clustering effect of linear probing by skipping the distance of the square of the number of probes. +- Quadratic probing skips larger distances to find empty positions, helping to distribute data more evenly. + +However, quadratic probing is not perfect: + +- Clustering still exists, i.e., some positions are more likely to be occupied than others. +- Due to the growth of squares, quadratic probing may not probe the entire hash table, meaning it might not access empty buckets even if they exist in the hash table. + +### 3.   Double Hashing + +As the name suggests, the double hashing method uses multiple hash functions $f_1(x)$, $f_2(x)$, $f_3(x)$, $\dots$ for probing. + +- **Inserting Elements**: If hash function $f_1(x)$ encounters a conflict, try $f_2(x)$, and so on, until an empty position is found and the element is inserted. +- **Searching for Elements**: Search in the same order of hash functions until the target element is found and returned; if an empty position is encountered or all hash functions have been tried, it indicates the element is not in the hash table, then return `None`. + +Compared to linear probing, double hashing is less prone to clustering but involves additional computation for multiple hash functions. + +!!! tip + + Please note that open addressing (linear probing, quadratic probing, and double hashing) hash tables all have the issue of "not being able to directly delete elements." + +## 6.2.3   Choice of Programming Languages + +Various programming languages have adopted different hash table implementation strategies, here are a few examples: + +- Python uses open addressing. The `dict` dictionary uses pseudo-random numbers for probing. +- Java uses separate chaining. Since JDK 1.8, when the array length in `HashMap` reaches 64 and the length of a linked list reaches 8, the linked list is converted to a red-black tree to improve search performance. +- Go uses separate chaining. Go stipulates that each bucket can store up to 8 key-value pairs, and if the capacity is exceeded, an overflow bucket is connected; when there are too many overflow buckets, a special equal-size expansion operation is performed to ensure performance. diff --git a/docs-en/chapter_hashing/hash_map.md b/docs-en/chapter_hashing/hash_map.md new file mode 100755 index 000000000..3abf28611 --- /dev/null +++ b/docs-en/chapter_hashing/hash_map.md @@ -0,0 +1,1678 @@ +--- +comments: true +--- + +# 6.1   Hash Table + +A "hash table", also known as a "hash map", achieves efficient element querying by establishing a mapping between keys and values. Specifically, when we input a `key` into the hash table, we can retrieve the corresponding `value` in $O(1)$ time. + +As shown in the Figure 6-1 , given $n$ students, each with two pieces of data: "name" and "student number". If we want to implement a query feature that returns the corresponding name when given a student number, we can use the hash table shown in the Figure 6-1 . + +![Abstract representation of a hash table](hash_map.assets/hash_table_lookup.png){ class="animation-figure" } + +

Figure 6-1   Abstract representation of a hash table

+ +Apart from hash tables, arrays and linked lists can also be used to implement querying functions. Their efficiency is compared in the Table 6-1 . + +- **Adding Elements**: Simply add the element to the end of the array (or linked list), using $O(1)$ time. +- **Querying Elements**: Since the array (or linked list) is unordered, it requires traversing all the elements, using $O(n)$ time. +- **Deleting Elements**: First, locate the element, then delete it from the array (or linked list), using $O(n)$ time. + +

Table 6-1   Comparison of Element Query Efficiency

+ +
+ +| | Array | Linked List | Hash Table | +| -------------- | ------ | ----------- | ---------- | +| Find Element | $O(n)$ | $O(n)$ | $O(1)$ | +| Add Element | $O(1)$ | $O(1)$ | $O(1)$ | +| Delete Element | $O(n)$ | $O(n)$ | $O(1)$ | + +
+ +Observations reveal that **the time complexity for adding, deleting, and querying in a hash table is $O(1)$**, which is highly efficient. + +## 6.1.1   Common Operations of Hash Table + +Common operations of a hash table include initialization, querying, adding key-value pairs, and deleting key-value pairs, etc. Example code is as follows: + +=== "Python" + + ```python title="hash_map.py" + # Initialize hash table + hmap: dict = {} + + # Add operation + # Add key-value pair (key, value) to the hash table + hmap[12836] = "Xiao Ha" + hmap[15937] = "Xiao Luo" + hmap[16750] = "Xiao Suan" + hmap[13276] = "Xiao Fa" + hmap[10583] = "Xiao Ya" + + # Query operation + # Input key into hash table, get value + name: str = hmap[15937] + + # Delete operation + # Delete key-value pair (key, value) from hash table + hmap.pop(10583) + ``` + +=== "C++" + + ```cpp title="hash_map.cpp" + /* Initialize hash table */ + unordered_map map; + + /* Add operation */ + // Add key-value pair (key, value) to the hash table + map[12836] = "Xiao Ha"; + map[15937] = "Xiao Luo"; + map[16750] = "Xiao Suan"; + map[13276] = "Xiao Fa"; + map[10583] = "Xiao Ya"; + + /* Query operation */ + // Input key into hash table, get value + string name = map[15937]; + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + map.erase(10583); + ``` + +=== "Java" + + ```java title="hash_map.java" + /* Initialize hash table */ + Map map = new HashMap<>(); + + /* Add operation */ + // Add key-value pair (key, value) to the hash table + map.put(12836, "Xiao Ha"); + map.put(15937, "Xiao Luo"); + map.put(16750, "Xiao Suan"); + map.put(13276, "Xiao Fa"); + map.put(10583, "Xiao Ya"); + + /* Query operation */ + // Input key into hash table, get value + String name = map.get(15937); + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + map.remove(10583); + ``` + +=== "C#" + + ```csharp title="hash_map.cs" + /* Initialize hash table */ + Dictionary map = new() { + /* Add operation */ + // Add key-value pair (key, value) to the hash table + { 12836, "Xiao Ha" }, + { 15937, "Xiao Luo" }, + { 16750, "Xiao Suan" }, + { 13276, "Xiao Fa" }, + { 10583, "Xiao Ya" } + }; + + /* Query operation */ + // Input key into hash table, get value + string name = map[15937]; + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + map.Remove(10583); + ``` + +=== "Go" + + ```go title="hash_map_test.go" + /* Initialize hash table */ + hmap := make(map[int]string) + + /* Add operation */ + // Add key-value pair (key, value) to the hash table + hmap[12836] = "Xiao Ha" + hmap[15937] = "Xiao Luo" + hmap[16750] = "Xiao Suan" + hmap[13276] = "Xiao Fa" + hmap[10583] = "Xiao Ya" + + /* Query operation */ + // Input key into hash table, get value + name := hmap[15937] + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + delete(hmap, 10583) + ``` + +=== "Swift" + + ```swift title="hash_map.swift" + /* Initialize hash table */ + var map: [Int: String] = [:] + + /* Add operation */ + // Add key-value pair (key, value) to the hash table + map[12836] = "Xiao Ha" + map[15937] = "Xiao Luo" + map[16750] = "Xiao Suan" + map[13276] = "Xiao Fa" + map[10583] = "Xiao Ya" + + /* Query operation */ + // Input key into hash table, get value + let name = map[15937]! + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + map.removeValue(forKey: 10583) + ``` + +=== "JS" + + ```javascript title="hash_map.js" + /* Initialize hash table */ + const map = new Map(); + /* Add operation */ + // Add key-value pair (key, value) to the hash table + map.set(12836, 'Xiao Ha'); + map.set(15937, 'Xiao Luo'); + map.set(16750, 'Xiao Suan'); + map.set(13276, 'Xiao Fa'); + map.set(10583, 'Xiao Ya'); + + /* Query operation */ + // Input key into hash table, get value + let name = map.get(15937); + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + map.delete(10583); + ``` + +=== "TS" + + ```typescript title="hash_map.ts" + /* Initialize hash table */ + const map = new Map(); + /* Add operation */ + // Add key-value pair (key, value) to the hash table + map.set(12836, 'Xiao Ha'); + map.set(15937, 'Xiao Luo'); + map.set(16750, 'Xiao Suan'); + map.set(13276, 'Xiao Fa'); + map.set(10583, 'Xiao Ya'); + console.info('\nAfter adding, the hash table is\nKey -> Value'); + console.info(map); + + /* Query operation */ + // Input key into hash table, get value + let name = map.get(15937); + console.info('\nInput student number 15937, query name ' + name); + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + map.delete(10583); + console.info('\nAfter deleting 10583, the hash table is\nKey -> Value'); + console.info(map); + ``` + +=== "Dart" + + ```dart title="hash_map.dart" + /* Initialize hash table */ + Map map = {}; + + /* Add operation */ + // Add key-value pair (key, value) to the hash table + map[12836] = "Xiao Ha"; + map[15937] = "Xiao Luo"; + map[16750] = "Xiao Suan"; + map[13276] = "Xiao Fa"; + map[10583] = "Xiao Ya"; + + /* Query operation */ + // Input key into hash table, get value + String name = map[15937]; + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + map.remove(10583); + ``` + +=== "Rust" + + ```rust title="hash_map.rs" + use std::collections::HashMap; + + /* Initialize hash table */ + let mut map: HashMap = HashMap::new(); + + /* Add operation */ + // Add key-value pair (key, value) to the hash table + map.insert(12836, "Xiao Ha".to_string()); + map.insert(15937, "Xiao Luo".to_string()); + map.insert(16750, "Xiao Suan".to_string()); + map.insert(13279, "Xiao Fa".to_string()); + map.insert(10583, "Xiao Ya".to_string()); + + /* Query operation */ + // Input key into hash table, get value + let _name: Option<&String> = map.get(&15937); + + /* Delete operation */ + // Delete key-value pair (key, value) from hash table + let _removed_value: Option = map.remove(&10583); + ``` + +=== "C" + + ```c title="hash_map.c" + // C does not provide a built-in hash table + ``` + +=== "Zig" + + ```zig title="hash_map.zig" + + ``` + +??? pythontutor "Code Visualization" + +
+ + +There are three common ways to traverse a hash table: traversing key-value pairs, keys, and values. Example code is as follows: + +=== "Python" + + ```python title="hash_map.py" + # Traverse hash table + # Traverse key-value pairs key->value + for key, value in hmap.items(): + print(key, "->", value) + # Traverse keys only + for key in hmap.keys(): + print(key) + # Traverse values only + for value in hmap.values(): + print(value) + ``` + +=== "C++" + + ```cpp title="hash_map.cpp" + /* Traverse hash table */ + // Traverse key-value pairs key->value + for (auto kv: map) { + cout << kv.first << " -> " << kv.second << endl; + } + // Traverse using iterator key->value + for (auto iter = map.begin(); iter != map.end(); iter++) { + cout << iter->first << "->" << iter->second << endl; + } + ``` + +=== "Java" + + ```java title="hash_map.java" + /* Traverse hash table */ + // Traverse key-value pairs key->value + for (Map.Entry kv: map.entrySet()) { + System.out.println(kv.getKey() + " -> " + kv.getValue()); + } + // Traverse keys only + for (int key: map.keySet()) { + System.out.println(key); + } + // Traverse values only + for (String val: map.values()) { + System.out.println(val); + } + ``` + +=== "C#" + + ```csharp title="hash_map.cs" + /* Traverse hash table */ + // Traverse key-value pairs Key->Value + foreach (var kv in map) { + Console.WriteLine(kv.Key + " -> " + kv.Value); + } + // Traverse keys only + foreach (int key in map.Keys) { + Console.WriteLine(key); + } + // Traverse values only + foreach (string val in map.Values) { + Console.WriteLine(val); + } + ``` + +=== "Go" + + ```go title="hash_map_test.go" + /* Traverse hash table */ + // Traverse key-value pairs key->value + for key, value := range hmap { + fmt.Println(key, "->", value) + } + // Traverse keys only + for key := range hmap { + fmt.Println(key) + } + // Traverse values only + for _, value := range hmap { + fmt.Println(value) + } + ``` + +=== "Swift" + + ```swift title="hash_map.swift" + /* Traverse hash table */ + // Traverse key-value pairs Key->Value + for (key, value) in map { + print("\(key) -> \(value)") + } + // Traverse keys only + for key in map.keys { + print(key) + } + // Traverse values only + for value in map.values { + print(value) + } + ``` + +=== "JS" + + ```javascript title="hash_map.js" + /* Traverse hash table */ + console.info('\nTraverse key-value pairs Key->Value'); + for (const [k, v] of map.entries()) { + console.info(k + ' -> ' + v); + } + console.info('\nTraverse keys only Key'); + for (const k of map.keys()) { + console.info(k); + } + console.info('\nTraverse values only Value'); + for (const v of map.values()) { + console.info(v); + } + ``` + +=== "TS" + + ```typescript title="hash_map.ts" + /* Traverse hash table */ + console.info('\nTraverse key-value pairs Key->Value'); + for (const [k, v] of map.entries()) { + console.info(k + ' -> ' + v); + } + console.info('\nTraverse keys only Key'); + for (const k of map.keys()) { + console.info(k); + } + console.info('\nTraverse values only Value'); + for (const v of map.values()) { + console.info(v); + } + ``` + +=== "Dart" + + ```dart title="hash_map.dart" + /* Traverse hash table */ + // Traverse key-value pairs Key->Value + map.forEach((key, value) { + print('$key -> $value'); + }); + + // Traverse keys only Key + map.keys.forEach((key) { + print(key); + }); + + // Traverse values only Value + map.values.forEach((value) { + print(value); + }); + ``` + +=== "Rust" + + ```rust title="hash_map.rs" + /* Traverse hash table */ + // Traverse key-value pairs Key->Value + for (key, value) in &map { + println!("{key} -> {value}"); + } + + // Traverse keys only Key + for key in map.keys() { + println!("{key}"); + } + + // Traverse values only Value + for value in map.values() { + println!("{value}"); + } + ``` + +=== "C" + + ```c title="hash_map.c" + // C does not provide a built-in hash table + ``` + +=== "Zig" + + ```zig title="hash_map.zig" + // Zig example is not provided + ``` + +??? pythontutor "Code Visualization" + +
+ + +## 6.1.2   Simple Implementation of Hash Table + +First, let's consider the simplest case: **implementing a hash table using just an array**. In the hash table, each empty slot in the array is called a "bucket", and each bucket can store one key-value pair. Therefore, the query operation involves finding the bucket corresponding to the `key` and retrieving the `value` from it. + +So, how do we locate the appropriate bucket based on the `key`? This is achieved through a "hash function". The role of the hash function is to map a larger input space to a smaller output space. In a hash table, the input space is all possible keys, and the output space is all buckets (array indices). In other words, input a `key`, **and we can use the hash function to determine the storage location of the corresponding key-value pair in the array**. + +The calculation process of the hash function for a given `key` is divided into the following two steps: + +1. Calculate the hash value using a certain hash algorithm `hash()`. +2. Take the modulus of the hash value with the number of buckets (array length) `capacity` to obtain the array index `index`. + +```shell +index = hash(key) % capacity +``` + +Afterward, we can use `index` to access the corresponding bucket in the hash table and thereby retrieve the `value`. + +Assuming array length `capacity = 100` and hash algorithm `hash(key) = key`, the hash function is `key % 100`. The Figure 6-2 uses `key` as the student number and `value` as the name to demonstrate the working principle of the hash function. + +![Working principle of hash function](hash_map.assets/hash_function.png){ class="animation-figure" } + +

Figure 6-2   Working principle of hash function

+ +The following code implements a simple hash table. Here, we encapsulate `key` and `value` into a class `Pair` to represent the key-value pair. + +=== "Python" + + ```python title="array_hash_map.py" + class Pair: + """键值对""" + + def __init__(self, key: int, val: str): + self.key = key + self.val = val + + class ArrayHashMap: + """基于数组实现的哈希表""" + + def __init__(self): + """构造方法""" + # 初始化数组,包含 100 个桶 + self.buckets: list[Pair | None] = [None] * 100 + + def hash_func(self, key: int) -> int: + """哈希函数""" + index = key % 100 + return index + + def get(self, key: int) -> str: + """查询操作""" + index: int = self.hash_func(key) + pair: Pair = self.buckets[index] + if pair is None: + return None + return pair.val + + def put(self, key: int, val: str): + """添加操作""" + pair = Pair(key, val) + index: int = self.hash_func(key) + self.buckets[index] = pair + + def remove(self, key: int): + """删除操作""" + index: int = self.hash_func(key) + # 置为 None ,代表删除 + self.buckets[index] = None + + def entry_set(self) -> list[Pair]: + """获取所有键值对""" + result: list[Pair] = [] + for pair in self.buckets: + if pair is not None: + result.append(pair) + return result + + def key_set(self) -> list[int]: + """获取所有键""" + result = [] + for pair in self.buckets: + if pair is not None: + result.append(pair.key) + return result + + def value_set(self) -> list[str]: + """获取所有值""" + result = [] + for pair in self.buckets: + if pair is not None: + result.append(pair.val) + return result + + def print(self): + """打印哈希表""" + for pair in self.buckets: + if pair is not None: + print(pair.key, "->", pair.val) + ``` + +=== "C++" + + ```cpp title="array_hash_map.cpp" + /* 键值对 */ + struct Pair { + public: + int key; + string val; + Pair(int key, string val) { + this->key = key; + this->val = val; + } + }; + + /* 基于数组实现的哈希表 */ + class ArrayHashMap { + private: + vector buckets; + + public: + ArrayHashMap() { + // 初始化数组,包含 100 个桶 + buckets = vector(100); + } + + ~ArrayHashMap() { + // 释放内存 + for (const auto &bucket : buckets) { + delete bucket; + } + buckets.clear(); + } + + /* 哈希函数 */ + int hashFunc(int key) { + int index = key % 100; + return index; + } + + /* 查询操作 */ + string get(int key) { + int index = hashFunc(key); + Pair *pair = buckets[index]; + if (pair == nullptr) + return ""; + return pair->val; + } + + /* 添加操作 */ + void put(int key, string val) { + Pair *pair = new Pair(key, val); + int index = hashFunc(key); + buckets[index] = pair; + } + + /* 删除操作 */ + void remove(int key) { + int index = hashFunc(key); + // 释放内存并置为 nullptr + delete buckets[index]; + buckets[index] = nullptr; + } + + /* 获取所有键值对 */ + vector pairSet() { + vector pairSet; + for (Pair *pair : buckets) { + if (pair != nullptr) { + pairSet.push_back(pair); + } + } + return pairSet; + } + + /* 获取所有键 */ + vector keySet() { + vector keySet; + for (Pair *pair : buckets) { + if (pair != nullptr) { + keySet.push_back(pair->key); + } + } + return keySet; + } + + /* 获取所有值 */ + vector valueSet() { + vector valueSet; + for (Pair *pair : buckets) { + if (pair != nullptr) { + valueSet.push_back(pair->val); + } + } + return valueSet; + } + + /* 打印哈希表 */ + void print() { + for (Pair *kv : pairSet()) { + cout << kv->key << " -> " << kv->val << endl; + } + } + }; + ``` + +=== "Java" + + ```java title="array_hash_map.java" + /* 键值对 */ + class Pair { + public int key; + public String val; + + public Pair(int key, String val) { + this.key = key; + this.val = val; + } + } + + /* 基于数组实现的哈希表 */ + class ArrayHashMap { + private List buckets; + + public ArrayHashMap() { + // 初始化数组,包含 100 个桶 + buckets = new ArrayList<>(); + for (int i = 0; i < 100; i++) { + buckets.add(null); + } + } + + /* 哈希函数 */ + private int hashFunc(int key) { + int index = key % 100; + return index; + } + + /* 查询操作 */ + public String get(int key) { + int index = hashFunc(key); + Pair pair = buckets.get(index); + if (pair == null) + return null; + return pair.val; + } + + /* 添加操作 */ + public void put(int key, String val) { + Pair pair = new Pair(key, val); + int index = hashFunc(key); + buckets.set(index, pair); + } + + /* 删除操作 */ + public void remove(int key) { + int index = hashFunc(key); + // 置为 null ,代表删除 + buckets.set(index, null); + } + + /* 获取所有键值对 */ + public List pairSet() { + List pairSet = new ArrayList<>(); + for (Pair pair : buckets) { + if (pair != null) + pairSet.add(pair); + } + return pairSet; + } + + /* 获取所有键 */ + public List keySet() { + List keySet = new ArrayList<>(); + for (Pair pair : buckets) { + if (pair != null) + keySet.add(pair.key); + } + return keySet; + } + + /* 获取所有值 */ + public List valueSet() { + List valueSet = new ArrayList<>(); + for (Pair pair : buckets) { + if (pair != null) + valueSet.add(pair.val); + } + return valueSet; + } + + /* 打印哈希表 */ + public void print() { + for (Pair kv : pairSet()) { + System.out.println(kv.key + " -> " + kv.val); + } + } + } + ``` + +=== "C#" + + ```csharp title="array_hash_map.cs" + /* 键值对 int->string */ + class Pair(int key, string val) { + public int key = key; + public string val = val; + } + + /* 基于数组实现的哈希表 */ + class ArrayHashMap { + List buckets; + public ArrayHashMap() { + // 初始化数组,包含 100 个桶 + buckets = []; + for (int i = 0; i < 100; i++) { + buckets.Add(null); + } + } + + /* 哈希函数 */ + int HashFunc(int key) { + int index = key % 100; + return index; + } + + /* 查询操作 */ + public string? Get(int key) { + int index = HashFunc(key); + Pair? pair = buckets[index]; + if (pair == null) return null; + return pair.val; + } + + /* 添加操作 */ + public void Put(int key, string val) { + Pair pair = new(key, val); + int index = HashFunc(key); + buckets[index] = pair; + } + + /* 删除操作 */ + public void Remove(int key) { + int index = HashFunc(key); + // 置为 null ,代表删除 + buckets[index] = null; + } + + /* 获取所有键值对 */ + public List PairSet() { + List pairSet = []; + foreach (Pair? pair in buckets) { + if (pair != null) + pairSet.Add(pair); + } + return pairSet; + } + + /* 获取所有键 */ + public List KeySet() { + List keySet = []; + foreach (Pair? pair in buckets) { + if (pair != null) + keySet.Add(pair.key); + } + return keySet; + } + + /* 获取所有值 */ + public List ValueSet() { + List valueSet = []; + foreach (Pair? pair in buckets) { + if (pair != null) + valueSet.Add(pair.val); + } + return valueSet; + } + + /* 打印哈希表 */ + public void Print() { + foreach (Pair kv in PairSet()) { + Console.WriteLine(kv.key + " -> " + kv.val); + } + } + } + ``` + +=== "Go" + + ```go title="array_hash_map.go" + /* 键值对 */ + type pair struct { + key int + val string + } + + /* 基于数组实现的哈希表 */ + type arrayHashMap struct { + buckets []*pair + } + + /* 初始化哈希表 */ + func newArrayHashMap() *arrayHashMap { + // 初始化数组,包含 100 个桶 + buckets := make([]*pair, 100) + return &arrayHashMap{buckets: buckets} + } + + /* 哈希函数 */ + func (a *arrayHashMap) hashFunc(key int) int { + index := key % 100 + return index + } + + /* 查询操作 */ + func (a *arrayHashMap) get(key int) string { + index := a.hashFunc(key) + pair := a.buckets[index] + if pair == nil { + return "Not Found" + } + return pair.val + } + + /* 添加操作 */ + func (a *arrayHashMap) put(key int, val string) { + pair := &pair{key: key, val: val} + index := a.hashFunc(key) + a.buckets[index] = pair + } + + /* 删除操作 */ + func (a *arrayHashMap) remove(key int) { + index := a.hashFunc(key) + // 置为 nil ,代表删除 + a.buckets[index] = nil + } + + /* 获取所有键对 */ + func (a *arrayHashMap) pairSet() []*pair { + var pairs []*pair + for _, pair := range a.buckets { + if pair != nil { + pairs = append(pairs, pair) + } + } + return pairs + } + + /* 获取所有键 */ + func (a *arrayHashMap) keySet() []int { + var keys []int + for _, pair := range a.buckets { + if pair != nil { + keys = append(keys, pair.key) + } + } + return keys + } + + /* 获取所有值 */ + func (a *arrayHashMap) valueSet() []string { + var values []string + for _, pair := range a.buckets { + if pair != nil { + values = append(values, pair.val) + } + } + return values + } + + /* 打印哈希表 */ + func (a *arrayHashMap) print() { + for _, pair := range a.buckets { + if pair != nil { + fmt.Println(pair.key, "->", pair.val) + } + } + } + ``` + +=== "Swift" + + ```swift title="array_hash_map.swift" + /* 键值对 */ + class Pair: Equatable { + public var key: Int + public var val: String + + public init(key: Int, val: String) { + self.key = key + self.val = val + } + + public static func == (lhs: Pair, rhs: Pair) -> Bool { + lhs.key == rhs.key && lhs.val == rhs.val + } + } + + /* 基于数组实现的哈希表 */ + class ArrayHashMap { + private var buckets: [Pair?] = [] + + init() { + // 初始化数组,包含 100 个桶 + for _ in 0 ..< 100 { + buckets.append(nil) + } + } + + /* 哈希函数 */ + private func hashFunc(key: Int) -> Int { + let index = key % 100 + return index + } + + /* 查询操作 */ + func get(key: Int) -> String? { + let index = hashFunc(key: key) + let pair = buckets[index] + return pair?.val + } + + /* 添加操作 */ + func put(key: Int, val: String) { + let pair = Pair(key: key, val: val) + let index = hashFunc(key: key) + buckets[index] = pair + } + + /* 删除操作 */ + func remove(key: Int) { + let index = hashFunc(key: key) + // 置为 nil ,代表删除 + buckets[index] = nil + } + + /* 获取所有键值对 */ + func pairSet() -> [Pair] { + var pairSet: [Pair] = [] + for pair in buckets { + if let pair = pair { + pairSet.append(pair) + } + } + return pairSet + } + + /* 获取所有键 */ + func keySet() -> [Int] { + var keySet: [Int] = [] + for pair in buckets { + if let pair = pair { + keySet.append(pair.key) + } + } + return keySet + } + + /* 获取所有值 */ + func valueSet() -> [String] { + var valueSet: [String] = [] + for pair in buckets { + if let pair = pair { + valueSet.append(pair.val) + } + } + return valueSet + } + + /* 打印哈希表 */ + func print() { + for pair in pairSet() { + Swift.print("\(pair.key) -> \(pair.val)") + } + } + } + ``` + +=== "JS" + + ```javascript title="array_hash_map.js" + /* 键值对 Number -> String */ + class Pair { + constructor(key, val) { + this.key = key; + this.val = val; + } + } + + /* 基于数组实现的哈希表 */ + class ArrayHashMap { + #buckets; + constructor() { + // 初始化数组,包含 100 个桶 + this.#buckets = new Array(100).fill(null); + } + + /* 哈希函数 */ + #hashFunc(key) { + return key % 100; + } + + /* 查询操作 */ + get(key) { + let index = this.#hashFunc(key); + let pair = this.#buckets[index]; + if (pair === null) return null; + return pair.val; + } + + /* 添加操作 */ + set(key, val) { + let index = this.#hashFunc(key); + this.#buckets[index] = new Pair(key, val); + } + + /* 删除操作 */ + delete(key) { + let index = this.#hashFunc(key); + // 置为 null ,代表删除 + this.#buckets[index] = null; + } + + /* 获取所有键值对 */ + entries() { + let arr = []; + for (let i = 0; i < this.#buckets.length; i++) { + if (this.#buckets[i]) { + arr.push(this.#buckets[i]); + } + } + return arr; + } + + /* 获取所有键 */ + keys() { + let arr = []; + for (let i = 0; i < this.#buckets.length; i++) { + if (this.#buckets[i]) { + arr.push(this.#buckets[i].key); + } + } + return arr; + } + + /* 获取所有值 */ + values() { + let arr = []; + for (let i = 0; i < this.#buckets.length; i++) { + if (this.#buckets[i]) { + arr.push(this.#buckets[i].val); + } + } + return arr; + } + + /* 打印哈希表 */ + print() { + let pairSet = this.entries(); + for (const pair of pairSet) { + console.info(`${pair.key} -> ${pair.val}`); + } + } + } + ``` + +=== "TS" + + ```typescript title="array_hash_map.ts" + /* 键值对 Number -> String */ + class Pair { + public key: number; + public val: string; + + constructor(key: number, val: string) { + this.key = key; + this.val = val; + } + } + + /* 基于数组实现的哈希表 */ + class ArrayHashMap { + private readonly buckets: (Pair | null)[]; + + constructor() { + // 初始化数组,包含 100 个桶 + this.buckets = new Array(100).fill(null); + } + + /* 哈希函数 */ + private hashFunc(key: number): number { + return key % 100; + } + + /* 查询操作 */ + public get(key: number): string | null { + let index = this.hashFunc(key); + let pair = this.buckets[index]; + if (pair === null) return null; + return pair.val; + } + + /* 添加操作 */ + public set(key: number, val: string) { + let index = this.hashFunc(key); + this.buckets[index] = new Pair(key, val); + } + + /* 删除操作 */ + public delete(key: number) { + let index = this.hashFunc(key); + // 置为 null ,代表删除 + this.buckets[index] = null; + } + + /* 获取所有键值对 */ + public entries(): (Pair | null)[] { + let arr: (Pair | null)[] = []; + for (let i = 0; i < this.buckets.length; i++) { + if (this.buckets[i]) { + arr.push(this.buckets[i]); + } + } + return arr; + } + + /* 获取所有键 */ + public keys(): (number | undefined)[] { + let arr: (number | undefined)[] = []; + for (let i = 0; i < this.buckets.length; i++) { + if (this.buckets[i]) { + arr.push(this.buckets[i].key); + } + } + return arr; + } + + /* 获取所有值 */ + public values(): (string | undefined)[] { + let arr: (string | undefined)[] = []; + for (let i = 0; i < this.buckets.length; i++) { + if (this.buckets[i]) { + arr.push(this.buckets[i].val); + } + } + return arr; + } + + /* 打印哈希表 */ + public print() { + let pairSet = this.entries(); + for (const pair of pairSet) { + console.info(`${pair.key} -> ${pair.val}`); + } + } + } + ``` + +=== "Dart" + + ```dart title="array_hash_map.dart" + /* 键值对 */ + class Pair { + int key; + String val; + Pair(this.key, this.val); + } + + /* 基于数组实现的哈希表 */ + class ArrayHashMap { + late List _buckets; + + ArrayHashMap() { + // 初始化数组,包含 100 个桶 + _buckets = List.filled(100, null); + } + + /* 哈希函数 */ + int _hashFunc(int key) { + final int index = key % 100; + return index; + } + + /* 查询操作 */ + String? get(int key) { + final int index = _hashFunc(key); + final Pair? pair = _buckets[index]; + if (pair == null) { + return null; + } + return pair.val; + } + + /* 添加操作 */ + void put(int key, String val) { + final Pair pair = Pair(key, val); + final int index = _hashFunc(key); + _buckets[index] = pair; + } + + /* 删除操作 */ + void remove(int key) { + final int index = _hashFunc(key); + _buckets[index] = null; + } + + /* 获取所有键值对 */ + List pairSet() { + List pairSet = []; + for (final Pair? pair in _buckets) { + if (pair != null) { + pairSet.add(pair); + } + } + return pairSet; + } + + /* 获取所有键 */ + List keySet() { + List keySet = []; + for (final Pair? pair in _buckets) { + if (pair != null) { + keySet.add(pair.key); + } + } + return keySet; + } + + /* 获取所有值 */ + List values() { + List valueSet = []; + for (final Pair? pair in _buckets) { + if (pair != null) { + valueSet.add(pair.val); + } + } + return valueSet; + } + + /* 打印哈希表 */ + void printHashMap() { + for (final Pair kv in pairSet()) { + print("${kv.key} -> ${kv.val}"); + } + } + } + ``` + +=== "Rust" + + ```rust title="array_hash_map.rs" + /* 键值对 */ + #[derive(Debug, Clone, PartialEq)] + pub struct Pair { + pub key: i32, + pub val: String, + } + + /* 基于数组实现的哈希表 */ + pub struct ArrayHashMap { + buckets: Vec> + } + + impl ArrayHashMap { + pub fn new() -> ArrayHashMap { + // 初始化数组,包含 100 个桶 + Self { buckets: vec![None; 100] } + } + + /* 哈希函数 */ + fn hash_func(&self, key: i32) -> usize { + key as usize % 100 + } + + /* 查询操作 */ + pub fn get(&self, key: i32) -> Option<&String> { + let index = self.hash_func(key); + self.buckets[index].as_ref().map(|pair| &pair.val) + } + + /* 添加操作 */ + pub fn put(&mut self, key: i32, val: &str) { + let index = self.hash_func(key); + self.buckets[index] = Some(Pair { + key, + val: val.to_string(), + }); + } + + /* 删除操作 */ + pub fn remove(&mut self, key: i32) { + let index = self.hash_func(key); + // 置为 None ,代表删除 + self.buckets[index] = None; + } + + /* 获取所有键值对 */ + pub fn entry_set(&self) -> Vec<&Pair> { + self.buckets.iter().filter_map(|pair| pair.as_ref()).collect() + } + + /* 获取所有键 */ + pub fn key_set(&self) -> Vec<&i32> { + self.buckets.iter().filter_map(|pair| pair.as_ref().map(|pair| &pair.key)).collect() + } + + /* 获取所有值 */ + pub fn value_set(&self) -> Vec<&String> { + self.buckets.iter().filter_map(|pair| pair.as_ref().map(|pair| &pair.val)).collect() + } + + /* 打印哈希表 */ + pub fn print(&self) { + for pair in self.entry_set() { + println!("{} -> {}", pair.key, pair.val); + } + } + } + ``` + +=== "C" + + ```c title="array_hash_map.c" + /* 键值对 int->string */ + typedef struct { + int key; + char *val; + } Pair; + + /* 基于数组实现的哈希表 */ + typedef struct { + Pair *buckets[HASHTABLE_CAPACITY]; + } ArrayHashMap; + + /* 构造函数 */ + ArrayHashMap *newArrayHashMap() { + ArrayHashMap *hmap = malloc(sizeof(ArrayHashMap)); + return hmap; + } + + /* 析构函数 */ + void delArrayHashMap(ArrayHashMap *hmap) { + for (int i = 0; i < HASHTABLE_CAPACITY; i++) { + if (hmap->buckets[i] != NULL) { + free(hmap->buckets[i]->val); + free(hmap->buckets[i]); + } + } + free(hmap); + } + + /* 添加操作 */ + void put(ArrayHashMap *hmap, const int key, const char *val) { + Pair *Pair = malloc(sizeof(Pair)); + Pair->key = key; + Pair->val = malloc(strlen(val) + 1); + strcpy(Pair->val, val); + + int index = hashFunc(key); + hmap->buckets[index] = Pair; + } + + /* 删除操作 */ + void removeItem(ArrayHashMap *hmap, const int key) { + int index = hashFunc(key); + free(hmap->buckets[index]->val); + free(hmap->buckets[index]); + hmap->buckets[index] = NULL; + } + + /* 获取所有键值对 */ + void pairSet(ArrayHashMap *hmap, MapSet *set) { + Pair *entries; + int i = 0, index = 0; + int total = 0; + /* 统计有效键值对数量 */ + for (i = 0; i < HASHTABLE_CAPACITY; i++) { + if (hmap->buckets[i] != NULL) { + total++; + } + } + entries = malloc(sizeof(Pair) * total); + for (i = 0; i < HASHTABLE_CAPACITY; i++) { + if (hmap->buckets[i] != NULL) { + entries[index].key = hmap->buckets[i]->key; + entries[index].val = malloc(strlen(hmap->buckets[i]->val) + 1); + strcpy(entries[index].val, hmap->buckets[i]->val); + index++; + } + } + set->set = entries; + set->len = total; + } + + /* 获取所有键 */ + void keySet(ArrayHashMap *hmap, MapSet *set) { + int *keys; + int i = 0, index = 0; + int total = 0; + /* 统计有效键值对数量 */ + for (i = 0; i < HASHTABLE_CAPACITY; i++) { + if (hmap->buckets[i] != NULL) { + total++; + } + } + keys = malloc(total * sizeof(int)); + for (i = 0; i < HASHTABLE_CAPACITY; i++) { + if (hmap->buckets[i] != NULL) { + keys[index] = hmap->buckets[i]->key; + index++; + } + } + set->set = keys; + set->len = total; + } + + /* 获取所有值 */ + void valueSet(ArrayHashMap *hmap, MapSet *set) { + char **vals; + int i = 0, index = 0; + int total = 0; + /* 统计有效键值对数量 */ + for (i = 0; i < HASHTABLE_CAPACITY; i++) { + if (hmap->buckets[i] != NULL) { + total++; + } + } + vals = malloc(total * sizeof(char *)); + for (i = 0; i < HASHTABLE_CAPACITY; i++) { + if (hmap->buckets[i] != NULL) { + vals[index] = hmap->buckets[i]->val; + index++; + } + } + set->set = vals; + set->len = total; + } + + /* 打印哈希表 */ + void print(ArrayHashMap *hmap) { + int i; + MapSet set; + pairSet(hmap, &set); + Pair *entries = (Pair *)set.set; + for (i = 0; i < set.len; i++) { + printf("%d -> %s\n", entries[i].key, entries[i].val); + } + free(set.set); + } + ``` + +=== "Zig" + + ```zig title="array_hash_map.zig" + // 键值对 + const Pair = struct { + key: usize = undefined, + val: []const u8 = undefined, + + pub fn init(key: usize, val: []const u8) Pair { + return Pair { + .key = key, + .val = val, + }; + } + }; + + // 基于数组实现的哈希表 + fn ArrayHashMap(comptime T: type) type { + return struct { + bucket: ?std.ArrayList(?T) = null, + mem_allocator: std.mem.Allocator = undefined, + + const Self = @This(); + + // 构造函数 + pub fn init(self: *Self, allocator: std.mem.Allocator) !void { + self.mem_allocator = allocator; + // 初始化一个长度为 100 的桶(数组) + self.bucket = std.ArrayList(?T).init(self.mem_allocator); + var i: i32 = 0; + while (i < 100) : (i += 1) { + try self.bucket.?.append(null); + } + } + + // 析构函数 + pub fn deinit(self: *Self) void { + if (self.bucket != null) self.bucket.?.deinit(); + } + + // 哈希函数 + fn hashFunc(key: usize) usize { + var index = key % 100; + return index; + } + + // 查询操作 + pub fn get(self: *Self, key: usize) []const u8 { + var index = hashFunc(key); + var pair = self.bucket.?.items[index]; + return pair.?.val; + } + + // 添加操作 + pub fn put(self: *Self, key: usize, val: []const u8) !void { + var pair = Pair.init(key, val); + var index = hashFunc(key); + self.bucket.?.items[index] = pair; + } + + // 删除操作 + pub fn remove(self: *Self, key: usize) !void { + var index = hashFunc(key); + // 置为 null ,代表删除 + self.bucket.?.items[index] = null; + } + + // 获取所有键值对 + pub fn pairSet(self: *Self) !std.ArrayList(T) { + var entry_set = std.ArrayList(T).init(self.mem_allocator); + for (self.bucket.?.items) |item| { + if (item == null) continue; + try entry_set.append(item.?); + } + return entry_set; + } + + // 获取所有键 + pub fn keySet(self: *Self) !std.ArrayList(usize) { + var key_set = std.ArrayList(usize).init(self.mem_allocator); + for (self.bucket.?.items) |item| { + if (item == null) continue; + try key_set.append(item.?.key); + } + return key_set; + } + + // 获取所有值 + pub fn valueSet(self: *Self) !std.ArrayList([]const u8) { + var value_set = std.ArrayList([]const u8).init(self.mem_allocator); + for (self.bucket.?.items) |item| { + if (item == null) continue; + try value_set.append(item.?.val); + } + return value_set; + } + + // 打印哈希表 + pub fn print(self: *Self) !void { + var entry_set = try self.pairSet(); + defer entry_set.deinit(); + for (entry_set.items) |item| { + std.debug.print("{} -> {s}\n", .{item.key, item.val}); + } + } + }; + } + ``` + +??? pythontutor "Code Visualization" + +
+ + +## 6.1.3   Hash Collision and Resizing + +Fundamentally, the role of the hash function is to map the entire input space of all keys to the output space of all array indices. However, the input space is often much larger than the output space. Therefore, **theoretically, there must be situations where "multiple inputs correspond to the same output"**. + +For the hash function in the above example, if the last two digits of the input `key` are the same, the output of the hash function will also be the same. For example, when querying for students with student numbers 12836 and 20336, we find: + +```shell +12836 % 100 = 36 +20336 % 100 = 36 +``` + +As shown in the Figure 6-3 , both student numbers point to the same name, which is obviously incorrect. This situation where multiple inputs correspond to the same output is known as "hash collision". + +![Example of hash collision](hash_map.assets/hash_collision.png){ class="animation-figure" } + +

Figure 6-3   Example of hash collision

+ +It is easy to understand that the larger the capacity $n$ of the hash table, the lower the probability of multiple keys being allocated to the same bucket, and the fewer the collisions. Therefore, **expanding the capacity of the hash table can reduce hash collisions**. + +As shown in the Figure 6-4 , before expansion, key-value pairs `(136, A)` and `(236, D)` collided; after expansion, the collision is resolved. + +![Hash table expansion](hash_map.assets/hash_table_reshash.png){ class="animation-figure" } + +

Figure 6-4   Hash table expansion

+ +Similar to array expansion, resizing a hash table requires migrating all key-value pairs from the original hash table to the new one, which is time-consuming. Furthermore, since the capacity `capacity` of the hash table changes, we need to recalculate the storage positions of all key-value pairs using the hash function, which adds to the computational overhead of the resizing process. Therefore, programming languages often reserve a sufficiently large capacity for the hash table to prevent frequent resizing. + +The "load factor" is an important concept for hash tables. It is defined as the ratio of the number of elements in the hash table to the number of buckets. It is used to measure the severity of hash collisions and **is often used as a trigger for resizing the hash table**. For example, in Java, when the load factor exceeds $0.75$, the system will resize the hash table to twice its original size. diff --git a/docs-en/chapter_hashing/index.md b/docs-en/chapter_hashing/index.md new file mode 100644 index 000000000..5daf4c5ba --- /dev/null +++ b/docs-en/chapter_hashing/index.md @@ -0,0 +1,25 @@ +--- +comments: true +icon: material/table-search +--- + +# Chapter 6.   Hash Table + +
+ +![Hash Table](../assets/covers/chapter_hashing.jpg){ class="cover-image" } + +
+ +!!! abstract + + In the world of computing, a hash table is like a wise librarian. + + He knows how to calculate index numbers, allowing it to quickly locate the target book. + +## Chapter Contents + +- [6.1   Hash Table](https://www.hello-algo.com/en/chapter_hashing/hash_map/) +- [6.2   Hash Collision](https://www.hello-algo.com/en/chapter_hashing/hash_collision/) +- [6.3   Hash Algorithm](https://www.hello-algo.com/en/chapter_hashing/hash_algorithm/) +- [6.4   Summary](https://www.hello-algo.com/en/chapter_hashing/summary/) diff --git a/docs-en/chapter_hashing/summary.md b/docs-en/chapter_hashing/summary.md new file mode 100644 index 000000000..ec68c7859 --- /dev/null +++ b/docs-en/chapter_hashing/summary.md @@ -0,0 +1,51 @@ +--- +comments: true +--- + +# 6.4   Summary + +### 1.   Key Review + +- Given an input `key`, a hash table can retrieve the corresponding `value` in $O(1)$ time, which is highly efficient. +- Common hash table operations include querying, adding key-value pairs, deleting key-value pairs, and traversing the hash table. +- The hash function maps a `key` to an array index, allowing access to the corresponding bucket to retrieve the `value`. +- Two different keys may end up with the same array index after hashing, leading to erroneous query results. This phenomenon is known as hash collision. +- The larger the capacity of the hash table, the lower the probability of hash collisions. Therefore, hash table resizing can mitigate hash collisions. Similar to array resizing, hash table resizing is costly. +- Load factor, defined as the ratio of the number of elements to the number of buckets in the hash table, reflects the severity of hash collisions and is often used as a trigger for resizing the hash table. +- Chaining addresses hash collisions by converting each element into a linked list, storing all colliding elements in the same list. However, excessively long lists can reduce query efficiency, which can be improved by converting the lists into red-black trees. +- Open addressing handles hash collisions through multiple probes. Linear probing uses a fixed step size but cannot delete elements and is prone to clustering. Multiple hashing uses several hash functions for probing, making it less susceptible to clustering but increasing computational load. +- Different programming languages adopt various hash table implementations. For example, Java's `HashMap` uses chaining, while Python's `dict` employs open addressing. +- In hash tables, we desire hash algorithms with determinism, high efficiency, and uniform distribution. In cryptography, hash algorithms should also possess collision resistance and the avalanche effect. +- Hash algorithms typically use large prime numbers as moduli to ensure uniform distribution of hash values and reduce hash collisions. +- Common hash algorithms include MD5, SHA-1, SHA-2, and SHA-3. MD5 is often used for file integrity checks, while SHA-2 is commonly used in secure applications and protocols. +- Programming languages usually provide built-in hash algorithms for data types to calculate bucket indices in hash tables. Generally, only immutable objects are hashable. + +### 2.   Q & A + +**Q**: When does the time complexity of a hash table degrade to $O(n)$? + +The time complexity of a hash table can degrade to $O(n)$ when hash collisions are severe. When the hash function is well-designed, the capacity is set appropriately, and collisions are evenly distributed, the time complexity is $O(1)$. We usually consider the time complexity to be $O(1)$ when using built-in hash tables in programming languages. + +**Q**: Why not use the hash function $f(x) = x$? This would eliminate collisions. + +Under the hash function $f(x) = x$, each element corresponds to a unique bucket index, which is equivalent to an array. However, the input space is usually much larger than the output space (array length), so the last step of a hash function is often to take the modulo of the array length. In other words, the goal of a hash table is to map a larger state space to a smaller one while providing $O(1)$ query efficiency. + +**Q**: Why can hash tables be more efficient than arrays, linked lists, or binary trees, even though they are implemented using these structures? + +Firstly, hash tables have higher time efficiency but lower space efficiency. A significant portion of memory in hash tables remains unused. + +Secondly, they are only more efficient in specific use cases. If a feature can be implemented with the same time complexity using an array or a linked list, it's usually faster than using a hash table. This is because the computation of the hash function incurs overhead, making the constant factor in the time complexity larger. + +Lastly, the time complexity of hash tables can degrade. For example, in chaining, we perform search operations in a linked list or red-black tree, which still risks degrading to $O(n)$ time. + +**Q**: Does multiple hashing also have the flaw of not being able to delete elements directly? Can space marked as deleted be reused? + +Multiple hashing is a form of open addressing, and all open addressing methods have the drawback of not being able to delete elements directly; they require marking elements as deleted. Marked spaces can be reused. When inserting new elements into the hash table, and the hash function points to a position marked as deleted, that position can be used by the new element. This maintains the probing sequence of the hash table while ensuring efficient use of space. + +**Q**: Why do hash collisions occur during the search process in linear probing? + +During the search process, the hash function points to the corresponding bucket and key-value pair. If the `key` doesn't match, it indicates a hash collision. Therefore, linear probing will search downwards at a predetermined step size until the correct key-value pair is found or the search fails. + +**Q**: Why can resizing a hash table alleviate hash collisions? + +The last step of a hash function often involves taking the modulo of the array length $n$, to keep the output within the array index range. When resizing, the array length $n$ changes, and the indices corresponding to the keys may also change. Keys that were previously mapped to the same bucket might be distributed across multiple buckets after resizing, thereby mitigating hash collisions. diff --git a/docs-en/chapter_stack_and_queue/deque.md b/docs-en/chapter_stack_and_queue/deque.md index 5b08cbff7..3129f7f17 100644 --- a/docs-en/chapter_stack_and_queue/deque.md +++ b/docs-en/chapter_stack_and_queue/deque.md @@ -340,7 +340,7 @@ Similarly, we can directly use the double-ended queue classes implemented in pro ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
diff --git a/docs-en/chapter_stack_and_queue/index.md b/docs-en/chapter_stack_and_queue/index.md index 04c2f508b..9668fdaa2 100644 --- a/docs-en/chapter_stack_and_queue/index.md +++ b/docs-en/chapter_stack_and_queue/index.md @@ -13,9 +13,9 @@ icon: material/stack-overflow !!! abstract - Stacks are like stacking cats, while queues are like cats lining up. + A stack is like cats placed on top of each other, while a queue is like cats lined up one by one. - They respectively represent the logical relationships of Last-In-First-Out (LIFO) and First-In-First-Out (FIFO). + They represent the logical relationships of Last-In-First-Out (LIFO) and First-In-First-Out (FIFO), respectively. ## Chapter Contents diff --git a/docs-en/chapter_stack_and_queue/queue.md b/docs-en/chapter_stack_and_queue/queue.md index 6d1e0f3c7..514908285 100755 --- a/docs-en/chapter_stack_and_queue/queue.md +++ b/docs-en/chapter_stack_and_queue/queue.md @@ -318,7 +318,7 @@ We can directly use the ready-made queue classes in programming languages: ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
@@ -1212,7 +1212,7 @@ Below is the code for implementing a queue using a linked list: } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
@@ -2128,7 +2128,7 @@ For a circular array, `front` or `rear` needs to loop back to the start of the a } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
diff --git a/docs-en/chapter_stack_and_queue/stack.md b/docs-en/chapter_stack_and_queue/stack.md index 45ee29c7a..d8498903f 100755 --- a/docs-en/chapter_stack_and_queue/stack.md +++ b/docs-en/chapter_stack_and_queue/stack.md @@ -312,7 +312,7 @@ Typically, we can directly use the stack class built into the programming langua ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
@@ -1084,7 +1084,7 @@ Below is an example code for implementing a stack based on a linked list: } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
@@ -1695,7 +1695,7 @@ Since the elements to be pushed onto the stack may continuously increase, we can } ``` -??? pythontutor "Visualizing Code" +??? pythontutor "Code Visualization"
diff --git a/docs/chapter_hashing/hash_algorithm.md b/docs/chapter_hashing/hash_algorithm.md index 8ed1a7fcb..a7b901cb2 100644 --- a/docs/chapter_hashing/hash_algorithm.md +++ b/docs/chapter_hashing/hash_algorithm.md @@ -841,7 +841,6 @@ $$ let mut dec_hasher = DefaultHasher::new(); dec.to_bits().hash(&mut dec_hasher); let hash_dec = dec_hasher.finish(); - println!("小数 {} 的哈希值为 {}", dec, hash_dec); // 小数 3.14159 的哈希值为 2566941990314602357 let str = "Hello 算法"; diff --git a/docs/chapter_sorting/selection_sort.md b/docs/chapter_sorting/selection_sort.md index 7fafc6e6c..343f0fc54 100644 --- a/docs/chapter_sorting/selection_sort.md +++ b/docs/chapter_sorting/selection_sort.md @@ -241,6 +241,9 @@ comments: true ```rust title="selection_sort.rs" /* 选择排序 */ fn selection_sort(nums: &mut [i32]) { + if nums.is_empty() { + return; + } let n = nums.len(); // 外循环:未排序区间为 [i, n-1] for i in 0..n-1 { diff --git a/docs/index.md b/docs/index.md index 73e2818e3..831b3b287 100644 --- a/docs/index.md +++ b/docs/index.md @@ -93,14 +93,14 @@ hide: -
+
hello-algo-typing-svg -

- 动画图解、一键运行的数据结构与算法教程 -