Merge branch 'ab/push-cas-doc-n-test'
[git/git.git] / Documentation / technical / api-hashmap.txt
1 hashmap API
2 ===========
3
4 The hashmap API is a generic implementation of hash-based key-value mappings.
5
6 Data Structures
7 ---------------
8
9 `struct hashmap`::
10
11 The hash table structure. Members can be used as follows, but should
12 not be modified directly:
13 +
14 The `size` member keeps track of the total number of entries (0 means the
15 hashmap is empty).
16 +
17 `tablesize` is the allocated size of the hash table. A non-0 value indicates
18 that the hashmap is initialized. It may also be useful for statistical purposes
19 (i.e. `size / tablesize` is the current load factor).
20 +
21 `cmpfn` stores the comparison function specified in `hashmap_init()`. In
22 advanced scenarios, it may be useful to change this, e.g. to switch between
23 case-sensitive and case-insensitive lookup.
24 +
25 When `disallow_rehash` is set, automatic rehashes are prevented during inserts
26 and deletes.
27
28 `struct hashmap_entry`::
29
30 An opaque structure representing an entry in the hash table, which must
31 be used as first member of user data structures. Ideally it should be
32 followed by an int-sized member to prevent unused memory on 64-bit
33 systems due to alignment.
34 +
35 The `hash` member is the entry's hash code and the `next` member points to the
36 next entry in case of collisions (i.e. if multiple entries map to the same
37 bucket).
38
39 `struct hashmap_iter`::
40
41 An iterator structure, to be used with hashmap_iter_* functions.
42
43 Types
44 -----
45
46 `int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
47
48 User-supplied function to test two hashmap entries for equality. Shall
49 return 0 if the entries are equal.
50 +
51 This function is always called with non-NULL `entry` / `entry_or_key`
52 parameters that have the same hash code. When looking up an entry, the `key`
53 and `keydata` parameters to hashmap_get and hashmap_remove are always passed
54 as second and third argument, respectively. Otherwise, `keydata` is NULL.
55
56 Functions
57 ---------
58
59 `unsigned int strhash(const char *buf)`::
60 `unsigned int strihash(const char *buf)`::
61 `unsigned int memhash(const void *buf, size_t len)`::
62 `unsigned int memihash(const void *buf, size_t len)`::
63 `unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)`::
64
65 Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
66 http://www.isthe.com/chongo/tech/comp/fnv).
67 +
68 `strhash` and `strihash` take 0-terminated strings, while `memhash` and
69 `memihash` operate on arbitrary-length memory.
70 +
71 `strihash` and `memihash` are case insensitive versions.
72 +
73 `memihash_cont` is a variant of `memihash` that allows a computation to be
74 continued with another chunk of data.
75
76 `unsigned int sha1hash(const unsigned char *sha1)`::
77
78 Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
79 for use in hash tables. Cryptographic hashes are supposed to have
80 uniform distribution, so in contrast to `memhash()`, this just copies
81 the first `sizeof(int)` bytes without shuffling any bits. Note that
82 the results will be different on big-endian and little-endian
83 platforms, so they should not be stored or transferred over the net.
84
85 `void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
86
87 Initializes a hashmap structure.
88 +
89 `map` is the hashmap to initialize.
90 +
91 The `equals_function` can be specified to compare two entries for equality.
92 If NULL, entries are considered equal if their hash codes are equal.
93 +
94 If the total number of entries is known in advance, the `initial_size`
95 parameter may be used to preallocate a sufficiently large table and thus
96 prevent expensive resizing. If 0, the table is dynamically resized.
97
98 `void hashmap_free(struct hashmap *map, int free_entries)`::
99
100 Frees a hashmap structure and allocated memory.
101 +
102 `map` is the hashmap to free.
103 +
104 If `free_entries` is true, each hashmap_entry in the map is freed as well
105 (using stdlib's free()).
106
107 `void hashmap_entry_init(void *entry, unsigned int hash)`::
108
109 Initializes a hashmap_entry structure.
110 +
111 `entry` points to the entry to initialize.
112 +
113 `hash` is the hash code of the entry.
114 +
115 The hashmap_entry structure does not hold references to external resources,
116 and it is safe to just discard it once you are done with it (i.e. if
117 your structure was allocated with xmalloc(), you can just free(3) it,
118 and if it is on stack, you can just let it go out of scope).
119
120 `void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
121
122 Returns the hashmap entry for the specified key, or NULL if not found.
123 +
124 `map` is the hashmap structure.
125 +
126 `key` is a hashmap_entry structure (or user data structure that starts with
127 hashmap_entry) that has at least been initialized with the proper hash code
128 (via `hashmap_entry_init`).
129 +
130 If an entry with matching hash code is found, `key` and `keydata` are passed
131 to `hashmap_cmp_fn` to decide whether the entry matches the key.
132
133 `void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
134
135 Returns the hashmap entry for the specified hash code and key data,
136 or NULL if not found.
137 +
138 `map` is the hashmap structure.
139 +
140 `hash` is the hash code of the entry to look up.
141 +
142 If an entry with matching hash code is found, `keydata` is passed to
143 `hashmap_cmp_fn` to decide whether the entry matches the key. The
144 `entry_or_key` parameter points to a bogus hashmap_entry structure that
145 should not be used in the comparison.
146
147 `void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
148
149 Returns the next equal hashmap entry, or NULL if not found. This can be
150 used to iterate over duplicate entries (see `hashmap_add`).
151 +
152 `map` is the hashmap structure.
153 +
154 `entry` is the hashmap_entry to start the search from, obtained via a previous
155 call to `hashmap_get` or `hashmap_get_next`.
156
157 `void hashmap_add(struct hashmap *map, void *entry)`::
158
159 Adds a hashmap entry. This allows to add duplicate entries (i.e.
160 separate values with the same key according to hashmap_cmp_fn).
161 +
162 `map` is the hashmap structure.
163 +
164 `entry` is the entry to add.
165
166 `void *hashmap_put(struct hashmap *map, void *entry)`::
167
168 Adds or replaces a hashmap entry. If the hashmap contains duplicate
169 entries equal to the specified entry, only one of them will be replaced.
170 +
171 `map` is the hashmap structure.
172 +
173 `entry` is the entry to add or replace.
174 +
175 Returns the replaced entry, or NULL if not found (i.e. the entry was added).
176
177 `void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
178
179 Removes a hashmap entry matching the specified key. If the hashmap
180 contains duplicate entries equal to the specified key, only one of
181 them will be removed.
182 +
183 `map` is the hashmap structure.
184 +
185 `key` is a hashmap_entry structure (or user data structure that starts with
186 hashmap_entry) that has at least been initialized with the proper hash code
187 (via `hashmap_entry_init`).
188 +
189 If an entry with matching hash code is found, `key` and `keydata` are
190 passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
191 +
192 Returns the removed entry, or NULL if not found.
193
194 `void hashmap_disallow_rehash(struct hashmap *map, unsigned value)`::
195
196 Disallow/allow automatic rehashing of the hashmap during inserts
197 and deletes.
198 +
199 This is useful if the caller knows that the hashmap will be accessed
200 by multiple threads.
201 +
202 The caller is still responsible for any necessary locking; this simply
203 prevents unexpected rehashing. The caller is also responsible for properly
204 sizing the initial hashmap to ensure good performance.
205 +
206 A call to allow rehashing does not force a rehash; that might happen
207 with the next insert or delete.
208
209 `void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
210 `void *hashmap_iter_next(struct hashmap_iter *iter)`::
211 `void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
212
213 Used to iterate over all entries of a hashmap. Note that it is
214 not safe to add or remove entries to the hashmap while
215 iterating.
216 +
217 `hashmap_iter_init` initializes a `hashmap_iter` structure.
218 +
219 `hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
220 more entries.
221 +
222 `hashmap_iter_first` is a combination of both (i.e. initializes the iterator
223 and returns the first entry, if any).
224
225 `const char *strintern(const char *string)`::
226 `const void *memintern(const void *data, size_t len)`::
227
228 Returns the unique, interned version of the specified string or data,
229 similar to the `String.intern` API in Java and .NET, respectively.
230 Interned strings remain valid for the entire lifetime of the process.
231 +
232 Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
233 strings / data must not be modified or freed.
234 +
235 Interned strings are best used for short strings with high probability of
236 duplicates.
237 +
238 Uses a hashmap to store the pool of interned strings.
239
240 Usage example
241 -------------
242
243 Here's a simple usage example that maps long keys to double values.
244 ------------
245 struct hashmap map;
246
247 struct long2double {
248 struct hashmap_entry ent; /* must be the first member! */
249 long key;
250 double value;
251 };
252
253 static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
254 {
255 return !(e1->key == e2->key);
256 }
257
258 void long2double_init(void)
259 {
260 hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
261 }
262
263 void long2double_free(void)
264 {
265 hashmap_free(&map, 1);
266 }
267
268 static struct long2double *find_entry(long key)
269 {
270 struct long2double k;
271 hashmap_entry_init(&k, memhash(&key, sizeof(long)));
272 k.key = key;
273 return hashmap_get(&map, &k, NULL);
274 }
275
276 double get_value(long key)
277 {
278 struct long2double *e = find_entry(key);
279 return e ? e->value : 0;
280 }
281
282 void set_value(long key, double value)
283 {
284 struct long2double *e = find_entry(key);
285 if (!e) {
286 e = malloc(sizeof(struct long2double));
287 hashmap_entry_init(e, memhash(&key, sizeof(long)));
288 e->key = key;
289 hashmap_add(&map, e);
290 }
291 e->value = value;
292 }
293 ------------
294
295 Using variable-sized keys
296 -------------------------
297
298 The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
299 `hashmap_entry` structure as key to find the correct entry. If the key data is
300 variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
301 to create a full-fledged entry structure on the heap and copy all the key data
302 into the structure.
303
304 In this case, the `keydata` parameter can be used to pass
305 variable-sized key data directly to the comparison function, and the `key`
306 parameter can be a stripped-down, fixed size entry structure allocated on the
307 stack.
308
309 See test-hashmap.c for an example using arbitrary-length strings as keys.