您的位置: 首页 > 文章 > Week 6 | Hash Tables | Princeton Algorithms

Week 6 | Hash Tables | Princeton Algorithms

分类: 文章 • 2024-05-10 11:10:10

一、Hash function

哈希函数：
均匀地打乱keys来生成表的索引。
→ 将key映射为表的索引。
→ 希望使用哈希表数组存储keys，而 f(key) 的值就是该key在数组中存储位置的索引，其中f(x)为哈希函数。

1. hashCode()

所有的java类都继承了hashCode()方法，返回一个32bits的int类型。
默认的hashCode()实现返回了该类的存储地址。

java库定义实现的hashCode()

Week 6 | Hash Tables | Princeton Algorithms
double类型的hashcode是将double这个数转换为64bits的二进制数，然后前32位与后32位。

Week 6 | Hash Tables | Princeton Algorithms
可以优化该方法，因为每次计算耗时太久，可以只计算一次储存起来，以后再计算就只取用即可。

自定义实现hashCode()：

Week 6 | Hash Tables | Princeton Algorithms
“Standard” recipe for user-defined types.
・ Combine each significant field using the 31x + y rule.
・ If field is a primitive type, use wrapper type hashCode().
・ If field is null, return 0.
・ If field is a reference type, use hashCode().
・ If field is an array, apply to each entry. or use Arrays.deepHashCode()

2. hash function

Week 6 | Hash Tables | Princeton Algorithms

二、collision resolution

当两个key被映射到同一个哈希表的索引怎么办？

1. 链表

Week 6 | Hash Tables | Princeton Algorithms

M是哈希表（数组）的长度。N为输入数据（键值对）的总个数。M太大，会导致很多空间浪费；M太小，会导致重叠太多，链表太长，时间太慢。一般取M = N/5。

2. 数组

当一个新的key与之前的key存储索引冲突，就找到下一个不为空的索引，将新key放在那里。

Hash. Map key to integer i between 0 and M-1.
Insert. Put at table index i if free; if not try i+1, i+2, etc.
Search. Search table index i; if occupied but no match, try i+1, i+2, etc.
Note. Array size M must be greater than number of key-value pairs N

Week 6 | Hash Tables | Princeton Algorithms

M为哈希表（数组的长度）。N为输入数据（键值对）的总个数。M太大，浪费空间；M太小，时间太慢（都花在遍历寻找非空索引或者遍历search上了）。M肯定要比N大，才能保证放得下。最好M = 2 N，能够实现最好性能。

三、Symbol Table实现方法总结

加入哈希表实现：

Week 6 | Hash Tables | Princeton Algorithms
如果不需要键值对有序，那么哈希表是最快的选择。

Week 6 | Hash Tables | Princeton Algorithms

四、Symble Table的应用

详见PPT

set集合
筛选词，命令行参数输入几个单词，得到txt中所有该单词的出现、或txt中所有除了这几个单词的其他单词
根据不同类型的key和value查词典
命令行参数输入一种文件类型和一个单词，返回所有此种文件类型中含该单词的文件的文件名
命令行参数输入一个文件名和一个单词，每次该单词在该文件中出现时，返回该单词的前后几个单词和他自己
矩阵相乘，当矩阵中有许多0时，可以简化点乘，而不需要N²复杂度