trie树

Trie树的名字有很多，比如字典树，前缀树等等。

比如“ 有一个1G大小的一个文件，里面每一行是一个词，词的大小不超过16字节，内存限制大小是1M。返回频数最高的100个词。” 该如何解决？有一种方案就是使用Trie树加排序实现。

Trie 树也就是常说的字典树，实际上相当于把单词的公共部分给拎出来，这样一层一层往上拎直到得到每个节点都是不可分的最小单元！

一：概念

下面我们有and,as,at,cn,com这些关键词，那么如何构建trie树呢？

trie树

从上面的图中，我们或多或少的可以发现一些好玩的特性。

第一：根节点不包含字符，除根节点外的每一个子节点都包含一个字符。

第二：从根节点到某一节点，路径上经过的字符连接起来，就是该节点对应的字符串。

第三：每个单词的公共前缀作为一个字符节点保存。

二：使用范围

对于单词出现的频率统计，以及查找公共前缀等问题，都可以很好的解决

第一：词频统计。

可能有人要说了，词频统计简单啊，一个hash或者一个堆就可以打完收工，但问题来了，如果内存有限呢？还能这么

玩吗？所以这里我们就可以用trie树来压缩下空间，因为公共前缀都是用一个节点保存的。

第二: 前缀匹配

就拿上面的图来说吧，如果我想获取所有以"a"开头的字符串，从图中可以很明显的看到是：and,as,at，如果不用trie树，

你该怎么做呢？很显然朴素的做法时间复杂度为O(N²) ，那么用Trie树就不一样了，它可以做到h，h为你检索单词的长度，

可以说这是秒杀的效果。

举个例子：现有一个编号为1的字符串”and“，我们要插入到trie树中，采用动态规划的思想，将编号”1“计入到每个途径的节点中，

那么以后我们要找”a“，”an“，”and"为前缀的字符串的编号将会轻而易举。

trie树

三：实际操作

public class Trie_Tree{  
       
      
    /** 
     * 内部节点类 
     * @author "zhshl" 
     * @date    2014-10-14 
     * 
     */  
    private class Node{  
        private int dumpli_num;////该字串的重复数目，  该属性统计重复次数的时候有用,取值为0、1、2、3、4、5……  
        private int prefix_num;///以该字串为前缀的字串数， 应该包括该字串本身！！！！！  
        private Node childs[];////此处用数组实现，当然也可以map或list实现以节省空间  
        private boolean isLeaf;///是否为单词节点  
        public Node(){  
            dumpli_num=0;  
            prefix_num=0;  
            isLeaf=false;  
            childs=new Node[26];  
        }  
    }     
      
      
    private Node root;///树根    
    public Trie_Tree(){  
        ///初始化trie 树  
        root=new Node();  
    }  
      
      
      
    /** 
     * 插入字串，用循环代替迭代实现 
     * @param words 
     */  
    public void insert(String words){  
        insert(this.root, words);  
    }  
    /** 
     * 插入字串，用循环代替迭代实现 
     * @param root 
     * @param words 
     */  
    private void insert(Node root,String words){  
        words=words.toLowerCase();////转化为小写  
        char[] chrs=words.toCharArray();  
          
        for(int i=0,length=chrs.length; i<length; i++){  
            ///用相对于a字母的值作为下标索引，也隐式地记录了该字母的值  
            int index=chrs[i]-'a';  
            if(root.childs[index]!=null){  
                ////已经存在了，该子节点prefix_num++  
                root.childs[index].prefix_num++;  
            }else{  
                ///如果不存在  
                root.childs[index]=new Node();  
                root.childs[index].prefix_num++;                  
            }     
              
            ///如果到了字串结尾，则做标记  
            if(i==length-1){  
                root.childs[index].isLeaf=true;  
                root.childs[index].dumpli_num++;  
            }  
            ///root指向子节点，继续处理  
            root=root.childs[index];  
        }  
          
    }  
      
      
      
      
    /** 
     * 遍历Trie树，查找所有的words以及出现次数 
     * @return HashMap<String, Integer> map 
     */  
    public HashMap<String,Integer> getAllWords(){  
//      HashMap<String, Integer> map=new HashMap<String, Integer>();  
              
        return preTraversal(this.root, "");  
    }  
      
    /** 
     * 前序遍历。。。 
     * @param root      子树根节点 
     * @param prefixs   查询到该节点前所遍历过的前缀 
     * @return 
     */  
    private  HashMap<String,Integer> preTraversal(Node root,String prefixs){  
        HashMap<String, Integer> map=new HashMap<String, Integer>();  
          
        if(root!=null){  
              
            if(root.isLeaf==true){  
            ////当前即为一个单词  
                map.put(prefixs, root.dumpli_num);  
            }  
              
            for(int i=0,length=root.childs.length; i<length;i++){  
                if(root.childs[i]!=null){  
                    char ch=(char) (i+'a');  
                    ////递归调用前序遍历  
                    String tempStr=prefixs+ch;  
                    map.putAll(preTraversal(root.childs[i], tempStr));  
                }  
            }  
        }         
          
        return map;  
    }  
      
      
      
      
    /** 
     * 判断某字串是否在字典树中 
     * @param word 
     * @return true if exists ,otherwise  false  
     */  
    public boolean isExist(String word){  
        return search(this.root, word);  
    }  
    /** 
     * 查询某字串是否在字典树中 
     * @param word 
     * @return true if exists ,otherwise  false  
     */  
    private boolean search(Node root,String word){  
        char[] chs=word.toLowerCase().toCharArray();  
        for(int i=0,length=chs.length; i<length;i++){  
            int index=chs[i]-'a';  
            if(root.childs[index]==null){  
                ///如果不存在，则查找失败  
                return false;  
            }             
            root=root.childs[index];              
        }  
          
        return true;  
    }  
      
    /** 
     * 得到以某字串为前缀的字串集，包括字串本身！ 类似单词输入法的联想功能 
     * @param prefix 字串前缀 
     * @return 字串集以及出现次数，如果不存在则返回null 
     */  
    public HashMap<String, Integer> getWordsForPrefix(String prefix){  
        return getWordsForPrefix(this.root, prefix);  
    }  
    /** 
     * 得到以某字串为前缀的字串集，包括字串本身！ 
     * @param root 
     * @param prefix 
     * @return 字串集以及出现次数 
     */  
    private HashMap<String, Integer> getWordsForPrefix(Node root,String prefix){  
        HashMap<String, Integer> map=new HashMap<String, Integer>();  
        char[] chrs=prefix.toLowerCase().toCharArray();  
        ////  
        for(int i=0, length=chrs.length; i<length; i++){  
              
            int index=chrs[i]-'a';  
            if(root.childs[index]==null){  
                return null;  
            }  
              
            root=root.childs[index];  
          
        }  
        ///结果包括该前缀本身  
        ///此处利用之前的前序搜索方法进行搜索  
        return preTraversal(root, prefix);  
    }  
         
}  

相关推荐