计算大文档中的每个字的出现次数

问题描述：

我想知道如何通过使用哪种数据结构来解决这个问题..任何人都可以详细解释这个... !!我正在考虑使用树。计算大文档中的每个字的出现次数

有一个大文件。其中包含数百万字。那么如何以最佳方式计算每个单词出现次数？

在Microsoft提出了此问题...任何建议将不胜感激。

答

使用字典或哈希集合将导致O（N）平均。

为了解决它在O（N）最坏情况，一个trie具有小变化应使用：添加一个计数器在每个线索字表示;每次插入的单词已经存在时，增加其计数器。

如果您想在最后打印所有金额，可以将计数器保留在不同的列表中，并从trie中引用它，而不是将计数器存储在trie中。

轮胎考虑它的唯一字。 – Jack

答

我只是使用散列映射（或字典，因为这是Microsoft;））的字符串整数。对于输入的每个单词，如果它是新的，则将其添加到字典中，否则将其计数增加。 O（n）在输入的长度上，假设哈希映射的实现是不错的。

答

class IntValue 
{ 
    public IntValue(int value) 
    { 
     Value = value; 
    } 
    public int Value; 
} 

static void Main(string[] args) 
{ 
    //assuming document is a enumerator for the word in the document: 

    Dictionary<string, IntValue> dict = new Dictionary<string, IntValue>(); 
    foreach (string word in document) 
    { 
     IntValue intValue; 
     if(!dict.TryGetValue(word, out intValue)) 
     { 
      intValue = new IntValue(0); 
      dict.Add(word, intValue); 
     } 

     ++intValue.Value; 
    } 

    //now dict contains the counts 
}

答

树不会在这里工作。

Hashtable ht = new Hashtable(); 
// Read each word in the text in its order, for each of them: 
if (ht.contains(oneWord)) 
{ 
    Integer I = (Integer) ht.get(oneWord)); 
    ht.put(oneWord, new Integer(I.intValue()+1)); 
} 
else 
{ 
    ht.put(oneWord, new Integer(1)); 
}

为什么树不起作用？ – svick

计算大文档中的每个字的出现次数

相关推荐