搜索特定字的字符串。 C#
答
我的建议是一个完整的课程。
class WordCount {
const string Symbols = ",;.:-()\t!¡¿?\"[]{}&<>+-*/=#'";
public static string normalize(string str)
{
var toret = new StringBuilder();
for(int i = 0; i < str.Length; ++i) {
if (Symbols.IndexOf(str[ i ]) > -1) {
toret.Append(' ');
} else {
toret.Append(char.ToLower(str[ i ]));
}
}
return toret.ToString();
}
private string word;
public string Word {
get { return this.word; }
set { this.word = value; }
}
private string str;
public string Str {
get { return this.str; }
}
private string[] words = null;
public string[] Words {
if (this.words == null) {
this.words = this.Str.split(' ');
}
return this.words;
}
public WordCount(string str, string w)
{
this.str = ' ' + normalize(str) + ' ';
this.word = w;
}
public int Times()
{
return this.Times(this.Word);
}
public int Times(string word)
{
int times = 0;
word = ' ' + word + ' ';
int wordLength = word.Length;
int pos = this.Str.IndexOf(word);
while(pos > -1) {
++times;
pos = this.Str.IndexOf(pos + wordLength, word);
}
return times;
}
public double Percentage()
{
return this.Percentage(this.Word);
}
public double Percentage(string word)
{
return (this.Times(word)/this.Words.Length);
}
}
优点:字符串分割缓存,所以没有将其应用于超过一次的危险。它包装在一个班级,所以它可以很容易地重新获得。没有Linq的必要性。 希望这有助于。
答
最简单的方法是使用LINQ:
char[] separators = new char() {' ', ',', '.', '?', '!', ':', ';'};
var count =
(from word In sentence.Split(separators) // get all the words
where word.ToLower() = searchedWord.ToLower() // find the words that match
select word).Count(); // count them
这只能算作这个词出现在文本的次数。你也可以算多少的话有于文:
var totalWords = sentence.Split(separators).Count());
,然后就得到百分比:
var result = count/totalWords * 100;
+3
有这么多的角落案例,这将错过。如果你在“一,二,三”这个句子中搜索“two”,你就不会得到任何匹配,因为split会给出元素“two”(包括逗号)。这意味着您需要考虑各种分隔符,并在分割之前将其除去(除非用户正在搜索它们)。 – 2011-02-05 12:03:14
答
我建议使用String.Equals
超载与StringComparison
获得更好的性能规定。
var separators = new [] { ' ', ',', '.', '?', '!', ';', ':', '\"' };
var words = sentence.Split (separators);
var matches = words.Count (w =>
w.Equals (searchedWord, StringComparison.OrdinalIgnoreCase));
var percentage = matches/(float) words.Count;
注意percentage
将float
,例如0.5
为50%。
var formatted = percentage.ToString ("P0"); // 0.1234 => 12 %
您还可以更改格式说明显示小数位:
var formatted = percentage.ToString ("P2"); // 0.1234 => 12.34 %
请记住,这种方法是无效的长字符串,因为
可以使用ToString
超载格式化显示它会为每个找到的单词创建一个字符串实例。您可能需要采取StringReader
并手动逐字阅读。
答
// The words you want to search for
var words = new string[] { "this", "is" };
// Build a regular expresion query
var wordRegexQuery = new System.Text.StringBuilder();
wordRegexQuery.Append("\\b(");
for (var wordIndex = 0; wordIndex < words.Length; wordIndex++)
{
wordRegexQuery.Append(words[wordIndex]);
if (wordIndex < words.Length - 1)
{
wordRegexQuery.Append('|');
}
}
wordRegexQuery.Append(")\\b");
// Find matches and return them as a string[]
var regex = new System.Text.RegularExpressions.Regex(wordRegexQuery.ToString(), RegexOptions.IgnoreCase);
var someText = var someText = "This is some text which is quite a good test of which word is used most often. Thisis isthis athisisa.";
var matches = (from Match m in regex.Matches(someText) select m.Value).ToArray();
// Display results
foreach (var word in words)
{
var wordCount = (int)matches.Count(w => w.Equals(word, StringComparison.InvariantCultureIgnoreCase));
Console.WriteLine("{0}: {1} ({2:f2}%)", word, wordCount, wordCount * 100f/matches.Length);
}
确切地说,你是指百分比? – 2011-02-05 11:55:34
我假设他意味着有多少(number_of_times_word_to_find_occurs/total_number_of_words)* 100。 – david 2011-02-05 12:13:05