C#从字符中删除口音?
问题描述:
可能重复:
How do I remove diacritics (accents) from a string in .NET?
How to change diacritic characters to non-diacritic onesC#从字符中删除口音?
我怎么能转换á
到a
在C#中?
例如:aéíúö
=>aeiuo
嗯,看了这些线程[我不知道他们被称为diatrics,所以我可以为无法搜索。
我想“滴”的所有diatrics但ñ
目前我有:
public static string RemoveDiacritics(this string text)
{
string normalized = text.Normalize(NormalizationForm.FormD);
var sb = new StringBuilder();
foreach (char c in from c in normalized
let u = CharUnicodeInfo.GetUnicodeCategory(c)
where u != UnicodeCategory.NonSpacingMark
select c)
{
sb.Append(c);
}
return sb.ToString().Normalize(NormalizationForm.FormC);
}
什么会留下ñ
出的最好的方法?
我的解决办法是做的foreach后执行以下操作:
var result = sb.ToString();
if (text.Length != result.Length)
throw new ArgumentOutOfRangeException();
int position = -1;
while ((position = text.IndexOf('ñ', position + 1)) > 0)
{
result = result.Remove(position, 1).Insert(position, "ñ");
}
return sb.ToString();
但是我认为还有一个不那么“手动”的方式来做到这一点?
答
如果你不想删除ñ,这是一个选项。它很快。
static string[] pats3 = { "é", "É", "á", "Á", "í", "Í", "ó", "Ó", "ú", "Ú" };
static string[] repl3 = { "e", "E", "a", "A", "i", "I", "o", "O", "u", "U" };
static Dictionary<string, string> _var = null;
static Dictionary<string, string> dict
{
get
{
if (_var == null)
{
_var = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value);
}
return _var;
}
}
private static string RemoveAccent(string text)
{
// using Zip as a shortcut, otherwise setup dictionary differently as others have shown
//var dict = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value);
//string input = "åÅæÆäÄöÖøØèÈàÀìÌõÕïÏ";
string pattern = String.Join("|", dict.Keys.Select(k => k)); // use ToArray() for .NET 3.5
string result = Regex.Replace(text, pattern, m => dict[m.Value]);
//Console.WriteLine("Pattern: " + pattern);
//Console.WriteLine("Input: " + text);
//Console.WriteLine("Result: " + result);
return result;
}
如果你想去除n,更快的选择是: Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(text));
看到这个职位:http://*.com/questions/249087/how-do-i-remove-diacritics-网络中的重音符号 – keyboardP 2011-06-08 23:01:40
它取决于底层的代码点。 http://unicode.org/faq/char_combmark.html – Tim 2011-06-08 23:03:18