对Java中String类的忽略大小写比较器(CaseInsensitiveComparator)的compare方法的一点疑问
最近我在看jdk源码,无意中看到String类的忽略大小写比较器的源码,其源码如下。
private static class CaseInsensitiveComparator
implements Comparator<String>, java.io.Serializable {
// use serialVersionUID from JDK 1.2.2 for interoperability
private static final long serialVersionUID = 8575799808933029326L;
public int compare(String s1, String s2) {
int n1 = s1.length();
int n2 = s2.length();
int min = Math.min(n1, n2);
for (int i = 0; i < min; i++) {
char c1 = s1.charAt(i);
char c2 = s2.charAt(i);
if (c1 != c2) {
c1 = Character.toUpperCase(c1);
c2 = Character.toUpperCase(c2);
if (c1 != c2) {
c1 = Character.toLowerCase(c1);
c2 = Character.toLowerCase(c2);
if (c1 != c2) {
// No overflow because of numeric promotion
return c1 - c2;
}
}
}
}
return n1 - n2;
}
/** Replaces the de-serialized object. */
private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
}
该代码中间首先将c1和c2先转换成大写字母进行比较,如果不相等然后再转换成小写字母进行比较,我当时认为转换成小写字母比较是多余的,因为转换成大写字母都不相等,那么转换成小写字母肯定也是不相等的,就发给了几个朋友,与朋友进行探讨。
最后在String类的public boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)方法的源码中发现了一段非常有用的注释,内容如下:
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;
}
原来Java为了针对Georgian(格鲁吉亚)字母表奇怪的大小写转换规则而专门又增加了一步判断,就是转换成小写再比较一次,Java的国际化真的做的好,又增长了知识。
欢迎留言共同学习进步!