对Java中String类的忽略大小写比较器(CaseInsensitiveComparator)的compare方法的一点疑问

最近我在看jdk源码,无意中看到String类的忽略大小写比较器的源码,其源码如下。

    private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if (c1 != c2) {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if (c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if (c1 != c2) {
                            // No overflow because of numeric promotion
                            return c1 - c2;
                        }
                    }
                }
            }
            return n1 - n2;
        }

        /** Replaces the de-serialized object. */
        private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
    }

该代码中间首先将c1和c2先转换成大写字母进行比较,如果不相等然后再转换成小写字母进行比较,我当时认为转换成小写字母比较是多余的,因为转换成大写字母都不相等,那么转换成小写字母肯定也是不相等的,就发给了几个朋友,与朋友进行探讨。

对Java中String类的忽略大小写比较器(CaseInsensitiveComparator)的compare方法的一点疑问

最后在String类的public boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)方法的源码中发现了一段非常有用的注释,内容如下:

// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion.  So we need to make one last check before exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
}

原来Java为了针对Georgian(格鲁吉亚)字母表奇怪的大小写转换规则而专门又增加了一步判断,就是转换成小写再比较一次,Java的国际化真的做的好,又增长了知识。

欢迎留言共同学习进步!