tesseract识别验证码 ---java

在目前测试中发现一些有意思的事就是不同的tesserat-ocr 与 tess4j 与 jna 版本之间如果搭配不当, 会出现无法识别的错误, 最终异常会提示为找不到模块等位置错误

所以在进行java代码前我会先写清楚可以使用之间的版本搭配

tesseract_3.0.3 搭配 tess4j 2.0.1 或者 3.2.1 搭配 jna 4.2.1 是可以正常使用的

废话不多直接上图

网盘tesseract 文件地址版本信息亲测过 4.0 用以上的搭配不能使用原因未知

tesseract识别验证码 ---java

java代码

public class TestOCR {

    /**
     *
     * @param srImage 图片路径
     * @param ZH_CN 是否使用中文训练库,true-是
     * @return 识别结果
     */
    public static String FindOCR(String srImage, boolean ZH_CN) {
        try {
            System.out.println("start");
            double start=System.currentTimeMillis();
            File imageFile = new File(srImage);
            if (!imageFile.exists()) {
                return "图片不存在";
            }
            BufferedImage textImage = ImageIO.read(imageFile);
            Tesseract instance=Tesseract.getInstance();
            instance.setDatapath("D:\\RPA\\tesseract\\tessdata");//设置训练库
            if (ZH_CN){
                instance.setLanguage("chi_sim");//中文识别
            }
            String result = null;
            result = instance.doOCR(textImage);
            double end=System.currentTimeMillis();
            System.out.println("耗时"+(end-start)/1000+" s");
            return result;
        } catch (Exception e) {
            e.printStackTrace();
            return "发生未知错误";
        }
    }
    public static void main(String[] args) throws Exception {
        String picPath = "D:\\Test\\YZM\\resprotity\\download\\";
        String result = null;
        for(int i=1; i<21; i++){
            String path = picPath + i + ".jpg";
            result = FindOCR(path,false);
            System.out.println("path-->" + path + "<===> value --->" + result);
        }

    }
}

tesseract识别验证码 ---java

相关推荐