Itext-Pdf2data的使用

Pdf2data提供了模板式组件去处理你需要的pdf表格，即通过在线定制好模板解析区域，然后上传符合模板识别的pdf文件即可解析返回你需要的文本了。可集成到java代码中，前提要购买许可证,很可惜，若你或你所在公司已购买itext的使用许可，只需再添加此组件的购买即可。

API文档:https://api.itextpdf.com/pdf2Data/java/latest/
官网:https://itextpdf.com/de/products/itext-7/pdf2data

依赖

<!--pdf2data-->
<!-- <dependency>
     <groupId>com.duallab.pdf2data</groupId>
     <artifactId>pdf2data</artifactId>
     <version>2.1.7</version>
 </dependency>-->

java应用代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14

// Make sure to load license file before invoking any code 加载许可证
LicenseKey.loadLicenseFile(pathToLicenseFile);

// Parse template into an object that will be used later on 模板地址 api中提供 inputstream形式，也可直接指定路径
Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate);

// Create an instance of Pdf2DataExtractor for the parsed template 创建带模板的实例
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);

// Feed file to be parsed against the template. Can be called multiple times for different files 要被识别的文pdf
ParsingResult result = extractor.recognize(pathToFileToParse);

// Save result to XML or explore the ParsingResult object to fetch information programmatically 保存到xml
result.saveToXML(pathToOutXmlFile);

试用首页

Itext-Pdf2data的使用

Itext-Pdf2data的使用

相关推荐