在Java中解析XML文件时如何避免读取DTD?

问题描述:

我需要解析的XML文档,这与以下几行开始:在Java中解析XML文件时如何避免读取DTD?

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd"> 

<pdf2xml producer="poppler" version="0.22.0"> 
<page number="1" position="absolute" top="0" left="0" height="1263" width="892"> 
    <fontspec id="0" size="12" family="Times" color="#000000"/> 

我使用下面的代码阅读:

final DocumentBuilder builder; 
    DocumentBuilderFactory builderFactory = 
      DocumentBuilderFactory.newInstance(); 

    builder = builderFactory.newDocumentBuilder(); 

    Document document = builder.parse(
      new FileInputStream(aXmlFileName)); 

最后调用失败,以下情况除外:

Exception in thread "main" java.io.FileNotFoundException: D:\dev\ro-2014-04-13-01\pdf2xml.dtd 
    at java.io.FileInputStream.open(Native Method) 
    at java.io.FileInputStream.<init>(FileInputStream.java:146) 
    at java.io.FileInputStream.<init>(FileInputStream.java:101) 
    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90) 
    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188) 
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:613) 

文件pdf2xml.dtd实际上不存在于指定的目录中。

我该如何修改代码,以便即使没有pdf2xml.dtd也可以解析文档?

+2

你需要实现一个EntityResolver。看到这里:http://*.com/questions/155101/make-documentbuilder-parse-ignore-dtd-references – Wintermute

您需要使用Entity Resolver

myBuilder.setEntityResolver(new EntityResolver() { 
    @Override 
    public InputSource resolveEntity(String publicId, String systemId) 
      throws SAXException, IOException { 
     if (systemId.contains("pdf2xml.dtd")) { 
      return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes())); 
     } else 
      return null; 
    } 
}); 

当解析器达到条件 - “pdf2xml.dtd”,实体解析器被调用,它返回一个空的XML文档。