使用BeautifullSoup修改后保留html文件结构

问题描述：

我´ m使用python和BeautifullSoup查找和替换html页面上的一些文本，而我的问题是我需要保持文件结构（缩进，空格，换行等）不变并只更改所需的元素。我怎样才能做到这一点？ str(soup)和soup.prettify()都在许多方面改变源文件。使用BeautifullSoup修改后保留html文件结构

P.S.示例代码：

 
    soup = BeautifulSoup(text) 
     for element in soup.findAll(text=True): 
      if not element.parent.name in ['style', 'script', 'head', 'title','pre']: 
       element.replaceWith(process(element)) 
    result = str(soup)

答

我想说有没有简单的方法（或根本没有办法）。从BeautifulStoneSoup的DOC：

__str__(self, encoding='utf-8', prettyPrint=False, indentLevel=0) 
    Returns a string or Unicode representation of this tag and 
    its contents. To get Unicode, pass None for encoding. 

    NOTE: since Python's HTML parser consumes whitespace, this 
    method is not certain to reproduce the whitespace present in 
    the original string.

根据说明，原来的空格都输给内部表示。

也许这可能与其他图书馆？ – 2012-02-03 19:01:29

使用BeautifullSoup修改后保留html文件结构

相关推荐