Python字符串处理，统一代码和美丽的汤

问题描述：

我一直在寻找解决方案来解决一个我有但是没有找到/理解的解决方案。基本上，如果我使用字符串函数（translate，strip等），我会得到Unicode错误（ascii'编解码器不能在位置y编码字符'x'：序号不在范围内（128）。但是当我尝试美丽的汤处理文本，我不明白Unicode的错误，但难易程度（我应该说，不熟悉）是相当高的，我下面的代码摘录我：Python字符串处理，统一代码和美丽的汤

...

import urllib2,sys 
import re 
import os 
import urllib 
import string 
import time 
from BeautifulSoup import BeautifulSoup,NavigableString, SoupStrainer 
from string import maketrans 
import codecs 

trantab=string.maketrans(",",";") 
... 

       html5 = urllib2.urlopen(address5).read() 
       time.sleep(1.5) 

       soup5 = BeautifulSoup(html5) 

       for company in iter(soup5.findAll(height="20px")): 
        stream = "" 
        count_detail = 1 
        for tag in iter(company.findAll('td')): 
         if count_detail > 1: 
          stream = stream + string.translate(str(tag.text),trantab) 
          if count_detail < 4 : 
           stream=stream+"," 
         count_detail = count_detail + 1 
        print str(storenum)+","+branch_name_address+","+ stream

....

该脚本运行一段时间，然后在stream = stream + string.translate(str(tag.text),trantab)

炸弹0

基本上，我只是试图在我正在处理的字段中用分号替换逗号。

此外，试图删除使用string.strip嵌入的空白/空白，但我得到类似的错误。

如何使用美丽的汤做同样的事情（只要用分号替换逗号并删除空格）？

或者如果我只是坚持字符串函数，是否有代码来解决这些麻烦的Unicode错误？

答

您正在将str对象与unicode对象混合，这导致Python解释器将一个对象强制为另一个对象。字符串/ Unicode强制转换需要一种编码，默认情况下该编码为ascii。当这个假设不成立时，你会得到这种错误。

的一般解决方案是不混合str与unicode：使用unicode到处可能的，并且使用显式和string.encode('utf8', 'strict')unicode_string.decode('utf8', 'strict')（UTF-8是一个例子）的任何转换。

在这种情况下，替换

stream = stream + string.translate(str(tag.text),trantab)

与

stream = stream + tag.text.replace(u',', u';')

Python字符串处理，统一代码和美丽的汤

相关推荐