Python/Mako：如何获得unicode字符串/字符正确解析？

问题描述：

我试图让马可呈现一些字符串与Unicode字符：Python/Mako：如何获得unicode字符串/字符正确解析？

tempLook=TemplateLookup(..., default_filters=[], input_encoding='utf8',output_encoding='utf-8', encoding_errors='replace') 
... 
print sys.stdout.encoding 
uname=cherrypy.session['userName'] 
print uname 
kwargs['_toshow']=uname 
... 
return tempLook.get_template(page).render(**kwargs)

的相关模板文件：

...${_toshow}...

，输出是：

UTF-8 
Deşghfkskhü 
... 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)

我不不要认为字符串本身有问题，因为我可以很好地打印它。

虽然我玩过（很多）input/output_encoding和default_filters参数，它总是抱怨无法解码/使用ascii编解码器编码。

所以我决定尝试在documentation发现的例子，以下工作的 “最佳”：

input_encoding='utf-8', output_encoding='utf-8' 
#(note : it still raised an error without output_encoding, despite tutorial not implying it)

随着

${u"voix m’a réveillé."}

而结果是

voix mâ�a rÃ©veillÃ©

我根本不明白为什么这不起作用。 “魔术编码评论”也不起作用。所有文件都使用UTF-8编码。

我花了几个小时没有用，我错过了什么吗？

~~更新：~~

我现在有一个简单的问题：

~~现在所有的变量是unicode的，我怎么能得到马可呈现unicode字符串没有应用什么？传递一个空白过滤器/ render_unicode（）并没有帮助。~~

答

是的，UTF-8！= Unicode。

UTF-8是一个特定的字符串编码，与ASCII和ISO 8859-1一样。试试这个：

对于任何输入字符串做一个inputstring.decode('utf-8')（或任何您输入的编码输入）。对于任何输出字符串做一个outputstring.encode('utf-8')（或任何你想要的输出编码）。对于任何内部使用，以Unicode字符串（'this is a normal string'.decode('utf-8') == u'this is a normal string'）

'foo'是一个字符串，u'foo'是Unicode字符串，它不“具有”的编码（不能被解码）。因此，无论python想要更改普通字符串的编码，它都会首先尝试“解码”它，然后“编码”它。默认值为“ascii”，失败的原因往往不是这样:-)

非常感谢您的澄清。 – felace 2010-09-19 11:00:27

Python/Mako：如何获得unicode字符串/字符正确解析？

相关推荐