转换中国ASCII字符串到中国语言串

问题描述：

我试图用sys模块设置默认编码字符串转换，但它不工作。转换中国ASCII字符串到中国语言串

的字符串是：

`\xd2\xe6\xc3\xf1\xba\xcb\xd0\xc4\xd4\xf6\xb3\xa4\xbb\xec\xba\xcf`

这意味着在中国益民核心增长混合。但是，我怎样才能把它转换成中文字符串？

我尝试这样：

>>> string = '\xd2\xe6\xc3\xf1\xba\xcb\xd0\xc4\xd4\xf6\xb3\xa4\xbb\xec\xba\xcf' 
>>> print string.decode("gbk") 
益民核心增长混合 # As you can see here, got the right answer 
>>> new_str = string.decode("gbk") 
>>> new_str 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408' # It returns the another encode type. 
>>> another = u"益民核心增长混合" 
>>> another 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408' # same as new_str

所以，我只是对这种情况，困惑，为什么我可以打印string.decode("gbk")但new_str在我的Python控制台刚刚返回另一个编码类型？

我的操作系统是Windows 10，我的Python版本是Python 2.7版。非常感谢你！

答

你做正确。

在这种情况下，new_str实际上是一个unicode字符串，如u前缀所示。

>>> new_str 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408' # It returns the another encode type.

当您解码GBK编码的字符串时，会得到一个unicode字符串。该字符串的每个字符都是一个unicode代码点，例如，

>>> u'\u76ca' 
u'\u76ca' 
>>> print u'\u76ca' 
益 
>>> import unicodedata 
>>> unicodedata.name(u'\u76ca') 
'CJK UNIFIED IDEOGRAPH-76CA' 

>>> print new_str 
益民核心增长混合 
>>> print repr(new_str) 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408

这是的Python如何显示在解释器中的Unicode字符串 - 它是使用repr来显示它。但是，当您打印字符串时，Python会转换为终端的编码（sys.stdout.encoding），这就是字符串按照您的预期显示的原因。

所以，这不是一个字符串的不同编码，它只是显示的Python在解释该字符串的方式。

转换中国ASCII字符串到中国语言串

相关推荐