关于python源码字符编码的定义

运行如下Python打印语句:
print u'I "said" do not touch “this.""'
其中包含一个中文的双引号,python解释器报错。报错信息如下:

[wangy@bogon 文档]$ python ex1.py
  File "ex1.py", line 7
SyntaxError: Non-ASCII character '\xe2' in file ex1.py on line 7, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

查看链接 http://www.python.org/peps/pep-0263.html


主要内容如下:


在Python2.1版本中,源码文件仅仅支持Latin-1,西欧国家的字符编码,从而给亚洲的编程爱
好者造成很大的困扰,必须使用“unicode-escape”编码来表示Unicode literals。


解决的方法就是为了让解释器了解源代码的编码,必须对源码文件的编码进行声明。


定义编码的方式:
Python will default to ASCII as standard encoding if no other encoding hints are given.

To define a source code encoding, a magic comment must be placed into the source 

files either as first or second line in the file, such as:


# coding=
or (using formats recognized by popular editors):


#!/usr/bin/python
# -*- coding: -*-
or:


#!/usr/bin/python
# vim: set fileencoding= :




最好使用第一种或者第二种。


文中特别提到在windows平台下,增加Unicode BOM标记在Unicode文件头,因此不需要特别声明文件编码,同理也会在UTF-8文件头增加UTF-8标记,故亦不需要声明。

如果源文件使用 both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'. Any other encoding will cause an 
error.


Examples
These are some examples to clarify the different styles for defining the source code encoding at the top of a Python source file:

With interpreter binary and using Emacs style file encoding comment:

#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...


#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys
...

#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys
...
Without interpreter line, using plain text:
# This Python file uses the following encoding: utf-8
import os, sys
...
Text editors might have different ways of defining the file's encoding, e.g.:
#!/usr/local/bin/python
# coding: latin-1
import os, sys
...
Without encoding comment, Python's parser will assume ASCII text:
#!/usr/local/bin/python
import os, sys
...
Encoding comments which don't work:
Missing "coding:" prefix:
#!/usr/local/bin/python
# latin-1
import os, sys
...
Encoding comment not on line 1 or 2:
#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys
...
Unsupported encoding:
#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys
...

修改源代码,以UTF-8保存,编辑器使用了Linux下的gedit
# -*- coding: utf-8 -*-
print "hello world!"
print "hello Again"
print "I like trying this"
print "This is fun"
print 'Yay! Printing'
print "I'd much rather you 'not'."
print u'I "said" 这里有中文双引号 “this.""'

正常打印