Django的编码错误从CSV
读书时当我尝试运行:Django的编码错误从CSV
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
我的大部分数据的获取在数据库中创建,除了一个特定的行。当我的脚本达到此行,我收到错误:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
在CSV特定行导致此错误是:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
我已经看过了其他类似的线程#1与相同或相似的问题,但大多数并不特定于在Django中使用Sqlite。有什么建议?
如果很重要,我通过调用python manage.py shell
进入Django shell并复制粘贴它来运行脚本,而不是从命令行调用脚本。
这是堆栈跟踪我得到:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
编辑:我决定与这个词条只是手动导入到我的数据库,而不是试图从我的CSV,根据阿拉斯泰尔·麦科马克的反馈
阅读Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
编码球员的名字以UTF-8的播放器名称中使用.encode('utf-8')
导入CSV
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
当我添加编码时,出现错误:'UnicodeDecodeError:'ascii'编解码器无法解码位置2中的字节0xcc:序号不在范围内(128)。' – Konrad
这是因为该文件已经是8位编码。 '。编码()'在这里没有意义 –
我怀疑你正在使用Python 2 - open()
返回str,它们只是字节串。
错误是告诉你,你需要将解码为你的文本在使用前转换为Unicode字符串。
最简单的方法是将每个单元进行解码:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
这会工作,但它的丑陋加解码无处不在,它不会在Python工作3 Python 3中通过以文本方式打开文件改进的东西并返回Py2中相当于Unicode字符串的Python 3字符串。
要在Python 2中获得相同的功能,请使用io
模块。这给你一个open()
方法,它有一个encoding
选项。烦人,Python的2.x的CSV模块使用Unicode坏了,所以你需要安装一个回迁版本:
pip install backports.csv
整理你的代码,面向未来的,这样做:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
的Python 2。 x或3.x? –
Python 2.x,但这是一个新项目,所以如果切换到3.x将使我的生活更轻松,我会这样做。 – Konrad