Django的编码错误从CSV

问题描述：

import csv 

with open('data.csv', 'rU') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'], 
     team=row['Team'], 
     position=row['Position'] 
    )

我的大部分数据的获取在数据库中创建，除了一个特定的行。当我的脚本达到此行，我收到错误：

ProgrammingError: You must not use 8-bit bytestrings unless you use a 
text_factory that can interpret 8-bit bytestrings (like text_factory = str). 
It is highly recommended that you instead just switch your application to Unicode strings.`

在CSV特定行导致此错误是：

>>> row 
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}

我已经看过了其他类似的线程＃1与相同或相似的问题，但大多数并不特定于在Django中使用Sqlite。有什么建议？

如果很重要，我通过调用python manage.py shell进入Django shell并复制粘贴它来运行脚本，而不是从命令行调用脚本。

这是堆栈跟踪我得到：

Traceback (most recent call last): 
    File "<console>", line 4, in <module> 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next 
    row = self.reader.next() 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode 
    (result, consumed) = self._buffer_decode(data, self.errors, final) 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte

编辑：我决定与这个词条只是手动导入到我的数据库，而不是试图从我的CSV，根据阿拉斯泰尔·麦科马克的反馈

阅读

Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.

的Python 2。 x或3.x？ –

Python 2.x，但这是一个新项目，所以如果切换到3.x将使我的生活更轻松，我会这样做。 – Konrad

答

编码球员的名字以UTF-8的播放器名称中使用.encode('utf-8') 导入CSV

with open('data.csv', 'rU') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'].encode('utf-8'), 
     team=row['Team'], 
     position=row['Position'] 
    )

当我添加编码时，出现错误：'UnicodeDecodeError：'ascii'编解码器无法解码位置2中的字节0xcc：序号不在范围内（128）。' – Konrad

这是因为该文件已经是8位编码。 '。编码（）'在这里没有意义 –

答

我怀疑你正在使用Python 2 - open()返回str，它们只是字节串。

错误是告诉你，你需要将解码为你的文本在使用前转换为Unicode字符串。

最简单的方法是将每个单元进行解码：

with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'].decode('utf-8), 
     team=row['Team'].decode('utf-8), 
     position=row['Position'].decode('utf-8) 
    )

这会工作，但它的丑陋加解码无处不在，它不会在Python工作3 Python 3中通过以文本方式打开文件改进的东西并返回Py2中相当于Unicode字符串的Python 3字符串。

要在Python 2中获得相同的功能，请使用io模块。这给你一个open()方法，它有一个encoding选项。烦人，Python的2.x的CSV模块使用Unicode坏了，所以你需要安装一个回迁版本：

pip install backports.csv

整理你的代码，面向未来的，这样做：

import io 
from backports import csv 

with io.open('data.csv', 'r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    # now every row is automatically decoded from UTF-8 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'], 
     team=row['Team'], 
     position=row['Position'] 
    )

当我添加解码时，出现此错误：UnicodeDecodeError：'utf8'编解码器无法解码位置2中的字节0xcc：无效延续字节。我要去尝试你的backports想法。 – Konrad

使用backports也不起作用。它给了我错误'UnicodeDecodeError：'utf8'编解码器无法解码位置1674中的字节0xcc：无效的连续字节在同一个麻烦的记录上。我也必须使用'从io导入打开' – Konrad

啊，我认为CSV是UTF-8编码。 CSV是什么编码？ –

Django的编码错误从CSV

相关推荐