来自SQLite数据库的Unicode文本似乎被破坏
问题描述:
我使用http://hackage.haskell.org/package/sqlite-0.5.2.2绑定到SQLite数据库。在* .db文件里面有UTF-8编码的文本,我可以在文本编辑器和sqlite CLI工具中保证这一点。来自SQLite数据库的Unicode文本似乎被破坏
当连接到数据库并检索数据时 - 文本内容被破坏。简单的测试如下:
import qualified Database.SQLite as SQL
import Control.Applicative ((<$>))
import System.IO
buildSkypeMessages dbh =
(go <$> (SQL.execStatement dbh "select chatname,author,timestamp,body_xml from messages order by chatname, timestamp")) >>=
writeIt
where
writeIt content = withFile "test.txt" WriteMode (\handle -> mapM_ (\(c:a:t:[]) -> hPutStrLn handle c) content)
go (Left msg) = fail msg
go (Right rows) = map f $ concat rows
where
f' (("chatname",SQL.Text chatname):
("author",SQL.Text author):
("timestamp",SQL.Int timestamp):
r) = ([chatname, author], r)
f xs = let (partEntry, (item:_)) = f' xs
in case item of
("body_xml",SQL.Text v) -> v:partEntry
("body_xml",SQL.Null) -> "":partEntry
escape (_,SQL.Text v) = v
escape (_,SQL.Null) = ""
escape (_,SQL.Int v) = show v
那里可能有什么错?我是否缺少Sqlite或Haskell I/O和编码?
答
实际上,问题与SQLite绑定无关,而是与Haskell中的字符串处理有关。有什么解决的问题 - 穿上它之前的数据手柄上调用hSetBinaryMode:
writeIt content = withFile "test.txt" WriteMode (\handle -> hSetBinaryMode handle True >> mapM_ (\(c:a:t:[]) -> hPutStrLn handle c) content)
一个地方这可能出问题是在写入文件:GHC将使用当前区域设置选择此操作的默认编码。你可以通过调用[hSetEncoding](http://hackage.haskell.org/packages/archive/base/latest/doc/html/System-IO.html#v:hSetEncoding)来测试这是否是问题。 – 2012-07-08 06:11:32
@DanielWagner我当前的语言环境是en_US.UTF-8,所以不应该如此。文本文件中的数据看起来像双编码为utf-8 – jdevelop 2012-07-08 06:34:33
@DanielWagner设置二进制模式有所帮助。谢谢! – jdevelop 2012-07-08 07:04:34