在PHP中读取大型CSV文件
您可能想看看流式传输csv文件。发送启动文件位置,起始位置和字节数改为得到paramters到ProgressiveReader.php
class NoFileFoundException extends Exception {
function __toString() {
return '<h1><b>ERROR:</b> could not find ('
.$this->getMessage().
') please check your settings.</h1>';
}
}
class NoFileOpenException extends Exception {
function __toString() {
return '<h1><b>ERROR:</b> could not open ('
.$this->getMessage().
') please check your settings.</h1>';
}
}
interface Reader {
function setFileName($fName);
function open();
function setBufferOffset($offset);
function bufferSize();
function isOffset();
function setPacketSize($size);
function read();
function isEOF();
function close();
function readAll();
}
class ProgressiveReader implements Reader {
private $fName;
private $fileHandler;
private $offset = 0;
private $packetSize = 0;
public function setFileName($fName) {
$this->fName = $fName;
if(!file_exists($this->fName)) {
throw new NoFileFoundException($this->fName);
}
}
public function open() {
try {
$this->fileHandler = fopen($this->fName, 'rb');
}
catch (Exception $e) {
throw new NoFileOpenException($this->fName);
}
fseek($this->fileHandler, $this->offset);
}
public function setBufferOffset($offset) {
$this->offset = $offset;
}
public function bufferSize() {
return filesize($this->fName) - (($this->offset > 0) ? ($this->offset + 1) : 0);
}
public function isOffset() {
if($this->offset === 0) {
return false;
}
return true;
}
public function setPacketSize($size) {
$this->packetSize = $size;
}
public function read() {
return fread($this->fileHandler, $this->packetSize);
}
public function isEOF() {
return feof($this->fileHandler);
}
public function close() {
if($this->fileHandler) {
fclose($this->fileHandler);
}
}
public function readAll() {
return fread($this->fileHandler, filesize($this->fName));
}
}
下面是单元测试:
require_once 'PHPUnit/Framework.php';
require_once dirname(__FILE__).'/../ProgressiveReader.php';
class ProgressiveReaderTest extends PHPUnit_Framework_TestCase {
protected $reader;
private $fp;
private $fname = "Test.txt";
protected function setUp() {
$this->createTestFile();
$this->reader = new ProgressiveReader();
}
protected function tearDown() {
$this->reader->close();
}
public function test_isValidFile() {
$this->reader->setFileName($this->fname);
}
public function test_isNotValidFile() {
try {
$this->reader->setFileName("nothing.tada");
}
catch (Exception $e) {
return;
}
$this->fail();
}
public function test_isFileOpen() {
$this->reader->setFileName($this->fname);
$this->reader->open();
}
public function test_couldNotOpenFile() {
$this->reader->setFileName($this->fname);
try {
$this->deleteTestFile();
$this->reader->open();
}
catch (Exception $e) {
return;
}
$this->fail();
}
public function test_bufferSizeZeroOffset() {
$this->reader->setFileName($this->fname);
$this->reader->open();
$this->assertEquals($this->reader->bufferSize(), 12);
}
public function test_bufferSizeTwoOffset() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->reader->open();
$this->assertEquals($this->reader->bufferSize(), 9);
}
public function test_readBuffer() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(0);
$this->reader->setPacketSize(1);
$this->reader->open();
$this->assertEquals($this->reader->read(), "T");
}
public function test_readBufferWithOffset() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->reader->setPacketSize(1);
$this->reader->open();
$this->assertEquals($this->reader->read(), "S");
}
public function test_readSuccesive() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(0);
$this->reader->setPacketSize(6);
$this->reader->open();
$this->assertEquals($this->reader->read(), "TEST1\n");
$this->assertEquals($this->reader->read(), "TEST2\n");
}
public function test_readEntireBuffer() {
$this->reader->setFileName($this->fname);
$this->reader->open();
$this->assertEquals($this->reader->readAll(), "TEST1\nTEST2\n");
}
public function test_isNotEOF() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->reader->setPacketSize(1);
$this->reader->open();
$this->assertFalse($this->reader->isEOF());
}
public function test_isEOF() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(0);
$this->reader->setPacketSize(15);
$this->reader->open();
$this->reader->read();
$this->assertTrue($this->reader->isEOF());
}
public function test_isOffset() {
$this->reader->setFileName($this->fname);
$this->reader->setBufferOffset(2);
$this->assertTrue($this->reader->isOffset());
}
public function test_isNotOffset() {
$this->reader->setFileName($this->fname);
$this->assertFalse($this->reader->isOffset());
}
private function createTestFile() {
$this->fp = fopen($this->fname, "wb");
fwrite($this->fp, "TEST1\n");
fwrite($this->fp, "TEST2\n");
flush();
fclose($this->fp);
}
private function deleteTestFile() {
if(file_exists($this->fname)) {
unlink($this->fname);
}
}
}
您可以直接连接到数据库服务器吗?
如果是这样,我会考虑使用像SQLyog第三方程序来导入您的csv。
你也可以上传文件,并使用mysql外壳直接导入数据:
LOAD DATA INFILE '/path/to/your_file.csv' INTO TABLE table_name FIELDS TERMINATED BY ',';
您的脚本可能花费的时间太长,它被终止。
您应该在php.ini中查找max_execution_time指令并将其设置为适合您的值。
默认的max_execution_time设置为30秒,所以你的脚本可能会被终止。
如果您还有脚本需要及时进行限制,您可以通过调用set_time_init()来单独执行该脚本;
您是否尝试过使用bash/shell(如果您在linux上)将您的csv导入到mysql中?你也可以使用ruby或者perl或者whatnot,因为我认为你应该使用它来代替php(或任何web应用程序)来导入文件。
此读取整个CSV文件到一个数组
所有50000+行?
通过逐行读取(fgets()),然后将每个(需要的)行添加到数组,从PHP开始读取文件的所需块;你可以用fgetcsv()获得该行的数组。
编辑:我不知道确切的细节,但我觉得将所有内容读入数据结构的成本比读取我们需要的更多。
我会建议使用快速MySQL的LOAD DATA INFILE命令:
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
如果这不是一种选择,你可能分裂CSV文件(假设访问shell)。
呸!忽略这个答案。是重复的。见Scorchio上面提到的fgetcsv()。
你有没有尝试过像'ini_set(“max_execution_time”,0)''这样的最大执行时间? – robjmills 2010-05-26 15:46:06
几个问题: - 你如何将文件导入数据库? - 您是在导入之前上传文件还是实时读取文件? – allnightgrocery 2010-05-26 15:48:49