提高写入文件的速度

问题描述:

我有一个程序在写完输出时写入输出,并且一个特定的文件需要很长时间,而且我想知道是否可以采取一些措施来提高速度。提高写入文件的速度

这个文件最终被25个MBS或更多 它大约有17000行,每行有大约500场

它的工作方式是:

procedure CWaitList.WriteData(AFile : string; AReplicat : integer; AllFields : Boolean); 
var 
    fout : TextFile; 
    idx, ndx : integer; 
    MyPat : CPatientItem; 
begin 
    ndx := FList.Count - 1; 
    AssignFile(fout, AFile); 
    Append(fout); 
    for idx := 0 to ndx do 
    begin 
     MyPat := CPatientItem(FList.Objects[idx]); 
     if not Assigned(MyPat) then Continue; 
     MyPat.WriteItem(fout, AReplicat, AllFields); 
    end; 
    CloseFile(fout); 
end; 

WriteItem是获取所有的程序来自MyPat的值并将它们写入文件,并且还调用3个其他函数,这些函数也将值写入文件

因此整体而言,WriteData循环最终在1700左右,并且每行结束时具有大约500个字段

我只是还是想知道如果有什么我可以做,以提高其性能,因为它有多少数据写入

感谢

+0

您会考虑使用流而不是帕斯卡尔文件I/O? – 2011-05-24 16:27:04

+0

或TStringList与SaveTo文件()?但首先你必须测试在不写文件的情况下循环数据的速度。 – 2011-05-24 16:31:43

+0

你有没有跑分析器?它会告诉你你的程序在哪里花费时间。 – 2011-05-24 16:43:32

我的天堂”,如果它总是要花费很长的时间牛逼做了一段时间,但你应该能够设置一个更大的文本I/O缓冲是这样的:从Sysinternals的

var 
    fout : TextFile; 
    idx, ndx : integer; 
    MyPat : CPatientItem; 
    Buffer: array[0..65535] of char; // 64K - example 
begin 
    ndx := FList.Count - 1; 
    AssignFile(fout, AFile); 
    SetTextBuf(fout, Buffer); 
    Append(fout); 

使用的Process Explorer来观看输出。我想你会看到你正在写成千上万的小块。使用流式I/O,在一个I/O操作中写入的内容将显着改善。

http://live.sysinternals.com/procexp.exe

+0

TextFile输出已被缓冲,但初始缓冲区大小(如果只有128个字节)。增加内部缓冲区大小将减少在Windows文件内核中花费的时间。文本文件的实现,即使它是一个古老的技术,也使用某种流式I/O。 – 2011-05-26 06:00:14

加快一个文本正确的方法是使用SetTextBuf。并且可能在所有文件访问周围添加{$I-} .... {$I+}

var 
    TmpBuf: array[word] of byte; 

.. 
    {$I-} 
    AssignFile(fout, AFile); 
    Append(fout); 
    SetTextBuf(fOut,TmpBuf); 
    for idx := 0 to ndx do 
    begin 
     MyPat := CPatientItem(FList.Objects[idx]); 
     if not Assigned(MyPat) then Continue; 
     MyPat.WriteItem(fout, AReplicat, AllFields); 
    end; 
    if ioresult<>0 then 
    ShowMessage('Error writing file'); 
    CloseFile(fout); 
    {$I+} 
end; 

在所有情况下,旧的文件API是不被采用时下...

{$I-} .... {$I+}要还增加了周围所有的子例程将内容添加到文本文件。

我做了一些关于巨大文本文件和缓冲区创建的实验。我已经在Open Source SynCommons单元中编写了一个专门的课程,名为TTextWriter,它是面向UTF-8的。我特别用JSON生产或LOG writing以最高速度使用它。它避免了大多数临时堆分配(例如,用于从整数值转换),所以它在多线程缩放方面甚至非常出色。一些高级方法可用于从开放数组中格式化一些文本,如format()函数,但速度更快。

下面是这个类的接口:

/// simple writer to a Stream, specialized for the TEXT format 
    // - use an internal buffer, faster than string+string 
    // - some dedicated methods is able to encode any data with JSON escape 
    TTextWriter = class 
    protected 
    B, BEnd: PUTF8Char; 
    fStream: TStream; 
    fInitialStreamPosition: integer; 
    fStreamIsOwned: boolean; 
    // internal temporary buffer 
    fTempBufSize: Integer; 
    fTempBuf: PUTF8Char; 
    // [0..4] for 'u0001' four-hex-digits template, [5..7] for one UTF-8 char 
    BufUnicode: array[0..7] of AnsiChar; 
    /// flush and go to next char 
    function FlushInc: PUTF8Char; 
    function GetLength: integer; 
    public 
    /// the data will be written to the specified Stream 
    // - aStream may be nil: in this case, it MUST be set before using any 
    // Add*() method 
    constructor Create(aStream: TStream; aBufSize: integer=1024); 
    /// the data will be written to an internal TMemoryStream 
    constructor CreateOwnedStream; 
    /// release fStream is is owned 
    destructor Destroy; override; 
    /// retrieve the data as a string 
    // - only works if the associated Stream Inherits from TMemoryStream: return 
    // '' if it is not the case 
    function Text: RawUTF8; 
    /// write pending data to the Stream 
    procedure Flush; 
    /// append one char to the buffer 
    procedure Add(c: AnsiChar); overload; {$ifdef HASINLINE}inline;{$endif} 
    /// append two chars to the buffer 
    procedure Add(c1,c2: AnsiChar); overload; {$ifdef HASINLINE}inline;{$endif} 
    /// append an Integer Value as a String 
    procedure Add(Value: Int64); overload; 
    /// append an Integer Value as a String 
    procedure Add(Value: integer); overload; 
    /// append a Currency from its Int64 in-memory representation 
    procedure AddCurr64(Value: PInt64); overload; 
    /// append a Currency from its Int64 in-memory representation 
    procedure AddCurr64(const Value: Int64); overload; 
    /// append a TTimeLog value, expanded as Iso-8601 encoded text 
    procedure AddTimeLog(Value: PInt64); 
    /// append a TDateTime value, expanded as Iso-8601 encoded text 
    procedure AddDateTime(Value: PDateTime); overload; 
    /// append a TDateTime value, expanded as Iso-8601 encoded text 
    procedure AddDateTime(const Value: TDateTime); overload; 
    /// append an Unsigned Integer Value as a String 
    procedure AddU(Value: cardinal); 
    /// append a floating-point Value as a String 
    // - double precision with max 3 decimals is default here, to avoid rounding 
    // problems 
    procedure Add(Value: double; decimals: integer=3); overload; 
    /// append strings or integers with a specified format 
    // - % = #37 indicates a string, integer, floating-point, or class parameter 
    // to be appended as text (e.g. class name) 
    // - $ = #36 indicates an integer to be written with 2 digits and a comma 
    // - £ = #163 indicates an integer to be written with 4 digits and a comma 
    // - µ = #181 indicates an integer to be written with 3 digits without any comma 
    // - ¤ = #164 indicates CR+LF chars 
    // - CR = #13 indicates CR+LF chars 
    // - § = #167 indicates to trim last comma 
    // - since some of this characters above are > #127, they are not UTF-8 
    // ready, so we expect the input format to be WinAnsi, i.e. mostly English 
    // text (with chars < #128) with some values to be inserted inside 
    // - if StringEscape is false (by default), the text won't be escaped before 
    // adding; but if set to true text will be JSON escaped at writing 
    procedure Add(Format: PWinAnsiChar; const Values: array of const; 
     Escape: TTextWriterKind=twNone); overload; 
    /// append CR+LF chars 
    procedure AddCR; {$ifdef HASINLINE}inline;{$endif} 
    /// write the same character multiple times 
    procedure AddChars(aChar: AnsiChar; aCount: integer); 
    /// append an Integer Value as a 2 digits String with comma 
    procedure Add2(Value: integer); 
    /// append the current date and time, in a log-friendly format 
    // - e.g. append '20110325 19241502 ' 
    // - this method is very fast, and avoid most calculation or API calls 
    procedure AddCurrentLogTime; 
    /// append an Integer Value as a 4 digits String with comma 
    procedure Add4(Value: integer); 
    /// append an Integer Value as a 3 digits String without any added comma 
    procedure Add3(Value: integer); 
    /// append a line of text with CR+LF at the end 
    procedure AddLine(const Text: shortstring); 
    /// append a String 
    procedure AddString(const Text: RawUTF8); {$ifdef HASINLINE}inline;{$endif} 
    /// append a ShortString 
    procedure AddShort(const Text: ShortString); {$ifdef HASINLINE}inline;{$endif} 
    /// append a ShortString property name, as '"PropName":' 
    procedure AddPropName(const PropName: ShortString); 
    /// append an Instance name and pointer, as '"TObjectList(00425E68)"'+SepChar 
    // - Instance must be not nil 
    procedure AddInstanceName(Instance: TObject; SepChar: AnsiChar); 
    /// append an Instance name and pointer, as 'TObjectList(00425E68)'+SepChar 
    // - Instance must be not nil 
    procedure AddInstancePointer(Instance: TObject; SepChar: AnsiChar); 
    /// append an array of integers as CSV 
    procedure AddCSV(const Integers: array of Integer); overload; 
    /// append an array of doubles as CSV 
    procedure AddCSV(const Doubles: array of double; decimals: integer); overload; 
    /// append an array of RawUTF8 as CSV 
    procedure AddCSV(const Values: array of RawUTF8); overload; 
    /// write some data as hexa chars 
    procedure WrHex(P: PAnsiChar; Len: integer); 
    /// write some data Base64 encoded 
    // - if withMagic is TRUE, will write as '"\uFFF0base64encodedbinary"' 
    procedure WrBase64(P: PAnsiChar; Len: cardinal; withMagic: boolean); 
    /// write some #0 ended UTF-8 text, according to the specified format 
    procedure Add(P: PUTF8Char; Escape: TTextWriterKind); overload; 
    /// write some #0 ended UTF-8 text, according to the specified format 
    procedure Add(P: PUTF8Char; Len: PtrInt; Escape: TTextWriterKind); overload; 
    /// write some #0 ended Unicode text as UTF-8, according to the specified format 
    procedure AddW(P: PWord; Len: PtrInt; Escape: TTextWriterKind); overload; 
    /// append some chars to the buffer 
    // - if Len is 0, Len is calculated from zero-ended char 
    // - don't escapes chars according to the JSON RFC 
    procedure AddNoJSONEscape(P: Pointer; Len: integer=0); 
    /// append some binary data as hexadecimal text conversion 
    procedure AddBinToHex(P: Pointer; Len: integer); 
    /// fast conversion from binary data into hexa chars, ready to be displayed 
    // - using this function with Bin^ as an integer value will encode it 
    // in big-endian order (most-signignifican byte first): use it for display 
    // - up to 128 bytes may be converted 
    procedure AddBinToHexDisplay(Bin: pointer; BinBytes: integer); 
    /// add the pointer into hexa chars, ready to be displayed 
    procedure AddPointer(P: PtrUInt); 
    /// append some unicode chars to the buffer 
    // - WideCharCount is the unicode chars count, not the byte size 
    // - don't escapes chars according to the JSON RFC 
    // - will convert the Unicode chars into UTF-8 
    procedure AddNoJSONEscapeW(P: PWord; WideCharCount: integer); 
    /// append some UTF-8 encoded chars to the buffer 
    // - if Len is 0, Len is calculated from zero-ended char 
    // - escapes chars according to the JSON RFC 
    procedure AddJSONEscape(P: Pointer; Len: PtrInt=0); overload; 
    /// append some UTF-8 encoded chars to the buffer, from a generic string type 
    // - faster than AddJSONEscape(pointer(StringToUTF8(string)) 
    // - if Len is 0, Len is calculated from zero-ended char 
    // - escapes chars according to the JSON RFC 
    procedure AddJSONEscapeString(const s: string); {$ifdef UNICODE}inline;{$endif} 
    /// append some Unicode encoded chars to the buffer 
    // - if Len is 0, Len is calculated from zero-ended widechar 
    // - escapes chars according to the JSON RFC 
    procedure AddJSONEscapeW(P: PWord; Len: PtrInt=0); 
    /// append an open array constant value to the buffer 
    // - "" will be added if necessary 
    // - escapes chars according to the JSON RFC 
    // - very fast (avoid most temporary storage) 
    procedure AddJSONEscape(const V: TVarRec); overload; 
    /// append a dynamic array content as UTF-8 encoded JSON array 
    // - expect a dynamic array TDynArray wrapper as incoming parameter 
    // - TIntegerDynArray, TInt64DynArray, TCardinalDynArray, TDoubleDynArray, 
    // TCurrencyDynArray, TWordDynArray and TByteDynArray will be written as 
    // numerical JSON values 
    // - TRawUTF8DynArray, TWinAnsiDynArray, TRawByteStringDynArray, 
    // TStringDynArray, TWideStringDynArray, TSynUnicodeDynArray, TTimeLogDynArray, 
    // and TDateTimeDynArray will be written as escaped UTF-8 JSON strings 
    // (and Iso-8601 textual encoding if necessary) 
    // - any other kind of dynamic array (including array of records) will be 
    // written as Base64 encoded binary stream, with a JSON_BASE64_MAGIC prefix 
    // (UTF-8 encoded \uFFF0 special code) 
    // - examples: '[1,2,3,4]' or '["\uFFF0base64encodedbinary"]' 
    procedure AddDynArrayJSON(const DynArray: TDynArray); 
    /// append some chars to the buffer in one line 
    // - P should be ended with a #0 
    // - will write #1..#31 chars as spaces (so content will stay on the same line) 
    procedure AddOnSameLine(P: PUTF8Char); overload; 
    /// append some chars to the buffer in one line 
    // - will write #0..#31 chars as spaces (so content will stay on the same line) 
    procedure AddOnSameLine(P: PUTF8Char; Len: PtrInt); overload; 
    /// append some wide chars to the buffer in one line 
    // - will write #0..#31 chars as spaces (so content will stay on the same line) 
    procedure AddOnSameLineW(P: PWord; Len: PtrInt); 
    /// serialize as JSON the given object 
    // - this default implementation will write null, or only write the 
    // class name and pointer if FullExpand is true - use TJSONSerializer. 
    // WriteObject method for full RTTI handling 
    // - default implementation will write TList/TCollection/TStrings/TRawUTF8List 
    // as appropriate array of class name/pointer (if FullExpand=true) or string 
    procedure WriteObject(Value: TObject; HumanReadable: boolean=false; 
     DontStoreDefault: boolean=true; FullExpand: boolean=false); virtual; 
    /// the last char appended is canceled 
    procedure CancelLastChar; {$ifdef HASINLINE}inline;{$endif} 
    /// the last char appended is canceled if it was a ',' 
    procedure CancelLastComma; {$ifdef HASINLINE}inline;{$endif} 
    /// rewind the Stream to the position when Create() was called 
    procedure CancelAll; 
    /// count of add byte to the stream 
    property TextLength: integer read GetLength; 
    /// the internal TStream used for storage 
    property Stream: TStream read fStream write fStream; 
    end; 

正如你所看到的,甚至有一些系列化可用的,并且CancelLastComma/CancelLastChar方法是从循环产生快速JSON或CSV数据非常有用。

关于速度和时序,这个例程比我的磁盘访问要快,大约是100 MB/s。我认为在TMemoryStream而不是TFileStream中附加数据时,它可以达到500 MB/s左右。

+0

嗨,好像使用缓冲区不会加速它。我会尝试使用TFileStream – KingKong 2011-05-24 18:18:57

+0

天真使用TFileStream不会帮助。你也需要缓冲这些。你的磁盘可以更快吗? – 2011-05-24 23:04:06

+0

@大卫你是完全正确的。 TFileStream只是Windows文件API的一个包装,因此每次调用Write()时添加一些小内容时速度会很慢。缓冲是一个关键。另一种可能性应该是使用TMemoryStream然后SaveToFile(在Delphi 6/7下TMemoryStream使用慢的GlobalAlloc API - 不要使用它)。这正是我们的'TTextWriter'类所做的。当与大缓冲区一起使用时,FileText函数速度很快。瓶颈应该在子例程中,而不是用于将数据附加到文本内容的技术。 – 2011-05-25 05:12:58

当我工作的一个归档包,我注意到一个性能提升,当我写的每512个字节,这是磁盘扇区的默认大小的块。请注意,磁盘扇区的大小和文件系统块的大小是两回事!有WinAPI功能,这将得到您的分区的块大小 - 看看here

我建议切换到TFileStream的或内存流,而不是老式的文件I/O。如果使用TFileStream,则可以根据估计的需要设置文件的大小,而不是让程序搜索每个写入使用的下一个空白块。然后可以根据需要扩展它或截断它。如果您使用TMemoryStream - 将数据保存并使用SaveToFile() - 则整个事件将一次从内存写入文件。这应该会加快你的速度。

我怀疑的写作时间是没有问题的。例程的耗时部分是流出500个字段。你可以用等价长度的常量字符串替换字段流式智商。我会保证这会更快。所以,为了优化例程,您需要优化字段流,而不是实际的写!