提高写入文件的速度
我有一个程序在写完输出时写入输出,并且一个特定的文件需要很长时间,而且我想知道是否可以采取一些措施来提高速度。提高写入文件的速度
这个文件最终被25个MBS或更多 它大约有17000行,每行有大约500场
它的工作方式是:
procedure CWaitList.WriteData(AFile : string; AReplicat : integer; AllFields : Boolean);
var
fout : TextFile;
idx, ndx : integer;
MyPat : CPatientItem;
begin
ndx := FList.Count - 1;
AssignFile(fout, AFile);
Append(fout);
for idx := 0 to ndx do
begin
MyPat := CPatientItem(FList.Objects[idx]);
if not Assigned(MyPat) then Continue;
MyPat.WriteItem(fout, AReplicat, AllFields);
end;
CloseFile(fout);
end;
WriteItem是获取所有的程序来自MyPat的值并将它们写入文件,并且还调用3个其他函数,这些函数也将值写入文件
因此整体而言,WriteData循环最终在1700左右,并且每行结束时具有大约500个字段
我只是还是想知道如果有什么我可以做,以提高其性能,因为它有多少数据写入
感谢
我的天堂”,如果它总是要花费很长的时间牛逼做了一段时间,但你应该能够设置一个更大的文本I/O缓冲是这样的:从Sysinternals的
var
fout : TextFile;
idx, ndx : integer;
MyPat : CPatientItem;
Buffer: array[0..65535] of char; // 64K - example
begin
ndx := FList.Count - 1;
AssignFile(fout, AFile);
SetTextBuf(fout, Buffer);
Append(fout);
使用的Process Explorer来观看输出。我想你会看到你正在写成千上万的小块。使用流式I/O,在一个I/O操作中写入的内容将显着改善。
TextFile输出已被缓冲,但初始缓冲区大小(如果只有128个字节)。增加内部缓冲区大小将减少在Windows文件内核中花费的时间。文本文件的实现,即使它是一个古老的技术,也使用某种流式I/O。 – 2011-05-26 06:00:14
加快一个文本正确的方法是使用SetTextBuf
。并且可能在所有文件访问周围添加{$I-} .... {$I+}
。
var
TmpBuf: array[word] of byte;
..
{$I-}
AssignFile(fout, AFile);
Append(fout);
SetTextBuf(fOut,TmpBuf);
for idx := 0 to ndx do
begin
MyPat := CPatientItem(FList.Objects[idx]);
if not Assigned(MyPat) then Continue;
MyPat.WriteItem(fout, AReplicat, AllFields);
end;
if ioresult<>0 then
ShowMessage('Error writing file');
CloseFile(fout);
{$I+}
end;
在所有情况下,旧的文件API是不被采用时下...
{$I-} .... {$I+}
要还增加了周围所有的子例程将内容添加到文本文件。
我做了一些关于巨大文本文件和缓冲区创建的实验。我已经在Open Source SynCommons单元中编写了一个专门的课程,名为TTextWriter
,它是面向UTF-8的。我特别用JSON生产或LOG writing以最高速度使用它。它避免了大多数临时堆分配(例如,用于从整数值转换),所以它在多线程缩放方面甚至非常出色。一些高级方法可用于从开放数组中格式化一些文本,如format()
函数,但速度更快。
下面是这个类的接口:
/// simple writer to a Stream, specialized for the TEXT format
// - use an internal buffer, faster than string+string
// - some dedicated methods is able to encode any data with JSON escape
TTextWriter = class
protected
B, BEnd: PUTF8Char;
fStream: TStream;
fInitialStreamPosition: integer;
fStreamIsOwned: boolean;
// internal temporary buffer
fTempBufSize: Integer;
fTempBuf: PUTF8Char;
// [0..4] for 'u0001' four-hex-digits template, [5..7] for one UTF-8 char
BufUnicode: array[0..7] of AnsiChar;
/// flush and go to next char
function FlushInc: PUTF8Char;
function GetLength: integer;
public
/// the data will be written to the specified Stream
// - aStream may be nil: in this case, it MUST be set before using any
// Add*() method
constructor Create(aStream: TStream; aBufSize: integer=1024);
/// the data will be written to an internal TMemoryStream
constructor CreateOwnedStream;
/// release fStream is is owned
destructor Destroy; override;
/// retrieve the data as a string
// - only works if the associated Stream Inherits from TMemoryStream: return
// '' if it is not the case
function Text: RawUTF8;
/// write pending data to the Stream
procedure Flush;
/// append one char to the buffer
procedure Add(c: AnsiChar); overload; {$ifdef HASINLINE}inline;{$endif}
/// append two chars to the buffer
procedure Add(c1,c2: AnsiChar); overload; {$ifdef HASINLINE}inline;{$endif}
/// append an Integer Value as a String
procedure Add(Value: Int64); overload;
/// append an Integer Value as a String
procedure Add(Value: integer); overload;
/// append a Currency from its Int64 in-memory representation
procedure AddCurr64(Value: PInt64); overload;
/// append a Currency from its Int64 in-memory representation
procedure AddCurr64(const Value: Int64); overload;
/// append a TTimeLog value, expanded as Iso-8601 encoded text
procedure AddTimeLog(Value: PInt64);
/// append a TDateTime value, expanded as Iso-8601 encoded text
procedure AddDateTime(Value: PDateTime); overload;
/// append a TDateTime value, expanded as Iso-8601 encoded text
procedure AddDateTime(const Value: TDateTime); overload;
/// append an Unsigned Integer Value as a String
procedure AddU(Value: cardinal);
/// append a floating-point Value as a String
// - double precision with max 3 decimals is default here, to avoid rounding
// problems
procedure Add(Value: double; decimals: integer=3); overload;
/// append strings or integers with a specified format
// - % = #37 indicates a string, integer, floating-point, or class parameter
// to be appended as text (e.g. class name)
// - $ = #36 indicates an integer to be written with 2 digits and a comma
// - £ = #163 indicates an integer to be written with 4 digits and a comma
// - µ = #181 indicates an integer to be written with 3 digits without any comma
// - ¤ = #164 indicates CR+LF chars
// - CR = #13 indicates CR+LF chars
// - § = #167 indicates to trim last comma
// - since some of this characters above are > #127, they are not UTF-8
// ready, so we expect the input format to be WinAnsi, i.e. mostly English
// text (with chars < #128) with some values to be inserted inside
// - if StringEscape is false (by default), the text won't be escaped before
// adding; but if set to true text will be JSON escaped at writing
procedure Add(Format: PWinAnsiChar; const Values: array of const;
Escape: TTextWriterKind=twNone); overload;
/// append CR+LF chars
procedure AddCR; {$ifdef HASINLINE}inline;{$endif}
/// write the same character multiple times
procedure AddChars(aChar: AnsiChar; aCount: integer);
/// append an Integer Value as a 2 digits String with comma
procedure Add2(Value: integer);
/// append the current date and time, in a log-friendly format
// - e.g. append '20110325 19241502 '
// - this method is very fast, and avoid most calculation or API calls
procedure AddCurrentLogTime;
/// append an Integer Value as a 4 digits String with comma
procedure Add4(Value: integer);
/// append an Integer Value as a 3 digits String without any added comma
procedure Add3(Value: integer);
/// append a line of text with CR+LF at the end
procedure AddLine(const Text: shortstring);
/// append a String
procedure AddString(const Text: RawUTF8); {$ifdef HASINLINE}inline;{$endif}
/// append a ShortString
procedure AddShort(const Text: ShortString); {$ifdef HASINLINE}inline;{$endif}
/// append a ShortString property name, as '"PropName":'
procedure AddPropName(const PropName: ShortString);
/// append an Instance name and pointer, as '"TObjectList(00425E68)"'+SepChar
// - Instance must be not nil
procedure AddInstanceName(Instance: TObject; SepChar: AnsiChar);
/// append an Instance name and pointer, as 'TObjectList(00425E68)'+SepChar
// - Instance must be not nil
procedure AddInstancePointer(Instance: TObject; SepChar: AnsiChar);
/// append an array of integers as CSV
procedure AddCSV(const Integers: array of Integer); overload;
/// append an array of doubles as CSV
procedure AddCSV(const Doubles: array of double; decimals: integer); overload;
/// append an array of RawUTF8 as CSV
procedure AddCSV(const Values: array of RawUTF8); overload;
/// write some data as hexa chars
procedure WrHex(P: PAnsiChar; Len: integer);
/// write some data Base64 encoded
// - if withMagic is TRUE, will write as '"\uFFF0base64encodedbinary"'
procedure WrBase64(P: PAnsiChar; Len: cardinal; withMagic: boolean);
/// write some #0 ended UTF-8 text, according to the specified format
procedure Add(P: PUTF8Char; Escape: TTextWriterKind); overload;
/// write some #0 ended UTF-8 text, according to the specified format
procedure Add(P: PUTF8Char; Len: PtrInt; Escape: TTextWriterKind); overload;
/// write some #0 ended Unicode text as UTF-8, according to the specified format
procedure AddW(P: PWord; Len: PtrInt; Escape: TTextWriterKind); overload;
/// append some chars to the buffer
// - if Len is 0, Len is calculated from zero-ended char
// - don't escapes chars according to the JSON RFC
procedure AddNoJSONEscape(P: Pointer; Len: integer=0);
/// append some binary data as hexadecimal text conversion
procedure AddBinToHex(P: Pointer; Len: integer);
/// fast conversion from binary data into hexa chars, ready to be displayed
// - using this function with Bin^ as an integer value will encode it
// in big-endian order (most-signignifican byte first): use it for display
// - up to 128 bytes may be converted
procedure AddBinToHexDisplay(Bin: pointer; BinBytes: integer);
/// add the pointer into hexa chars, ready to be displayed
procedure AddPointer(P: PtrUInt);
/// append some unicode chars to the buffer
// - WideCharCount is the unicode chars count, not the byte size
// - don't escapes chars according to the JSON RFC
// - will convert the Unicode chars into UTF-8
procedure AddNoJSONEscapeW(P: PWord; WideCharCount: integer);
/// append some UTF-8 encoded chars to the buffer
// - if Len is 0, Len is calculated from zero-ended char
// - escapes chars according to the JSON RFC
procedure AddJSONEscape(P: Pointer; Len: PtrInt=0); overload;
/// append some UTF-8 encoded chars to the buffer, from a generic string type
// - faster than AddJSONEscape(pointer(StringToUTF8(string))
// - if Len is 0, Len is calculated from zero-ended char
// - escapes chars according to the JSON RFC
procedure AddJSONEscapeString(const s: string); {$ifdef UNICODE}inline;{$endif}
/// append some Unicode encoded chars to the buffer
// - if Len is 0, Len is calculated from zero-ended widechar
// - escapes chars according to the JSON RFC
procedure AddJSONEscapeW(P: PWord; Len: PtrInt=0);
/// append an open array constant value to the buffer
// - "" will be added if necessary
// - escapes chars according to the JSON RFC
// - very fast (avoid most temporary storage)
procedure AddJSONEscape(const V: TVarRec); overload;
/// append a dynamic array content as UTF-8 encoded JSON array
// - expect a dynamic array TDynArray wrapper as incoming parameter
// - TIntegerDynArray, TInt64DynArray, TCardinalDynArray, TDoubleDynArray,
// TCurrencyDynArray, TWordDynArray and TByteDynArray will be written as
// numerical JSON values
// - TRawUTF8DynArray, TWinAnsiDynArray, TRawByteStringDynArray,
// TStringDynArray, TWideStringDynArray, TSynUnicodeDynArray, TTimeLogDynArray,
// and TDateTimeDynArray will be written as escaped UTF-8 JSON strings
// (and Iso-8601 textual encoding if necessary)
// - any other kind of dynamic array (including array of records) will be
// written as Base64 encoded binary stream, with a JSON_BASE64_MAGIC prefix
// (UTF-8 encoded \uFFF0 special code)
// - examples: '[1,2,3,4]' or '["\uFFF0base64encodedbinary"]'
procedure AddDynArrayJSON(const DynArray: TDynArray);
/// append some chars to the buffer in one line
// - P should be ended with a #0
// - will write #1..#31 chars as spaces (so content will stay on the same line)
procedure AddOnSameLine(P: PUTF8Char); overload;
/// append some chars to the buffer in one line
// - will write #0..#31 chars as spaces (so content will stay on the same line)
procedure AddOnSameLine(P: PUTF8Char; Len: PtrInt); overload;
/// append some wide chars to the buffer in one line
// - will write #0..#31 chars as spaces (so content will stay on the same line)
procedure AddOnSameLineW(P: PWord; Len: PtrInt);
/// serialize as JSON the given object
// - this default implementation will write null, or only write the
// class name and pointer if FullExpand is true - use TJSONSerializer.
// WriteObject method for full RTTI handling
// - default implementation will write TList/TCollection/TStrings/TRawUTF8List
// as appropriate array of class name/pointer (if FullExpand=true) or string
procedure WriteObject(Value: TObject; HumanReadable: boolean=false;
DontStoreDefault: boolean=true; FullExpand: boolean=false); virtual;
/// the last char appended is canceled
procedure CancelLastChar; {$ifdef HASINLINE}inline;{$endif}
/// the last char appended is canceled if it was a ','
procedure CancelLastComma; {$ifdef HASINLINE}inline;{$endif}
/// rewind the Stream to the position when Create() was called
procedure CancelAll;
/// count of add byte to the stream
property TextLength: integer read GetLength;
/// the internal TStream used for storage
property Stream: TStream read fStream write fStream;
end;
正如你所看到的,甚至有一些系列化可用的,并且CancelLastComma/CancelLastChar
方法是从循环产生快速JSON或CSV数据非常有用。
关于速度和时序,这个例程比我的磁盘访问要快,大约是100 MB/s。我认为在TMemoryStream而不是TFileStream中附加数据时,它可以达到500 MB/s左右。
嗨,好像使用缓冲区不会加速它。我会尝试使用TFileStream – KingKong 2011-05-24 18:18:57
天真使用TFileStream不会帮助。你也需要缓冲这些。你的磁盘可以更快吗? – 2011-05-24 23:04:06
@大卫你是完全正确的。 TFileStream只是Windows文件API的一个包装,因此每次调用Write()时添加一些小内容时速度会很慢。缓冲是一个关键。另一种可能性应该是使用TMemoryStream然后SaveToFile(在Delphi 6/7下TMemoryStream使用慢的GlobalAlloc API - 不要使用它)。这正是我们的'TTextWriter'类所做的。当与大缓冲区一起使用时,FileText函数速度很快。瓶颈应该在子例程中,而不是用于将数据附加到文本内容的技术。 – 2011-05-25 05:12:58
当我工作的一个归档包,我注意到一个性能提升,当我写的每512个字节,这是磁盘扇区的默认大小的块。请注意,磁盘扇区的大小和文件系统块的大小是两回事!有WinAPI功能,这将得到您的分区的块大小 - 看看here。
我建议切换到TFileStream的或内存流,而不是老式的文件I/O。如果使用TFileStream,则可以根据估计的需要设置文件的大小,而不是让程序搜索每个写入使用的下一个空白块。然后可以根据需要扩展它或截断它。如果您使用TMemoryStream - 将数据保存并使用SaveToFile() - 则整个事件将一次从内存写入文件。这应该会加快你的速度。
我怀疑的写作时间是没有问题的。例程的耗时部分是流出500个字段。你可以用等价长度的常量字符串替换字段流式智商。我会保证这会更快。所以,为了优化例程,您需要优化字段流,而不是实际的写!
您会考虑使用流而不是帕斯卡尔文件I/O? – 2011-05-24 16:27:04
或TStringList与SaveTo文件()?但首先你必须测试在不写文件的情况下循环数据的速度。 – 2011-05-24 16:31:43
你有没有跑分析器?它会告诉你你的程序在哪里花费时间。 – 2011-05-24 16:43:32