将大文件上传到Tornado HTTP服务器时出现MemoryError

问题描述:

我正在处理大型文件上传(250 MB +)的Flask应用程序。我正在使用Tornado运行这个应用程序,并使用多个进程,因此它可以处理并发请求而不会阻塞。将大文件上传到Tornado HTTP服务器时出现MemoryError

import os.path 
import tempfile 
from flask import Flask, request, jsonify 
from tornado.wsgi import WSGIContainer 
from tornado.httpserver import HTTPServer 
from tornado.ioloop import IOLoop 
from werkzeug import secure_filename 

app = Flask(__name__) 

@app.route("/test", methods=["GET"]) 
def test_route(): 

    return jsonify(msg='Ok'), 200 

@app.route("/upload", methods=["GET", "POST"]) 
def upload_file(): 

    if request.method == 'POST': 
     temp_directory = app.config['TMP_DIRECTORY'] 
     uploaded_file = request.files['filename'] 
     filename = secure_filename(uploaded_file.filename) 
     uploaded_file.save(os.path.join(temp_directory, filename)) 
     return jsonify(msg="File upload successfully"), 200 

    else: 
     return jsonify(msg="Use POST to upload a file"), 200 


if __name__ == '__main__': 
    app.config['TMP_DIRECTORY'] = tempfile.mkdtemp() 
    address = '0.0.0.0' 
    port = 8000 

    max_buffer_size = 500 * 1024 * 1024 
    server = HTTPServer(WSGIContainer(app), max_buffer_size=max_buffer_size) 
    server.bind(port=port, address=address) 

    print("Starting Tornado server on %s:%s" % (address, port)) 
    server.start(2) 
    IOLoop.instance().start() 

同时上传多个大文件时,我得到了以下的MemoryError:

$ curl -i -F name=file -F [email protected] http://127.0.0.1:8000/upload 

ERROR:tornado.application:Uncaught exception 
Traceback (most recent call last): 
    File "/usr/lib64/python2.7/site-packages/tornado/http1connection.py", line 238, in _read_message 
    delegate.finish() 
    File "/usr/lib64/python2.7/site-packages/tornado/httpserver.py", line 285, in finish 
    self.request.body = b''.join(self._chunks) 
MemoryError 

相信龙卷风在内存中存储整个上传的文件,一旦客户只需将其写入到磁盘完成上传。是否有可能修改此行为将块写入磁盘?

您误解了Tornado的工作原理。它并不神奇地让你的Flask应用“能够处理并发请求而不会阻塞” - 在Tornado的WSGIContainer中使用Flask是较少可伸缩比在诸如uwsgigunicorn的东西上使用Flask更灵活。请参阅warning in WSGIContainer's documentation

如果你是以原生Tornado应用程序(没有Flask)的方式来做这件事,那么你可以使用tornado.web.stream_request_body修饰器来处理大的上传而不缓冲整个内存。

+0

你是完全正确的。我将使用gunicorn。 –