nginx + gunicorn + django间歇性502s

问题描述:

在过去的几周里,我们越来越多的502错误。目前我们的栈是一个m1.large EC2实例的nginx + gunicron + django,由一个小RDS实例支持。nginx + gunicorn + django间歇性502s

随着请求负载的增加,它们似乎变得更加频繁。在使用浏览器时我会看到随机数502,但是我们的命令行脚本碰到api(Tasty Pie)后,通常会在第二个或第三个请求中失败。然而,如果我在它发出请求之前将一个睡眠函数添加到脚本中,那么对于该请求将是可以的,但是对于该请求而言,则是502。请注意,我们正在使用请求库和休眠包装的摘要身份验证 - 因此是401,200的模式。

为了使调试更加棘手,当使用--debug选项运行Gunicorn时,问题会自行解决。如果我删除了--debug选项,但是仍然存在错误,但是明确地将我的Gunicorn工作者限制为1。

我的nginx.conf:

user www-data; 
worker_processes 4; 
pid /var/run/nginx.pid; 

events { 
    worker_connections 768; 
    # multi_accept on; 
} 

http { 

    ## 
    # Basic Settings 
    ## 

    sendfile on; 
    tcp_nopush on; 
    tcp_nodelay on; 
    keepalive_timeout 65; 
    types_hash_max_size 2048; 
    # server_tokens off; 

    # server_names_hash_bucket_size 64; 
    # server_name_in_redirect off; 

    include /etc/nginx/mime.types; 
    default_type application/octet-stream; 

    ## 
    # Logging Settings 
    ## 

    access_log /var/log/nginx/access.log; 
    error_log /var/log/nginx/error.log; 

    ## 
    # Gzip Settings 
    ## 

    gzip on; 
    gzip_disable "msie6"; 

     gzip_proxied any; 
     gzip_types application/x-ghi-packedschemafeatures-v1 
     gzip_http_version 1.1; 
     gzip_comp_level 1; 
     gzip_min_length 500; 

     proxy_buffering on; 
     proxy_http_version 1.1; 

    ## 
    # Virtual Host Configs 
    ## 

    include /etc/nginx/conf.d/*.conf; 
    include /etc/nginx/sites-enabled/*; 
} 

虚拟主机文件:

server { 
     listen 80; 
     server_name pipeline.ourdomain.com; 
     location/{ 
     rewrite^https://$server_name$request_uri permanent; 
     } 
} 

server { 
     listen 443; 
     server_name pipeline.ourdomain.com; 
     ssl on; 
     ssl_protocols SSLv3 TLSv1; 
     ssl_ciphers ALL:-ADH:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP; 
     ssl_session_cache shared:SSL:10m; 
     ssl_certificate  /etc/ssl/certs/ourdomain.com.combined.crt; 
     ssl_certificate_key /etc/ssl/private/ourdomain.com.key; 

     root /var/www/; 
     location /static/ { 
       alias /var/www/production/pipeline/public/; 
     } 

     location/{ 
      proxy_pass_header Server; 
      proxy_set_header Host $http_host; 
      proxy_redirect off; 
      proxy_set_header X-Real-IP $remote_addr; 
      proxy_set_header X-Scheme $scheme; 
      proxy_set_header X-Forwarded-Protocol https; 
      proxy_connect_timeout 240; 
      proxy_read_timeout 280; 
      proxy_pass http://localhost:8000/; 
     } 
     error_page 500 502 503 504 /static/50x.html; 



} 

Gunicorn命令

#!/bin/bash 
set -e 
LOGFILE=/var/log/gunicorn/ea_pipeline.log 
LOGDIR=$(dirname $LOGFILE) 

SETTINGS=production_settings 
# user/group to run as 
USER=ubuntu 
GROUP=ubuntu 
DJANGO_PATH=$(dirname $(readlink -f $0))/../ 
cd $DJANGO_PATH 
echo $(pwd) 
. ../env/bin/activate 
test -d $LOGDIR || mkdir -p $LOGDIR 
exec ../env/bin/gunicorn_django \ 
    --user=$USER --group=$GROUP --log-level=debug \ 
    --preload \ 
    --workers=4 \ 
    --timeout=90 \ 
    --settings=$SETTINGS \ 
    --limit-request-line=8190 \ 
    --limit-request-field_size 0 \ 
    --pythonpath=$DJANGO_PATH \ 
    --log-file=$LOGFILE production_settings.py 2>>$LOGFILE 

样品访问日志:

67.134.170.194 - - [24/Aug/2012:00:28:17 +0000] "GET /api/v1/storage/ HTTP/1.1" 401 5 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 
67.134.170.194 - - [24/Aug/2012:00:28:18 +0000] "GET /api/v1/storage/ HTTP/1.1" 200 326 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 
67.134.170.194 - - [24/Aug/2012:00:28:18 +0000] "GET /api/v1/customer/?client_id=lamb_01 HTTP/1.1" 502 18 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 
67.134.170.194 - - [24/Aug/2012:00:29:41 +0000] "GET /api/v1/storage/ HTTP/1.1" 502 18 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 

Nginx的错误日志:

2012/08/24 00:28:18 [error] 16490#0: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 67.134.170.194, server: pipeline.ourdomain.com, request: "GET /api/v1/customer/?client_id=lamb_01 HTTP/1.1", upstream: "http://127.0.0.1:8000/api/v1/customer/?client_id=lamb_01", host: "pipeline.ourdomain.com" 
2012/08/24 00:29:41 [error] 16490#0: *7 connect() failed (111: Connection refused) while connecting to upstream, client: 67.134.170.194, server: pipeline.ourdomain.com, request: "GET /api/v1/storage/ HTTP/1.1", upstream: "http://127.0.0.1:8000/api/v1/storage/", host: "pipeline.ourdomain.com" 

的Gunicorn日志的样本:

2012-08-24 17:03:13 [8716] [INFO] Starting gunicorn 0.14.3 
2012-08-24 17:03:13 [8716] [DEBUG] Arbiter booted 
2012-08-24 17:03:13 [8716] [INFO] Listening at: http://127.0.0.1:8000 (8716) 
2012-08-24 17:03:13 [8716] [INFO] Using worker: sync 
2012-08-24 17:03:13 [8735] [INFO] Booting worker with pid: 8735 
2012-08-24 17:03:13 [8736] [INFO] Booting worker with pid: 8736 
2012-08-24 17:03:13 [8737] [INFO] Booting worker with pid: 8737 
2012-08-24 17:03:13 [8738] [INFO] Booting worker with pid: 8738 
2012-08-24 17:03:21 [8738] [DEBUG] GET /api/v1/storage/ 
Assertion failed: ok (mailbox.cpp:84) 
2012-08-24 17:03:21 [8738] [INFO] Parent changed, shutting down: <Worker 8738> 
2012-08-24 17:03:21 [8738] [INFO] Worker exiting (pid: 8738) 
Error in sys.exitfunc: 
2012-08-24 17:03:21 [8737] [DEBUG] GET /api/v1/storage/ 
2012-08-24 17:03:22 [8838] [INFO] Starting gunicorn 0.14.3 
2012-08-24 17:03:22 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000) 
2012-08-24 17:03:22 [8838] [ERROR] Retrying in 1 second. 
2012-08-24 17:03:22 [8737] [INFO] Parent changed, shutting down: <Worker 8737> 
2012-08-24 17:03:22 [8737] [INFO] Worker exiting (pid: 8737) 
Error in sys.exitfunc: 
2012-08-24 17:03:22 [8736] [DEBUG] GET /api/v1/customer/ 
2012-08-24 17:03:23 [8736] [INFO] Parent changed, shutting down: <Worker 8736> 
2012-08-24 17:03:23 [8736] [INFO] Worker exiting (pid: 8736) 
Error in sys.exitfunc: 
2012-08-24 17:03:23 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000) 
2012-08-24 17:03:23 [8838] [ERROR] Retrying in 1 second. 
2012-08-24 17:03:24 [8735] [DEBUG] GET /api/v1/upload_action/ 
2012-08-24 17:03:24 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000) 
2012-08-24 17:03:24 [8838] [ERROR] Retrying in 1 second. 
2012-08-24 17:03:24 [8735] [INFO] Parent changed, shutting down: <Worker 8735> 
2012-08-24 17:03:24 [8735] [INFO] Worker exiting (pid: 8735) 
Error in sys.exitfunc: 
2012-08-24 17:03:25 [8838] [DEBUG] Arbiter booted 
2012-08-24 17:03:25 [8838] [INFO] Listening at: http://127.0.0.1:8000 (8838) 
2012-08-24 17:03:25 [8838] [INFO] Using worker: sync 
2012-08-24 17:03:25 [8907] [INFO] Booting worker with pid: 8907 
2012-08-24 17:03:25 [8908] [INFO] Booting worker with pid: 8908 
2012-08-24 17:03:25 [8909] [INFO] Booting worker with pid: 8909 
2012-08-24 17:03:25 [8910] [INFO] Booting worker with pid: 8910 
+0

什么是Nginx的错误日志? – VBart

+0

@VBart Nginx正在登录到access.log – AndrewJesaitis

+0

但是在你的配置中:'error_log /var/log/nginx/error.log;'。而且,nginx错误日志格式是不同的,不是你在这里显示的格式。 – VBart

这是一个很老的帖子。但是我有一个与NGinx + Gunicorn + Flask设置完全相同的问题。我还有一个与每300次请求相同的日志的502错误。将gunicorn worker类型改为异步解决了我的问题(我选择了gthread)。希望这个答案会帮助别人。

如何更改设置:http://docs.gunicorn.org/en/stable/settings.html#worker-class

如何选择你的工人类型: http://docs.gunicorn.org/en/latest/design.html#choosing-a-worker-type

在这里很好地解释了为什么: How many concurrent requests does a single Flask process receive?