缓存代理服务器使用www.google.com返回404

问题描述:

我有一项家庭作业任务,涉及在Python中为网页实现代理缓存服务器。这是我实现它缓存代理服务器使用www.google.com返回404

from socket import * 
import sys 

def main(): 
    #Create a server socket, bind it to a port and start listening 
    tcpSerSock = socket(AF_INET, SOCK_STREAM) #Initializing socket 
    tcpSerSock.bind(("", 8030)) #Binding socket to port 
    tcpSerSock.listen(5) #Listening for page requests 
    while True: 
     #Start receiving data from the client 
     print 'Ready to serve...' 
     tcpCliSock, addr = tcpSerSock.accept() 
     print 'Received a connection from:', addr 
     message = tcpCliSock.recv(1024) 
     print message 

     #Extract the filename from the given message 
     filename = "" 
     try: 
      filename = message.split()[1].partition("/")[2].replace("/", "") 
     except: 
      continue 
     fileExist = False 

     try: #Check whether the file exists in the cache 
      f = open(filename, "r") 
      outputdata = f.readlines() 
      fileExist = True 
      #ProxyServer finds a cache hit and generates a response message 
      tcpCliSock.send("HTTP/1.0 200 OK\r\n") 
      tcpCliSock.send("Content-Type:text/html\r\n") 
      for data in outputdata: 
       tcpCliSock.send(data) 
      print 'Read from cache' 
     except IOError: #Error handling for file not found in cache 
      if fileExist == False: 

       c = socket(AF_INET, SOCK_STREAM) #Create a socket on the proxyserver 

       try: 
        srv = getaddrinfo(filename, 80) 
        c.connect((filename, 80)) #https://docs.python.org/2/library/socket.html 
        # Create a temporary file on this socket and ask port 80 for 
        # the file requested by the client 
        fileobj = c.makefile('r', 0) 
        fileobj.write("GET " + "http://" + filename + " HTTP/1.0\r\n") 
        # Read the response into buffer 
        buffr = fileobj.readlines() 
        # Create a new file in the cache for the requested file. 
        # Also send the response in the buffer to client socket and the 
        # corresponding file in the cache 
        tmpFile = open(filename,"wb") 
        for data in buffr: 
         tmpFile.write(data) 
         tcpCliSock.send(data) 
       except: 
        print "Illegal request" 
      else: #File not found 
       print "404: File Not Found" 
     tcpCliSock.close() #Close the client and the server sockets 

main() 

我配置我的浏览器使用我的代理服务器,像这样

enter image description here

但我的问题,当我运行它,无论我尝试什么网页访问它会返回一个404错误与初始连接,然后连接重置错误与后续连接。我不知道为什么如此任何帮助将不胜感激,谢谢!

+0

是否有一个原因为什么你使用字符串布尔值?为什么不'fileExist = False'? – jsfan

你的代码有很多问题。

您的URL解析器非常麻烦。取而代之的是线的

filename = message.split()[1].partition("/")[2].replace("/", "") 

我会用

import re 
parsed_url = re.match(r'GET\s+http://(([^/]+)(.*))\sHTTP/1.*$', message) 
local_path = parsed_url.group(3) 
host_name = parsed_url.group(2) 
filename = parsed_url.group(1) 

如果你发现一个异常那里,你应该抛出一个错误,因为它是你的代理不明白的请求(例如一个POST) 。

当您汇编请求到目标服务器,你再使用

fileobj.write("GET {object} HTTP/1.0\n".format(object=local_path)) 
fileobj.write("Host: {host}\n\n".format(host=host_name)) 

你还应该包括一些来自原始请求的标题行的,因为他们可以对返回的内容有很大的差异。

此外,您目前缓存整个响应与所有标题行,所以你不应该添加自己的缓存服务时。

无论如何,你有什么都不起作用,因为不能保证你会得到200和text/html的内容。你应该检查响应代码,只有缓存,如果你确实得到了200.