从Python中的文件中读取URL？

问题描述：

嘿家伙，所以我想从文件中读取URL并打印，如果URL存在/可达或不可以？我不知道为什么这个代码不工作：（我在读从一个.txt文件的URL）从Python中的文件中读取URL？

我得到的错误是：

name 'in_file' is not defined

代码：

from urllib.request import urlopen 

def is_reachable(url): 
    if urlopen(url): 
     return True 
    else: 
     return False 

in_file_name = input("Enter file name: ") 
try: 
    in_file = open(in_file_name, "r") 
except: 
    print("Cannot open " + in_file) 

line = in_file.readline().replace(" ", "") 
print(line) 

counter = 0 
while line != "": 
    if is_reachable(line) == True: 
    counter += 1 
    print("The URL on line ", counter, "is unreachable!") 
    line = in_file.readline()

什么错误？ –

NameError：name'in_file'未定义 – Kris

'line = in_file.readline（）'仅在is_reachable返回'True'时调用，因为缩进。为什么不用'with open（）作为in_file：for infile中的行：'block？ –

答

在打印不可达之前应该有其他的东西。或未检查以打印无法访问的网址。现在，即使网址可以访问，您正在打印它无法访问。

counter = 0 
while line != "": 
    counter += 1 
    if not is_reachable(line): 
     print("The URL on line ", counter, "is unreachable!") 
    line = in_file.readline()

还有其他一些问题与您的程序： 1.如果文件不是可读的仍然是你的程序将继续 2.您使用计数器变量，并明确地维护它。您可以轻松地使用枚举

一个更好的办法是：

from urllib.request import urlopen 
import sys 

def is_reachable(url): 
    try: 
     urlopen(url) 
     return True 
    except: 
     return False 

in_file_name = input("Enter file name: ") 
lines = [] 
try: 
    with open(in_file_name, 'r') as f: 
     lines = f.read().splitlines() 
except: 
    print("Cannot open " + in_file_name) 
    sys.exit(1) 

for counter, line in enumerate(lines): 
    if is_reachable(line): 
     print("The URL on line ", counter, "is reachable!") 
    else: 
     print("The URL on line ", counter, "is unreachable!")

由于他们使用'counter'来显示行号，他们需要在所有情况下增加行数。也许'如果is_reachable（行）不是真的：'然后打印错误，并在读下一行时增加计数器？ –

@SimonFraser感谢您指出了这一点。 –

@VikashSingh我试过你的代码，但我不知道为什么它不能打开文件 – Kris

答

如果无法打开文件，则应该退出脚本。由于您的代码当前已编写，因此如果文件无法打开，则会打印一个异常，然后尝试运行代码的其余部分。

一个快速修复：

in_file_name = input("Enter file name: ") 
try: 
    in_file = open(in_file_name, "r") 
except: 
    print("Cannot open " + in_file) 
    sys.exit(1) ### you will need to import the sys module

而且，你的输出是错误的。如果urlopen返回True，那么你应该打印它可以REACHABLE，这就是说它是UNREACHABLE。

最后，在is_reachable，你需要处理一个可能的异常，如果有一个与你试图打开一个URL解析问题：

def is_reachable(url): 
    try: 
     urlopen(url): 
     return True 
    except urllib.error.URLError: 
     return False

这是假的，但我正在试验，谢谢你告诉我。我会试试 – Kris

有道理。实际上，我认为你可能遇到的最大问题是最后一个问题，那就是你需要尝试/除非调用urlopen，否则该函数会引发异常，而不仅仅是返回False，如果URL不可访问出于任何原因。 – rumdrums

答

你在你的代码有错误：

except: 
    print('yadaya ' + in_file_name) # you have used in_file

我没有测试这个，但应该工作：

from urllib2 import urlopen # urllib is deprecated 

if urlopen('http://google.com').getcode() >= 200 and urlopen('http://google.com') < 400: 
    print ('Yes the URL exists and works.')

你将不得不付出更多的˚F或重定向之后。

从Python中的文件中读取URL？

相关推荐