类型错误:列表索引必须是整数,而不是标签 - 蟒蛇
问题描述:
试图使数据挖掘,我在阵列中的所有网址,但一旦我尝试采取在刮板给了我这个错误:类型错误:列表索引必须是整数,而不是标签 - 蟒蛇
$TypeError: list indices must be integers, not Tag -- python
这是我刮板全码:
s = sched.scheduler(time.time, time.sleep)
def myScraper(sc):
csv_f = csv.reader(f)
quote_page = []
for row in csv_f:
quote_page.append(url+row[0])
i=1
for var in quote_page:
num_dat = []
txt_dat = []
num_dat2 = []
txt_dat2 = []
s.enter(5,1,myScraper, (sc,))
sleep(5)
print(quote_page[i])
page = urlopen(quote_page[i])
i = i+1
soup = BeautifulSoup(page, 'html.parser')
data_store = []
for tr in soup.find_all('tr'): # find table rows
tds = tr.find_all('td', attrs={'class': 'fieldData'}) # find all table cells
for i in tds: # returns all cells from html rows
if i != []: # pops out empty cells from returned data
data_store.append(i.text)
#print(i.text)
#print("\n")
data_store2 = []
for tr in soup.find_all('tr'):
tds2 = tr.find_all('td', attrs={'class': 'improvementsFieldData'})
for i in tds2:
if i != []:
data_store2.append(i.text)
for j in data_store:
if ',' in j and ' ' not in j:
lft_dec = j[:j.index(',')].replace('$', '')
rght_dec = j[j.index(','):].replace(',', '') # drop the decimal
num_dat.append(float(lft_dec+rght_dec)) # convert to numerical data
else:
txt_dat.append(j)
for j in data_store2:
if ',' in j and ' ' not in j:
lft_dec = j[:j.index(',')].replace('$', '')
rght_dec = j[j.index(','):].replace(',', '').replace('Sq. Ft', '') # drop the decimal and Sq
num_dat2.append(float(lft_dec+rght_dec)) # convert to numerical data
elif ('Sq. Ft' and ',') in j:
sqft_dat_befcm = j[:j.index(',')].replace(',', '')
sqft_dat_afcm = j[j.index(','):].replace(' ', '').replace('Sq.Ft', '').replace(',', '')
num_dat2.append(float(sqft_dat_befcm+sqft_dat_afcm))
else:
txt_dat2.append(j)
print(num_dat)
print(txt_dat)
print(num_dat2)
print(txt_dat2)
s.enter(5, 1, myScraper, (s,))
s.run()
f.close
基本上我对这一计划的目标是给定的URL,我可以打开一个浏览器刮去第一个数组,接着等待的时间和重复间隔量直到数组完成。
编辑***对不起,第一次发布在这个。下面是完整的堆栈跟踪
Traceback (most recent call last):
File "C:\Users\Ahmad\Desktop\HouseProject\AhmadsScraper.py", line 85, in
<module>
s.run()
File "C:\Users\Ahmad\Anaconda2\lib\sched.py", line 117, in run
action(*argument)
File "C:\Users\Ahmad\Desktop\HouseProject\AhmadsScraper.py", line 32, in
myScraper
print(quote_page[i])
TypeError: list indices must be integers, not Tag
答
的问题是,因为使用的是相同的变量i
作为一个计数器和一个内循环变量。如何使用enumerate
代替?
for idx, var in enumerate(quote_page):
...
print(quote_page[idx])
page = urlopen(quote_page[idx])
+0
多德非常感谢你!哈哈不能相信我错过了它现在是有道理的!我只是解决它,它的工作!再次感谢! – Matherz
您能否提供完整的回溯?所以人们可以理解哪一行会抛出错误? –
是的!感谢您的回应! – Matherz