Pythonç¬è«å®æå 乿åç±é®ç¥è¯äººé®é¢å¹¶ä¿åè³æ°æ®åº
å¤§å®¶å¥½ï¼æ¬æ¬¡ä¸ºå¤§å®¶å¸¦æ¥çæ¯æåç±é®ç¥è¯äººçé®é¢å¹¶å°é®é¢åçæ¡ä¿åå°æ°æ®åºçæ¹æ³ï¼æ¶åçå å®¹å æ¬ï¼
- Urllibçç¨æ³åå¼å¸¸å¤ç
- Beautiful Soupçç®ååºç¨
- MySQLdbçåºç¡ç¨æ³
- æ£å表达å¼çç®ååºç¨
ç¯å¢é ç½®
å¨è¿ä¹åï¼æä»¬éè¦å é ç½®ä¸ä¸ç¯å¢ï¼æçPythonççæ¬ä¸º2.7ï¼éè¦é¢å¤å®è£ çåºæä¸¤ä¸ªï¼ä¸ä¸ªæ¯Beautiful Soupï¼ä¸ä¸ªæ¯MySQLdbï¼å¨è¿ééä¸ä¸¤ä¸ªåºçä¸è½½å°åï¼
大家å¯ä»¥ä¸è½½ä¹åéè¿å¦ä¸å½ä»¤å®è£
1 |
python setup.py install |
ç¯å¢é 置好ä¹åï¼æä»¬ä¾¿å¯ä»¥å¼å¿å°æ¸ç¬è«äº
æ¡æ¶æè·¯
é¦å æä»¬é便æ¾ä¸ä¸ªåç±»å°åï¼å¤è¯å¦ä¹ â ç±é®ç¥è¯äººï¼æå¼ä¹åå¯ä»¥çå°ä¸ç³»åçé®é¢å表ã
æä»¬å¨è¿ä¸ªé¡µé¢éè¦è·åçä¸è¥¿æï¼
æ»çé¡µç æ°ï¼æ¯ä¸é¡µçææé®é¢é¾æ¥ã
æ¥ä¸æ¥æä»¬éè¦éåææçé®é¢ï¼æ¥æåæ¯ä¸ä¸ªè¯¦æ 页é¢ï¼æåé®é¢ï¼é®é¢å 容ï¼åçè ï¼åçæ¶é´ï¼åçå 容ã
æåï¼æä»¬éè¦æè¿äºå 容åå¨å°æ°æ®åºä¸ã
è¦ç¹ç®æ
å ¶å®å¤§é¨åå 容ç¸ä¿¡å¤§å®¶ä¼äºåé¢çå 容ï¼è¿éçç¬è«æè·¯å·²ç»èæ±è´¯éäºï¼è¿é就说ä¸ä¸ä¸äºæ©å±çåè½
1.æ¥å¿è¾åº
æ¥å¿è¾åºï¼æä»¬è¦è¾åºæ¶é´åç¬åçç¶æï¼æ¯å¦åä¸é¢è¿æ ·ï¼
[2015-08-10 03:05:20] 113011 å·é®é¢åå¨å ¶ä»çæ¡ æä¸ªäººè®¤ä¸ºåºè¯¥æ¯æ¨±æ¡æ²å¾ç¾ç
[2015-08-10 03:05:20] ä¿åå°æ°æ®åº,æ¤é®é¢çID为 113011
[2015-08-10 03:05:20] å½åç¬å第 2 çå 容,åç°ä¸ä¸ªé®é¢ ç¾åº¦æä¸ä¸ªå°æ¹ï¼è±å¿å¸¦çè³é¦ï¼æ°´å¿æµæ·å¥è ¾æ¯ä»ä¹ææ å¤å¤å¸®å¿å¦ åçæ°é 1
[2015-08-10 03:05:19] ä¿åå°æ°æ®åº,æ¤é®é¢çID为 113010
æä»¥ï¼æä»¬éè¦å¼å ¥æ¶é´å½æ°ï¼ç¶ååä¸ä¸ªè·åå½åæ¶é´ç彿°
1 2 3 4 5 6 7 8 9 |
import time
#è·åå½åæ¶é´ def getCurrentTime(self): return time.strftime('[%Y-%m-%d %H:%M:%S]',time.localtime(time.time()))
#è·åå½åæ¶é´ def getCurrentDate(self): return time.strftime('%Y-%m-%d',time.localtime(time.time())) |
以ä¸å嫿¯è·åå¸¦å ·ä½æ¶é´åè·åæ¥æç彿°ï¼å¨è¾åºæ¶ï¼æä»¬å¯ä»¥å¨è¾åºè¯å¥çåé¢è°ç¨è¿å½æ°å³å¯ã
ç¶åæä»¬éè¦å°ç¼å²åºè®¾ç½®è¾åºå°logä¸ï¼å¨ç¨åºçæåé¢å ä¸è¿ä¸¤å¥å³å¯
1 2 |
f_handler=open('out.log', 'w') sys.stdout=f_handler |
è¿æ ·ï¼ææçprintè¯å¥è¾åºçå 容就ä¼ä¿åå°out.logæä»¶ä¸äºã
2.页ç ä¿å
ç¬è«ç¬åè¿ç¨ä¸å¯è½åºç°åç§åæ ·çé误ï¼è¿æ ·ä¼å¯¼è´ç¬è«ç䏿ï¼å¦ææä»¬éæ°è¿è¡ç¬è«ï¼é£ä¹å°±ä¼å¯¼è´ç¬è«ä»å¤´å¼å§è¿è¡äºï¼è¿æ ·æ¾ç¶æ¯ä¸åççãæä»¥ï¼æä»¬éè¦æå½åç¬åç页é¢ä¿å䏿¥ï¼æ¯å¦å¯ä»¥ä¿åå°ææ¬ä¸ï¼åå¦ç¬è«ä¸æäºï¼éæ°è¿è¡ç¬è«ï¼è¯»åææ¬æä»¶çå å®¹ï¼æ¥çç¬åå³å¯ã
大家å¯ä»¥ç¨å¾®åèä¸ä¸å½æ°çå®ç°ï¼
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
#䏻彿° def main(self): f_handler=open('out.log', 'w') sys.stdout=f_handler page = open('page.txt', 'r') content = page.readline() start_page = int(content.strip()) - 1 page.close() print self.getCurrentTime(),"å¼å§é¡µç ",start_page print self.getCurrentTime(),"ç¬è«æ£å¨å¯å¨,å¼å§ç¬åç±é®ç¥è¯äººé®é¢" self.total_num = self.getTotalPageNum() print self.getCurrentTime(),"è·åå°ç®å½é¡µé¢ä¸ªæ°",self.total_num,"个" if not start_page: start_page = self.total_num for x in range(1,start_page): print self.getCurrentTime(),"æ£å¨æå第",start_page-x+1,"个页é¢" try: self.getQuestions(start_page-x+1) except urllib2.URLError, e: if hasattr(e, "reason"): print self.getCurrentTime(),"ææ»é¡µé¢å æåææå失败,é误åå ", e.reason except Exception,e: print self.getCurrentTime(),"ææ»é¡µé¢å æåææå失败,é误åå :",e if start_page-x+1 < start_page: f=open('page.txt','w') f.write(str(start_page-x+1)) print self.getCurrentTime(),"åå ¥æ°é¡µç ",start_page-x+1 f.close() |
è¿æ ·ï¼ä¸ç®¡æä»¬ç¬è«ä¸ééå°ä»ä¹é误ï¼å¦å¦ä¹ä¸ä¼æ å¿äº
3.页é¢å¤ç
页é¢å¤çè¿ç¨ä¸ï¼æä»¬å¯è½éå°åç§åæ ·å¥è©çHTML代ç ï¼åä¸ä¸è䏿 ·ï¼æä»¬æ²¿ç¨ä¸ä¸ªé¡µé¢å¤çç±»å³å¯ã
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import re
#å¤ç页颿 ç¾ç±» class Tool:
#å°è¶ 龿¥å¹¿ååé¤ removeADLink = re.compile('<div class="link_layer.*?</div>') #å»é¤imgæ ç¾,1-7ä½ç©ºæ ¼, removeImg = re.compile('<img.*?>| {1,7}| ') #å é¤è¶ 龿¥æ ç¾ removeAddr = re.compile('<a.*?>|</a>') #ææ¢è¡çæ ç¾æ¢ä¸º\n replaceLine = re.compile('<tr>|<div>|</div>|</p>') #å°è¡¨æ ¼å¶è¡¨<td>æ¿æ¢ä¸º\t replaceTD= re.compile('<td>') #å°æ¢è¡ç¬¦æåæ¢è¡ç¬¦æ¿æ¢ä¸º\n replaceBR = re.compile('<br><br>|<br>') #å°å ¶ä½æ ç¾åé¤ removeExtraTag = re.compile('<.*?>') #å°å¤è¡ç©ºè¡å é¤ removeNoneLine = re.compile('\n+')
def replace(self,x): x = re.sub(self.removeADLink,"",x) x = re.sub(self.removeImg,"",x) x = re.sub(self.removeAddr,"",x) x = re.sub(self.replaceLine,"\n",x) x = re.sub(self.replaceTD,"\t",x) x = re.sub(self.replaceBR,"\n",x) x = re.sub(self.removeExtraTag,"",x) x = re.sub(self.removeNoneLine,"\n",x) #strip()å°ååå¤ä½å 容å é¤ return x.strip() |
æä»¬å¯ä»¥ç¨ä¸æ®µå«æHTML代ç çæåï¼ç»è¿è°ç¨replaceæ¹æ³ä¹åï¼åç§åä½çHTML代ç å°±ä¼å¤ç好äºã
æ¯å¦æä»¬è¿ä¹ä¸æ®µä»£ç ï¼
1 2 3 4 5 6 7 8 9 10 11 12 |
<article class="article-content"> <h2>åè¨</h2> <p>æè¿åç°MySQLæå¡éä¸å·®äºå°±ä¼ææï¼å¯¼è´æçç½ç«åç¬è«é½æ æ³æ£å¸¸è¿ä½ãèªå·±çç½ç«æ¯åºäºMySQLï¼å¨åç¬è«ååä¸äºèµæçæ¶å乿¯åºäºMySQLï¼æ°æ®éä¸å¤§äºï¼MySQLå®å°±æç¹åä¸äºäºï¼æ¶ä¸æ¶ä¼å´©æï¼è½ç¶æèªå·±æç½ç«çæ§åé®ä»¶éç¥ï¼ä½æ¯å¥½å¤æ¶åè¿æ¯éè¦ææ¥æå¨è¿æ¥æçæå¡å¨éæ°å¯å¨ä¸ä¸æçMySQLï¼è¿æ ·ç®ç´å¤ªä¸å好äºï¼æä»¥ï¼æå°±è§å®èªå·±åä¸ªèæ¬ï¼å®æ¶çæ§å®ï¼å¦æåç°å®ææäºå°±éå¯å®ã</p> <p>好äºï¼é²è¨ç¢è¯ä¸å¤è®²ï¼å¼å§æä»¬çé ç½®ä¹æ ã</p> <p>è¿è¡ç¯å¢ï¼<strong>Ubuntu Linux 14.04</strong></p> <h2>ç¼åShellèæ¬</h2> <p>é¦å ï¼æä»¬è¦ç¼åä¸ä¸ªshellèæ¬ï¼èæ¬ä¸»è¦æ§è¡çé»è¾å¦ä¸ï¼</p> <p>æ¾ç¤ºmysqldè¿ç¨ç¶æï¼å¦æå¤æè¿ç¨æªå¨è¿è¡ï¼é£ä¹è¾åºæ¥å¿å°æä»¶ï¼ç¶åå¯å¨mysqlæå¡ï¼å¦æè¿ç¨å¨è¿è¡ï¼é£ä¹ä¸æ§è¡ä»»ä½æä½ï¼å¯ä»¥éæ©æ§è¾åºçæµç»æã</p> <p>å¯è½å¤§å®¶å¯¹äºshellèæ¬æ¯è¾éçï¼å¨è¿éæ¨è宿¹çshellèæ¬ææ¡£æ¥åèä¸ä¸</p> <p><a href="http://wiki.ubuntu.org.cn/Shell%E7%BC%96%E7%A8%8B%E5%9F%BA%E7%A1%80" data-original-title="" title="">Ubuntu Shell ç¼ç¨åºç¡</a></p> <p>shellèæ¬çåç¼ä¸ºshï¼å¨ä»»ä½ä½ç½®æ°å»ºä¸ä¸ªèæ¬æä»¶ï¼æéæ©å¨ /etc/mysql ç®å½ä¸æ°å»ºä¸ä¸ª listen.sh æä»¶ã</p> <p>æ§è¡å¦ä¸å½ä»¤ï¼</p> |
ç»è¿å¤çå便ä¼åæå¦ä¸çæ ·åï¼
1 2 3 4 5 6 7 8 9 10 11 |
åè¨ æè¿åç°MySQLæå¡éä¸å·®äºå°±ä¼ææï¼å¯¼è´æçç½ç«åç¬è«é½æ æ³æ£å¸¸è¿ä½ãèªå·±çç½ç«æ¯åºäºMySQLï¼å¨åç¬è«ååä¸äºèµæçæ¶å乿¯åºäºMySQLï¼æ°æ®éä¸å¤§äºï¼MySQLå®å°±æç¹åä¸äºäºï¼æ¶ä¸æ¶ä¼å´©æï¼è½ç¶æèªå·±æç½ç«çæ§åé®ä»¶éç¥ï¼ä½æ¯å¥½å¤æ¶åè¿æ¯éè¦ææ¥æå¨è¿æ¥æçæå¡å¨éæ°å¯å¨ä¸ä¸æçMySQLï¼è¿æ ·ç®ç´å¤ªä¸å好äºï¼æä»¥ï¼æå°±è§å®èªå·±åä¸ªèæ¬ï¼å®æ¶çæ§å®ï¼å¦æåç°å®ææäºå°±éå¯å®ã 好äºï¼é²è¨ç¢è¯ä¸å¤è®²ï¼å¼å§æä»¬çé ç½®ä¹æ ã è¿è¡ç¯å¢ï¼UbuntuLinux14.04 ç¼åShellèæ¬ é¦å ï¼æä»¬è¦ç¼åä¸ä¸ªshellèæ¬ï¼èæ¬ä¸»è¦æ§è¡çé»è¾å¦ä¸ï¼ æ¾ç¤ºmysqldè¿ç¨ç¶æï¼å¦æå¤æè¿ç¨æªå¨è¿è¡ï¼é£ä¹è¾åºæ¥å¿å°æä»¶ï¼ç¶åå¯å¨mysqlæå¡ï¼å¦æè¿ç¨å¨è¿è¡ï¼é£ä¹ä¸æ§è¡ä»»ä½æä½ï¼å¯ä»¥éæ©æ§è¾åºçæµç»æã å¯è½å¤§å®¶å¯¹äºshellèæ¬æ¯è¾éçï¼å¨è¿éæ¨è宿¹çshellèæ¬ææ¡£æ¥åèä¸ä¸ UbuntuShellç¼ç¨åºç¡ shellèæ¬çåç¼ä¸ºshï¼å¨ä»»ä½ä½ç½®æ°å»ºä¸ä¸ªèæ¬æä»¶ï¼æéæ©å¨/etc/mysqlç®å½ä¸æ°å»ºä¸ä¸ªlisten.shæä»¶ã æ§è¡å¦ä¸å½ä»¤ï¼ |
ç»è¿ä¸é¢çå¤çï¼ææä¹±ä¹±ç代ç é½ä¼è¢«å¤ç好äºã
4.ä¿åå°æ°æ®åº
å¨è¿éï¼æä»¬æ³å®ç°ä¸ä¸ªéç¨çæ¹æ³ï¼å°±æ¯æåå¨çä¸ä¸ªä¸ªå 容åæåå ¸çå½¢å¼ï¼ç¶åæ§è¡æå ¥è¯å¥çæ¶åï¼èªå¨æå»ºå¯¹åºçsqlè¯å¥ï¼æå ¥æ°æ®ã
æ¯å¦æä»¬æé å¦ä¸çåå ¸ï¼
1 2 3 4 5 6 7 8 |
#æé æä½³çæ¡çåå ¸ good_ans_dict = { "text": good_ans[0], "answerer": good_ans[1], "date": good_ans[2], "is_good": str(good_ans[3]), "question_id": str(insert_id) } |
æé sqlè¯å¥å¹¶æå ¥å°æ°æ®åºçæ¹æ³å¦ä¸ï¼
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
#æå ¥æ°æ® def insertData(self, table, my_dict): try: self.db.set_character_set('utf8') cols = ', '.join(my_dict.keys()) values = '"," '.join(my_dict.values()) sql = "INSERT INTO %s (%s) VALUES (%s)" % (table, cols, '"'+values+'"') try: result = self.cur.execute(sql) insert_id = self.db.insert_id() self.db.commit() #夿æ¯å¦æ§è¡æå if result: return insert_id else: return 0 except MySQLdb.Error,e: #åçé误æ¶åæ» self.db.rollback() #主é®å¯ä¸ï¼æ æ³æå ¥ if "key 'PRIMARY'" in e.args[1]: print self.getCurrentTime(),"æ°æ®å·²åå¨ï¼æªæå ¥æ°æ®" else: print self.getCurrentTime(),"æå ¥æ°æ®å¤±è´¥ï¼åå %d: %s" % (e.args[0], e.args[1]) except MySQLdb.Error,e: print self.getCurrentTime(),"æ°æ®åºé误ï¼åå %d: %s" % (e.args[0], e.args[1]) |
è¿éæä»¬åªéè¦ä¼ å ¥é£ä¸ªåå ¸ï¼ä¾¿ä¼æå»ºåºå¯¹åºåå ¸é®å¼åé®åçsqlè¯å¥ï¼å®ææå ¥ã
5.PHPè¯»åæ¥å¿
æä»¬å°è¿è¡ç»æè¾åºå°äºæ¥å¿éï¼é£ä¹æä¹æ¥çæ¥å¿å¢ï¼å¾ç®åï¼å¨è¿éæä¾ä¸¤ç§æ¹æ³
æ¹æ³ä¸ï¼
PHPååºè¾åºæææ¥å¿å 容
PHP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<html> <head> <meta charset="utf-8"> <meta http-equiv="refresh" content = "5"> </head> <body> <?php $fp = file("out.log"); if ($fp) { for($i = count($fp) - 1;$i >= 0; $i --) echo $fp[$i]."<br>"; } ?> </body> </html> |
æ¤æ¹æ³å¯ä»¥çå°ææçè¾å ¥æ¥å¿ï¼ä½æ¯å¦ææ¥å¿å¤ªå¤§äºï¼é£ä¹å°±ä¼æ¥èè´¹å åå¤ªå¤§ï¼æ æ³è¾åºãä¸ºæ¤æä»¬å°±æäºç¬¬äºç§æ¹æ³ï¼å©ç¨linuxå½ä»¤ï¼è¾åºååè¡å 容ã
æ¹æ³äºï¼
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<html> <head> <meta charset="utf-8"> <meta http-equiv="refresh" content = "5"> </head> <body> <?php $ph = popen('tail -n 100 out.log','r'); while($r = fgets($ph)){ echo $r."<br>"; } pclose($ph); ?> </body> </html> |
ä¸é¢ä¸¤ç§æ¹æ³é½æ¯5ç§å·æ°ä¸æ¬¡ç½é¡µæ¥æ¥çææ°çæ¥å¿ã
æºä»£ç æ¾é
好äºï¼é²è¨ç¢è¯ä¸å¤è®²ï¼ç´æ¥ä¸æºç äº
1 |
spider.py |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# -*- coding:utf-8 -*-
import urllib import urllib2 import re import time import types import page import mysql import sys from bs4 import BeautifulSoup
class Spider:
#åå§å def __init__(self): self.page_num = 1 self.total_num = None self.page_spider = page.Page() self.mysql = mysql.Mysql()
#è·åå½åæ¶é´ def getCurrentTime(self): return time.strftime('[%Y-%m-%d %H:%M:%S]',time.localtime(time.time()))
#è·åå½åæ¶é´ def getCurrentDate(self): return time.strftime('%Y-%m-%d',time.localtime(time.time()))
#éè¿ç½é¡µçé¡µç æ°æ¥æå»ºç½é¡µçURL |
1 |
page.py |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# -*- coding:utf-8 -*- import urllib import urllib2 import re import time import types import tool from bs4 import BeautifulSoup
#æååææä¸é®é¢åçæ¡ class Page:
def __init__(self): self.tool = tool.Tool()
#è·åå½åæ¶é´ def getCurrentDate(self): return time.strftime('%Y-%m-%d',time.localtime(time.time()))
#è·åå½åæ¶é´ def getCurrentTime(self): return time.strftime('[%Y-%m-%d %H:%M:%S]',time.localtime(time.time()))
#éè¿é¡µé¢çURLæ¥è·å页é¢ç代ç def getPageByURL(self, url): try: request = urllib2.Request(url) response = urllib2.urlopen(request) return response.read().decode("utf-8") except urllib2.URLError, e: |
1 |
tool.py |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#-*- coding:utf-8 -*- import re
#å¤ç页颿 ç¾ç±» class Tool:
#å°è¶ 龿¥å¹¿ååé¤ removeADLink = re.compile('<div class="link_layer.*?</div>') #å»é¤imgæ ç¾,1-7ä½ç©ºæ ¼, removeImg = re.compile('<img.*?>| {1,7}| ') #å é¤è¶ 龿¥æ ç¾ removeAddr = re.compile('<a.*?>|</a>') #ææ¢è¡çæ ç¾æ¢ä¸º\n replaceLine = re.compile('<tr>|<div>|</div>|</p>') #å°è¡¨æ ¼å¶è¡¨<td>æ¿æ¢ä¸º\t replaceTD= re.compile('<td>') #å°æ¢è¡ç¬¦æåæ¢è¡ç¬¦æ¿æ¢ä¸º\n replaceBR = re.compile('<br><br>|<br>') #å°å ¶ä½æ ç¾åé¤ removeExtraTag = re.compile('<.*?>') |
1 |
mysql.py |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# -*- coding:utf-8 -*-
import MySQLdb import time
class Mysql:
#è·åå½åæ¶é´ def getCurrentTime(self): return time.strftime('[%Y-%m-%d %H:%M:%S]',time.localtime(time.time()))
#æ°æ®åºåå§å def __init__(self): try: self.db = MySQLdb.connect('ip','username','password','db_name') self.cur = self.db.cursor() except MySQLdb.Error,e: print self.getCurrentTime(),"è¿æ¥æ°æ®åºé误ï¼åå %d: %s" % (e.args[0], e.args[1])
#æå ¥æ°æ® def insertData(self, table, my_dict): try: self.db.set_character_set('utf8') cols = ', '.join(my_dict.keys()) values = '"," '.join(my_dict.values()) sql = "INSERT INTO %s (%s) VALUES (%s)" % (table, cols, '"'+values+'"') try: result = self.cur.execute(sql) insert_id = self.db.insert_id() self.db.commit() #夿æ¯å¦æ§è¡æå if result: return insert_id else: return 0 except MySQLdb.Error,e: #åçé误æ¶åæ» self.db.rollback() #主é®å¯ä¸ï¼æ æ³æå ¥ if "key 'PRIMARY'" in e.args[1]: print self.getCurrentTime(),"æ°æ®å·²åå¨ï¼æªæå ¥æ°æ®" else: print self.getCurrentTime(),"æå ¥æ°æ®å¤±è´¥ï¼åå %d: %s" % (e.args[0], e.args[1]) except MySQLdb.Error,e: print self.getCurrentTime(),"æ°æ®åºé误ï¼åå %d: %s" % (e.args[0], e.args[1]) |
æ°æ®åºå»ºè¡¨SQLå¦ä¸ï¼
PgSQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE TABLE IF NOT EXISTS `iask_answers` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'èªå¢ID', `text` text NOT NULL COMMENT 'åçå 容', `question_id` int(18) NOT NULL COMMENT 'é®é¢ID', `answerer` varchar(255) NOT NULL COMMENT 'åçè ', `date` varchar(255) NOT NULL COMMENT 'åçæ¶é´', `is_good` int(11) NOT NULL COMMENT 'æ¯å¦æ¯æä½³çæ¡', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `iask_questions` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'é®é¢ID', `text` text NOT NULL COMMENT 'é®é¢å 容', `questioner` varchar(255) NOT NULL COMMENT 'æé®è ', `date` date NOT NULL COMMENT 'æé®æ¶é´', `ans_num` int(11) NOT NULL COMMENT 'åçæ°é', `url` varchar(255) NOT NULL COMMENT 'é®é¢é¾æ¥', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; |
è¿è¡çæ¶åæ§è¡å¦ä¸å½ä»¤å³å¯
1 |
nohup python spider.py & |
代ç åçä¸å¥½ï¼ä» ä¾å¤§å®¶å¦ä¹ åè使ç¨ï¼å¦æé®é¢ï¼æ¬¢è¿çè¨äº¤æµã
è¿è¡ç»ææ¥ç
æä»¬æPHPæä»¶ålogæä»¶æ¾å¨åä¸ç®å½ä¸ï¼è¿è¡PHPæä»¶ï¼ä¾¿å¯ä»¥çå°å¦ä¸çå 容ï¼
å°ä¼ä¼´ä»¬èµ¶å¿«è¯ä¸ä¸å§ã