从多个表中提取第一行并添加一列(Python)

问题描述:

我试图从Investing.com生成最新货币报价的列表。从多个表中提取第一行并添加一列(Python)

我有以下代码:

head = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36", 
     "X-Requested-With": "XMLHttpRequest"} 

ISO_Code=[] 
Latest=[] 
for item in ISO_CURR_ID.ISO_Code[:4]: 
    url = 'http://www.investing.com/currencies/usd-'+item+'-historical-data' 
    r = requests.get(url, headers=head) 
    soup = BeautifulSoup(r.content, 'html.parser') 
    try: 
     CurrHistoricRange = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0] 
     Item='USD/'+item 
     ISO_Code.append(np.array(Item)) 
#  Latest.append(np.array(CurrHistoricRange[:1])) 
     Latest.append(CurrHistoricRange[:1]) 
    except: 
     pass 

其中ISO_CURR_ID.ISO_Code是:

In [69]:ISO_CURR_ID.ISO_Code[:4] 
Out[69]: 
0 EUR 
1 GBP 
2 JPY 
3 CHF 

我需要的最终形式是一张类似的表格

ISO_Code Date  Price Open High Low Change % 
0 EUR Jun 21, 2016, 0.8877, 0.8833, 0.8893, 0.881, -0.14% 

但我m having problems to undestand how to merge those first rows without repeating column names. So I m如果我使用

Final=pd.DataFrame(dict(ISO_Code = ISO_Code, Latest_Quotes = Latest)) 


Final 
Out[71]: 
    ISO_Code          Latest_Quotes 
0 USD/EUR    Date Price Open High Low... 
1 USD/GBP    Date Price Open High  Lo... 
2 USD/JPY    Date Price Open High Low... 
3 USD/CHF    Date Price Open High  Low... 
+0

你可以发布所需的输出(5-7行应该是足够了)? – MaxU

+0

期望的outup在问题 – Pavel

我认为这是完成你正在尝试做

head = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36", 
     "X-Requested-With": "XMLHttpRequest"} 

latest_data=[] 

for item in ISO_CURR_ID.ISO_Code: 
    url = 'http://www.investing.com/currencies/usd-'+item+'-historical-data' 
    r = requests.get(url, headers=head) 
    soup = BeautifulSoup(r.content, 'html.parser') 
    try: 
     CurrHistoricRange = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0] 
     Item='USD/'+item 
     data = CurrHistoricRange.iloc[0].to_dict() 
     data["ISO_Code"] = Item 
     latest_data.append(data) 
    except Exception as e: 
     print(e) 

def getDf(latest_list, order = ["ISO_Code", "Date", "Price", "Open", "High", "Low", "Change %"]): 
    return pd.DataFrame(latest_list, columns=order) 

getDf(latest_data) 

输出一个更清洁的方式:

ISO_Code Date  Price Open High Low  Change % 
0 USD/EUR  Jun 21, 2016 0.8882 0.8833 0.8893 0.8810 0.55% 
1 USD/GBP  Jun 21, 2016 0.6822 0.6815 0.6829 0.6766 0.10% 
2 USD/JPY  Jun 21, 2016 104.75 103.82 104.82 103.60 0.88% 
3 USD/CHF  Jun 21, 2016 0.9613 0.9620 0.9623 0.9572 -0.07% 
+0

谢谢!工作很棒! – Pavel

我会建议你使用pandas.Panel的,类似pandas_datareader:

import requests 
from bs4 import BeautifulSoup 
import pandas as pd 

head = { 
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36", 
    "X-Requested-With": "XMLHttpRequest" 
} 

ISO_Code=[] 
Latest=[] 
URL = 'http://www.investing.com/currencies/usd-{}-historical-data' 
dfs = {} 

curr_ser = pd.Series(['EUR','GBP','JPY','CHF']) 


#for item in ISO_CURR_ID.ISO_Code[:4]: 
for item in curr_ser: 
    url = URL.format(item) 
    r = requests.get(url, headers=head) 
    soup = BeautifulSoup(r.content, 'html.parser') 
    try: 
     Item='USD/'+item 
     dfs[Item] = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0] 
     #CurrHistoricRange = pd.read_html(r.content,attrs = {'id': 'curr_table'}, flavor="bs4")[0] 
     #ISO_Code.append(np.array(Item)) 
     #Latest.append(np.array(CurrHistoricRange[:1])) 
     #Latest.append(CurrHistoricRange[:1]) 
    except: 
     pass 

# create Panel out of dictionary of DataFrames 
p = pd.Panel(dfs) 

# slice first row from all DFs 
t = p[:,0,:] 

print(t) 
print(t.T) 

产量:

   USD/CHF  USD/EUR  USD/GBP  USD/JPY 
Date  Jun 21, 2016 Jun 21, 2016 Jun 21, 2016 Jun 21, 2016 
Price   0.9618  0.8887  0.6828  104.97 
Open    0.962  0.8833  0.6815  103.82 
High   0.9623  0.8893  0.6829  104.97 
Low    0.9572   0.881  0.6766   103.6 
Change %  -0.02%   0.61%   0.19%   1.09% 

       Date Price Open High  Low Change % 
USD/CHF Jun 21, 2016 0.9618 0.962 0.9623 0.9572 -0.02% 
USD/EUR Jun 21, 2016 0.8887 0.8833 0.8893 0.881 0.61% 
USD/GBP Jun 21, 2016 0.6828 0.6815 0.6829 0.6766 0.19% 
USD/JPY Jun 21, 2016 104.97 103.82 104.97 103.6 1.09% 

如果我们排序DF的索引(按日期)像这样:

dfs[Item] = pd.read_html(r.content, 
          attrs = {'id': 'curr_table'}, 
          flavor="bs4", 
          parse_dates=['Date'], 
          index_col=[0] 
       )[0].sort_index() 

# create Panel out of dictionary of DataFrames 
p = pd.Panel(dfs) 

现在我们可以做很多有趣的事情:

In [18]: p.axes 
Out[18]: 
[Index(['USD/CHF', 'USD/EUR', 'USD/GBP', 'USD/JPY'], dtype='object'), 
DatetimeIndex(['2016-05-23', '2016-05-24', '2016-05-25', '2016-05-26', '2016-05-27', '2016-05-30', '2016-05-31', '2016-06-01', '2016-06-02', '20 
16-06-03', '2016-06-06', '2016-06-07', '2016-06-08', 
       '2016-06-09', '2016-06-10', '2016-06-13', '2016-06-14', '2016-06-15', '2016-06-16', '2016-06-17', '2016-06-19', '2016-06-20', '20 
16-06-21'], 
       dtype='datetime64[ns]', name='Date', freq=None), 
Index(['Price', 'Open', 'High', 'Low', 'Change %'], dtype='object')] 

In [19]: p.keys() 
Out[19]: Index(['USD/CHF', 'USD/EUR', 'USD/GBP', 'USD/JPY'], dtype='object') 

In [22]: p.to_frame().head(10) 
Out[22]: 
        USD/CHF USD/EUR USD/GBP USD/JPY 
Date  minor 
2016-05-23 Price  0.9896 0.8913 0.6904 109.23 
      Open  0.9905 0.8913 0.6893 110.08 
      High  0.9924 0.8942 0.6925 110.25 
      Low  0.9879 0.8893 0.6872 109.08 
      Change % -0.06% 0.03% 0.12% -0.84% 
2016-05-24 Price  0.9933 0.8976 0.6833 109.99 
      Open  0.9892 0.891 0.6903 109.22 
      High  0.9946 0.8983 0.6911 110.12 
      Low  0.9882 0.8906 0.6827 109.14 
      Change % 0.37% 0.71% -1.03% 0.70% 

索引由货币对按日期排列

In [25]: p['USD/EUR', '2016-06-10':'2016-06-15', :] 
Out[25]: 
      Price Open High  Low Change % 
Date 
2016-06-10 0.8889 0.8835 0.8893 0.8825 0.59% 
2016-06-13 0.8855 0.8885 0.8903 0.8846 -0.38% 
2016-06-14 0.8922 0.8856 0.8939 0.8846 0.76% 
2016-06-15 0.8881 0.892 0.8939 0.8848 -0.46% 

指数货币对

In [26]: p['USD/EUR', :, :] 
Out[26]: 
      Price Open High  Low Change % 
Date 
2016-05-23 0.8913 0.8913 0.8942 0.8893 0.03% 
2016-05-24 0.8976 0.891 0.8983 0.8906 0.71% 
2016-05-25 0.8964 0.8974 0.8986 0.8953 -0.13% 
2016-05-26 0.8933 0.8963 0.8975 0.8913 -0.35% 
2016-05-27 0.8997 0.8931 0.9003 0.8926 0.72% 
2016-05-30 0.8971 0.8995 0.9012 0.8969 -0.29% 
2016-05-31 0.8983 0.8975 0.8993 0.8949 0.13% 
2016-06-01 0.8938 0.8981  0.9 0.8929 -0.50% 
2016-06-02 0.8968 0.8937 0.8974 0.8911 0.34% 
2016-06-03 0.8798 0.8968 0.8981 0.8787 -1.90% 
2016-06-06 0.8807 0.8807 0.8831 0.8777 0.10% 
2016-06-07 0.8804 0.8805 0.8821 0.8785 -0.03% 
2016-06-08 0.8777 0.8803 0.8812 0.8762 -0.31% 
2016-06-09 0.8837 0.877 0.8847 0.8758 0.68% 
2016-06-10 0.8889 0.8835 0.8893 0.8825 0.59% 
2016-06-13 0.8855 0.8885 0.8903 0.8846 -0.38% 
2016-06-14 0.8922 0.8856 0.8939 0.8846 0.76% 
2016-06-15 0.8881 0.892 0.8939 0.8848 -0.46% 
2016-06-16 0.8908 0.8879 0.8986 0.8851 0.30% 
2016-06-17 0.8868 0.8907 0.8914 0.885 -0.45% 
2016-06-19 0.8813 0.8822 0.8841 0.8811 -0.63% 
2016-06-20 0.8833 0.8861 0.8864 0.8783 0.23% 
2016-06-21 0.8891 0.8833 0.8893 0.881 0.66% 

指数按日期

In [28]: p[:, '2016-06-20', :] 
Out[28]: 
     USD/CHF USD/EUR USD/GBP USD/JPY 
Price  0.962 0.8833 0.6815 103.84 
Open  0.9599 0.8861 0.6857 104.63 
High  0.9633 0.8864 0.6881 104.84 
Low  0.9576 0.8783 0.6794 103.78 
Change % 0.22% 0.23% -0.61% -0.75% 

In [29]: p[:, :, 'Change %'] 
Out[29]: 
      USD/CHF USD/EUR USD/GBP USD/JPY 
Date 
2016-05-23 -0.06% 0.03% 0.12% -0.84% 
2016-05-24 0.37% 0.71% -1.03% 0.70% 
2016-05-25 -0.20% -0.13% -0.42% 0.18% 
2016-05-26 -0.20% -0.35% 0.18% -0.38% 
2016-05-27 0.55% 0.72% 0.31% 0.42% 
2016-05-30 -0.25% -0.29% -0.07% 0.82% 
2016-05-31 0.14% 0.13% 1.10% -0.38% 
2016-06-01 -0.55% -0.50% 0.42% -1.07% 
2016-06-02 0.23% 0.34% -0.04% -0.61% 
2016-06-03 -1.45% -1.90% -0.66% -2.14% 
2016-06-06 -0.56% 0.10% 0.55% 0.97% 
2016-06-07 -0.55% -0.03% -0.71% -0.19% 
2016-06-08 -0.62% -0.31% 0.28% -0.35% 
2016-06-09 0.55% 0.68% 0.30% 0.10% 
2016-06-10 0.02% 0.59% 1.42% -0.10% 
2016-06-13 -0.03% -0.38% -0.11% -0.68% 
2016-06-14 -0.11% 0.76% 1.11% -0.13% 
2016-06-15 -0.21% -0.46% -0.64% -0.08% 
2016-06-16 0.40% 0.30% 0.03% -1.67% 
2016-06-17 -0.54% -0.45% -1.08% -0.12% 
2016-06-19 0.00% -0.63% -1.55% 0.48% 
2016-06-20 0.22% 0.23% -0.61% -0.75% 
2016-06-21 0.02% 0.66% 0.35% 0.98% 

指数由两轴

In [30]: p[:, '2016-06-10':'2016-06-15', 'Change %'] 
Out[30]: 
      USD/CHF USD/EUR USD/GBP USD/JPY 
Date 
2016-06-10 0.02% 0.59% 1.42% -0.10% 
2016-06-13 -0.03% -0.38% -0.11% -0.68% 
2016-06-14 -0.11% 0.76% 1.11% -0.13% 
2016-06-15 -0.21% -0.46% -0.64% -0.08%