VBA中的HTML解析
我正在尝试更新morningstar的网站的基金规模。我以前的IE自动化尝试没有成功,所以我切换到XML httpRequest(工作速度也快得多)。现在,我无法从我从网站获得的文档中打印出正确的线条。我希望代码在第一个“td” - 标签被称为“基金规模(Mil)”的“tr” - 标签内给我第三个“td” - 标签。所以代码循环遍历所有“td”标签的标题,并且如果发现"{line heading}" = "Fund Size (Mil)"
就跳转到行动。现在这是问题。我不知道如何参考这些标题。我试着将每个“td”-tag设置为一个变量(在“tr”-tag中有3个“td” - 标签,所以我对每个“td-tag”都有变量row1,row2,row3) ),但是当我现在做这个,我得到一个运行时错误438:对象不支持此属性或方法符合VBA中的HTML解析
Debug.Print TDElements.getElementsByTagName("tr")(0).Cells(0).innerHTML
此外,当我Debug.Print TDElement.innerHTML
,我没有看到我需要的“td”标签。当我最后放入.innerText
时,我会看到所有“td”标签。
现在的问题是: 1)如何直接引用标题? (请参阅下面For Each
循环内的注释代码行) 2)什么原因导致我没有看到带有.innerHTML的所有td-tags,但是我用.innerText看到它们?
网址 http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW Excel 2010中,IE版本11
Sub XMLhttpRequestTest2()
'Microsoft XML, v 6.0
'Microsoft HTML object library, used in parsing HTML
Dim myurl As String
Dim TDElement As Object
Dim TDElements As IHTMLElementCollection
Dim IE As MSXML2.XMLHTTP60
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.HTMLBody
Set IE = New MSXML2.XMLHTTP60
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
myurl = "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
IE.Open "GET", myurl, False
IE.send
HTMLBody.innerHTML = IE.responseText
Set TDElements = HTMLDoc.getElementsByTagName("td")
For Each TDElement In TDElements
Debug.Print TDElement.innerText '.innerText/.innerHTML. Can't see the fund size with .innerHTML??
' If "{line heading}" = "Fund Size (Mil)" Then 'How can I refer to headings in the html document?
' Worksheets("Sheet3").Range("B3") = Split("{line text}", ";")(1) 'reference to line text?
Next
End Sub
H2SO4的答案固定的上述问题。以下是对最初问题的扩展。
现在TDElement被分配了几行文本(我将h2so4的值10切换到3,因此代码在找到基金字符串时显示接下来的3行)。我怎样才能进一步解析呢?当前行Worksheets("helper").Cells(x, 6).Value = Split(TDElement.innerText, " ")(1)
返回我需要的值(769.28),但如果我真的明白这里发生了什么,将来会有帮助。
所以,只是为了锻炼,我将如何获得所有3行打印在自己的细胞?因此,产出将为:基金规模(米尔),31/08/2017,769.28单元格(x,6),(x,7),(x,8)。当我尝试将函数“Split”或“Left”分配给TDElement时,函数只会定位最后一行,而不是上面的其他行。但是,当我Debug.Print TDElement.innerText
/.innerHTML
,我也看到其他线路。那么我怎样才能“访问”最后一行之上的行呢?
输出的Debug.Print TDElement.innerText
:
基金规模(MIL)
31/08/2017
EUR 769.28
下面的代码将让你的 “基金规模” 行
Sub XMLhttpRequestTest2()
'Microsoft XML, v 6.0
'Microsoft HTML object library, used in parsing HTML
Dim myurl As String
Dim TDElement As Object
Dim TDElements As IHTMLElementCollection
Dim IE As MSXML2.XMLHTTP60
Dim Flag As Boolean
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.HTMLBody
Dim k As Long
Set IE = New MSXML2.XMLHTTP60
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
myurl = "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
IE.Open "GET", myurl, False
IE.send
HTMLDoc.body.innerHTML = IE.responseText
Flag = False
k = 0
Set TDElements = HTMLDoc.getElementsByTagName("td")
For Each TDElement In TDElements
If InStr(TDElement.innerText, "Fund Size") <> 0 Or Flag Then
'if fundsize string is found, display the next 10 lines
Debug.Print ":" & TDElement.innerText '.innerText/.innerHTML. Can't see the fund size with .innerHTML??
' If "{line heading}" = "Fund Size (Mil)" Then 'How can I refer to headings in the html document?
' Worksheets("Sheet3").Range("B3") = Split("{line text}", ";")(1) 'reference to line text?
k = k + 1
If k < 10 Then Flag = True Else Flag = False
End If
Next
End Sub
回答你的扩展,分割a的方法nswer取决于页面设计的方式。这里是一个可能的解决方案,让您的数据在3个不同的单元格中。
Sub XMLhttpRequestTest2()
'Microsoft XML, v 6.0
'Microsoft HTML object library, used in parsing HTML
Dim myurl As String
Dim TDElement As Object
Dim TDElements As IHTMLElementCollection
Dim IE As MSXML2.XMLHTTP60
Dim Flag As Boolean
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.HTMLBody
Dim k As Long, text
Set IE = New MSXML2.XMLHTTP60
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
myurl = "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
IE.Open "GET", myurl, False
IE.send
HTMLDoc.body.innerHTML = IE.responseText
Flag = False
k = 0
Set TDElements = HTMLDoc.getElementsByTagName("td")
For Each TDElement In TDElements
If InStr(TDElement.innerText, "Fund Size") <> 0 Or Flag Then
'if fundsize string is found, display the next 10 lines
text = Split(TDElement.innerText, vbLf)
If text(0) <> "" Then
Worksheets("Sheet3").Cells(3, k + 2).Resize(, UBound(text) + 1) = text '.innerText/.innerHTML. Can't see the fund size with .innerHTML??
' If "{line heading}" = "Fund Size (Mil)" Then 'How can I refer to headings in the html document?
' Worksheets("Sheet3").Range("B3") = Split("{line text}", ";")(1) 'reference to line text?
End If
k = k + 1
If k < 3 Then Flag = True Else Flag = False
End If
Next
End Sub
非常感谢!这工作。我通过声明一个新的整数变量n并将其设置为3而不是10(仅针对我需要的行)来修改您的代码。然而,我想知道为什么我只能解析代码返回的最后一行。请参阅上面最初问题**的**扩展。 – Samppa