试图解析斯威夫特4 HTML仅使用标准库
我试图解析一些HTML拉弦的任何事件后的所有环节:试图解析斯威夫特4 HTML仅使用标准库
market_listing_row_link的“href =”
收集项目URL的列表只使用Swift 4标准库。
我认为我需要的是一个for循环,它继续检查字符的条件,一旦找到完整的字符串,它开始读取下列项目的URL到数组中,直到达到双引号,然后停止,然后重复这个过程直到文件结束。在C中我们略微熟悉一下,我们可以访问一个函数(我认为它是fgetc),它在为文件推进位置指示器时做了这个。在Swift中有没有类似的方法?
我的代码到目前为止只能找到第一次出现的字符串,我在找10个需要查找的字符串。
import Foundation
extension String {
func slice(from: String, to: String) -> String? {
return (range(of: from)?.upperBound).flatMap { substringFrom in
(range(of: to, range: substringFrom..<endIndex)?.lowerBound).map { substringTo in
String(self[substringFrom..<substringTo])
}
}
}
}
let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")!
let itemListHTML = try String(contentsOf: itemListURL, encoding: .utf8)
let itemURL = URL(string: itemListHTML.slice(from: "market_listing_row_link\" href=\"", to: "\"")!)!
print(itemURL)
// Prints the current first URL found matching: http://steamcommunity.com/market/listings/252490/Wyrm%20Chest
您可以使用正则表达式来找到两个特定的字符串之间的所有字符串出现(检查这个SO answer),并使用扩展方法ranges(of:)
从这个answer来获取正则表达式的所有范围。您只需要将选项.regularExpression传递给该方法。
extension String {
func ranges(of string: String, options: CompareOptions = .literal) -> [Range<Index>] {
var result: [Range<Index>] = []
var start = startIndex
while let range = range(of: string, options: options, range: start..<endIndex) {
result.append(range)
start = range.lowerBound < range.upperBound ? range.upperBound : index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
}
return result
}
func slices(from: String, to: String) -> [Substring] {
let pattern = "(?<=" + from + ").*?(?=" + to + ")"
return ranges(of: pattern, options: .regularExpression)
.map{ self[$0] }
}
}
测试操场
let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")!
let itemListHTML = try! String(contentsOf: itemListURL, encoding: .utf8)
let result = itemListHTML.slices(from: "market_listing_row_link\" href=\"", to: "\"")
result.forEach({print($0)})
结果
http://steamcommunity.com/market/listings/252490/Night%20Howler%20AK47 http://steamcommunity.com/market/listings/252490/Hellcat%20SAR http://steamcommunity.com/market/listings/252490/Metal http://steamcommunity.com/market/listings/252490/Volcanic%20Stone%20Hatchet http://steamcommunity.com/market/listings/252490/Box http://steamcommunity.com/market/listings/252490/High%20Quality%20Bag http://steamcommunity.com/market/listings/252490/Utilizer%20Pants http://steamcommunity.com/market/listings/252490/Lizard%20Skull http://steamcommunity.com/market/listings/252490/Frost%20Wolf http://steamcommunity.com/market/listings/252490/Cloth
不要忘记使用URLSession的dataTask异步获取你的网址的HTML数据 –
这是完美的!谢谢! – ANoobSwiftly
我张贴这种作为,而不是一个答案,因为它并不直接回答你的问题中留言。您是否考虑过使用[XMLParser](https://developer.apple.com/documentation/foundation/xmlparser)?真正的XML解析通常优于正则表达式,当涉及到HTML时,例如,请参见[着名的Stack Overflow答案。](https://*.com/questions/1732348/regex-match-open-tags-except -xhtml-self-contained-tags/1732454#1732454) –
@AlanKantz HTML不是XML,除非它碰巧实际上是xHTML。 – rmaddy
@AlanKantz忘记它是HTML,我想为一串字符搜索一串无意义的字符,将该序列后面的字符读入一个字符串变量,直到某个字符,然后继续搜索该序列的另一个事件以重复该过程。 – ANoobSwiftly