如何排除搜索结果(链接)从谷歌搜索在Java
问题描述:
我想过滤所有的网站链接出谷歌搜索。如果我寻找某些东西,我想要获取网站上的所有网站链接,Google会向我们展示这些链接。如何排除搜索结果(链接)从谷歌搜索在Java
首先我想要阅读完整的html内容。之后我想过滤掉所有重要的网址。例如 - >如果我把“买鞋子”的话放进谷歌 - >我想获得像“www.amazon.in/Shoes”等链接。
如果我开始我的节目,我只得到了几个网址,只有Google为基础的网站,如“google.de/intl/de/options/”
PS:我检查与相同的查询页面的源代码( “购买+鞋子”),并注意Chrome浏览器比firefox浏览器提供更多的内容。我的感觉是,我只能得到少数网站的结果,因为java像Firefox浏览器那样连接,不是吗? 如何获得所有这些链接,哪些谷歌显示?
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.nio.charset.Charset;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class findEveryUrl {
public static void main(String[] args) throws IOException
{
String gInput = "https://www.google.de/#q=";
// setKeyWord asks you to enter the keyword into the console
String fullUrl = gInput + setKeyWord();
//fullUrl is used for the InputStream and "www." is the string, which is used for splitting
findAllSubs(fullUrl, "www.");
//System.out.println("given url: " + fullUrl);
}
/*
* @param <T> String type.
* @param urlString has to be the full Url.
* @param splitphrase is the String which is used for splitting.
* @return void
*/
static void findAllSubs(String urlString, String splitphrase)
{
try
{
URL url = new URL(urlString);
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine ;
String array[];
while ((inputLine = in.readLine()) != null){
inputLine += in.readLine();
array = inputLine.split(splitphrase);
arrayToConsol(array);
}
}catch (IOException e) {
e.printStackTrace();
}
}
/*
* urlQuery() asks you for the search keyword for the google query
* @return returns the keyword, which you wrote into the console
*/
public static String setKeyWord(){
BufferedReader console = new BufferedReader(new InputStreamReader(System.in));
System.out.print("Enter a KeyWord: ");
//googles search engine url
String keyWord = null;
try {
keyWord = console.readLine();
} catch (IOException e) {
// shouldn't be happen
e.printStackTrace();
}
return keyWord;
}
public static void arrayToConsol(String[] array){
for (String item : array) {
System.out.println(item);
}
}
public static void searchQueryToConsole(String url) throws IOException{
URL googleSearch = new URL(url);
URLConnection yc = googleSearch.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}}
答
这里是简单和容易的解决方案。
http://www.programcreek.com/2012/05/call-google-search-api-in-java-program/
但是如果你想要解析使用CSS选择器来查找元素JSoup其伟大的图书馆的其他页面。
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
谢谢Daredesm,为你快速回复=) – 2014-10-05 20:10:08