1 java也可以爬取第三方网站的数据;
注: 1 ip限制【防爬】
2 header参数referer
3 伪装hearder ua
就源引 一个第三方代理网站试试
{
Random r = new Random();
String[] ua = {"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.87 Safari/537.36 OPR/37.0.2178.32",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586",
"Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0)",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 BIDUBrowser/8.3 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36 Core/1.47.277.400 QQBrowser/9.4.7658.400",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 UBrowser/5.6.12150.8 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36 TheWorld 7",
"Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/60.0"};
int i = r.nextInt(14);
logger.info("检测中------ {}:{}",ip,port );
Map<String,String> map = new HashMap<String,String>();
map.put("waybillNo","DD1838768852");
try {
total ++ ;
long a = System.currentTimeMillis();
//爬取的目标网站,url记得换下。。。!!! 代理ip网站
Document doc = Jsoup.connect("http://xxxx.com/dayProxy/ip/314639.html")
.timeout(5000)
//.proxy(ip, port)
.data(map)
.ignoreContentType(true)
.userAgent(ua[i])
.header("referer","http://xxxx.com/dayProxy.html")//这个来源记得换..
.post();
System.out.println(ip+":"+port+"访问时间:"+(System.currentTimeMillis() -a) + " 访问结果: "+doc.text());
suc ++ ;
} catch (IOException e) {
e.printStackTrace();
fail ++ ;
}finally {
if (total == count ) {
System.out.println("总次数:"+total);
System.out.println("成功次数:"+suc);
System.out.println("失败次数:"+fail);
}
}
}
这样通过org.jsoup.nodes.Document解析返回的数据, 解析出ip 和端口,
然后 上面的同样代码只要
.proxy(ip, port)
放开这句 填入对应的ip port即可开启代理访问模式 ,
可以过滤90%的反防;
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/38515.html