Boblee人工智能硕士毕业,擅长及爱好python,基于python研究人工智能、群体智能、区块链等技术,并使用python开发前后端、爬虫等。
1.背景
今天打算翻一波墙,爬取一下谷歌搜索热度。找台可以浏览谷歌得电脑,打开谷歌热度。以区块链为条件进行搜索。谷歌还是很人性化提供下载。
2.数据爬取
F12打开一下,找到数据来源,如下:
发现网址是这个:https://trends.google.com/trends/api/widgetdata/multiline?hl=zh-CN&tz=-480&req=%7B%22time%22:%222019-04-25+2020-04-25%22,%22resolution%22:%22WEEK%22,%22locale%22:%22zh-CN%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22%E5%8C%BA%E5%9D%97%E9%93%BE%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXqWN2yAvi1CbzrzgWceV9bctbP5a5ByT&tz=-480
其它都好说这个token是什么鬼。
token全局一搜索。
找到个新链接:
https://trends.google.com/trends/api/explore?hl=zh-CN&tz=-480&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22%E5%8C%BA%E5%9D%97%E9%93%BE%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-480
好吧,思路有了先获取token,最后在获取内容。
3.代码实现
1.获取token。
# -*- coding: utf-8 -*-import timeimport jsonimport requestsdef get_token():try:rs = requests.get(‘https://trends.google.com/trends/explore?q=blockchain’)# print(rs.cookies.get_dict()[‘NID’])headers = {‘authority’: ‘trends.google.com’,’method’: ‘GET’,’path’: ‘/trends/api/explore?hl=zh-CN&tz=-480&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22blockchain%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-480′,’scheme’: ‘https’,’accept’: ‘application/json, text/plain, */*’,’accept-encoding’: ‘gzip, deflate, br’,’referer’: ‘https://trends.google.com/trends/explore?q=blockchain’,’user-agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36′,’cookie’: ‘__utmc=10102256; __utmz=10102256.1578533764.7.6.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __utma=10102256.1870595916.1574039570.1578533764.1578534245.8; __utmt=1; __utmb=10102256.7.9.1578548386183; CONSENT=YES+NL.zh-CN+V14; ANID=AHWqTUmMKNQcriAUhD0KV5fhDVWYVzGRm6ITNdfFwaDGvMtTx7Cyo4zUq8eCkbG9; NID=’ +rs.cookies.get_dict()[‘NID’] + ‘; 1P_JAR=2020-1-9-5′,’x-client-data’: ‘CLO1yQEIjLbJAQiltskBCMS2yQEIqZ3KAQioo8oBCLGnygEI4qjKAQjxqcoBCMuuygEI97TKAQ==’}s = requests.Session()data = s.get(‘https://trends.google.com/trends/api/explore?hl=zh-CN&tz=-480&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22blockchain%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-480’,headers=headers)data = data.texttoken = json.loads(data[data.find(‘{‘):])[‘widgets’][0][‘token’]return {‘token’: token}except Exception as e:return {‘error’: str(e)}
2.获取数据。
def get_data(token):time_stop = str(time.strftime(“%Y-%m-%d”))time_from_ = time_stop.split(‘-‘)time_from = str(int(time_from_[0]) – 1) + ‘-‘ + time_from_[1] + ‘-‘ + time_from_[2]url = ‘https://trends.google.com/trends/api/widgetdata/multiline?hl=zh-CN&tz=-480&req=%7B%22time%22:%22’ + time_from + ‘+’ + time_stop + ‘%22,%22resolution%22:%22WEEK%22,%22locale%22:%22zh-CN%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22blockchain%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=’ + str(token) + ‘&tz=-480’# print(url)rsp = requests.get(url)rsp = str(rsp.text)data = ‘{‘ + rsp[rsp.find(‘”default”‘):]# print(/ans)data_time =[]data_value =[]data = json.loads(data)[‘default’][‘timelineData’]for i in range(len(data)):temp = data[i]time_ = int(temp[‘time’])time_local = time.localtime(time_)# 转换成新的时间格式(2016-05-05 20:28:54)data_time.append(time.strftime(“%d/%m/%Y %H:%M:%S”, time_local))data_value.append(str(temp[‘value’]).replace(‘[‘, ”).replace(‘]’, ”)) # 259return data_time, data_value
4.结语
本文实现了自动化爬取谷歌热度得功能,条件有点复杂,需要找一台能上谷歌的电脑,注意设置为全局模式。保证python中可以请求谷歌热度。
今天的文章python爬取谷歌学术_python爬取谷歌热度分享到此就结束了,感谢您的阅读,如果确实帮到您,您可以动动手指转发给其他人。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/26168.html