python爬取谷歌学术_python爬取谷歌热度

python爬取谷歌学术_python爬取谷歌热度Boblee人工智能硕士毕业,擅长及爱好python,基于python研究人工智能、群体智能、区块链等技术,并使用python开发前后端、爬虫等。1.背景今天打算翻一波墙,爬取一下谷歌搜索热度。找台可以浏览谷歌得电脑,打开谷歌热度。以区块链为条件进行搜索。谷歌还是很人性化提供下载。2.数据爬取F12打开一下,找到数据来源,如下:发现网址是这个:https://trends.google.com…

Boblee人工智能硕士毕业,擅长及爱好python,基于python研究人工智能、群体智能、区块链等技术,并使用python开发前后端、爬虫等。

1.背景

今天打算翻一波墙,爬取一下谷歌搜索热度。找台可以浏览谷歌得电脑,打开谷歌热度。以区块链为条件进行搜索。谷歌还是很人性化提供下载。

c1e221d2aff6238dcfefe3cc2d63c85e.png

2.数据爬取

F12打开一下,找到数据来源,如下:

94280fac7f4647dd401ad775b3236bef.png

发现网址是这个:https://trends.google.com/trends/api/widgetdata/multiline?hl=zh-CN&tz=-480&req=%7B%22time%22:%222019-04-25+2020-04-25%22,%22resolution%22:%22WEEK%22,%22locale%22:%22zh-CN%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22%E5%8C%BA%E5%9D%97%E9%93%BE%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXqWN2yAvi1CbzrzgWceV9bctbP5a5ByT&tz=-480

其它都好说这个token是什么鬼。

a59aecbc9902e3b5ef9a52278aa02d2e.png

token全局一搜索。

7f3ddee845fd0e289833a3e8ddbf5c10.png

找到个新链接:

https://trends.google.com/trends/api/explore?hl=zh-CN&tz=-480&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22%E5%8C%BA%E5%9D%97%E9%93%BE%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-480

好吧,思路有了先获取token,最后在获取内容。

3.代码实现

1.获取token。

# -*- coding: utf-8 -*-import timeimport jsonimport requestsdef get_token():try:rs = requests.get(‘https://trends.google.com/trends/explore?q=blockchain’)# print(rs.cookies.get_dict()[‘NID’])headers = {‘authority’: ‘trends.google.com’,’method’: ‘GET’,’path’: ‘/trends/api/explore?hl=zh-CN&tz=-480&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22blockchain%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-480′,’scheme’: ‘https’,’accept’: ‘application/json, text/plain, */*’,’accept-encoding’: ‘gzip, deflate, br’,’referer’: ‘https://trends.google.com/trends/explore?q=blockchain’,’user-agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36′,’cookie’: ‘__utmc=10102256; __utmz=10102256.1578533764.7.6.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __utma=10102256.1870595916.1574039570.1578533764.1578534245.8; __utmt=1; __utmb=10102256.7.9.1578548386183; CONSENT=YES+NL.zh-CN+V14; ANID=AHWqTUmMKNQcriAUhD0KV5fhDVWYVzGRm6ITNdfFwaDGvMtTx7Cyo4zUq8eCkbG9; NID=’ +rs.cookies.get_dict()[‘NID’] + ‘; 1P_JAR=2020-1-9-5′,’x-client-data’: ‘CLO1yQEIjLbJAQiltskBCMS2yQEIqZ3KAQioo8oBCLGnygEI4qjKAQjxqcoBCMuuygEI97TKAQ==’}s = requests.Session()data = s.get(‘https://trends.google.com/trends/api/explore?hl=zh-CN&tz=-480&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22blockchain%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-480’,headers=headers)data = data.texttoken = json.loads(data[data.find(‘{‘):])[‘widgets’][0][‘token’]return {‘token’: token}except Exception as e:return {‘error’: str(e)}

2.获取数据。

def get_data(token):time_stop = str(time.strftime(“%Y-%m-%d”))time_from_ = time_stop.split(‘-‘)time_from = str(int(time_from_[0]) – 1) + ‘-‘ + time_from_[1] + ‘-‘ + time_from_[2]url = ‘https://trends.google.com/trends/api/widgetdata/multiline?hl=zh-CN&tz=-480&req=%7B%22time%22:%22’ + time_from + ‘+’ + time_stop + ‘%22,%22resolution%22:%22WEEK%22,%22locale%22:%22zh-CN%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22blockchain%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=’ + str(token) + ‘&tz=-480’# print(url)rsp = requests.get(url)rsp = str(rsp.text)data = ‘{‘ + rsp[rsp.find(‘”default”‘):]# print(/ans)data_time =[]data_value =[]data = json.loads(data)[‘default’][‘timelineData’]for i in range(len(data)):temp = data[i]time_ = int(temp[‘time’])time_local = time.localtime(time_)# 转换成新的时间格式(2016-05-05 20:28:54)data_time.append(time.strftime(“%d/%m/%Y %H:%M:%S”, time_local))data_value.append(str(temp[‘value’]).replace(‘[‘, ”).replace(‘]’, ”)) # 259return data_time, data_value

b054276a0b05e1f92ff849d734441cd7.png

4.结语

本文实现了自动化爬取谷歌热度得功能,条件有点复杂,需要找一台能上谷歌的电脑,注意设置为全局模式。保证python中可以请求谷歌热度。

85bcba88bb2ec9a21c18cb22e74781a4.png

今天的文章python爬取谷歌学术_python爬取谷歌热度分享到此就结束了,感谢您的阅读,如果确实帮到您,您可以动动手指转发给其他人。

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/26168.html

(0)
编程小号编程小号

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注