去哪儿旅游数据分析

编程基础 • 2024-12-10 12:40 • 阅读 62

去哪儿旅游数据分析该博客通过对去哪儿旅游数据的获取清洗和可视化展示揭示了旅游趋势

去哪儿旅游数据分析

获取数据

（爬取上）

import requests from bs4 import BeautifulSoup import re import time import csv import random #爬取每个网址的分页 fb = open('url.txt','w') url = 'http://travel.qunar.com/travelbook/list.htm?page={}&order=hot_heat&avgPrice=1_2' #请求头，cookies在电脑网页中可以查到 headers={ 
   'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.360', 'cookies':'JSESSIONID=5E9DCED1A95B8643B49DF; QN1=00002b80306c204d8c38c41b; QN300=s%3Dbaidu; QN99=2793; QN205=s%3Dbaidu; QN277=s%3Dbaidu; QunarGlobal=10.86.213.148_-3ad026b5_b8f_-44df|99; QN601=64fd2a8e533e94d422ac3da458ee6e88; _i=RBTKSueZDCmVnmnwlQKbrHgrodMx; QN269=D32536A056A711EA8A2FFA163E642F8B; QN48=f-3a3c-496c-9370-e033bd32cbcc; fid=ae39c42c-66b4-4e2d-880f-fb3f1bfe72d0; QN49=; csrfToken=51sGhnGXCSQTDKWcdAWIeIrhZLG86cka; QN163=0; Hm_lvt_c56a2baadeafc=,,,; viewdist=-1; uld=1--1-|1--1-|1--1-|1--1-; _vi=6vK5Gry4UmXDT70IFohKyFF8R8Mu0SvtUfxawwaKYRTq9NKud1iKUt8qkTLGH74E80hXLLVOFPYqRGy52OuTFnhpWvBXWEbkOJaDGaX_5L6CnyiQPPOYb2lFVxrJXsVd-W4NGHRzYtRQ5cJmiAbasK8kbNgDDhkJVTC9YrY6Rfi2; viewbook=||||; QN267=c32674; Hm_lpvt_c56a2baadeafc=; QN271=c8712b13-2065-4aa7-a70b-e6156f6fc216', 'referer':'http://travel.qunar.com/travelbook/list.htm?page=1&order=hot_heat&avgPrice=1'} count = 1 #共200页 for i in range(1,201): url_ = url.format(i) try: response = requests.get(url=url_,headers = headers) response.encoding = 'utf-8'
        html = response.text soup = BeautifulSoup(html,'lxml') #print(soup) all_url = soup.find_all('li',attrs={ 
   'class': 'list_item'}) print('正在爬取第%s页' % count) for each in all_url: each_url = each.find('h2')['data-bookid'] fb.write(each_url) fb.write('\n') count+=1 except Exception as e: print(e)

在这里插入图片描述

获取数据

（爬取下）

import requests from bs4 import BeautifulSoup import re import time import csv import random url_list = [] with open('url.txt','r') as f: for i in f.readlines(): i = i.strip() url_list.append(i) the_url_list = [] for i in range(len(url_list)): url = 'http://travel.qunar.com/youji/' the_url = url + str(url_list[i]) the_url_list.append(the_url) last_list = [] def spider(): headers = { 
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.360', 'cookies': 'QN1=00002b80306c204d8c38c41b; QN300=s%3Dbaidu; QN99=2793; QN205=s%3Dbaidu; QN277=s%3Dbaidu; QunarGlobal=10.86.213.148_-3ad026b5_b8f_-44df|99; QN601=64fd2a8e533e94d422ac3da458ee6e88; _i=RBTKSueZDCmVnmnwlQKbrHgrodMx; QN269=D32536A056A711EA8A2FFA163E642F8B; QN48=f-3a3c-496c-9370-e033bd32cbcc; fid=ae39c42c-66b4-4e2d-880f-fb3f1bfe72d0; QN49=; csrfToken=51sGhnGXCSQTDKWcdAWIeIrhZLG86cka; QN163=0; Hm_lvt_c56a2baadeafc=,,,; viewdist=-1; uld=1--1-|1--1-|1--1-|1--1-; viewbook=||||; QN267=d93fcee; _vi=vofWa8tPffFKNx9MM0ASbMfYySr3IenWr5QF22SjnOoPp1MKGe8_-VroXhkC0UNdM0WdUnvQpqebgva9VacpIkJ3f5lUEBz5uyCzG-xVsC-sIV-jEVDWJNDB2vODycKN36DnmUGS5tvy8EEhfq_soX6JF1OEwVFXk2zow0YZQ2Dr; Hm_lpvt_c56a2baadeafc=; QN271=fc8dd4bc-3fe6-4690-9823-e27d28e9718c', 'Host': 'travel.qunar.com' } count = 1 for i in range(len(the_url_list)): try: print('正在爬取第%s页'% count) response = requests.get(url=the_url_list[i],headers = headers) response.encoding = 'utf-8'
            html = response.text soup = BeautifulSoup(html,'lxml') information = soup.find('p',attrs={ 
   'class': 'b_crumb_cont'}).text.strip().replace(' ','') info = information.split('>') if len(info)>2: location = info[1].replace('\xa0','').replace('旅游攻略','') introduction = info[2].replace('\xa0','') else: location = info[0].replace('\xa0','') introduction = info[1].replace('\xa0','') #print(location) #print(introduction) other_information = soup.find('ul',attrs={ 
   'class': 'foreword_list'}) when = other_information.find('li',attrs={ 
   'class': 'f_item when'}) time1 = when.find('p',attrs={ 
   'class': 'txt'}).text.replace('出发日期','').strip() howlong = other_information.find('li',attrs={ 
   'class': 'f_item howlong'}) day = howlong.find('p', attrs={ 
   'class': 'txt'}).text.replace('天数','').replace('/','').replace('天','').strip() howmuch = other_information.find('li',attrs={ 
   'class': 'f_item howmuch'}) money = howmuch.find('p', attrs={ 
   'class': 'txt'}).text.replace('人均费用','').replace('/','').replace('','').strip() who = other_information.find('li',

今天的文章去哪儿旅游数据分析分享到此就结束了，感谢您的阅读。

编程小号

python爬虫去哪儿网_用python爬虫爬取去哪儿4500个热门景点，看看国庆不能去哪儿...

上一篇 2024-12-10 12:46

电感笔记汇总

下一篇 2024-12-10 12:40

python爬虫去哪儿网_用python爬虫爬取去哪儿4500个热门景点，看看国庆不能去哪儿... 1733775683
【python爬虫专项（10）】去哪儿网景点数据采集 1733775679
去哪儿网机票服务请求头pre逆向 1733775678
爬取去哪儿网机票数据 1733775675
【有效】最新爬取音乐，纯接口访问实现。Python3、requests、美丽汤、tqdm实战 1733775674
投影向量计算公式的推导 1733775670
Shakti是什么？ 1733775667
自学指南：PLC高效自学，方法都在这里！ 1733775666
VC++的DDX/DDV机制 1733775664
电感笔记汇总 1733775689
Vatee万腾平台：开启企业智能升级新篇章的钥匙 1733775690
XAPI项目架构：应对第三方签名认证的设计与调整 1733775692
快速自适应交叉近似算法（FACA） 1733775693
去哪儿网网页版机票数据 1733775695
requests爬取去哪儿网站 1733775697
JavaScript的函数,事件(常用事件,事件对象),内置对象(字符串,数组,Date,Nath)超超超级详解!!! 1733775701
修改IP地址的方法有哪些？总结8个方法 1733775702
Scrapy+MySQL爬取去哪儿网 1733775704

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至举报，一经查实，本站将立刻删除。
如需转载请保留出处：https://bianchenghao.cn/bian-cheng-ji-chu/82713.html