Python基础：标准库和常用的第三方库

Python的标准库

名称	作用
datetime	为日期和时间处理同时提供了简单和复杂的方法
zlib	直接支持通用的数据打包和压缩格式：zlib，gzip，bz2，zipfile，以及 tarfile
random	提供了生成随机数的工具
math	为浮点运算提供了对底层C函数库的访问
sys	工具脚本经常调用命令行参数。这些命令行参数以链表形式存储于 sys 模块的 argv 变量
glob	提供了一个函数用于从目录通配符搜索中生成文件列表
os	提供了不少与操作系统相关联的函数
urllib	获取网页源码

Python常用的第三方库

名称	作用	使用参考
Scrapy	爬虫工具常用的库	Python爬虫之Scrapy环境搭建_简言-CSDN博客
Requests	http库	python做接口测试或者爬数据常用
Pillow	是PIL（Python图形库）的一个分支。适用于在图形领域工作的人	Python实现图像处理：PiL依赖库的应用_简言-CSDN博客
matplotlib	绘制数据图的库。对于数据科学家或分析师非常有用
OpenCV	图片识别常用的库，通常在练习人脸识别时会用到	OpenCV的作用及安装_简言-CSDN博客_opencv的作用
pytesseract	图片文字识别，即OCR识别	Python实现OCR识别：pytesseract_简言-CSDN博客
jira	操作jira，查询Jira信息，操作jira	Python查询Jira issue信息_简言-CSDN博客
python-jenkins	操作jenkins	Python实现jenkins操作和批量部署_简言-CSDN博客
python-gitlab	查询gitlab信息	使用gitpython和python-gitlab操作git_简言-CSDN博客
wxPython	Python的一个GUI（图形用户界面）工具
Twisted	对于网络应用开发者最重要的工具
SymPy	SymPy可以做代数评测、差异化、扩展、复数等等
SQLAlchemy	数据库的库
SciPy	Python的算法和数学工具库
Scapy	数据包探测和分析库
pywin32	提供和windows交互的方法和类的Python库
pyQT	Python的GUI工具。给Python脚本开发用户界面时次于wxPython的选择
pyGtk	也是Python GUI库
Pyglet	3D动画和游戏开发引擎
Pygame	开发2D游戏的时候使用会有很好的效果
NumPy	为Python提供了很多高级的数学方法
nose	Python的测试框架
nltk	自然语言工具包
IPython	Python的提示信息。包括完成信息、历史信息、shell功能，以及其他很多很多方面
BeautifulSoup	xml和html的解析库，对于新手非常有用

标准库用法参考示例

datetime：

为日期和时间处理同时提供方法。

from datetime import date
#导入时间库
now=date.today()
#取当前时间
print(now)
birthday=date(1987,12,3)
print(birthday)
age=now-birthday
#假设年龄=当前日期-生日日期
print(age)

运行结果为：

2019-05-04
1987-12-03
11475 days, 0:00:00

zlib：

提供压缩和解压的功能。

import zlib
m = b'This is a test compress'
print(m)
m1=len(m)
#查看字符串的长度
print(m1)
t = zlib.compress(m)
#假设压缩后的内容为t
t1=len(t)
#查看压缩后内容t的长度
print(t)
print(t1)
s = zlib.decompress(t)
#解压缩后的内容为s
print(s)

运行结果为：

b’This is a test compress’
23
b’x\x9c\x0b\xc9\xc8,V\x00\xa2D\x85\x92\xd4\xe2\x12\x85\xe4\xfc\xdc\x82\xa2\xd4\xe2b\x00ah\x08\x82′
29
b’This is a test compress’

使用python zlib进行字符串压缩

命令

字符串：使用zlib.compress可以压缩字符串。使用zlib.decompress可以解压字符串。
数据流：压缩：compressobj，解压：decompressobj

案例

>>> import zlib
>>> s = 'slfsjdalfkasflkkdkaleeeeeeeeeeeeeeeeeeeeeeeeeeeelaaalkllfksaklfasdll  kkkkkk123'
>>> zlib_s = zlib.compress(s)
>>> zlib_s
'x\x9c}\xca\xb1\r\xc0 \x10\x04\xc1Vh\xc1\xb8\xa2\x93\x9e\x0f|\x9b]\xff\x92\x11\x050\xf1\x84\xceW\xa2\xad4vY\xac\x0b$a\xf6\x8fL+\x05c\xf8x\xe6\xfb\x03\xf7\x97\x1e\xd1'

>>> print tlen(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'tlen' is not defined
>>> print len(s)
79
>>> print len(zlib_s)
55
>>> ss = zlib.decompress(zlib_s)
>>> ss
'slfsjdalfkasflkkdkaleeeeeeeeeeeeeeeeeeeeeeeeeeeelaaalkllfksaklfasdll  kkkkkk123'

压缩与解压缩文件

import zlib
def compress(infile, dst, level=9):
    infile = open(infile, 'rb')
    dst = open(dst, 'wb')
    compress = zlib.compressobj(level)
    data = infile.read(1024)
    while data:
        dst.write(compress.compress(data))
        data = infile.read(1024)
    dst.write(compress.flush())
def decompress(infile, dst):
    infile = open(infile, 'rb')
    dst = open(dst, 'wb')
    decompress = zlib.decompressobj()
    data = infile.read(1024)
    while data:
        dst.write(decompress.decompress(data))
        data = infile.read(1024)
    dst.write(decompress.flush())
    
if __name__ == "__main__":
    infile = "1.txt"
    dst = "1.zlib.txt"
    compress(infile, dst)
    
    infile = "1.zlib.txt"
    dst = "2.txt"
    decompress(infile, dst)
    print "done~"

注：compressobj返回一个压缩对象，用来压缩不能一下子读入内存的数据流。 level 从9到-1表示压缩等级，其中1最快但压缩度最小，9最慢但压缩度最大，0不压缩，默认是-1大约相当于与等级6，是一个压缩速度和压缩度适中的level。

sys：

调用命令行参数，经常使用sys.path来查看python及系统的依赖库包安装路径。

import sys
a=sys.path
#假设系统路径为a
print(a)

运行结果为：

[‘/Users/alice/PycharmProjects/untitled’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nose-1.3.7-py2.7.egg’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tornado-5.0.2-py2.7-macosx-10.13-intel.egg’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/backports_abc-0.5-py2.7.egg’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/singledispatch-3.4.0.3-py2.7.egg’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/futures-3.2.0-py2.7.egg’, ‘/Users/alice/PycharmProjects/untitled’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python37.zip’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload’, ‘/Users/alice/venv/untitled/lib/python3.7/site-packages’, ‘/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages’, ‘/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python’, ‘/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/PyObjC’]

urllib：

实现思路：

使用 urllib获取网页源码。
使用open打开本地文件并写入数据。

输出本地的数据内容

# coding=UTF-8

import urllib
url = 'https://blog.csdn.net/alice_tl'
wp = urllib.urlopen(url)
file_content = wp.read()

print file_content
#第一部分为获取网页源码

fp = open('alice.txt', 'wb') #打开一个文本文件
fp.write(file_content) #写入数据
fp.close() #关闭文件
#第二部分为将网页内容存入文件中

#第三部分为利用正则表达式将文件内容打印出来
import re

fp = open('alice.txt', 'rb')
content = fp.read()
fp.close()

title = re.search('<title>(.*?)</title>', content, re.S).group(1)

print 'title = ', title + '\n'

hrefPatten = 'href="(.*?)"'
hrefC = re.findall(hrefPatten, content, re.S)  #返回所有匹配正则表达式的值于列表中

print 'Allhref = ', hrefC

for h in hrefC :
    print h

最终输出的alice.txt内容如下：