python 数据合并函数merge( )[通俗易懂]

python 数据合并函数merge( )[通俗易懂]python 中的 merge 函数与 sql 中的 join 用法非常类似 以下是 merge 函数中的参数 merge left right how inner on None left on None right on None left index False right index False sort False suffixes x y

python中的merge函数与sql中的 join 用法非常类似,以下是merge( )函数中的参数:

merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

一、左右连接键名一样

import pandas as pd
df1=pd.DataFrame({'key':['a','b','a','b','b'],'value1':range(5)})
df2=pd.DataFrame({'key':['a','c','c','c','c'],'value2':range(5)})
display(df1,df2,pd.merge(df1,df2))

df1

   key value1          
0 a 0
1 b 1
2 a 2
3 b 3
4 b 4

df2

   key value2
0 a 0
1 c 1
2 c 2
3 c 3
4 c 4

pd.merge(df1,df2) ##以df1、df2中相同的列名key进行连接,默认how=’inner’, pd.merge(df1,df2,on=’key’,how=’inner’)

   key value1 value2
0 a 0 0
1 a 2 0

pd.merge(df1,df2,how=’outer’) ## 全连接,取并集

	key	value1	value2
0 a 0.0 0.0
1 a 2.0 0.0
2 b 1.0 NaN
3 b 3.0 NaN
4 b 4.0 NaN
5 c NaN 1.0
6 c NaN 2.0
7 c NaN 3.0
8 c NaN 4.0

pd.merge(df1,df2,how=’left’) ### 左连接,左边取全部,右边取部分,没有值则用NaN填充

   key value1 value2
0 a 0 0.0
1 b 1 NaN
2 a 2 0.0
3 b 3 NaN
4 b 4 NaN

pd.merge(df1,df2,how=’right’) ### 右连接,右边取全部,左边取部分,没有值则用NaN填充

  key value1 value2
0 a 0.0 0
1 a 2.0 0
2 c NaN 1
3 c NaN 2
4 c NaN 3
5 c NaN 4

二、左右连接键名不一样

如果两个DataFrame的左右连接键的列名不一样,可以用left_on,right_on来进行指定

df3=pd.DataFrame({'lkey':['a','b','a','b','b'],'data1':range(5)})
df4=pd.DataFrame({'rkey':['a','c','c','c','c'],'data2':range(5)})

df3

    lkey  data1
0 a 0
1 b 1
2 a 2
3 b 3
4 b 4

df4

    rkey  data2
0 a 0
1 c 1
2 c 2
3 c 3
4 c 4

pd.merge(df3,df4,left_on=’lkey’,right_on=’rkey’) ### 内连接,默认how=’inner’

    lkey  data1 rkey  data2
0 a 0 a 0
1 a 2 a 0

pd.merge(df3,df4,left_on=’lkey’,right_on=’lkey’,how=’outer’) ### 全连接

    lkey  data1 rkey  data2
0 a 0.0 a 0.0
1 a 2.0 a 0.0
2 b 1.0 NaN NaN
3 b 3.0 NaN NaN
4 b 4.0 NaN NaN
5 NaN NaN c 1.0
6 NaN NaN c 2.0
7 NaN NaN c 3.0
8 NaN NaN c 4.0

pd.merge(df3,df4,left_on=’lkey’,right_on=’rkey’,how=’left’) ### 左连接

    lkey  data1 rkey  data2
0 a 0 a 0.0
1 b 1 NaN NaN
2 a 2 a 0.0
3 b 3 NaN NaN
4 b 4 NaN NaN

pd.merge(df3,df4,left_on=’lkey’,right_on=’rkey’,how=’right’) ### 右连接

    lkey  data1 rkey  data2
0 a 0.0 a 0
1 a 2.0 a 0
2 NaN NaN c 1
3 NaN NaN c 2
4 NaN NaN c 3
5 NaN NaN c 4

三、索引作为连接键

df5=pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'),columns=['v1','v2','v3','v4'])
df6=pd.DataFrame(np.arange(12,24,1).reshape(3,4),index=list('abd'),columns=['v5','v6','v7','v8'])

df5

    v1  v2  v3  v4
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11

df6

   v5  v6  v7  v8
a 12 13 14 15
b 16 17 18 19
d 20 21 22 23

pd.merge(df5,df6,left_index=True,right_index=True)

	v1	v2	v3	v4	v5	v6	v7	v8
a 0 1 2 3 12 13 14 15
b 4 5 6 7 16 17 18 19
编程小号
上一篇 2025-03-14 18:17
下一篇 2025-03-06 16:46

相关推荐

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/hz/125136.html