Pandas——练习题一 - 好文

文章目录

* 练习一：(使用jupyter notebook 工具)
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#jupyter_notebook__1>
* Step 1. 导入相应的模块
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_1__2>
* Step 2. 给定的原始数据集
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_2__8>
* Step 3. 根据原始数据集创建一个DataFrame，并赋值给变量army
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_3_DataFramearmy_22>
* Step 4. 设定指定列为索引：设定数据中的origin字段为索引
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_4_origin_36>
* Step 5. 打印列名为veterans的所有值
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_5_veterans_57>
* Step 6. 打印列名为 'veterans' 和 'deaths' 的所有数据
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_6__veterans__deaths__77>
* Step 7. 打印出所有的列索引的值
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_7__94>
* Step 8. 筛选出列 regiments 的值不为"Dragoons"的所有数据
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_8__regiments_Dragoons_103>
* Step 9.筛选出第 3 到 7 行，第 3 到 6 列的所有数据
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_9__3__7__3__6__119>
* 练习二：在校生饮酒消费数据分析
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#_133>
* Step 1. 导入相关的模块
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_1__135>
* Step 2. 导入数据，并赋值给变量df
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_2_df_141>
* Step 3. 连续切片（获取[school:guardian]两列以及中间的所有数据）
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_3_schoolguardian_156>
* Step 5. 将数据列 Mjob 和 Fjob中所有数据实现首字母大写
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_5__Mjob__Fjob_174>
* Step 6.创建一个名为majority函数，并根据age列数据返回一个布尔值添加到新的数据列，列名为 legal_drinker
(根据年龄这一列数据，大于17岁为合法饮酒)
<https://blog.csdn.net/wsp_1138886114/article/details/80768986#Step_6majorityage_legal_drinker_17_201>

<>练习一：(使用jupyter notebook 工具)
<>Step 1. 导入相应的模块 import pandas as pd import numpy as np from pandas import
Series,DataFrame <>Step 2. 给定的原始数据集 # Create an example dataframe about a
fictional army raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks'
, 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts',
'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st',
'1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'deaths': [523, 52, 25, 616, 43
, 234, 523, 62, 62, 73, 37, 35], 'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9
], 'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345], 'readiness': [1,
2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3], 'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1
], 'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], 'origin': ['Arizona',
'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington',
'Oregon', 'Wyoming', 'Louisana', 'Georgia']} <>Step 3.
根据原始数据集创建一个DataFrame，并赋值给变量army army = DataFrame(raw_data) army.head()
~~~ armored battles company deaths deserters origin readiness regiment
size veterans
0 1 5 1st 523 4 Arizona 1 Nighthawks 1045 1
1 0 42 1st 52 24 California 2 Nighthawks 957 5
2 1 2 2nd 25 31 Texas 3 Nighthawks 1099 62
3 1 2 2nd 616 2 Florida 3 Nighthawks 1400 26
4 0 4 1st 43 3 Maine 2 Dragoons 1592 73 <>Step 4. 设定指定列为索引：设定数据中的origin字段为索引
army1= army.set_index(["origin"]) army1
~~~~ armored battles company deaths deserters readiness regiment size
veterans
origin
Arizona 1 5 1st 523 4 1 Nighthawks 1045 1
California 0 42 1st 52 24 2 Nighthawks 957 5
Texas 1 2 2nd 25 31 3 Nighthawks 1099 62
Florida 1 2 2nd 616 2 3 Nighthawks 1400 26
Maine 0 4 1st 43 3 2 Dragoons 1592 73
Iowa 1 7 1st 234 4 1 Dragoons 1006 37
Alaska 0 8 2nd 523 24 2 Dragoons 987 949
Washington 1 3 2nd 62 31 3 Dragoons 849 48
Oregon 0 4 1st 62 2 2 Scouts 973 48
Wyoming 0 7 1st 73 3 1 Scouts 1005 435
Louisana 1 8 2nd 37 2 2 Scouts 1099 63
Georgia 1 9 2nd 35 3 3 Scouts 1523 345 <>Step 5. 打印列名为veterans的所有值 army1[
"veterans"] origin Arizona 1 California 5 Texas 62 Florida 26 Maine 73 Iowa 37
Alaska949 Washington 48 Oregon 48 Wyoming 435 Louisana 63 Georgia 345 Name:
veterans, dtype: int64 <>Step 6. 打印列名为 ‘veterans’ 和 ‘deaths’ 的所有数据
~ veterans deaths
origin
Arizona 1 523
California 5 52
Texas 62 25
Florida 26 616
Maine 73 43
Iowa 37 234
Alaska 949 523
Washington 48 62
Oregon 48 62
Wyoming 435 73
Louisana 63 37
Georgia 345 35 <>Step 7. 打印出所有的列索引的值 army1.columns Index(['armored', 'battles'
, 'company', 'deaths', 'deserters', 'readiness', 'regiment', 'size', 'veterans']
, dtype='object') <>Step 8. 筛选出列 regiments 的值不为"Dragoons"的所有数据 army1.loc[army1[
"regiment"] != "Dragoons"]
~ armored battles company deaths deserters readiness regiment size veterans
origin
Arizona 1 5 1st 523 4 1 Nighthawks 1045 1
California 0 42 1st 52 24 2 Nighthawks 957 5
Texas 1 2 2nd 25 31 3 Nighthawks 1099 62
Florida 1 2 2nd 616 2 3 Nighthawks 1400 26
Oregon 0 4 1st 62 2 2 Scouts 973 48
Wyoming 0 7 1st 73 3 1 Scouts 1005 435
Louisana 1 8 2nd 37 2 2 Scouts 1099 63
Georgia 1 9 2nd 35 3 3 Scouts 1523 345 <>Step 9.筛选出第 3 到 7 行，第 3 到 6 列的所有数据
army1.iloc[2:6,[2,6]]
~ company regiment
origin
Texas 2nd Nighthawks
Florida 2nd Nighthawks
Maine 1st Dragoons
Iowa 1st Dragoons
<>练习二：在校生饮酒消费数据分析

数据集下载地址：https://download.csdn.net/download/wsp_1138886114/10563032
<https://download.csdn.net/download/wsp_1138886114/10563032>
<>Step 1. 导入相关的模块 import pandas as pd import numpy as np from pandas import
Series,DataFrame <>Step 2. 导入数据，并赋值给变量df df = pd.read_csv(
"./datasets/Student_Alcohol.csv") df.head()
~ school sex age add
ress famsize Pstatus Medu Fedu Mjob Fjob … abse
nces G1 G2 G3
0 GP F 18 U GT3 A 4 4 at_home teacher … 6 5 6 6
1 GP F 17 U GT3 T 1 1 at_home other … 4 5 5 6
2 GP F 15 U LE3 T 1 1 at_home other … 10 7 8 10
3 GP F 15 U GT3 T 4 2 health services … 2 15 14 15
4 GP F 16 U GT3 T 3 3 other other … 4 6 10 10
395 rows × 33 columns
<>Step 3. 连续切片（获取[school:guardian]两列以及中间的所有数据） df.iloc[:,0:12]
~ school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian
0 GP F 18 U GT3 A 4 4 at_home teacher course mother
1 GP F 17 U GT3 T 1 1 at_home other course father
2 GP F 15 U LE3 T 1 1 at_home other other mother
3 GP F 15 U GT3 T 4 2 health services home mother
4 GP F 16 U GT3 T 3 3 other other home father
… … … … … … … … … … … … …
391 MS M 17 U LE3 T 3 1 services services course mother
392 MS M 21 R GT3 T 1 1 other other course other
393 MS M 18 R LE3 T 3 2 services other course mother
394 MS M 19 U LE3 T 1 1 other at_home course father
395 rows × 12 columns
<>Step 5. 将数据列 Mjob 和 Fjob中所有数据实现首字母大写 data2 = df.iloc[:,[8,9]] #获取
"Mjob"，"Fjob" 两列 data21 = Series(data2["Mjob"]) #将两列转成Series格式 data22 = Series(
data2["Fjob"]) df["Mjob"] =data21.map(lambda x:x.capitalize()) #将"Mjob"列所有值
首字母大写 df["Fjob"] =data22.map(lambda x:x.capitalize()) #将"Fjob"列所有值首字母大写 df
#查看效果
~ school sex age add
ress famsize Psta
tus Medu Fedu Mjob Fjob … abse
nces G1 G2 G3
0 GP F 18 U GT3 A 4 4 At_home Teacher … 6 5 6 6
1 GP F 17 U GT3 T 1 1 At_home Other … 4 5 5 6
2 GP F 15 U LE3 T 1 1 At_home Other … 10 7 8 10
3 GP F 15 U GT3 T 4 2 Health Services … 2 15 14 15
4 GP F 16 U GT3 T 3 3 Other Other … 4 6 10 10
5 GP M 16 U LE3 T 4 3 Services Other … 10 15 15 15
… … … … … … … … … … … … … … … …
390 MS M 20 U LE3 A 2 2 Services Services … 11 9 9 9
391 MS M 17 U LE3 T 3 1 Services Services … 3 14 16 16
392 MS M 21 R GT3 T 1 1 Other Other … 3 10 8 7
393 MS M 18 R LE3 T 3 2 Services Other … 0 11 12 10
394 MS M 19 U LE3 T 1 1 Other At_home … 5 8 9 9
395 rows × 12 columns
<>Step 6.创建一个名为majority函数，并根据age列数据返回一个布尔值添加到新的数据列，列名为 legal_drinker
(根据年龄这一列数据，大于17岁为合法饮酒) majority = lambda x:["合法" if x>17 else "不合法"] df[
"legal_drinker"] = df["age"].map(majority) df
~ sch
ool sex age addr
ess famsize Psta
tus Medu Fedu Mjob Fjob … G1 G2 G3 legal_
drinker
0 GP F 18 U GT3 A 4 4 At_home Teacher … 5 6 6 [合法]
1 GP F 17 U GT3 T 1 1 At_home Other … 5 5 6 [不合法]
2 GP F 15 U LE3 T 1 1 At_home Other … 7 8 10 [不合法]
3 GP F 15 U GT3 T 4 2 Health Services … 15 14 15 [不合法]
4 GP F 16 U GT3 T 3 3 Other Other … 6 10 10 [不合法]
… … … … … … … … … … … … … … … …
391 MS M 17 U LE3 T 3 1 Services Services … 14 16 16 [不合法]
392 MS M 21 R GT3 T 1 1 Other Other … 10 8 7 [合法]
393 MS M 18 R LE3 T 3 2 Services Other … 11 12 10 [合法]
394 MS M 19 U LE3 T 1 1 Other At_home … 8 9 9 [合法]
395 rows × 12 columns

热门工具换一换