• No products in the cart.

104.2.3 Manipulating datasets in python

Subsetting the datasets.

link to the previous post : https://course.dvanalyticsmds.com/104-2-2-practice-working-with-datasets-in-python/

In this blog we will see how we can manipulate imported dataset into subsets.

Sub-setting the data

  • Dataset: “./World Bank Data/GDP.csv“
In [30]:
import pandas as pd   
#The below line may throw some error
gdp=pd.read_csv("datasets\\World Bank Data\\GDP.csv",encoding = "ISO-8859-1")

gdp.shape
Out[30]:
(194, 4)
In [29]:
gdp.columns.values
Out[29]:
array(['Country_code', 'Rank', 'Country', 'GDP'], dtype=object)
  • New dataset with selected rows
In [32]:
gdp1 = gdp.head(10)
gdp2=gdp.iloc[[2,9,15,25]]
print(gdp2)
   Country_code  Rank             Country      GDP
2           JPN     3               Japan  4601461
9           RUS    10  Russian Federation  1860598
15          IDN    16           Indonesia   888538
25          NOR    26              Norway   499817
  • New dataset by keeping selected columns
In [33]:
gdp3 = gdp[["Country", "Rank"]]
gdp3
Out[33]:
Country Rank
0 United States 1
1 China 2
2 Japan 3
3 Germany 4
4 United Kingdom 5
5 France 6
6 Brazil 7
7 Italy 8
8 India 9
9 Russian Federation 10
10 Canada 11
11 Australia 12
12 Korea, Rep. 13
13 Spain 14
14 Mexico 15
15 Indonesia 16
16 Netherlands 17
17 Turkey 18
18 Saudi Arabia 19
19 Switzerland 20
20 Sweden 21
21 Nigeria 22
22 Poland 23
23 Argentina 24
24 Belgium 25
25 Norway 26
26 Austria 27
27 Iran, Islamic Rep. 28
28 Thailand 29
29 United Arab Emirates 30
164 Maldives 165
165 Faeroe Islands 166
166 Lesotho 167
167 Liberia 168
168 Bhutan 169
169 Cabo Verde 170
170 Central African Republic 171
171 Belize 172
172 Djibouti 173
173 Seychelles 174
174 Timor-Leste 175
175 St. Lucia 176
176 Antigua and Barbuda 177
177 Solomon Islands 178
178 Guinea-Bissau 179
179 Grenada 180
180 Gambia, The 181
181 St. Kitts and Nevis 182
182 Vanuatu 183
183 Samoa 184
184 St. Vincent and the Grenadines 185
185 Comoros 186
186 Dominica 187
187 Tonga 188
188 São Tomé and Principe 189
189 Micronesia, Fed. Sts. 190
190 Palau 191
191 Marshall Islands 192
192 Kiribati 193
193 Tuvalu 194

194 rows × 2 columns

  • New dataset with selected rows and columns
In [34]:
gdp4 = gdp[["Country", "GDP"]][0:10]
gdp4
Out[34]:
Country GDP
0 United States 17419000
1 China 10354832
2 Japan 4601461
3 Germany 3868291
4 United Kingdom 2988893
5 France 2829192
6 Brazil 2346076
7 Italy 2141161
8 India 2048517
9 Russian Federation 1860598

New dataset with selected rows and excluding columns

In [35]:
gdp5=gdp.drop(["Country_code"], axis=1)[0:12]
gdp5
Out[35]:
Rank Country GDP
0 1 United States 17419000
1 2 China 10354832
2 3 Japan 4601461
3 4 Germany 3868291
4 5 United Kingdom 2988893
5 6 France 2829192
6 7 Brazil 2346076
7 8 Italy 2141161
8 9 India 2048517
9 10 Russian Federation 1860598
10 11 Canada 1785387
11 12 Australia 1454675

The next post is a practice session on manipulating dataset in python.
Link to the next post : https://course.dvanalyticsmds.com/104-2-4-practice-manipulating-dataset-in-python/

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.