• No products in the cart.

104.2.6 Sorting the data in python

Sorting the dataset using pandas in python

Link to the previous post : https://course.dvanalyticsmds.com/104-2-5-subsetting-data-with-variable-filter-condition-in-python/

In the previous post we created subsets of data by condition filtering, in this post we will create the new subsets by sorting one or more column values.

Sorting the data

We will use Online retail dataset.

In [10]:
Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.head(5)
Out[10]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/2010 8:26 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 12/1/2010 8:26 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 12/1/2010 8:26 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/2010 8:26 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 12/1/2010 8:26 3.39 17850.0 United Kingdom
In [78]:
Online_Retail.columns.values
Out[78]:
array(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'], dtype=object)
  • Its ascending by default
In [11]:
Online_Retail_sort=Online_Retail.sort('UnitPrice')
Online_Retail_sort.head(5)
Out[11]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
299984 A563187 B Adjust bad debt 1 8/12/2011 14:52 -11062.06 NaN United Kingdom
299983 A563186 B Adjust bad debt 1 8/12/2011 14:51 -11062.06 NaN United Kingdom
40984 539750 22652 TRAVEL SEWING KIT 1 12/21/2010 15:40 0.00 NaN United Kingdom
52217 540696 84562A NaN 1 1/11/2011 9:14 0.00 NaN United Kingdom
52262 540699 POST NaN 1000 1/11/2011 9:32 0.00 NaN United Kingdom
  • we can use ascending=False for descending order
In [12]:
Online_Retail_sort=Online_Retail.sort('UnitPrice',ascending=False)
Online_Retail_sort.head(5)

Out[12]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
222681 C556445 M Manual -1 6/10/2011 15:31 38970.00 15098.0 United Kingdom
524602 C580605 AMAZONFEE AMAZON FEE -1 12/5/2011 11:36 17836.46 NaN United Kingdom
43702 C540117 AMAZONFEE AMAZON FEE -1 1/5/2011 9:55 16888.02 NaN United Kingdom
43703 C540118 AMAZONFEE AMAZON FEE -1 1/5/2011 9:57 16453.71 NaN United Kingdom
15017 537632 AMAZONFEE AMAZON FEE 1 12/7/2010 15:08 13541.33 NaN United Kingdom

Practice : Sorting the data

  • We will use AutoDataset for this practice which we imported in our previous posts.
  • Sort the dataset based on length.
  • Sort the dataset based on length descending.
In [83]:
auto_data.columns.values
Out[83]:
array([' symboling', ' normalized-losses', ' make', ' fuel-type',
       ' aspiration', ' num-of-doors', ' body-style', ' drive-wheels',
       ' engine-location', ' wheel-base', ' length', ' width', ' height',
       ' curb-weight', ' engine-type', ' num-of-cylinders', ' engine-size',
       ' fuel-system', ' bore', ' stroke', ' compression-ratio',
       ' horsepower', ' peak-rpm', ' city-mpg', ' highway-mpg', ' price',
       'area'], dtype=object)
In [84]:
auto_data_sort=auto_data.sort( ' length')
auto_data_sort.head(5)

Out[84]:
symboling normalized-losses make fuel-type aspiration num-of-doors body-style drive-wheels engine-location wheel-base fuel-system bore stroke compression-ratio horsepower peak-rpm city-mpg highway-mpg price area
18 2 121 chevrolet gas std two hatchback fwd front 88.4 2bbl 2.91 3.03 9.5 48 5100 47 53 5151.0 452643.156
30 2 137 honda gas std two hatchback fwd front 86.6 1bbl 2.91 3.41 9.6 58 4800 49 54 6479.0 469388.952
31 2 137 honda gas std two hatchback fwd front 86.6 1bbl 2.91 3.41 9.2 76 6000 31 38 6855.0 469388.952
34 1 101 honda gas std two hatchback fwd front 93.7 1bbl 2.91 3.41 9.2 76 6000 30 34 7129.0 504960.000
33 1 101 honda gas std two hatchback fwd front 93.7 1bbl 2.91 3.41 9.2 76 6000 30 34 6529.0 504960.000

5 rows × 27 columns

In [85]:
auto_data_sort1=auto_data.sort( ' length',ascending=False)
auto_data_sort1.head(5)

Out[85]:
symboling normalized-losses make fuel-type aspiration num-of-doors body-style drive-wheels engine-location wheel-base fuel-system bore stroke compression-ratio horsepower peak-rpm city-mpg highway-mpg price area
73 0 ? mercedes-benz gas std four sedan rwd front 120.9 mpfi 3.8 3.35 8.0 184 4500 14 16 40960.0 846007.659
70 -1 93 mercedes-benz diesel turbo four sedan rwd front 115.6 idi 3.58 3.64 21.5 123 4350 22 25 31600.0 817837.446
71 -1 ? mercedes-benz gas std four sedan rwd front 115.6 mpfi 3.46 3.1 8.3 155 4750 16 18 34184.0 820742.730
48 0 ? jaguar gas std four sedan rwd front 113.0 mpfi 3.63 4.17 8.1 176 4750 15 19 35550.0 733506.048
47 0 145 jaguar gas std four sedan rwd front 113.0 mpfi 3.63 4.17 8.1 176 4750 15 19 32250.0 733506.048

5 rows × 27 columns

In next post we will learn how to identify and remove duplicates in python.

Link to the next post : https://course.dvanalyticsmds.com/104-2-7-identifying-and-removing-duplicate-values-from-dataset-in-python/

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.