Link to the previous post : https://course.dvanalyticsmds.com/104-2-5-subsetting-data-with-variable-filter-condition-in-python/

In the previous post we created subsets of data by condition filtering, in this post we will create the new subsets by sorting one or more column values.

Sorting the data

We will use Online retail dataset.

In [10]:

Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.head(5)

Out[10]:

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
0	536365	85123A	WHITE HANGING HEART T-LIGHT HOLDER	6	12/1/2010 8:26	2.55	17850.0	United Kingdom
1	536365	71053	WHITE METAL LANTERN	6	12/1/2010 8:26	3.39	17850.0	United Kingdom
2	536365	84406B	CREAM CUPID HEARTS COAT HANGER	8	12/1/2010 8:26	2.75	17850.0	United Kingdom
3	536365	84029G	KNITTED UNION FLAG HOT WATER BOTTLE	6	12/1/2010 8:26	3.39	17850.0	United Kingdom
4	536365	84029E	RED WOOLLY HOTTIE WHITE HEART.	6	12/1/2010 8:26	3.39	17850.0	United Kingdom

In [78]:

Online_Retail.columns.values

Out[78]:

array(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'], dtype=object)

Its ascending by default

In [11]:

Online_Retail_sort=Online_Retail.sort('UnitPrice')
Online_Retail_sort.head(5)

Out[11]:

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
299984	A563187	B	Adjust bad debt	1	8/12/2011 14:52	-11062.06	NaN	United Kingdom
299983	A563186	B	Adjust bad debt	1	8/12/2011 14:51	-11062.06	NaN	United Kingdom
40984	539750	22652	TRAVEL SEWING KIT	1	12/21/2010 15:40	0.00	NaN	United Kingdom
52217	540696	84562A	NaN	1	1/11/2011 9:14	0.00	NaN	United Kingdom
52262	540699	POST	NaN	1000	1/11/2011 9:32	0.00	NaN	United Kingdom

we can use ascending=False for descending order

In [12]:

Online_Retail_sort=Online_Retail.sort('UnitPrice',ascending=False)
Online_Retail_sort.head(5)

Out[12]:

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country
222681	C556445	M	Manual	-1	6/10/2011 15:31	38970.00	15098.0	United Kingdom
524602	C580605	AMAZONFEE	AMAZON FEE	-1	12/5/2011 11:36	17836.46	NaN	United Kingdom
43702	C540117	AMAZONFEE	AMAZON FEE	-1	1/5/2011 9:55	16888.02	NaN	United Kingdom
43703	C540118	AMAZONFEE	AMAZON FEE	-1	1/5/2011 9:57	16453.71	NaN	United Kingdom
15017	537632	AMAZONFEE	AMAZON FEE	1	12/7/2010 15:08	13541.33	NaN	United Kingdom

Practice : Sorting the data

We will use AutoDataset for this practice which we imported in our previous posts.
Sort the dataset based on length.
Sort the dataset based on length descending.

In [83]:

auto_data.columns.values

Out[83]:

array([' symboling', ' normalized-losses', ' make', ' fuel-type',
       ' aspiration', ' num-of-doors', ' body-style', ' drive-wheels',
       ' engine-location', ' wheel-base', ' length', ' width', ' height',
       ' curb-weight', ' engine-type', ' num-of-cylinders', ' engine-size',
       ' fuel-system', ' bore', ' stroke', ' compression-ratio',
       ' horsepower', ' peak-rpm', ' city-mpg', ' highway-mpg', ' price',
       'area'], dtype=object)

In [84]:

auto_data_sort=auto_data.sort( ' length')
auto_data_sort.head(5)

Out[84]:

	symboling	normalized-losses	make	fuel-type	aspiration	num-of-doors	body-style	drive-wheels	engine-location	wheel-base	…	fuel-system	bore	stroke	compression-ratio	horsepower	peak-rpm	city-mpg	highway-mpg	price	area
18	2	121	chevrolet	gas	std	two	hatchback	fwd	front	88.4	…	2bbl	2.91	3.03	9.5	48	5100	47	53	5151.0	452643.156
30	2	137	honda	gas	std	two	hatchback	fwd	front	86.6	…	1bbl	2.91	3.41	9.6	58	4800	49	54	6479.0	469388.952
31	2	137	honda	gas	std	two	hatchback	fwd	front	86.6	…	1bbl	2.91	3.41	9.2	76	6000	31	38	6855.0	469388.952
34	1	101	honda	gas	std	two	hatchback	fwd	front	93.7	…	1bbl	2.91	3.41	9.2	76	6000	30	34	7129.0	504960.000
33	1	101	honda	gas	std	two	hatchback	fwd	front	93.7	…	1bbl	2.91	3.41	9.2	76	6000	30	34	6529.0	504960.000

5 rows × 27 columns

In [85]:

auto_data_sort1=auto_data.sort( ' length',ascending=False)
auto_data_sort1.head(5)

Out[85]:

symboling	normalized-losses	make	fuel-type	aspiration	num-of-doors	body-style	drive-wheels	engine-location	wheel-base	…	fuel-system	bore	stroke	compression-ratio	horsepower	peak-rpm	city-mpg	highway-mpg	price	area
73	0	?	mercedes-benz	gas	std	four	sedan	rwd	front	120.9	…	mpfi	3.8	3.35	8.0	184	4500	14	16	40960.0	846007.659
70	-1	93	mercedes-benz	diesel	turbo	four	sedan	rwd	front	115.6	…	idi	3.58	3.64	21.5	123	4350	22	25	31600.0	817837.446
71	-1	?	mercedes-benz	gas	std	four	sedan	rwd	front	115.6	…	mpfi	3.46	3.1	8.3	155	4750	16	18	34184.0	820742.730
48	0	?	jaguar	gas	std	four	sedan	rwd	front	113.0	…	mpfi	3.63	4.17	8.1	176	4750	15	19	35550.0	733506.048
47	0	145	jaguar	gas	std	four	sedan	rwd	front	113.0	…	mpfi	3.63	4.17	8.1	176	4750	15	19	32250.0	733506.048

5 rows × 27 columns

In next post we will learn how to identify and remove duplicates in python.

Link to the next post : https://course.dvanalyticsmds.com/104-2-7-identifying-and-removing-duplicate-values-from-dataset-in-python/

11th October 2018

104.2.6 Sorting the data in python

Sorting the dataset using pandas in python

Sorting the data

Practice : Sorting the data

Dv Analytics

Dv Analytics

Dv Analytics

104.2.6 Sorting the data in python

Sorting the dataset using pandas in python

Sorting the data

Practice : Sorting the data

Related Courses

Agentic AI

Dv Analytics

Excel with AI

Dv Analytics

Deep Learning

Dv Analytics