• No products in the cart.

103.2.5 Sorting of Data

In a previous post, we saw  Calculated Fields in R

Sorting of the data can be considered as the fundamental part of the Data Analysis. The user might want to sort the Names in the Alphabetical order, or wants to sort the Income data in the ascending order to find the highest taxpayers, etc. Sorting is helpful in managing the data in ascending or descending order. Like for example, A bank wants to find the highest taxpayers in a city. For this, they need to sort the data on the basis of the Income in the Descending order, and hence select the top taxpayers from the list. In R, there is an inbuilt function order() to sort the data in ascending or descending order. The syntax for sorting the data is

>Newdata <- Olddata[order(variables),]

Consider the following examples

  • Sorting the data in Ascending order
>Online.Retail_sort<-Online_Retail[order(Online_Retail$UnitPrice),]
>head(Online.Retail_sort)

 

##        InvoiceNo StockCode     Description Quantity     InvoiceDate
## 299984   A563186         B Adjust bad debt        1 8/12/2011 14:51
## 299985   A563187         B Adjust bad debt        1 8/12/2011 14:52
## 623       536414     22139                       56 12/1/2010 11:52
## 1971      536545     21134                        1 12/1/2010 14:32
## 1972      536546     22145                        1 12/1/2010 14:33
## 1973      536547     37509                        1 12/1/2010 14:33
##        UnitPrice CustomerID        Country Quantity_indicator Price_Class
## 299984 -11062.06         NA United Kingdom                Low         Low
## 299985 -11062.06         NA United Kingdom                Low         Low
## 623         0.00         NA United Kingdom                Low         Low
## 1971        0.00         NA United Kingdom                Low         Low
## 1972        0.00         NA United Kingdom                Low         Low
## 1973        0.00         NA United Kingdom                Low         Low

The code above would sort the data in UnitPrice in ascending order and store the sorted data set into Online.Retail_sort as shown in the output. To sort the data in descending order, the negative sign is added in front of the variable which is to be sorted.

>Online.Retail_sort1<-Online_Retail[order(-Online_Retail$UnitPrice),]
>head(Online.Retail_sort1)

 

##        InvoiceNo StockCode Description Quantity     InvoiceDate UnitPrice
## 222682   C556445         M      Manual       -1 6/10/2011 15:31  38970.00
## 524603   C580605 AMAZONFEE  AMAZON FEE       -1 12/5/2011 11:36  17836.46
## 43703    C540117 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:55  16888.02
## 43704    C540118 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:57  16453.71
## 15017    C537630 AMAZONFEE  AMAZON FEE       -1 12/7/2010 15:04  13541.33
## 15018     537632 AMAZONFEE  AMAZON FEE        1 12/7/2010 15:08  13541.33
##        CustomerID        Country Quantity_indicator Price_Class
## 222682      15098 United Kingdom                Low        High
## 524603         NA United Kingdom                Low        High
## 43703          NA United Kingdom                Low        High
## 43704          NA United Kingdom                Low        High
## 15017          NA United Kingdom                Low        High
## 15018          NA United Kingdom                Low        High

The code above would sort the data UnitPrice in descending order and store it is new variable Retail_sort1.

Sorting based on multiple variables

Sorting in the data set can also be done with two variables simultaneously.

As in the code below, the Country Name is sorted in ascending(alphabetical) order and within country, Quantity is sorted in descending(numeric) order and is being stored in Online.Retail_Sort2, the output of which can be seen below.

>Online.Retail_sort2<-Online_Retail[order(Online_Retail$Country, -Online_Retail$Quantity),]
>head(Online.Retail_sort2)

 

##        InvoiceNo StockCode Description Quantity     InvoiceDate UnitPrice
## 222682   C556445         M      Manual       -1 6/10/2011 15:31  38970.00
## 524603   C580605 AMAZONFEE  AMAZON FEE       -1 12/5/2011 11:36  17836.46
## 43703    C540117 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:55  16888.02
## 43704    C540118 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:57  16453.71
## 15017    C537630 AMAZONFEE  AMAZON FEE       -1 12/7/2010 15:04  13541.33
## 15018     537632 AMAZONFEE  AMAZON FEE        1 12/7/2010 15:08  13541.33
##        CustomerID        Country Quantity_indicator Price_Class
## 222682      15098 United Kingdom                Low        High
## 524603         NA United Kingdom                Low        High
## 43703          NA United Kingdom                Low        High
## 43704          NA United Kingdom                Low        High
## 15017          NA United Kingdom                Low        High
## 15018          NA United Kingdom                Low        High

In the next post, we will learn about An Example of Sorting the Data.

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.