• No products in the cart.

103.2.4 Calculated Fields in R

Making Data Analysis more easier.

In previous post we saw  Sub Setting Example 2

Calculated or the derived field is another important concept in data analysis. Sometimes not only the fields or the rows and columns of the raw data are sufficient for the data analysis, we might also have to do some operations and create some new fields. A new field can be added to the existing dataset by using $

Creating Calculated Fields in R

Consider the following syntax, where a new field sum is added(attached) to the dataset, where the content of this variable is the sum of the other two variables x1 and x2 in the dataset.

 >dataset$sum <- dataset$x1 + dataset$x2

Average of two existing variables can be stored in a new variable in a dataset, as shown below

 >dataset$mean <- (dataset$x1 + dataset$x2)/2

However, assigning this way is very time to consume as the user has to type the name of the dataset again and again.This process can be bypassed by attaching the dataset. Using this, the user need not refer to the dataset again and again.

Refer to the code below:

      >attach(dataset)
      >dataset$sum <- x1 + x2
      >dataset
      >dataset$mean <- (x1 + x2)/2
      >detach(dataset)
      >dataset

Now let us do the following.

  • Getting an idea on size of the car in Auto Data.
  • Find out the Volume(length *width * height) of the car
    >dim(auto_data)
    >auto_data$area<-(auto_data$length)*(auto_data$width)*(auto_data$height)
    >names(auto_data)
    >dim(auto_data)

 

## [1] 205  26
##  [1] "symboling"         "normalized.losses" "make"             
##  [4] "fuel.type"         "aspiration"        "num.of.doors"     
##  [7] "body.style"        "drive.wheels"      "engine.location"  
## [10] "wheel.base"        "length"            "width"            
## [13] "height"            "curb.weight"       "engine.type"      
## [16] "num.of.cylinders"  "engine.size"       "fuel.system"      
## [19] "bore"              "stroke"            "compression.ratio"
## [22] "horsepower"        "peak.rpm"          "city.mpg"         
## [25] "highway.mpg"       "price"             "area"
## [1] 205  27

As we can see, before creating a new variable area in AutoDataset, the number of rows and columns were 205, 26, and after creating the new field, the number of rows and columns became 205, and 27. Hence a new variable ‘area’ has been created . Creating a new variable by reducing the balance by 20%.

>bank$balance_new<-bank$balance*0.8
>summary(bank)
##  [1] "Cust_num"    "age"         "job"         "marital"     "education"  
##  [6] "default"     "balance"     "housing"     "loan"        "contact"    
## [11] "day"         "month"       "duration"    "campaign"    "pdays"      
## [16] "previous"    "poutcome"    "y"           "balance_new"

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -8019      72     448    1362    1428  102100

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -6415.0    57.6   358.4  1090.0  1142.0 81700.0
  • A new variable balance_new is created which contains the data with balance 20% less (balance*0.8).

If-Else statements

  • If-Else statement is used if we we have to place some conditions in our argument.
  • It can be written as:

if(condition == true)
{
Syntax1
}

else
{
Syntax2
}

Consider the following example

>first_element <- c(5,6,7,8,9)
>if(first_element[] > 9)
{
  print("condition pass")
}else
{
  print("condition fail")
}

 

[1] "condition fail"
Warning message:
In if (first_element[] > 3) { :
 the condition has length > 1 and only the first element will be used

The above code would print “condition fail” in the output as none of the elements in the variable first_element is greater than 9. Else should be written immediately after ‘}’ and not in the next line as it has been written in the above example.

If-then-Else Statement

If then Else, functions the same way as If Else. However, the difference lies in the syntax. The syntax for If then Else is as shown below

    Newvar<-ifelse( Condition, True Value, False Value)

In arguments we give the condition, then the value when the condition is true and the value when the condition is false.

Consider the following examples:

We see if there are any missing values in horsepower.

>Sum(is.na(AutoDataset$horsepower))

If there are any missing value then replace it with -1 using ‘If-then-Else’ condition

>auto_data$horsepower_new<-ifelse(auto_data$horsepower=="?",-1,  auto_data$horsepower)
>auto_data$horsepower_new

 

##   [1]  7  7 22  4 10  6  6  6 17 25  3  3 13 13 13 30 30 30 36 46 46 44 44
##  [24]  4 44 44 44  4 55 20 40 49 41 49 49 49 49 54 54 54 54  3  2 50 46 46
##  [47] 56 29 29 34 44 44 44 44 44  3  3  3 16 52 52 52 52 43 52 12 47 14 14
##  [70] 14 14 23 23 31 31 28 44 44 44  4 11 55 20 20 20 55 55 11 11 45 38 45
##  [93] 45 45 45 45 45 45 45 60 60 21 21 21 25 32 25 60 59 60 59 59 59 59 59
## [116] 60 59 18 44  4 44 44 44 55 20 19 33 33 33 35 -1 -1  6  6  6  6 25 25
## [139] 45 48 48 51 51 58 51  7 51 58 51  7 42 42 42 42 42 42 46 46 39 39 46
## [162] 46 46 46 46  8  8 11 11 11 11 11 11 57 48 57 57 57 26 26 24 24 37 53
## [185] 37 53 53 44  2 56 56  6 44 55  9  9  9  9 27 27  9 25 15  5  9

Replace missing at peak.rpm values by -1 using If then Else

>auto_data$peak_rpm_new<-ifelse(auto_data$peak.rpm=="?",-1,auto_data$peak.rpm)
>auto_data$peak_rpm_new
>auto_data$peak_rpm_new<-ifelse(auto_data$peak.rpm=="?",-1,auto_data$peak.rpm)
>auto_data$peak_rpm_new

In the next post we will learn about  Sorting of Data.

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.