• No products in the cart.

103.2.3.b Sub Setting-An Example

Learn through practice

In previous section we saw  Sub Setting Example 1

Here we have to import the automobile dataset and then perform various operations in it.

  1. Create a new dataset for exclusively Toyota cars
  2. Import : “./Automobile Data Set/AutoDataset.csv”
  3. Create a new dataset for all cars with city.mpg greater than 30 and engine size is less than 120.
  4. Create a new dataset by taking only sedan cars. Keep only four variables(Make, body style, fuel type, price) in the final dataset.
  5. Create a new dataset by taking Audi, BMW or Porsche company makes. Drop two variables from the resultant dataset(price and normalized losses)

Solutions

1.Import : “./Automobile Data Set/AutoDataset.csv”

>auto_data <- read.csv("C:\\Amrita\\Datavedi\\Automobile Data Set\\AutoDataset.csv")

2.Create a new dataset for exclusively Toyota cars

>toyota_data <- subset(auto_data, make == "toyota")
>head(toyota_data)
symboling normalized.losses   make fuel.type aspiration num.of.doors
## 151         1                87 toyota       gas        std          two
## 152         1                87 toyota       gas        std          two
## 153         1                74 toyota       gas        std         four
## 154         0                77 toyota       gas        std         four
## 155         0                81 toyota       gas        std         four
## 156         0                91 toyota       gas        std         four
##     body.style drive.wheels engine.location wheel.base length width height
## 151  hatchback          fwd           front       95.7  158.7  63.6   54.5
## 152  hatchback          fwd           front       95.7  158.7  63.6   54.5
## 153  hatchback          fwd           front       95.7  158.7  63.6   54.5
## 154      wagon          fwd           front       95.7  169.7  63.6   59.1
## 155      wagon          4wd           front       95.7  169.7  63.6   59.1
## 156      wagon          4wd           front       95.7  169.7  63.6   59.1
##     curb.weight engine.type num.of.cylinders engine.size fuel.system bore
## 151        1985         ohc             four          92        2bbl 3.05
## 152        2040         ohc             four          92        2bbl 3.05
## 153        2015         ohc             four          92        2bbl 3.05
## 154        2280         ohc             four          92        2bbl 3.05
## 155        2290         ohc             four          92        2bbl 3.05
## 156        3110         ohc             four          92        2bbl 3.05
##     stroke compression.ratio horsepower peak.rpm city.mpg highway.mpg
## 151   3.03                 9         62     4800       35          39
## 152   3.03                 9         62     4800       31          38
## 153   3.03                 9         62     4800       31          38
## 154   3.03                 9         62     4800       31          37
## 155   3.03                 9         62     4800       27          32
## 156   3.03                 9         62     4800       27          32
##     price
## 151  5348
## 152  6338
## 153  6488
## 154  6918
## 155  7898
## 156  8778

 


3.Create a new dataset for all cars with city.mpg greater than 30 and engine size is less than 120.

>auto_data1 <- subset(auto_data, (city.mpg > 30) & (engine.size < 120))
>head(auto_data1)

##    symboling normalized.losses      make fuel.type aspiration num.of.doors
## 19         2               121 chevrolet       gas        std          two
## 20         1                98 chevrolet       gas        std          two
## 21         0                81 chevrolet       gas        std         four
## 22         1               118     dodge       gas        std          two
## 23         1               118     dodge       gas        std          two
## 25         1               148     dodge       gas        std         four
##    body.style drive.wheels engine.location wheel.base length width height
## 19  hatchback          fwd           front       88.4  141.1  60.3   53.2
## 20  hatchback          fwd           front       94.5  155.9  63.6   52.0
## 21      sedan          fwd           front       94.5  158.8  63.6   52.0
## 22  hatchback          fwd           front       93.7  157.3  63.8   50.8
## 23  hatchback          fwd           front       93.7  157.3  63.8   50.8
## 25  hatchback          fwd           front       93.7  157.3  63.8   50.6
##    curb.weight engine.type num.of.cylinders engine.size fuel.system bore
## 19        1488           l            three          61        2bbl 2.91
## 20        1874         ohc             four          90        2bbl 3.03
## 21        1909         ohc             four          90        2bbl 3.03
## 22        1876         ohc             four          90        2bbl 2.97
## 23        1876         ohc             four          90        2bbl 2.97
## 25        1967         ohc             four          90        2bbl 2.97
##    stroke compression.ratio horsepower peak.rpm city.mpg highway.mpg price
## 19   3.03              9.50         48     5100       47          53  5151
## 20   3.11              9.60         70     5400       38          43  6295
## 21   3.11              9.60         70     5400       38          43  6575
## 22   3.23              9.41         68     5500       37          41  5572
## 23   3.23              9.40         68     5500       31          38  6377
## 25   3.23              9.40         68     5500       31          38  622

4.Create a new dataset by taking only sedan cars. Keep only four variables(Make, body style, fuel type, price) in the final dataset.

>auto_data2 <- subset(auto_data, body.style == "sedan" , select = c(make, body.style,fuel.type,price))
>head(auto_data2)
make body.style fuel.type price
## 4  audi      sedan       gas 13950
## 5  audi      sedan       gas 17450
## 6  audi      sedan       gas 15250
## 7  audi      sedan       gas 17710
## 9  audi      sedan       gas 23875
## 11  bmw      sedan       gas 16430



5.Create a new dataset by taking Audi, BMW or Porsche company makes. Drop two variables from the resultant dataset(price and normalized losses)

auto_data3 <- subset(auto_data, (make == "audi") | (make == "bmw") | (make == "porsche"), select = c(-price, -normalized.losses)) 
head(auto_data3)
##   symboling make fuel.type aspiration num.of.doors body.style drive.wheels
## 4         2 audi       gas        std         four      sedan          fwd
## 5         2 audi       gas        std         four      sedan          4wd
## 6         2 audi       gas        std          two      sedan          fwd
## 7         1 audi       gas        std         four      sedan          fwd
## 8         1 audi       gas        std         four      wagon          fwd
## 9         1 audi       gas      turbo         four      sedan          fwd
##   engine.location wheel.base length width height curb.weight engine.type
## 4           front       99.8  176.6  66.2   54.3        2337         ohc
## 5           front       99.4  176.6  66.4   54.3        2824         ohc
## 6           front       99.8  177.3  66.3   53.1        2507         ohc
## 7           front      105.8  192.7  71.4   55.7        2844         ohc
## 8           front      105.8  192.7  71.4   55.7        2954         ohc
## 9           front      105.8  192.7  71.4   55.9        3086         ohc
##   num.of.cylinders engine.size fuel.system bore stroke compression.ratio
## 4             four         109        mpfi 3.19    3.4              10.0
## 5             five         136        mpfi 3.19    3.4               8.0
## 6             five         136        mpfi 3.19    3.4               8.5
## 7             five         136        mpfi 3.19    3.4               8.5
## 8             five         136        mpfi 3.19    3.4               8.5
## 9             five         131        mpfi 3.13    3.4               8.3
##   horsepower peak.rpm city.mpg highway.mpg
## 4        102     5500       24          30
## 5        115     5500       18          22
## 6        110     5500       19          25
## 7        110     5500       19          25
## 8        110     5500       19          25
## 9        140     5500       17          20

With these two examples we have learned much more about subsetting in R.

In the next post we will see Calculated Fields in R.

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.