• No products in the cart.

103.2.7 Merge and Join

In previous post we saw  An Example of Handling Duplicates

In R, we have merge() function to join the data sets into one. Syntax for merge function is:

x <- merge(datasets1, dataset2, by="primary_key")

The function above merges the dataset1 and dataset2, and saves the merged data into the variable x. We need to mention the  type of join otherwise the function does inner join by default.

With a single primary key, the syntax is

 Newdata <- merge(dataone, datatwo, by = "primary_key")With composite keys or more than one key

With composite keys or more than one key we can join two datasets. The syntax is:

    Newdata <- merge(dataone, datatwo ,by=c("primary_key1","primary_key2")
  • Joining the data can happen in one of the four ways:
    • Inner Join
    • Outer Join
    • Left Outer Join
    • Right Outer JoinThe value given for the parameter “all” or a “all.x” will decide the type of join.

      Inner Join

         Newdata <- merge(dataone, datatwo ,by="primary_key", all=FALSE)

                   Outer Join

         Newdata <- merge(dataone, datatwo ,by="primary_key", all=TRUE)

                  Left Outer Join

        Newdata <- merge(dataone, datatwo ,by="primary_key", all.x=TRUE)

                 Right Outer Join

        Newdata <- merge(dataone, datatwo ,by="primary_key", all.y=TRUE)

Below is an example:

>orders <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\TV Commercial Slots Analysis\\orders.csv")
>slots <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\TV Commercial Slots Analysis\\slots.csv")

 

Inner Join

Merge by using single key “Unique_id”

>Order_Slot_Common <- merge(orders, slots, by="Unique_id", all=FALSE)
>head(Order_Slot_Common)

 

##                            Unique_id      AD_ID.x      Date     Time
## 1  SPYMYA2CBSH416510.833333333333333  SPYMYA2CBSH 1/12/2014  8:00 PM
## 2             SPYMYA60C008416520.375 SPYMYA60C008 1/13/2014  9:00 AM
## 3             SPYMYA60C008416540.375 SPYMYA60C008 1/15/2014  9:00 AM
## 4 SPYMYA60C008416550.458333333333333 SPYMYA60C008 1/16/2014 11:00 AM
## 5             SPYMYA60C008416560.375 SPYMYA60C008 1/17/2014  9:00 AM
## 6 SPYMYA60C008416570.666666666666667 SPYMYA60C008 1/18/2014  4:00 PM
##   Client.Product.Code.x Product.ID.x Network.ID.x Length.x Orders
## 1                MSDR02       SPYMYA          CBS      120      3
## 2                MSDR02       SPYMYA       TBKIDS       60      1
## 3                MSDR02       SPYMYA       TBKIDS       60      1
## 4                MSDR02       SPYMYA       TBKIDS       60      1
## 5                MSDR02       SPYMYA       TBKIDS       60      1
## 6                MSDR02       SPYMYA       TBKIDS       60      1
##        AD_ID.y  Air.Date Air.Time Time.Zone Client.Product.Code.y
## 1  SPYMYA2CBSH 1/12/2014  8:00 PM       EST                MSDR02
## 2 SPYMYA60C008 1/13/2014  9:00 AM       EST                MSDR02
## 3 SPYMYA60C008 1/15/2014  9:00 AM       EST                MSDR02
## 4 SPYMYA60C008 1/16/2014 11:00 AM       EST                MSDR02
## 5 SPYMYA60C008 1/17/2014  9:00 AM       EST                MSDR02
## 6 SPYMYA60C008 1/18/2014  4:00 PM       EST                MSDR02
##   Product.ID.y Length.y Network.ID.y  Network.Name      Network.Type
## 1       SPYMYA      120          CBS           CBS Broadcast Station
## 2       SPYMYA       60       TBKIDS The Band Kids             Cable
## 3       SPYMYA       60       TBKIDS The Band Kids             Cable
## 4       SPYMYA       60       TBKIDS The Band Kids             Cable
## 5       SPYMYA       60       TBKIDS The Band Kids             Cable
## 6       SPYMYA       60       TBKIDS The Band Kids             Cable
##   Network.DMA Network.Feed HH.Impressions HH.Ratings Spot.Cost
## 1    National                       18418      15.91      4000
## 2    Regional                          NA         NA      1000
## 3    Regional                          NA         NA      1000
## 4    Regional                          NA         NA      1000
## 5    Regional                          NA         NA      1050
## 6    Regional                          NA         NA      1000
##   Spot.Cost.Discounted
## 1               3800.0
## 2                950.0
## 3                950.0
## 4                950.0
## 5                997.5
## 6                950.0

Left Outer Join

Merge by using single key “Unique_id”

>Order_all_Slot_match<-merge(orders, slots, by="Unique_id", all.x=TRUE)
>head(Order_all_Slot_match)

OUTPUT:

##                            Unique_id      AD_ID.x      Date     Time
## 1             MYSPYR60A002416540.375 MYSPYR60A002 1/15/2014  9:00 AM
## 2 MYSPYR60A002416620.708333333333333 MYSPYR60A002 1/23/2014  5:00 PM
## 3 MYSPYR60A002416630.416666666666667 MYSPYR60A002 1/24/2014 10:00 AM
## 4             MYSPYR60A002416640.625 MYSPYR60A002 1/25/2014  3:00 PM
## 5 MYSPYR60A004416500.666666666666667 MYSPYR60A004 1/11/2014  4:00 PM
## 6 MYSPYR60A004416520.916666666666667 MYSPYR60A004 1/13/2014 10:00 PM
##   Client.Product.Code.x Product.ID.x Network.ID.x Length.x Orders AD_ID.y
## 1               MSDR02R       MYSPYR          ESQ       60      1    <NA>
## 2               MSDR02R       MYSPYR          ESQ       60      1    <NA>
## 3               MSDR02R       MYSPYR          ESQ       60      1    <NA>
## 4               MSDR02R       MYSPYR          ESQ       60      1    <NA>
## 5               MSDR02R       MYSPYR         NHLN       60      1    <NA>
## 6               MSDR02R       MYSPYR         NHLN       60      1    <NA>
##   Air.Date Air.Time Time.Zone Client.Product.Code.y Product.ID.y Length.y
## 1     <NA>     <NA>      <NA>                  <NA>         <NA>       NA
## 2     <NA>     <NA>      <NA>                  <NA>         <NA>       NA
## 3     <NA>     <NA>      <NA>                  <NA>         <NA>       NA
## 4     <NA>     <NA>      <NA>                  <NA>         <NA>       NA
## 5     <NA>     <NA>      <NA>                  <NA>         <NA>       NA
## 6     <NA>     <NA>      <NA>                  <NA>         <NA>       NA
##   Network.ID.y Network.Name Network.Type Network.DMA Network.Feed
## 1         <NA>         <NA>         <NA>        <NA>         <NA>
## 2         <NA>         <NA>         <NA>        <NA>         <NA>
## 3         <NA>         <NA>         <NA>        <NA>         <NA>
## 4         <NA>         <NA>         <NA>        <NA>         <NA>
## 5         <NA>         <NA>         <NA>        <NA>         <NA>
## 6         <NA>         <NA>         <NA>        <NA>         <NA>
##   HH.Impressions HH.Ratings Spot.Cost Spot.Cost.Discounted
## 1             NA         NA        NA                   NA
## 2             NA         NA        NA                   NA
## 3             NA         NA        NA                   NA
## 4             NA         NA        NA                   NA
## 5             NA         NA        NA                   NA
## 6             NA         NA        NA                   NA

 

Right Outer Join

Merge by using single key “Unique_id”

>Slot_all_order_match<-merge(orders, slots, by=”Unique_id”, all.y=TRUE)
>head(Slot_all_order_match)

 

##                            Unique_id      AD_ID.x      Date     Time
## 1  SPYMYA2CBSH416510.833333333333333  SPYMYA2CBSH 1/12/2014  8:00 PM
## 2             SPYMYA60C008416520.375 SPYMYA60C008 1/13/2014  9:00 AM
## 3             SPYMYA60C008416540.375 SPYMYA60C008 1/15/2014  9:00 AM
## 4 SPYMYA60C008416550.458333333333333 SPYMYA60C008 1/16/2014 11:00 AM
## 5             SPYMYA60C008416560.375 SPYMYA60C008 1/17/2014  9:00 AM
## 6 SPYMYA60C008416570.666666666666667 SPYMYA60C008 1/18/2014  4:00 PM
##   Client.Product.Code.x Product.ID.x Network.ID.x Length.x Orders
## 1                MSDR02       SPYMYA          CBS      120      3
## 2                MSDR02       SPYMYA       TBKIDS       60      1
## 3                MSDR02       SPYMYA       TBKIDS       60      1
## 4                MSDR02       SPYMYA       TBKIDS       60      1
## 5                MSDR02       SPYMYA       TBKIDS       60      1
## 6                MSDR02       SPYMYA       TBKIDS       60      1
##        AD_ID.y  Air.Date Air.Time Time.Zone Client.Product.Code.y
## 1  SPYMYA2CBSH 1/12/2014  8:00 PM       EST                MSDR02
## 2 SPYMYA60C008 1/13/2014  9:00 AM       EST                MSDR02
## 3 SPYMYA60C008 1/15/2014  9:00 AM       EST                MSDR02
## 4 SPYMYA60C008 1/16/2014 11:00 AM       EST                MSDR02
## 5 SPYMYA60C008 1/17/2014  9:00 AM       EST                MSDR02
## 6 SPYMYA60C008 1/18/2014  4:00 PM       EST                MSDR02
##   Product.ID.y Length.y Network.ID.y  Network.Name      Network.Type
## 1       SPYMYA      120          CBS           CBS Broadcast Station
## 2       SPYMYA       60       TBKIDS The Band Kids             Cable
## 3       SPYMYA       60       TBKIDS The Band Kids             Cable
## 4       SPYMYA       60       TBKIDS The Band Kids             Cable
## 5       SPYMYA       60       TBKIDS The Band Kids             Cable
## 6       SPYMYA       60       TBKIDS The Band Kids             Cable
##   Network.DMA Network.Feed HH.Impressions HH.Ratings Spot.Cost
## 1    National                       18418      15.91      4000
## 2    Regional                          NA         NA      1000
## 3    Regional                          NA         NA      1000
## 4    Regional                          NA         NA      1000
## 5    Regional                          NA         NA      1050
## 6    Regional                          NA         NA      1000
##   Spot.Cost.Discounted
## 1               3800.0
## 2                950.0
## 3                950.0
## 4                950.0
## 5                997.5
## 6                950.0

Outer Join

Merge by using single key “Unique_id”

>Order_Slots_union<- merge(orders, slots, by="Unique_id", all=TRUE)
>head(Order_Slots_union)

 

Order_Slots_union<- merge(orders, slots, by="Unique_id", all=TRUE)
head(Order_Slots_union)

In the next post we will see An Example of Merge and Join.

11th October 2018

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.