Before start our video lession please download the SQL file.
SQL Basics
Contents
- SQL
- Databases
- Sql Basics
- RMySql
- Sql Queries
- Sql Joins
What is SQL
- Structured Query Language, pronounced as “SQL”.
- Used for Creating, Accessing, Exploring, Cleaning, and Analyzing data
- SQL is a programming language for interacting and managing the databases
- SQL is relatively easy to learn
- Very few commands in whole language
- Intuitive code structure and easy syntax
Database
What is a database
- Organized collection of data
- A collection of files storing related data or a coherent collection of data
- Database contains
- Tables
- Queries
- Schemas – Data structure
- Views – Virtual tables, it doesn’t contain any data and it is dynamically calculated
 
- A single database can contain multiple tables and queries
Example Databases
- Universities: Students database, registration, grades
- Sales: customers, products, purchases
- Manufacturing: production, inventory, orders, supply chain
- Human resources: employee records, salaries, tax deductions, Payroll database
- Amazon’s products database
- Banking Accounts database, all transactions
- Airlines: reservations, schedules
Tables, columns and records
- Tables contain homogeneous data records
- Contains columns and records
- Table is the combination of rows and columns, where each row is a record or a data value. Each column gives some specific information about the record.

- The table looks like the details of some products.
DBMS
Database Management System
- If we have a database, i.e., the collection of tables or data inside it, we need a software to interact with Databases.
- We need to query the database to get a particular information about the database. Like if we have an employee database, how to get the info about an employee working at a particular location?
- To perform such operations and extract the information, we need a software to interact with the database.
- DBMS is a software, designed for the definition, creation, querying, update, and administration of databases
- All information in database cannot be accessed from the database, only privileged or authorized people can access some vital information. This is managed by the DBMS.
- A database created in a DBMS system may not be compatible with other DBMS software’s
- We can use SQL language in all DBMS
- We can also use JDBC and ODBC drivers for connecting to DBMS
- RDBMS
- Relational Database Management System. A DBMS that is based on the relational model
- The tables have an identifier called primary key
- It is similar to DBMS but its a DBMS based on relational model. The tables in this will be somehow connected. There will be the primary key, and based on primary key, the tables are connected. Hence it is called as a relational database management system.
Famous RDBMS softwares are
- MySQL
- PostgreSQL
- Microsoft SQL Server
- Oracle
- IBM DB2
DDL and DML
- SQL id used for two broad types of functions
- DDL – Data Definition Language
- Create database
- Create tables
- Access rights
- Manage Authentication
 
- DML – Data manipulation language
- Insert data
- Query the data
- Analyze the data
 
In Data Analysis course we focus on “Data Manipulation Language”.
Installing Database
LAB: Installing Database
- For writing SQL queries, we need MySQL workbench or MySQL command line client installed in our system.
- Right now we do not have a database. Before we write any queries, we have to make sure that there is a database that we can access. So lets us intall the database. This need not be done always. We can create database just once and work on it.
- Now open MySQL.
- Create a user in Mysql
- We have to create a new connection with a name let’s say, “local”. The username and password for local will be root. This is for our convenience, we can use any name for user and password.
- An sql file is shared, which will automatically create a database.
- Open the sql file in a new query tab and run it. The database “classicmodels” will be automatically created.
- In the Action Output window, under schemas we can see the database created.
- We will be writing queries with respect to this database.
Joins
The Join Scenario
- All the data is not arranged in a single table
- Relevant homogenous data is stored in corresponding tables

- Order Details data has productCode
- Products data has product details for a given productCode.
- There are many tables in the database which has some connection with some other table. The information what we need may be present in homogenous tables.
- Can we attach all the matching product details(from product table) to order details?

Right Outer Join
- Resultant table will have all the order details(the right hand side table) and the matching product details (left hand side table)
- All from right table, matching from left table .
- Matching will be done based on a unique id.
- This is called right outer join
- Matching records will be populated , otherwise fields are left blank

- This will be the resultant table. It will contain all the columns from both the tables. It will contain all the products from the right table i.e., the order details table but it will contain the description of only matching products from the product details table.
Resultant table
- Number of rows in resultant table = number of rows in the right hand side table
- Number of columns in resultant table = columns of table1 + columns of table2
- Matching should be on product id.
Code Right Outer Join
        select * 
        from classicmodels.products 
        right outer join classicmodels.orderdetails 
        on products.productCode=orderdetails.productCode;- The poductCode column will be there in both the table, based on which we will find the match.

Left Outer Join
- Same logic as right outer join
- We would like to keep all the data from left hand side table and matching data from right hand side table
- The table of importance is the left side one

Code – Left outer join
        select * 
        from classicmodels.orderdetails
        left outer join classicmodels.products
        on orderdetails.productCode=products.productCode;
Inner Join
- We have two tables
- Two tables have connecting key
- We want the details of all the matching records or common records from two tables
- We match the key, if it is there in both the tables then the resultant table will have the record, if it is not present in any one of the tables, then it will be dropped.
Inner Join and outer join difference
- Left outer join contains all the records from left dataset, unconditionally
- Right outer join contains all the records from left dataset, unconditionally
- Inner join contains only matching records

Lab – Inner Join
- Take motor_cycle_table. It was a small subset of products, created by taking productLine=Motorcycle
- Inner join motor_cycle_table table and orderDetails table to see the common records.
Code – Inner Join
        select * 
        from classicmodels.orderdetails 
        inner join classicmodels.motor_cycles_table 
        On orderdetails.productCode=motor_cycles_table.productCode;
Full Outer Join
- We have two tables.
- Table-1 has some unique data that is not there in table-2
- Table-2 has some unique data that is not there in table-1
- The common data can be extracted using inner join
- How to extract over-all available data, which is there in table-1 and table-2
- Full outer join fetches all the available information
Code-Full Outer Join
Full outer join – Part-1 : Left Outer Join
    select * from classicmodels.orderdetails left outer join classicmodels.products on              orderdetails.productCode=products.productCode;Full outer join – Part-2 : Right Outer Join
        select * from classicmodels.orderdetails right outer join classicmodels.products on products.productCode=orderdetails.productCode;Then combine these tqo queries
        select * from classicmodels.orderdetails left outer join classicmodels.products on orderdetails.productCode=products.productCode
        Union
        select * from classicmodels.orderdetails right outer join classicmodels.products on products.productCode=orderdetails.productCode;
Conclusion
- In this session we started with basics of RDBMS
- This is a very basic class on SQL
- SQL has many more commands for data handling and database management
- We focused only on DML – Data manipulation language we discussed nothing about DDL – Data Definition Language
- You may want to explore advanced functionalities of SQL while you are performing basic descriptive statistics and ad hock analysis.



