SQL Basics
Contents
- SQL
- Databases
- Sql Basics
- RMySql
- Sql Queries
- Sql Joins
What is SQL
- Structured Query Language, pronounced as “S-Q-L”
- Used for Creating, Accessing, Exploring, Cleaning, and Analyzing data
- SQL is a programming language for interacting and managing the databases
- SQL is relatively easy to learn
- Very few commands in whole language
- Intuitive code structure and easy syntax
Database
What is a database
- Organized collection of data
- A collection of files storing related data or a coherent collection of data
- Database contains
- Tables
- Queries
- Schemas – Data structure
- Views – Virtual tables. doesn’t contain any data. dynamically calculated
- A single database can contain multiple tables and queries
Example Databases
- Universities: Students database, registration, grades
- Sales: customers, products, purchases
- Manufacturing: production, inventory, orders, supply chain
- Human resources: employee records, salaries, tax deductions, Payroll database
- Amazon’s products database
- Banking Accounts database, all transactions
- Airlines: reservations, schedules
Tables, columns and records
- Tables contain homogeneous data records
- Contains columns and records

DBMS
Database Management System
- A software to interact with Databases
- A software designed for the definition, creation, querying, update, and administration of databases
- A database created in a DBMS system may not be compatible with other DBMS software’s
- We can use SQL language in all DBMS
- We can also use JDBC and ODBC drivers for connecting to DBMS
- RDBMS
- Relational Database Management System. A DBMS that is based on the relational model
- The tables have an identifier called primary key
Famous RDBMS software’s are
- MySQL
- PostgreSQL
- Microsoft SQL Server
- Oracle
- IBM DB2
DDL and DML
- SQL id used for two broad types of functions
- DDL – Data Definition Language
- Create database
- Create tables
- Access rights
- Manage Authentication
- DML – Data manipulation language
- Insert data
- Query the data
- Analyze the data
- DDL – Data Definition Language
In Data Analysis course we focus on “Data Manipulation Language”
Installing Database
LAB: Installing Database
- Create a user in Mysql
- Execute the classic models sql file to create classic models database.
SQL Queries
The sql Mantra
Select * from table
- How do fetch all the values from a particular table?
Select * from table
- To select all the columns from a particular table
- Get the product table from classic models database
select * from table_name select * from classicmodels.products;

How do fetch a particular column from a particular table?
Select col_name from table
- To select particular column from a particular table
- Get the product names column from products table in classic models database
select col_name from table_name select productName from classicmodels.products;

How do fetch some specific columns from a particular table?
Select col_name1, col_name2 from table
- To select multiple column from a particular table
- Get the product names and MSRP columns from products table in classic models database
select col_name1, col_name2 from table_name select productName,MSRP from classicmodels.products;

Can you assign alias while fetching the columns?
Select col_name as alias1 from table
- To select particular column and assign alias
- Get the product names as model_name from products table in classic models database
select col_name as alias1 from table_name select productName as model_name from classicmodels.products;

Select col_name as alias1 col_name2 as alais2 from table
- To select multiple columns and assign relevant alias
- Get the product names and MSRP columns from products table in classic models database. Give the alias model_name and final_price
select col_name as alaias1 from table_name select productName as model_name,MSRP as final_price from classicmodels.products

How do you fetch data with condition?
Select * from table where condition
- To select a table with condition
- Get the product table from classic models database select only records with product line “Motorcycle”
select * from table_name WHERE colname=“value” select * from classicmodels.products WHERE productLine='Motorcycles';

- To select a table with condition
- Get the product table from classic models database select only records with MSRP more than 100
select * from table_name WHERE colname>xxx; select * from classicmodels.products WHERE MSRP>100;How do you fetch data with multiple conditions?
Select * from table WHERE condition AND
- To select a table with WHERE conditions and “AND” operator
- Get the product table from classic models database select only records with conditions product line=“Vintage Cars” and MSRP is more than 100
select * from table_name WHERE condition1 and condition2 select * from classicmodels.products WHERE productLine='Vintage Cars' and MSRP>100;

Select * from table WHERE condition OR
- To select a table with WHERE conditions and “OR” operator
- Get the product table from classic models database select only records with conditions quantityInStock>4000 OR MSRP>150
select * from table_name WHERE condition1 OR condition2 select * from classicmodels.products WHERE quantityInStock>4000 OR MSRP>150;
Select * from table WHERE condition IN
- To select a table with WHERE condition and “IN” operator
- Get the product table from classic models database select only records where product line takes values “classic cars” or “Vintage cars” or “Trucks and Busses”
select * from table_name WHERE col_name IN (val1, val2) select * from classicmodels.products WHERE productLine IN ('Classic Cars','Vintage Cars', 'Trucks and Buses');

Can we sort the data based on a column?
Select * from table ORDER BY
- To select a table and sort it based on a column
- Get the product table from classic models database make sure that the resultant table is sorted based on MSRP
select * from table_name ORDER BY col_name select * from classicmodels.products ORDER BY MSRP;

Select * from table ORDER BY descending
- To select a table and sort it based on a column in descending order
- Get the product table from classic models database make sure that the resultant table is sorted based on MSRP in descending order
select * from table_name ORDER BY col_name DESC select * from classicmodels.products ORDER BY MSRP DESC;

Can we fetch some summary statistics instead of data ?
Select Count(*) from table
- To fetch number of rows
- Get the row count of product table from classic models database
select count(*) from table_name; select count(*) from classicmodels.products;

Select Count(col_name) from table
- To fetch number of rows based on a coloumn
- Get the row count based on product code from product table in classic models database
select count(col_name) from table_name; select count(productCode) from classicmodels.products;

Select sum(col_name) from table where
- To fetch the sum of values in a column
- Get the sum of quantityInStock from product table in classic models database
select sum(col_name) from table_name; select sum(quantityInStock) from classicmodels.products;

Select avg(col_name) from table
- To fetch the average of a column
- Get the average of MSRP from product table in classic models database
select avg(col_name) from table_name; select avg(MSRP) from classicmodels.products;

Can we fetch some summary statistics grouped into segments. For example counts of east, west, north and south regions
Select count(*) from table group by col_name
- To fetch count for groups of values in a column
- Get the counts of records for each product line. Frequency of each product line
select col_name, count(*) from table_name GROUP BY col_name; select productLine, count(*) from classicmodels.products GROUP BY productLine;

Select avg(col_name) from table group by col_name
- To fetch average of a column for groups of values in a column
- Get the average MSRP for each product line.
select col_name1, avg(col_name2) from table_name GROUP BY col_name1; select productLine, avg(MSRP) from classicmodels.products GROUP BY productLine;

GROUP BY and ORDER BY
-Get the average MSRP for each product line. Order the result based on average MSRP
select productLine, avg(MSRP) as avg_price from classicmodels.products GROUP BY productLine ORDER BY avg_price

Create Table
- We need to just add one more key word
- Create a new table by selecting product line =Motorcycles. Name the new table as motor_cycles_table
create table classicmodels.motor_cycles_table as select * from classicmodels.products where productLine='Motorcycles';

Drop Table
Drop Statement
- Use drop statement
- Create a temp table by selecting MSRP>2000
- Drop the temp table
create table classicmodels.temp as select * from classicmodels.products where MSRP>2000; Drop table classicmodels.temp;
Joins
The Join Scenario
- All the data is not arranged in a single table
- Relevant homogenous data is stored in corresponding tables

- Order Details data has productCode
- Products data has product details for a given productCode.
- Can we attach all the matching product details(from product table) to order details

Right Outer Join
- Resultant table will have all the order details(the right hand side table) and the matching product details (left hand side)
- All from right table, matching from left table
- This is called right outer join
- Matching records will be populated , otherwise fields are left blank

Resultant table
- Number of rows in resultant table = number of rows in the right hand side table
- Number of columns in resultant table = columns of table1 + columns of table2
- Matching should be on product id
Code Right Outer Join
select *
from classicmodels.products
right outer join classicmodels.orderdetails
on products.productCode=orderdetails.productCode;

Left Outer Join
- Same logic as right outer join
- We would like to keep all the data from left hand side table and matching data from right hand side table
- The table of importance is the left side one

Code – Left outer join
select *
from classicmodels.orderdetails
left outer join classicmodels.products
on orderdetails.productCode=products.productCode;

Inner Join
- We have two tables
- Two tables have connecting key
- We want the details of all the matching records or common records from two tables
- We match the key, if it is there in both the tables then the resultant table will have the record, if it is not present in any one of the tables, then it will be dropped.
Inner Join and outer join difference
- Left outer join contains all the records from left dataset, unconditionally
- Right outer join contains all the records from left dataset, unconditionally
- Inner join contains only matching records

Lab – Inner Join
- Take motor_cycle_table. It was a small subset of products, created by taking productLine=Motorcycle
- Inner join motor_cycle_table table and orderDetails table to see the common records.
Code – Inner Join
select *
from classicmodels.orderdetails
inner join classicmodels.motor_cycles_table
On orderdetails.productCode=motor_cycles_table.productCode;

Full Outer Join
- We have two tables.
- Table-1 has some unique data that is not there in table-2
- Table-2 has some unique data that is not there in table-1
- The common data can be extracted using inner join
- How to extract over-all available data, which is there in table-1 and table-2
- Full outer join fetches all the available information
Code-Full Outer Join
Full outer join – Part-1 : Left Outer Join
select * from classicmodels.orderdetails left outer join classicmodels.products on orderdetails.productCode=products.productCode;
Full outer join – Part-2 : Right Outer Join
select * from classicmodels.orderdetails right outer join classicmodels.products on products.productCode=orderdetails.productCode;
Then combine these tqo queries
select * from classicmodels.orderdetails left outer join classicmodels.products on orderdetails.productCode=products.productCode
Union
select * from classicmodels.orderdetails right outer join classicmodels.products on products.productCode=orderdetails.productCode;

Conclusion
- In this session we started with basics of RDBMS
- This is a very basic class on SQL
- SQL has many more commands for data handling and database management
- We focused only on DML – Data manipulation language we discussed nothing about DDL – Data Definition Language
- You may want to explore advanced functionalities of SQL while you are performing basic descriptive statistics and ad hock analysis.


