Before start our lesson please download the datasets.
Introduction to Python
This document will cover basics of python for data science which can be used for future purposes and are used for predictive modelling or machine learning projects which we want to do.
Contents
- What is Python & History
- Installing Python & Python Environment
- Basic commands in Python
- Data Types and Operations
- Python packages
- Loops
- My first python program
- If-then-else statement
Introduction to Python & History
What is python
Python is general purpose programming language which can also be used as a scripting language. Moreover, it is an human readable language and well documented if we have basic programming knowledge. It is open source (free). Python can be used for different purposes. If we are going with web development, then we can use django frame work. Python is simple language and has powerful libraries available for data scientist and machine learning.
History
- Python language created by Guido van Rossum (Benevolent Dictator for Life). From 2000-2012, he worked with google. His major work over there was to just develop this language.
- First Python version released in 1991.
- Python 2 released in 2000.
- Python 3 released in 2008.
- Python 3 introduced to overcome future code expanding.
- Python 3 is NOT fully backwards compatible with Python 2.
- Python 2 is frozen and supported until 2020. Good features from Python 3 are back-ported.
Which one to use? Python 2 or Python 3
- Python 2 is NOT same as Python 3. There are minor changes.
- There are some incompatibilities, code in Python 2 may not always run in Python 3 and vice-versa.
- All important packages like NumPy, SciPy and Matplotlib are available for both Python 2 and Python 3.
- We are going to use Python 3 in our course.
Installing Python & Python IDEs
- Python has many options to write and execute a program.
- There are mainly 3 ways to execute python code:
- Text Editors or Command line interface
- Ipython Notebook (report file interactive environment)
- Any Python IDE
- We will use Spyder IDE in our course, which is part of Anaconda distribution.
- Anaconda distribution has all the required software’s inbuilt. We just need to download and install it.
Installing Python, Anaconda
- Visit https://www.continuum.io/downloads and select the compatible version of Anaconda with Python 3.
- Download and install by running the .exe file for windows.
- It automatically installs
- Python
- Ipython
- Jupyter notebook
- Spyder IDE
- It comes with all the necessary packages pre-installed.
- Spyder is what we need for our coding.
Spyder- Python IDEs
- Spyder (formerly Pydee) is an open source cross-platform IDE for the Python language.
- Editor with syntax highlighting and code completion.
- Has an interactive console to execute and check the output of the code.
- Testing and debugging is relatively easy.
- Best IDE if you are coming from a R-Studio or MATLAB background.
- Spyder also provides an object inspector that executes in the context of the console. Any objects created in the console can be examined in the inspector.
In spyder environment, we have three parts:
1. Editor
2. Variable explorer
3. Console
Spyder-Editor
- This is where you write the code.
- To execute the code, select and hit Ctrl+Enter.
- You can load old code files.
- Code written in this editor is saved in .py format.
- You can hit the tab button to see the autofill options on objects and function names.
- You will be spending most of your time on editor.
Spyder-Console
- This is where the code will be executed when you hit Ctrl+Enter in editor.
- Helps us in code testing and debugging.
- Helps us to avoid errors in the source code at the development phase itself.
- Its usual practice is to write a chunk of code in editor then execute it and see if it is working well or not.
- You can toggle between Console and IPython Console.
- Here, we have two types of consoles:
- Python console
If we want to write code line by line, then we will prefer python console.
- Ipython console
To run the total script, where total script will run in single go.
- Python console
Spyder – Variable explorer
- Shows all the variables that are created in the current session.
- Helps in physically checking the presence of objects that are created.
- Shows a quick summary of type of object, size, length, sample values, etc.
- We can run the code and see the objects getting created; and also we can validate the data type and size of the object.
Basic Commands in Python
Before you code
- Before we start the execution of commands of Python, we should know that python is case-sensitive.
- Example:
- Sales_data is not same as sales_data
Basic Commands
Try this basic commands like add,sub,mul, div and try to print those things in spyder shell
571+95
19*17
print(57+39)
print(19*17)
print("Dv Analytics")
#Division example
34/56
Basics-What an error looks like?
If you are trying to use print with P (uppercase P), then it doesn’t work and you will see a basic error.
Print(600+900)
#used Print() instead of print()
576-'96'
LAB: Basic Commands
- Calculate below values
- 973*75
- 22/7
- Print the string “my python file”
973*75
22/7
print("my python file")
Assigning and Naming convention
Assignment operator
In python, we can assign new variables and the name of the variable should be in a particular manner just like any other programming language. We need to consider few things before you assign name or defining a new variable. In python, “=” is the assignment operator and it is used for assigning the values. Here, we try to see few commands showing how it works:
income=12000
income
x=20
x
y=30
z=x*y
z
del x #deletes the variable
Printing
name="Jack"
name
print(name)
book_name="Practical business analytics n using SAS"
book_name
print(book_name)
Naming convention
- In python, just like any other languages when you are trying to define a new variable, then you need to understand how the variable name can be defined.
- These are the basic rules as follows:
- You can start the letter with upper/lower case(A-Z or a-z).
- Can contain letters, digits (0-9), and/or underscore ”_-”.
#Doesn't work
1x=20
#works
x1=20
x1
#Doesn't work
x.1=20
x.1
#works
x_1=20
x_1
Type of Objects
- Object refers to any entity in a python program.
- Python has some standard built-in object types:
- Numbers
- Strings
- Lists
- Tuples
- Dictionaries
- Having a good knowledge on these basic objects is essential to feel comfortable in Python programming.
Numbers
- The good thing about python is that you can just define a variable or assign any value to a variable very easily even it is integer or float value, you don’t have to define specifically.
age=30
age
weight=102.88
weight
x=17
x**2 #Square of x
Check the variable types for age and weight in variable explorer:
- From the example, “age” is integer and “weight” is 102.88, so it gives a float value as it is automatically interpreted in python.
- If you want to see the types of the variable.
- Type(age), then run this; then we can see the type of the variable.
- Accordingly, we can try for the remaining.
type(age)
type(weight)
Strings
- Strings are collection of characters.
- Strings are amongst the most popular types in Python. There are a number of methods or built-in string functions.
- Define Strings
name="Sheldon"
msg="Dv Analytics Data Science Classes"
- Accessing strings
print(name[0])
print(name[1])
- This is as good as substring.
print(msg[0:9])
- Length of string
len(msg)
#is used to get length of the string
print(msg[10:len(msg)])
Performing multiple strings
- Displaying string multiple time
msg="Site under Construction"
msg*10
msg*50
- There is a difference between print and just displaying a variable
message="Data Science on R and Data Science on Python n"
message*10
print(message*10)
String Concatenation
- If you want to connect to different strings, then we use string concatenation.
- The basic way to do is to use a function plus(+) and thus the string will be joined.
- Examples are given below:
msg1="Site under Construction "
msg2=msg1+"Go to home page n"
print(msg2)
#this will help to combine the msg1 and msg2
print(msg2*10)
#here msg2 will be printed 10 times
List
- List is a sequential data set that we can add or create or put many kind of data variable.
- List is a hybrid datatype - A sequence of related data
- Similar to array, but all the elements need not be of same type.
- Creating a list
mylist1=['Sheldon','Male', 25]
- Accessing list elements
mylist1[0] #Python indexing starts from 1
mylist1[1]
mylist1[2]
- Appending to a list
mylist2=['L.A','No 173', "CR108877"]
final_list=mylist1+mylist2
final_list
- Here we added both the lists, by the command called mylist1+mylist2 and then we can see the final list of 1 and 2.
- Updating list elements
- We can update the lists. Here, we are updating the second element as 35.
final_list[2]=35
final_list
- Length of list
length(final_list) – If we execute this command the length of the list will be displayed.
len(final_list)
- Deleting an element in list
If we need to delete an element, then just use the del(final_list[5]) and execute it. It will delete the last element in the final_list which can be checked by executing it.
del final_list[5]
final_list
Tuples
Another datatype is tuples. This are very similar to list as they are also sequential datatypes and can add different type of data variables or strings or integer. The only difference between list and tuples is that tuples cannot update i.e., cannot able to change.
- Also sequence data types.
- Created using parenthesis. Lists were created using square brackets.
- Tuples can’t be updated.
my_tuple=('Mark','Male', 55)
my_tuple
my_tuple[1]
my_tuple[2]
my_tuple[0]*10
- Here, in tuples, we are just using () i.e., parenthesis for creating tuples.
- Difference between tuples and lists
my_list=['Sheldon','Male', 25]
my_tuple=('Mark','M', 55)
my_list[2]=30
my_list
my_tuple[2]=40
- The major difference between tuples and list is lists can be updated and tuples cannot be updated or changed.
Dictionaries
- Dictionaries have two major element types:
- Key - Value
- These above are sequential data types.
- Dictionaries are collection of key-value pairs.
- Each key is separated from its value by a colon (:), the items are separated by commas, and the whole thing is enclosed in curly braces.
- Keys are unique within a dictionary.
city={0:"LA", 1:"PA" , 2:"FL"}
city
city[0]
city[1]
city[2]
- In dictionary, keys are similar to indexes. We define our own preferred indexes in dictionaries.
- Make sure that we give the right key index while accessing the elements in dictionary.
names={1:"David", 6:"Bill", 9:"Jim"}
names
names[0] #Doesn't work, because we haven't assign "0" to any value?
names[1]
names[2]
names[6]
names[9]
- In the key-value pairs, key need not be a number always.
edu={"David":"Bsc", "Bill":"Msc", "Jim":"Phd"}
edu
edu[0]
edu[1]
edu[David]
edu["David"]
- Updating values in dictionary
edu
edu["David"]="MSc"
edu
- Updating keys in dictionary
- Delete the key and value element first and then add new element.
city={0:"LA", 1:"PA" , 2:"FL"}
#How to make 6 as "LA"
del city[0]
city
city[6]="LA"
city
- Fetch all keys and all values separately
city.keys()
city.values()
edu.keys()
edu.values()
If-Then-Else statement
If Condition
We need to add some degree logical in our code. This is one of the logical statement i.e., IF and ELSE. IF statement is used for checking or to see something in the case or giving it condition or the case is true, then we perform one action which is under the IF statement. If it is not true, then it will bypass the action which we need to perform because the condition is not true.
age=60
if age<50:
print("Group1")
print("Done with If")
IF-ELSE statement
Our IF condition with statement is not true, then it moves to ELSE part and prints the ELSE part. This means that whole statement will give you one result and will get one output. If IF statement is true, then we will get something related to IF or else, we get result on ELSE statement. Here, below are some given examples for IF-ELSE statement.
age=60
if age<50:
print("Group1")
else:
print("Group2")
print("Done with If else")
Multiple ELSE conditions in IF
If any of this statements is true, then given below will first check the condition mark <30 henceforth if it is true, then it will print fail or else it will break the whole loop. It will go to next part of the code and whatever will be coming next, we just have to move out of this whole elif statement.
marks=75
if(marks<30):
print("fail")
elif(marks<60):
print("Second Class")
elif(marks<80):
print("First Class")
elif(marks<100):
print("Distinction")
else:
print("Error in Marks")
marks=20
if(marks<30):
print("fail")
elif(marks<60):
print("Second Class")
elif(marks<80):
print("First Class")
elif(marks<100):
print("Distinction")
else:
print("Error in Marks")
Nested IF
Nested IF is also a version of ELSE-IF statements. If the IF-condition is true, then it goes in to the action part directly. Here, we had given one additional condition with x, where we can just keep going the IF-condition so far whenever IF-condition is true, it will enter the condition actions and will keep performing. Wherever it breaks, then it gets out of IF-condition and go back where it will finally print whatever the action we want to perform. Here, x is 45 and we can see if x is less than 50 then accordingly, x is 45 which is true and will return number which should be less than 50. This is true, but next IF line also under the part of this x which is less than 50, it will also execute x which is less than 40; but here we can see the condition is not true that it will not enter any part of the statement. It will directly go to ELSE part where the number is greater than 40 and the whole nested IF will print less than 50 number which is greater than 40 and we can try to run the same code and we need to see how it works.
x=45
if(x<50):
print("Number is less than 50")
if(x<40):
print ("Number is less than 40")
if(x<30):
print("Number is less than 30")
else:
print("Number is greater than 30")
else:
print("Number is greater than 40")
else:
print("Number is greater than 50")
x=35
if(x<50):
print("Number is less than 50")
if(x<40):
print ("Number is less than 40")
if(x<30):
print("Number is less than 30")
else:
print("Number is greater than 30")
else:
print("Number is greater than 40")
else:
print("Number is greater than 50")
For loop
- For loop is a iteration statement.
- It allows a code block to be repeated certain number of times.
- Generally we see a for loop being iterating through a list.
Syntax:
- for(variable) in (sequence):
- (code block)
Example-1
my_num=1
for i in range(1,20):
my_num=my_num+1
print("my num value is", my_num)
i=i+1
Example-2
X = 0; it will take sum as zero for the first value and the x would be 1, then (sum + 1). It will give output of one, then sumx has been changed to sum1; next time sumx would be added, thus (sumx+2) will be 3 iterations and it will be done similarly.
sumx = 0
x=1
for x in range(1,20):
sumx = sumx + x
print(sumx)
Break Statement in FOR-loop
We want to iterate through code only for certain no. of time and till certain condition or certain output has come. We can do is in between, we just put IF-statement, where the condition is true; then the FOR-loop will break and we can close the FOR-loop and can move further with our code.
- To stop execution of a loop
- Stopping the loop in midway using a condition
sumx = 0
x=1
for x in range(1,200):
sumx = sumx + x
if(sumx>500):
break
print(sumx)
Function
Sometimes, in programming, we need to perform some task in different places in our code with different values and variables instead of writing same code over and over, we can write generalized code block which could be called a function and called whenever we need.
- A function is a piece of code, which takes input values and returns a result.
- Function can be used again with different values.
- Instead of rewriting the whole code, it’s much cleaner to define a function, which can then be used repeatedly.
- Function improves modularity of a program.
- Functions are also known as Method.
Function Syntax:
- Function has two Components:
- Header (Again header has two components):
- Function Name
- Input Parameters
- Body: (Consists of the procedure which we want the function to carry out).
- Header (Again header has two components):
- Example:
def square(a):
c = a*a
return c
Writing our own function
- Write a function that:
- Two numerical variables as inputs.
- Returns the remainder when variable 1 is divided by variable 2.
def remainder(var1, var2):
a = var1%var2
print(a)
Packages
Python packages are the true power of python which lies in the packages. Packages are like bundle of pre-build function; if you are from R background, then you must be knowing few libraries which has table functions and packages and are used to perform specific task.
- A package is collection of python functions. A properly structured and complied code. A package may contain many sub-packages.
- Many python functions are only available via “packages” that must be imported.
- For example, to find value of log(10), we need to first import match package that has the log function in it.
log(10)
exp(5)
sqrt(256)
import math
math.log(10)
math.exp(5)
math.sqrt(256)
- To be a good data scientist on python, one needs to be very comfortable with below packages:
- Numpy
- Scipy
- Pandas
- Scikit-Learn
- Matplotlib
Important Packages- NumPy
Numpy is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric with extensive modifications. NumPy is open-source software and has many contributors.
- NumPy is for fast operations on vectors and matrices, including mathematical, logical, shape manipulation, sorting, selecting.
- It is the foundation on which all higher level tools for scientific Python are built.
import numpy as np
income = np.array([9000, 8500, 9800, 12000, 7900, 6700, 10000])
print(income)
print(income[0])
expenses=income*0.65
print(expenses)
savings=income-expenses
print(savings)
Important Packages – Pandas
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license. The name is derived from the term “Panel Data”, an econometrics term for multidimensional structured datasets. Pandas is data frame package which allows any data file or csv file we can work on it.
- Data frames and data handling.
- Pandas has Data structures and operations for manipulating numerical tables and time series.
import pandas as pd
buyer_profile = pd.read_csv('R DatasetBuyers ProfilesTrain_data.csv')
print(buyer_profile)
buyer_profile.Age
buyer_profile.Gender
buyer_profile.Age[0]
buyer_profile.Age[0:10]
Important Packages- Scikit-Learn
Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
- It is a Machine learning algorithms made easy by using this scikit-Learn.
import sklearn as sk
import pandas as pd
air_passengers = pd.read_csv('R DatasetAirPassengersAirPassengers.csv')
air_passengers
x=air_passengers['Promotion_Budget']
x=x.reshape(-1,1)
y=air_passengers['Passengers']
y=y.reshape(-1,1)
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(x, y)
print('Coefficients: n', reg.coef_)
Important Packages- Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like wxPython, Qt, or GTK+. There is also a procedural “pylab” interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of matplotlib. Plotting library similar to MATLAB plots matplotlib is an matlab library, where python do not have any specific library for plots so we use mathplotlib to draw any kind of plots.
import numpy as np
import matplotlib as mp
import matplotlib.pyplot
#to print the plot in the notebook:
%matplotlib inline
X = np.random.normal(0,1,1000)
Y = np.random.normal(0,1,1000)
mp.pyplot.scatter(X,Y)
General notes
- Variable is lost after restarting shell.
- Using the same object name, overwrites the old object.
- Customize the color coding and highlighting of coding syntax, it makes it easy to read.
- Make use of variable explorer for physical verification of created variables.
Conclusion
- In this session, we got basic introduction to Python. We tried some basic commands in Python.
- In later sessions, we will see data handling and basic descriptive statistics.