Python is general purpose programming language which can also be used as a scripting language. Moreover, it is an human readable language and well documented if we have basic programming knowledge. It is open source (free). Python can be used for different purposes. If we are going with web development, then we can use django frame work. Python is simple language and has powerful libraries available for data scientist and machine learning.

History

Python language created by Guido van Rossum (Benevolent Dictator for Life). From 2000-2012, he worked with google. His major work over there was to just develop this language.
First Python version released in 1991.
Python 2 released in 2000.
Python 3 released in 2008.
Python 3 introduced to overcome future code expanding.
Python 3 is NOT fully backwards compatible with Python 2.
Python 2 is frozen and supported until 2020. Good features from Python 3 are back-ported.

Which one to use? Python 2 or Python 3

Python 2 is NOT same as Python 3. There are minor changes.
There are some incompatibilities, code in Python 2 may not always run in Python 3 and vice-versa.
All important packages like NumPy, SciPy and Matplotlib are available for both Python 2 and Python 3.
We are going to use Python 3 in our course.

Installing Python & Python IDEs

Python has many options to write and execute a program.
There are mainly 3 ways to execute python code:
1. Text Editors or Command line interface
2. Ipython Notebook (report file interactive environment)
3. Any Python IDE
We will use Spyder IDE in our course, which is part of Anaconda distribution.
Anaconda distribution has all the required software’s inbuilt. We just need to download and install it.

Installing Python, Anaconda

Visit https://www.continuum.io/downloads and select the compatible version of Anaconda with Python 3.
Download and install by running the .exe file for windows.
It automatically installs
- Python
- Ipython
- Jupyter notebook
- Spyder IDE
It comes with all the necessary packages pre-installed.
Spyder is what we need for our coding.

Spyder- Python IDEs

Spyder (formerly Pydee) is an open source cross-platform IDE for the Python language.
Editor with syntax highlighting and code completion.
Has an interactive console to execute and check the output of the code.
Testing and debugging is relatively easy.
Best IDE if you are coming from a R-Studio or MATLAB background.
Spyder also provides an object inspector that executes in the context of the console. Any objects created in the console can be examined in the inspector.

In spyder environment, we have three parts:

    1.    Editor 
    2.    Variable explorer 
    3.    Console

Spyder-Editor

This is where you write the code.
To execute the code, select and hit Ctrl+Enter.
You can load old code files.
Code written in this editor is saved in .py format.
You can hit the tab button to see the autofill options on objects and function names.
You will be spending most of your time on editor.

Spyder-Console

This is where the code will be executed when you hit Ctrl+Enter in editor.
Helps us in code testing and debugging.
Helps us to avoid errors in the source code at the development phase itself.
Its usual practice is to write a chunk of code in editor then execute it and see if it is working well or not.
You can toggle between Console and IPython Console.

Here, we have two types of consoles:

Python console

  If we want to write code line by line, then we will prefer python console.

Ipython console

  To run the total script, where total script will run in single go.

Spyder – Variable explorer

Shows all the variables that are created in the current session.
Helps in physically checking the presence of objects that are created.
Shows a quick summary of type of object, size, length, sample values, etc.
We can run the code and see the objects getting created; and also we can validate the data type and size of the object.

Basic Commands in Python

Before you code

Before we start the execution of commands of Python, we should know that python is case-sensitive.

Example:

  - Sales_data is not same as sales_data

Basic Commands

Try this basic commands like add,sub,mul, div and try to print those things in spyder shell

In [1]:

571+95
19*17
print(57+39)
print(19*17)
print("Dv Analytics")

96
323
Dv Analytics

Use hash(#) for comments, which will not be a part of the code and for paragraph, we use (‘’’) and type your paragraph and end up with (‘’’).

In [2]:

#Division example
34/56

Out[2]:

0.6071428571428571

Basics-What an error looks like?

If you are trying to use print with P (uppercase P), then it doesn’t work and you will see a basic error.

In [3]:

Print(600+900) 
#used Print() instead of print()
576-'96'

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-8cacbcbfb378> in <module>()
----> 1 Print(600+900)
      2 #used Print() instead of print()
      3 576-'96'
      4 

NameError: name 'Print' is not defined

LAB: Basic Commands

Calculate below values
- 973*75
- 22/7
Print the string “my python file”

In [4]:

973*75

Out[4]:

In [5]:

22/7

Out[5]:

3.142857142857143

In [6]:

print("my python file")

my python file

Assigning and Naming convention

Assignment operator

In python, we can assign new variables and the name of the variable should be in a particular manner just like any other programming language. We need to consider few things before you assign name or defining a new variable. In python, “=” is the assignment operator and it is used for assigning the values. Here, we try to see few commands showing how it works:

In [7]:

income=12000
income

Out[7]:

In [8]:

x=20
x

Out[8]:

In [9]:

y=30
z=x*y
z

Out[9]:

In [10]:

del x #deletes the variable

Printing

In [11]:

name="Jack"
name

Out[11]:

'Jack'

In [12]:

print(name)

Jack

Is there a difference between output of name and print(name)?

In [13]:

book_name="Practical business analytics n using SAS"
book_name

Out[13]:

'Practical business analytics n using SAS'

In [14]:

print(book_name)

Practical business analytics 
 using SAS

In the above, we had assigned a book name with back slash, which means it is just giving the value of the string when we try to use print function. Here, we can see that when we trying to use print function, then it takes the back slash as new line and this is how print worked out.

Naming convention

In python, just like any other languages when you are trying to define a new variable, then you need to understand how the variable name can be defined.
These are the basic rules as follows:
- You can start the letter with upper/lower case(A-Z or a-z).
- Can contain letters, digits (0-9), and/or underscore ”_-”.

In [15]:

#Doesn't work
1x=20

  File "<ipython-input-15-fd484b8e1634>", line 2
    1x=20
     ^
SyntaxError: invalid syntax

In [16]:

#works
x1=20 
x1

Out[16]:

In [17]:

#Doesn't work
x.1=20 
x.1

  File "<ipython-input-17-6d3a7516bd8a>", line 2
    x.1=20
      ^
SyntaxError: invalid syntax

In [18]:

#works
x_1=20 
x_1

Out[18]:

Type of Objects

Object refers to any entity in a python program.
Python has some standard built-in object types:
- Numbers
- Strings
- Lists
- Tuples
- Dictionaries
Having a good knowledge on these basic objects is essential to feel comfortable in Python programming.

Numbers

The good thing about python is that you can just define a variable or assign any value to a variable very easily even it is integer or float value, you don’t have to define specifically.

In [19]:

age=30
age

Out[19]:

In [20]:

weight=102.88
weight

Out[20]:

102.88

In [21]:

x=17
x**2 #Square of x

Out[21]:

Check the variable types for age and weight in variable explorer:

From the example, “age” is integer and “weight” is 102.88, so it gives a float value as it is automatically interpreted in python.
If you want to see the types of the variable.
Type(age), then run this; then we can see the type of the variable.
Accordingly, we can try for the remaining.

In [22]:

type(age)

Out[22]:

int

In [23]:

type(weight)

Out[23]:

float

Strings

Strings are collection of characters.
Strings are amongst the most popular types in Python. There are a number of methods or built-in string functions.

Define Strings

In [24]:

name="Sheldon"
msg="Dv Analytics Data Science Classes"

Accessing strings

In [25]:

print(name[0])
print(name[1])

S
h

This is as good as substring.

In [26]:

print(msg[0:9])

Dv Analytics

The above code prints characters fro 0 to 9 characters.

Length of string

In [27]:

len(msg)
#is used to get length of the string

Out[27]:

In [28]:

print(msg[10:len(msg)])

Data Science Classes

Performing multiple strings

Displaying string multiple time

In [29]:

msg="Site under Construction"
msg*10

Out[29]:

'Site under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under Construction'

In [30]:

msg*50

Out[30]:

'Site under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under Construction'

There is a difference between print and just displaying a variable

In [31]:

message="Data Science on R and Data Science on Python n"
message*10

Out[31]:

'Data Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python n'

In [32]:

print(message*10)

Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python

String Concatenation

If you want to connect to different strings, then we use string concatenation.
The basic way to do is to use a function plus(+) and thus the string will be joined.
Examples are given below:

In [33]:

msg1="Site under Construction "
msg2=msg1+"Go to home page n"
print(msg2)

#this will help to combine the msg1 and msg2

Site under Construction Go to home page

In [34]:

print(msg2*10)
#here msg2 will be printed 10 times

Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page

List

List is a sequential data set that we can add or create or put many kind of data variable.
```
  - List is a hybrid datatype
  - A sequence of related data
```
Similar to array, but all the elements need not be of same type.

Creating a list

In [35]:

mylist1=['Sheldon','Male', 25]

Accessing list elements

In [36]:

mylist1[0] #Python indexing starts from 1

Out[36]:

'Sheldon'

In [37]:

mylist1[1]

Out[37]:

'Male'

In [38]:

mylist1[2]

Out[38]:

Appending to a list

In [39]:

mylist2=['L.A','No 173', "CR108877"]
final_list=mylist1+mylist2
final_list

Out[39]:

['Sheldon', 'Male', 25, 'L.A', 'No 173', 'CR108877']

Here we added both the lists, by the command called mylist1+mylist2 and then we can see the final list of 1 and 2.

Updating list elements

We can update the lists. Here, we are updating the second element as 35.

In [40]:

final_list[2]=35
final_list

Out[40]:

['Sheldon', 'Male', 35, 'L.A', 'No 173', 'CR108877']

Length of list

length(final_list) – If we execute this command the length of the list will be displayed.

In [41]:

len(final_list)

Out[41]:

Deleting an element in list

If we need to delete an element, then just use the del(final_list[5]) and execute it. It will delete the last element in the final_list which can be checked by executing it.

In [42]:

del final_list[5]
final_list

Out[42]:

['Sheldon', 'Male', 35, 'L.A', 'No 173']

Tuples

Another datatype is tuples. This are very similar to list as they are also sequential datatypes and can add different type of data variables or strings or integer. The only difference between list and tuples is that tuples cannot update i.e., cannot able to change.

Also sequence data types.
Created using parenthesis. Lists were created using square brackets.
Tuples can’t be updated.

In [43]:

my_tuple=('Mark','Male', 55)
my_tuple

Out[43]:

('Mark', 'Male', 55)

In [44]:

my_tuple[1]

Out[44]:

'Male'

In [45]:

my_tuple[2]

Out[45]:

In [46]:

my_tuple[0]*10

Out[46]:

'MarkMarkMarkMarkMarkMarkMarkMarkMarkMark'

Here, in tuples, we are just using () i.e., parenthesis for creating tuples.

Difference between tuples and lists

In [47]:

my_list=['Sheldon','Male', 25]
my_tuple=('Mark','M', 55)

my_list[2]=30
my_list

Out[47]:

['Sheldon', 'Male', 30]

In [48]:

my_tuple[2]=40

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-48-c6d510f4ba89> in <module>()
----> 1 my_tuple[2]=40

TypeError: 'tuple' object does not support item assignment

The major difference between tuples and list is lists can be updated and tuples cannot be updated or changed.

Dictionaries

Dictionaries have two major element types:
```
  - Key
  - Value
```
These above are sequential data types.
Dictionaries are collection of key-value pairs.
Each key is separated from its value by a colon (:), the items are separated by commas, and the whole thing is enclosed in curly braces.
Keys are unique within a dictionary.

In [49]:

city={0:"LA", 1:"PA" , 2:"FL"}
city

Out[49]:

{0: 'LA', 1: 'PA', 2: 'FL'}

In [50]:

city[0]

Out[50]:

'LA'

In [51]:

city[1]

Out[51]:

'PA'

In [52]:

city[2]

Out[52]:

'FL'

In dictionary, keys are similar to indexes. We define our own preferred indexes in dictionaries.
Make sure that we give the right key index while accessing the elements in dictionary.

In [53]:

names={1:"David", 6:"Bill", 9:"Jim"}
names

Out[53]:

{1: 'David', 6: 'Bill', 9: 'Jim'}

In [54]:

names[0] #Doesn't work, because we haven't assign "0" to any value?

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-54-2bdf624c7c2b> in <module>()
----> 1 names[0] #Doesn't work, because we haven't assign "0" to any value?

KeyError: 0

In [55]:

names[1]

Out[55]:

'David'

In [56]:

names[2]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-56-5fc2e5d7f092> in <module>()
----> 1 names[2]

KeyError: 2

In [57]:

names[6]

Out[57]:

'Bill'

In [58]:

names[9]

Out[58]:

'Jim'

In the key-value pairs, key need not be a number always.

In [59]:

edu={"David":"Bsc", "Bill":"Msc", "Jim":"Phd"}
edu

Out[59]:

{'Bill': 'Msc', 'David': 'Bsc', 'Jim': 'Phd'}

In [60]:

edu[0]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-60-097da42662a8> in <module>()
----> 1 edu[0]

KeyError: 0

In [61]:

edu[1]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-61-86fe0fd73d3a> in <module>()
----> 1 edu[1]

KeyError: 1

In [62]:

edu[David]

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-62-9c7944aa3a3d> in <module>()
----> 1 edu[David]

NameError: name 'David' is not defined

In [63]:

edu["David"]

Out[63]:

'Bsc'

Updating values in dictionary

In [64]:

edu

Out[64]:

{'Bill': 'Msc', 'David': 'Bsc', 'Jim': 'Phd'}

In [65]:

edu["David"]="MSc"
edu

Out[65]:

{'Bill': 'Msc', 'David': 'MSc', 'Jim': 'Phd'}

Updating keys in dictionary
Delete the key and value element first and then add new element.

In [66]:

city={0:"LA", 1:"PA" , 2:"FL"}
#How to make 6 as "LA"
del city[0]
city

Out[66]:

{1: 'PA', 2: 'FL'}

In [67]:

city[6]="LA"
city

Out[67]:

{1: 'PA', 2: 'FL', 6: 'LA'}

Fetch all keys and all values separately

In [68]:

city.keys()

Out[68]:

dict_keys([1, 2, 6])

In [69]:

city.values()

Out[69]:

dict_values(['PA', 'FL', 'LA'])

In [70]:

edu.keys()

Out[70]:

dict_keys(['David', 'Bill', 'Jim'])

In [71]:

edu.values()

Out[71]:

dict_values(['MSc', 'Msc', 'Phd'])

If-Then-Else statement

If Condition

We need to add some degree logical in our code. This is one of the logical statement i.e., IF and ELSE. IF statement is used for checking or to see something in the case or giving it condition or the case is true, then we perform one action which is under the IF statement. If it is not true, then it will bypass the action which we need to perform because the condition is not true.

In [72]:

age=60
if age<50:
    print("Group1")
print("Done with If")

Done with If

IF-ELSE statement

Our IF condition with statement is not true, then it moves to ELSE part and prints the ELSE part. This means that whole statement will give you one result and will get one output. If IF statement is true, then we will get something related to IF or else, we get result on ELSE statement. Here, below are some given examples for IF-ELSE statement.

In [73]:

age=60
if age<50:
    print("Group1")
else:
    print("Group2")
print("Done with If else")

Group2
Done with If else

Multiple ELSE conditions in IF

If any of this statements is true, then given below will first check the condition mark <30 henceforth if it is true, then it will print fail or else it will break the whole loop. It will go to next part of the code and whatever will be coming next, we just have to move out of this whole elif statement.

In [74]:

marks=75

if(marks<30):
    print("fail")
elif(marks<60):
    print("Second Class")
elif(marks<80):
     print("First Class")
elif(marks<100):
     print("Distinction")
else:
    print("Error in Marks")

First Class

In [75]:

marks=20

if(marks<30):
    print("fail")
elif(marks<60):
    print("Second Class")
elif(marks<80):
     print("First Class")
elif(marks<100):
     print("Distinction")
else:
    print("Error in Marks")

fail

In this example, mark is less than 30 but mark is defined as 75. If mark is less than 30 then prints as fail and moves to next elif statement; similarly iteration shifted until we get the final result.

Nested IF

Nested IF is also a version of ELSE-IF statements. If the IF-condition is true, then it goes in to the action part directly. Here, we had given one additional condition with x, where we can just keep going the IF-condition so far whenever IF-condition is true, it will enter the condition actions and will keep performing. Wherever it breaks, then it gets out of IF-condition and go back where it will finally print whatever the action we want to perform. Here, x is 45 and we can see if x is less than 50 then accordingly, x is 45 which is true and will return number which should be less than 50. This is true, but next IF line also under the part of this x which is less than 50, it will also execute x which is less than 40; but here we can see the condition is not true that it will not enter any part of the statement. It will directly go to ELSE part where the number is greater than 40 and the whole nested IF will print less than 50 number which is greater than 40 and we can try to run the same code and we need to see how it works.

In [76]:

x=45

if(x<50):
    print("Number is less than 50")
    if(x<40):
         print ("Number is less than 40")
         if(x<30):
             print("Number is less than 30")
         else:
             print("Number is greater than 30")
    else:
        print("Number is greater than 40")
else:
    print("Number is greater than 50")

Number is less than 50
Number is greater than 40

In [77]:

x=35

if(x<50):
    print("Number is less than 50")
    if(x<40):
         print ("Number is less than 40")
         if(x<30):
             print("Number is less than 30")
         else:
             print("Number is greater than 30")
    else:
        print("Number is greater than 40")
else:
    print("Number is greater than 50")

Number is less than 50
Number is less than 40
Number is greater than 30

For loop

For loop is a iteration statement.
It allows a code block to be repeated certain number of times.
Generally we see a for loop being iterating through a list.

Syntax:

for(variable) in (sequence):
- (code block)

The following are the examples of the for loop.

Example-1

In [78]:

my_num=1

for i in range(1,20):
    my_num=my_num+1
    print("my num value is", my_num)
    i=i+1

my num value is 2
my num value is 3
my num value is 4
my num value is 5
my num value is 6
my num value is 7
my num value is 8
my num value is 9
my num value is 10
my num value is 11
my num value is 12
my num value is 13
my num value is 14
my num value is 15
my num value is 16
my num value is 17
my num value is 18
my num value is 19
my num value is 20

Example-2

X = 0; it will take sum as zero for the first value and the x would be 1, then (sum + 1). It will give output of one, then sumx has been changed to sum1; next time sumx would be added, thus (sumx+2) will be 3 iterations and it will be done similarly.

In [79]:

sumx = 0 
x=1

for x in range(1,20): 
     sumx = sumx + x
     print(sumx)

Break Statement in FOR-loop

We want to iterate through code only for certain no. of time and till certain condition or certain output has come. We can do is in between, we just put IF-statement, where the condition is true; then the FOR-loop will break and we can close the FOR-loop and can move further with our code.

To stop execution of a loop
Stopping the loop in midway using a condition

In [80]:

sumx = 0 
x=1

for x in range(1,200): 
     sumx = sumx + x
     if(sumx>500):
         break
     print(sumx)

Function

Sometimes, in programming, we need to perform some task in different places in our code with different values and variables instead of writing same code over and over, we can write generalized code block which could be called a function and called whenever we need.

A function is a piece of code, which takes input values and returns a result.
Function can be used again with different values.
Instead of rewriting the whole code, it’s much cleaner to define a function, which can then be used repeatedly.
Function improves modularity of a program.
Functions are also known as Method.

Function Syntax:

Function has two Components:
- Header (Again header has two components):
  - Function Name
  - Input Parameters
- Body: (Consists of the procedure which we want the function to carry out).
Example:

In [81]:

def square(a):
      c = a*a
      return c

Writing our own function

Write a function that:
- Two numerical variables as inputs.
- Returns the remainder when variable 1 is divided by variable 2.

In [82]:

def remainder(var1, var2):
        a = var1%var2
        print(a)

Packages

Python packages are the true power of python which lies in the packages. Packages are like bundle of pre-build function; if you are from R background, then you must be knowing few libraries which has table functions and packages and are used to perform specific task.

A package is collection of python functions. A properly structured and complied code. A package may contain many sub-packages.
Many python functions are only available via “packages” that must be imported.
For example, to find value of log(10), we need to first import match package that has the log function in it.

In [83]:

log(10)
exp(5)
sqrt(256)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-83-b0bfd28c096c> in <module>()
----> 1 log(10)
      2 exp(5)
      3 sqrt(256)

NameError: name 'log' is not defined

In [84]:

import math
math.log(10)

Out[84]:

2.302585092994046

In [85]:

math.exp(5)

Out[85]:

148.4131591025766

In [86]:

math.sqrt(256)

Out[86]:

16.0

To be a good data scientist on python, one needs to be very comfortable with below packages:
- Numpy
- Scipy
- Pandas
- Scikit-Learn
- Matplotlib

Important Packages- NumPy

Numpy is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric with extensive modifications. NumPy is open-source software and has many contributors.

NumPy is for fast operations on vectors and matrices, including mathematical, logical, shape manipulation, sorting, selecting.
It is the foundation on which all higher level tools for scientific Python are built.

In [87]:

import numpy as np

income = np.array([9000, 8500, 9800, 12000, 7900, 6700, 10000])
print(income) 
print(income[0])

[ 9000  8500  9800 12000  7900  6700 10000]
9000

In [88]:

expenses=income*0.65
print(expenses)

[ 5850.  5525.  6370.  7800.  5135.  4355.  6500.]

In [89]:

savings=income-expenses
print(savings)

[ 3150.  2975.  3430.  4200.  2765.  2345.  3500.]

Important Packages – Pandas

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license. The name is derived from the term “Panel Data”, an econometrics term for multidimensional structured datasets. Pandas is data frame package which allows any data file or csv file we can work on it.

Data frames and data handling.
Pandas has Data structures and operations for manipulating numerical tables and time series.

In [90]:

import pandas as pd
buyer_profile = pd.read_csv('R DatasetBuyers ProfilesTrain_data.csv')

print(buyer_profile)

    Age  Gender Bought
0    29    Male    Yes
1    34    Male    Yes
2    13  Female    Yes
3    27  Female     No
4    10  Female     No
5    68    Male    Yes
6    15    Male    Yes
7    53    Male    Yes
8    51    Male     No
9    48  Female     No
10   63  Female     No
11   43    Male    Yes
12    8  Female     No
13   47  Female     No

In [91]:

buyer_profile.Age

Out[91]:

0     29
1     34
2     13
3     27
4     10
5     68
6     15
7     53
8     51
9     48
10    63
11    43
12     8
13    47
Name: Age, dtype: int64

In [92]:

buyer_profile.Gender

Out[92]:

0       Male
1       Male
2     Female
3     Female
4     Female
5       Male
6       Male
7       Male
8       Male
9     Female
10    Female
11      Male
12    Female
13    Female
Name: Gender, dtype: object

In [93]:

buyer_profile.Age[0]

Out[93]:

In [94]:

buyer_profile.Age[0:10]

Out[94]:

0    29
1    34
2    13
3    27
4    10
5    68
6    15
7    53
8    51
9    48
Name: Age, dtype: int64

Important Packages- Scikit-Learn

Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

It is a Machine learning algorithms made easy by using this scikit-Learn.

In [95]:

import sklearn as sk
import pandas as pd

air_passengers = pd.read_csv('R DatasetAirPassengersAirPassengers.csv')
air_passengers

Out[95]:

	Week_num	Passengers	Promotion_Budget	Service_Quality_Score	Holiday_week	Delayed_Cancelled_flight_ind	Inter_metro_flight_ratio	Bad_Weather_Ind	Technical_issues_ind
0	1	37824	517356	4.00000	NO	NO	0.70	YES	YES
1	2	43936	646086	2.67466	NO	YES	0.80	YES	YES
2	3	42896	638330	3.29473	NO	NO	0.90	NO	NO
3	4	35792	506492	3.85684	NO	NO	0.40	NO	NO
4	5	38624	609658	3.90757	NO	NO	0.87	NO	YES
5	6	35744	476084	3.83710	NO	YES	0.66	YES	NO
6	7	40752	635978	3.60259	NO	YES	0.74	YES	NO
7	8	34592	495152	3.60086	NO	YES	0.39	NO	NO
8	9	35136	429800	3.62776	NO	NO	0.61	NO	YES
9	10	43328	613326	2.98305	NO	NO	0.66	NO	NO
10	11	34960	492758	3.60089	NO	NO	0.77	NO	NO
11	12	44464	600726	2.56064	NO	YES	0.74	YES	NO
12	13	36464	456960	3.89655	NO	YES	0.39	YES	NO
13	14	44464	586096	2.47713	NO	YES	0.79	YES	NO
14	15	51888	704802	1.77422	YES	YES	0.72	YES	YES
15	16	36800	536970	3.92254	NO	NO	0.43	NO	YES
16	17	48688	742308	1.93589	NO	NO	0.90	NO	YES
17	18	37456	500234	3.99060	NO	NO	0.46	NO	NO
18	19	44800	570682	2.43241	NO	YES	0.79	YES	YES
19	20	56032	826420	1.41139	YES	YES	0.80	YES	NO
20	21	58800	761040	1.24488	YES	NO	0.69	NO	NO
21	22	57440	753466	1.36091	YES	NO	0.60	NO	NO
22	23	32752	502712	3.37428	NO	YES	0.45	YES	YES
23	24	43424	653856	2.88878	NO	YES	0.89	YES	YES
24	25	45968	706748	2.31898	NO	YES	0.62	YES	NO
25	26	38816	532602	3.85307	NO	NO	0.75	NO	YES
26	27	35168	518070	3.70671	NO	YES	0.47	YES	YES
27	28	34496	539378	3.48455	NO	YES	0.78	YES	YES
28	29	34208	414120	3.48166	NO	YES	0.38	YES	NO
29	30	44320	653338	2.58325	NO	NO	0.71	NO	YES
…	…	…	…	…	…	…	…	…	…
50	51	43728	590492	2.77882	NO	YES	0.47	YES	NO
51	52	47040	694568	2.06989	NO	YES	0.55	YES	NO
52	53	34512	493444	3.57125	NO	NO	0.74	NO	YES
53	54	57600	781718	1.35511	YES	NO	0.67	NO	YES
54	55	36064	526162	3.87218	NO	YES	0.73	NO	YES
55	56	49392	707070	1.91865	NO	NO	0.75	NO	NO
56	57	42378	545510	3.46630	NO	NO	0.62	NO	YES
57	58	38584	555170	3.99116	NO	NO	0.77	NO	NO
58	59	28700	405916	3.07021	NO	NO	0.72	NO	NO
59	60	55160	738794	1.48667	YES	YES	0.71	YES	NO
60	61	52472	666778	1.58686	YES	YES	0.90	YES	NO
61	62	54474	715498	1.52341	YES	YES	0.55	YES	NO
62	63	54222	754418	1.58647	YES	NO	0.78	YES	NO
63	64	73444	1012130	0.91298	YES	YES	0.90	YES	NO
64	65	67130	1003002	0.98050	YES	NO	0.79	NO	YES
65	66	39984	589526	3.77575	NO	NO	0.81	NO	NO
66	67	41972	550872	3.49699	NO	YES	0.68	YES	YES
67	68	43722	652680	2.84565	NO	YES	0.69	YES	NO
68	69	76972	1041796	0.87470	YES	YES	0.90	YES	NO
69	70	58156	881818	1.33013	YES	NO	0.82	NO	NO
70	71	52304	679938	1.68678	YES	NO	0.63	NO	YES
71	72	76524	1024450	0.87933	YES	YES	0.90	YES	NO
72	73	60620	844578	1.15504	YES	NO	0.90	NO	YES
73	74	32018	445424	3.23666	NO	YES	0.64	YES	YES
74	75	51814	669144	1.87321	YES	NO	0.88	NO	YES
75	76	66934	927696	1.07138	YES	YES	0.84	NO	NO
76	77	81228	1108254	0.85536	YES	YES	0.90	YES	NO
77	78	43288	638162	3.08191	NO	NO	0.62	NO	NO
78	79	43834	636636	2.75382	NO	YES	0.79	YES	YES
79	80	40852	575008	3.52768	NO	YES	0.54	YES	YES

80 rows × 9 columns

In [96]:

x=air_passengers['Promotion_Budget']
x=x.reshape(-1,1)
y=air_passengers['Passengers']
y=y.reshape(-1,1)

In [97]:

from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(x, y)

Out[97]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [98]:

print('Coefficients: n', reg.coef_)

Coefficients: 
 [[ 0.06952969]]

Important Packages- Matplotlib

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like wxPython, Qt, or GTK+. There is also a procedural “pylab” interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of matplotlib. Plotting library similar to MATLAB plots matplotlib is an matlab library, where python do not have any specific library for plots so we use mathplotlib to draw any kind of plots.

In [99]:

import numpy as np
import matplotlib as mp
import matplotlib.pyplot

#to print the plot in the notebook:
%matplotlib inline

X = np.random.normal(0,1,1000)
Y = np.random.normal(0,1,1000)

mp.pyplot.scatter(X,Y)

Out[99]:

<matplotlib.collections.PathCollection at 0x2912d5537b8>

General notes

Variable is lost after restarting shell.
Using the same object name, overwrites the old object.
Customize the color coding and highlighting of coding syntax, it makes it easy to read.
Make use of variable explorer for physical verification of created variables.

Conclusion

In this session, we got basic introduction to Python. We tried some basic commands in Python.
In later sessions, we will see data handling and basic descriptive statistics.

Handout – Introduction to Python

Before start our lesson please download the datasets.

Introduction to Python

Contents

Introduction to Python & History

What is python