• No products in the cart.

Handout – Introduction to Python

Before start our lesson please download the datasets.

 

Introduction to Python

This document will cover basics of python for data science which can be used for future purposes and are used for predictive modelling or machine learning projects which we want to do.

Contents

  • What is Python & History
  • Installing Python & Python Environment
  • Basic commands in Python
  • Data Types and Operations
  • Python packages
  • Loops
  • My first python program
  • If-then-else statement

Introduction to Python & History

What is python

Python is general purpose programming language which can also be used as a scripting language. Moreover, it is an human readable language and well documented if we have basic programming knowledge. It is open source (free). Python can be used for different purposes. If we are going with web development, then we can use django frame work. Python is simple language and has powerful libraries available for data scientist and machine learning.

History

  • Python language created by Guido van Rossum (Benevolent Dictator for Life). From 2000-2012, he worked with google. His major work over there was to just develop this language.
  • First Python version released in 1991.
  • Python 2 released in 2000.
  • Python 3 released in 2008.
  • Python 3 introduced to overcome future code expanding.
  • Python 3 is NOT fully backwards compatible with Python 2.
  • Python 2 is frozen and supported until 2020. Good features from Python 3 are back-ported.

Which one to use? Python 2 or Python 3

  • Python 2 is NOT same as Python 3. There are minor changes.
  • There are some incompatibilities, code in Python 2 may not always run in Python 3 and vice-versa.
  • All important packages like NumPy, SciPy and Matplotlib are available for both Python 2 and Python 3.
  • We are going to use Python 3 in our course.

Installing Python & Python IDEs

  • Python has many options to write and execute a program.
  • There are mainly 3 ways to execute python code:
    1. Text Editors or Command line interface
    2. Ipython Notebook (report file interactive environment)
    3. Any Python IDE
  • We will use Spyder IDE in our course, which is part of Anaconda distribution.
  • Anaconda distribution has all the required software’s inbuilt. We just need to download and install it.

Installing Python, Anaconda

  • Visit https://www.continuum.io/downloads and select the compatible version of Anaconda with Python 3.
  • Download and install by running the .exe file for windows.
  • It automatically installs
    • Python
    • Ipython
    • Jupyter notebook
    • Spyder IDE
  • It comes with all the necessary packages pre-installed.
  • Spyder is what we need for our coding.

Spyder- Python IDEs

  • Spyder (formerly Pydee) is an open source cross-platform IDE for the Python language.
  • Editor with syntax highlighting and code completion.
  • Has an interactive console to execute and check the output of the code.
  • Testing and debugging is relatively easy.
  • Best IDE if you are coming from a R-Studio or MATLAB background.
  • Spyder also provides an object inspector that executes in the context of the console. Any objects created in the console can be examined in the inspector.

In spyder environment, we have three parts:

    1.    Editor 
    2.    Variable explorer 
    3.    Console 

Spyder-Editor

  • This is where you write the code.
  • To execute the code, select and hit Ctrl+Enter.
  • You can load old code files.
  • Code written in this editor is saved in .py format.
  • You can hit the tab button to see the autofill options on objects and function names.
  • You will be spending most of your time on editor.

Spyder-Console

  • This is where the code will be executed when you hit Ctrl+Enter in editor.
  • Helps us in code testing and debugging.
  • Helps us to avoid errors in the source code at the development phase itself.
  • Its usual practice is to write a chunk of code in editor then execute it and see if it is working well or not.
  • You can toggle between Console and IPython Console.
  • Here, we have two types of consoles:
    • Python console
        If we want to write code line by line, then we will prefer python console.
    • Ipython console
        To run the total script, where total script will run in single go.

Spyder – Variable explorer

  • Shows all the variables that are created in the current session.
  • Helps in physically checking the presence of objects that are created.
  • Shows a quick summary of type of object, size, length, sample values, etc.
  • We can run the code and see the objects getting created; and also we can validate the data type and size of the object.

Basic Commands in Python

Before you code

  • Before we start the execution of commands of Python, we should know that python is case-sensitive.
  • Example:
      - Sales_data is not same as sales_data

Basic Commands

Try this basic commands like add,sub,mul, div and try to print those things in spyder shell

In [1]:
571+95
19*17
print(57+39)
print(19*17)
print("Dv Analytics")
96
323
Dv Analytics
Use hash(#) for comments, which will not be a part of the code and for paragraph, we use (‘’’) and type your paragraph and end up with (‘’’).
In [2]:
#Division example
34/56
Out[2]:
0.6071428571428571

Basics-What an error looks like?

If you are trying to use print with P (uppercase P), then it doesn’t work and you will see a basic error.

In [3]:
Print(600+900) 
#used Print() instead of print()
576-'96'
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-8cacbcbfb378> in <module>()
----> 1 Print(600+900)
      2 #used Print() instead of print()
      3 576-'96'
      4 

NameError: name 'Print' is not defined

LAB: Basic Commands

  • Calculate below values
    • 973*75
    • 22/7
  • Print the string “my python file”
In [4]:
973*75
Out[4]:
72975
In [5]:
22/7
Out[5]:
3.142857142857143
In [6]:
print("my python file")
my python file

Assigning and Naming convention

Assignment operator

In python, we can assign new variables and the name of the variable should be in a particular manner just like any other programming language. We need to consider few things before you assign name or defining a new variable. In python, “=” is the assignment operator and it is used for assigning the values. Here, we try to see few commands showing how it works:

In [7]:
income=12000
income
Out[7]:
12000
In [8]:
x=20
x
Out[8]:
20
In [9]:
y=30
z=x*y
z
Out[9]:
600
In [10]:
del x #deletes the variable

    

Printing

In [11]:
name="Jack"
name
Out[11]:
'Jack'
In [12]:
print(name)
Jack
Is there a difference between output of name and print(name)?
In [13]:
book_name="Practical business analytics n using SAS"
book_name
Out[13]:
'Practical business analytics n using SAS'
In [14]:
print(book_name)
Practical business analytics 
 using SAS
In the above, we had assigned a book name with back slash, which means it is just giving the value of the string when we try to use print function. Here, we can see that when we trying to use print function, then it takes the back slash as new line and this is how print worked out.

Naming convention

  • In python, just like any other languages when you are trying to define a new variable, then you need to understand how the variable name can be defined.
  • These are the basic rules as follows:
    • You can start the letter with upper/lower case(A-Z or a-z).
    • Can contain letters, digits (0-9), and/or underscore ”_-”.
In [15]:
#Doesn't work
1x=20
  File "<ipython-input-15-fd484b8e1634>", line 2
    1x=20
     ^
SyntaxError: invalid syntax
In [16]:
#works
x1=20 
x1
Out[16]:
20
In [17]:
#Doesn't work
x.1=20 
x.1
  File "<ipython-input-17-6d3a7516bd8a>", line 2
    x.1=20
      ^
SyntaxError: invalid syntax
In [18]:
#works
x_1=20 
x_1
Out[18]:
20

Type of Objects

  • Object refers to any entity in a python program.
  • Python has some standard built-in object types:
    • Numbers
    • Strings
    • Lists
    • Tuples
    • Dictionaries
  • Having a good knowledge on these basic objects is essential to feel comfortable in Python programming.

Numbers

  • The good thing about python is that you can just define a variable or assign any value to a variable very easily even it is integer or float value, you don’t have to define specifically.
In [19]:
age=30
age
Out[19]:
30
In [20]:
weight=102.88
weight
Out[20]:
102.88
In [21]:
x=17
x**2 #Square of x
Out[21]:
289

Check the variable types for age and weight in variable explorer:

  • From the example, “age” is integer and “weight” is 102.88, so it gives a float value as it is automatically interpreted in python.
  • If you want to see the types of the variable.
  • Type(age), then run this; then we can see the type of the variable.
  • Accordingly, we can try for the remaining.
In [22]:
type(age)
Out[22]:
int
In [23]:
type(weight)
Out[23]:
float

Strings

  • Strings are collection of characters.
  • Strings are amongst the most popular types in Python. There are a number of methods or built-in string functions.
  • Define Strings
In [24]:
name="Sheldon"
msg="Dv Analytics Data Science Classes"
  • Accessing strings
In [25]:
print(name[0])
print(name[1])
S
h
  • This is as good as substring.
In [26]:
print(msg[0:9])
Dv Analytics
The above code prints characters fro 0 to 9 characters.
  • Length of string
In [27]:
len(msg)
#is used to get length of the string
Out[27]:
30
In [28]:
print(msg[10:len(msg)])
Data Science Classes

Performing multiple strings

  • Displaying string multiple time
In [29]:
msg="Site under Construction"
msg*10
Out[29]:
'Site under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under Construction'
In [30]:
msg*50
Out[30]:
'Site under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under ConstructionSite under Construction'
  • There is a difference between print and just displaying a variable
In [31]:
message="Data Science on R and Data Science on Python n"
message*10
Out[31]:
'Data Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python nData Science on R and Data Science on Python n'
In [32]:
print(message*10)
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 
Data Science on R and Data Science on Python 

String Concatenation

  • If you want to connect to different strings, then we use string concatenation.
  • The basic way to do is to use a function plus(+) and thus the string will be joined.
  • Examples are given below:
In [33]:
msg1="Site under Construction "
msg2=msg1+"Go to home page n"
print(msg2)

#this will help to combine the msg1 and msg2
Site under Construction Go to home page 

In [34]:
print(msg2*10)
#here msg2 will be printed 10 times
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 
Site under Construction Go to home page 

List

  • List is a sequential data set that we can add or create or put many kind of data variable.
      - List is a hybrid datatype
      - A sequence of related data
  • Similar to array, but all the elements need not be of same type.
  • Creating a list
In [35]:
mylist1=['Sheldon','Male', 25]
  • Accessing list elements
In [36]:
mylist1[0] #Python indexing starts from 1
Out[36]:
'Sheldon'
In [37]:
mylist1[1]
Out[37]:
'Male'
In [38]:
mylist1[2]
Out[38]:
25
  • Appending to a list
In [39]:
mylist2=['L.A','No 173', "CR108877"]
final_list=mylist1+mylist2
final_list
Out[39]:
['Sheldon', 'Male', 25, 'L.A', 'No 173', 'CR108877']
  • Here we added both the lists, by the command called mylist1+mylist2 and then we can see the final list of 1 and 2.
  • Updating list elements
  • We can update the lists. Here, we are updating the second element as 35.
In [40]:
final_list[2]=35
final_list
Out[40]:
['Sheldon', 'Male', 35, 'L.A', 'No 173', 'CR108877']
  • Length of list

length(final_list) – If we execute this command the length of the list will be displayed.

In [41]:
len(final_list)
Out[41]:
6
  • Deleting an element in list

If we need to delete an element, then just use the del(final_list[5]) and execute it. It will delete the last element in the final_list which can be checked by executing it.

In [42]:
del final_list[5]
final_list
Out[42]:
['Sheldon', 'Male', 35, 'L.A', 'No 173']

Tuples

Another datatype is tuples. This are very similar to list as they are also sequential datatypes and can add different type of data variables or strings or integer. The only difference between list and tuples is that tuples cannot update i.e., cannot able to change.

  • Also sequence data types.
  • Created using parenthesis. Lists were created using square brackets.
  • Tuples can’t be updated.
In [43]:
my_tuple=('Mark','Male', 55)
my_tuple
Out[43]:
('Mark', 'Male', 55)
In [44]:
my_tuple[1]
Out[44]:
'Male'
In [45]:
my_tuple[2]
Out[45]:
55
In [46]:
my_tuple[0]*10
Out[46]:
'MarkMarkMarkMarkMarkMarkMarkMarkMarkMark'
  • Here, in tuples, we are just using () i.e., parenthesis for creating tuples.
  • Difference between tuples and lists
In [47]:
my_list=['Sheldon','Male', 25]
my_tuple=('Mark','M', 55)

my_list[2]=30
my_list
Out[47]:
['Sheldon', 'Male', 30]
In [48]:
my_tuple[2]=40
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-48-c6d510f4ba89> in <module>()
----> 1 my_tuple[2]=40

TypeError: 'tuple' object does not support item assignment
  • The major difference between tuples and list is lists can be updated and tuples cannot be updated or changed.

Dictionaries

  • Dictionaries have two major element types:
      - Key
      - Value
  • These above are sequential data types.
  • Dictionaries are collection of key-value pairs.
  • Each key is separated from its value by a colon (:), the items are separated by commas, and the whole thing is enclosed in curly braces.
  • Keys are unique within a dictionary.
In [49]:
city={0:"LA", 1:"PA" , 2:"FL"}
city
Out[49]:
{0: 'LA', 1: 'PA', 2: 'FL'}
In [50]:
city[0]
Out[50]:
'LA'
In [51]:
city[1]
Out[51]:
'PA'
In [52]:
city[2]
Out[52]:
'FL'
  • In dictionary, keys are similar to indexes. We define our own preferred indexes in dictionaries.
  • Make sure that we give the right key index while accessing the elements in dictionary.
In [53]:
names={1:"David", 6:"Bill", 9:"Jim"}
names
Out[53]:
{1: 'David', 6: 'Bill', 9: 'Jim'}
In [54]:
names[0] #Doesn't work, because we haven't assign "0" to any value?
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-54-2bdf624c7c2b> in <module>()
----> 1 names[0] #Doesn't work, because we haven't assign "0" to any value?

KeyError: 0
In [55]:
names[1]
Out[55]:
'David'
In [56]:
names[2]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-56-5fc2e5d7f092> in <module>()
----> 1 names[2]

KeyError: 2
In [57]:
names[6]
Out[57]:
'Bill'
In [58]:
names[9]
Out[58]:
'Jim'
  • In the key-value pairs, key need not be a number always.
In [59]:
edu={"David":"Bsc", "Bill":"Msc", "Jim":"Phd"}
edu
Out[59]:
{'Bill': 'Msc', 'David': 'Bsc', 'Jim': 'Phd'}
In [60]:
edu[0]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-60-097da42662a8> in <module>()
----> 1 edu[0]

KeyError: 0
In [61]:
edu[1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-61-86fe0fd73d3a> in <module>()
----> 1 edu[1]

KeyError: 1
In [62]:
edu[David]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-62-9c7944aa3a3d> in <module>()
----> 1 edu[David]

NameError: name 'David' is not defined
In [63]:
edu["David"]
Out[63]:
'Bsc'
  • Updating values in dictionary
In [64]:
edu
Out[64]:
{'Bill': 'Msc', 'David': 'Bsc', 'Jim': 'Phd'}
In [65]:
edu["David"]="MSc"
edu
Out[65]:
{'Bill': 'Msc', 'David': 'MSc', 'Jim': 'Phd'}
  • Updating keys in dictionary
  • Delete the key and value element first and then add new element.
In [66]:
city={0:"LA", 1:"PA" , 2:"FL"}
#How to make 6 as "LA"
del city[0]
city
Out[66]:
{1: 'PA', 2: 'FL'}
In [67]:
city[6]="LA"
city
Out[67]:
{1: 'PA', 2: 'FL', 6: 'LA'}
  • Fetch all keys and all values separately
In [68]:
city.keys()
Out[68]:
dict_keys([1, 2, 6])
In [69]:
city.values()
Out[69]:
dict_values(['PA', 'FL', 'LA'])
In [70]:
edu.keys()
Out[70]:
dict_keys(['David', 'Bill', 'Jim'])
In [71]:
edu.values()
Out[71]:
dict_values(['MSc', 'Msc', 'Phd'])

If-Then-Else statement

If Condition

We need to add some degree logical in our code. This is one of the logical statement i.e., IF and ELSE. IF statement is used for checking or to see something in the case or giving it condition or the case is true, then we perform one action which is under the IF statement. If it is not true, then it will bypass the action which we need to perform because the condition is not true.

In [72]:
age=60
if age<50:
    print("Group1")
print("Done with If")
Done with If

IF-ELSE statement

Our IF condition with statement is not true, then it moves to ELSE part and prints the ELSE part. This means that whole statement will give you one result and will get one output. If IF statement is true, then we will get something related to IF or else, we get result on ELSE statement. Here, below are some given examples for IF-ELSE statement.

In [73]:
age=60
if age<50:
    print("Group1")
else:
    print("Group2")
print("Done with If else")
Group2
Done with If else

Multiple ELSE conditions in IF

If any of this statements is true, then given below will first check the condition mark <30 henceforth if it is true, then it will print fail or else it will break the whole loop. It will go to next part of the code and whatever will be coming next, we just have to move out of this whole elif statement.

In [74]:
marks=75

if(marks<30):
    print("fail")
elif(marks<60):
    print("Second Class")
elif(marks<80):
     print("First Class")
elif(marks<100):
     print("Distinction")
else:
    print("Error in Marks")
First Class
In [75]:
marks=20

if(marks<30):
    print("fail")
elif(marks<60):
    print("Second Class")
elif(marks<80):
     print("First Class")
elif(marks<100):
     print("Distinction")
else:
    print("Error in Marks")
fail
In this example, mark is less than 30 but mark is defined as 75. If mark is less than 30 then prints as fail and moves to next elif statement; similarly iteration shifted until we get the final result.

Nested IF

Nested IF is also a version of ELSE-IF statements. If the IF-condition is true, then it goes in to the action part directly. Here, we had given one additional condition with x, where we can just keep going the IF-condition so far whenever IF-condition is true, it will enter the condition actions and will keep performing. Wherever it breaks, then it gets out of IF-condition and go back where it will finally print whatever the action we want to perform. Here, x is 45 and we can see if x is less than 50 then accordingly, x is 45 which is true and will return number which should be less than 50. This is true, but next IF line also under the part of this x which is less than 50, it will also execute x which is less than 40; but here we can see the condition is not true that it will not enter any part of the statement. It will directly go to ELSE part where the number is greater than 40 and the whole nested IF will print less than 50 number which is greater than 40 and we can try to run the same code and we need to see how it works.

In [76]:
x=45

if(x<50):
    print("Number is less than 50")
    if(x<40):
         print ("Number is less than 40")
         if(x<30):
             print("Number is less than 30")
         else:
             print("Number is greater than 30")
    else:
        print("Number is greater than 40")
else:
    print("Number is greater than 50")
Number is less than 50
Number is greater than 40
In [77]:
x=35

if(x<50):
    print("Number is less than 50")
    if(x<40):
         print ("Number is less than 40")
         if(x<30):
             print("Number is less than 30")
         else:
             print("Number is greater than 30")
    else:
        print("Number is greater than 40")
else:
    print("Number is greater than 50")
Number is less than 50
Number is less than 40
Number is greater than 30

For loop

  • For loop is a iteration statement.
  • It allows a code block to be repeated certain number of times.
  • Generally we see a for loop being iterating through a list.

Syntax:

  • for(variable) in (sequence):
    • (code block)
The following are the examples of the for loop.

Example-1

In [78]:
my_num=1

for i in range(1,20):
    my_num=my_num+1
    print("my num value is", my_num)
    i=i+1
my num value is 2
my num value is 3
my num value is 4
my num value is 5
my num value is 6
my num value is 7
my num value is 8
my num value is 9
my num value is 10
my num value is 11
my num value is 12
my num value is 13
my num value is 14
my num value is 15
my num value is 16
my num value is 17
my num value is 18
my num value is 19
my num value is 20

Example-2

X = 0; it will take sum as zero for the first value and the x would be 1, then (sum + 1). It will give output of one, then sumx has been changed to sum1; next time sumx would be added, thus (sumx+2) will be 3 iterations and it will be done similarly.

In [79]:
sumx = 0 
x=1

for x in range(1,20): 
     sumx = sumx + x
     print(sumx)
1
3
6
10
15
21
28
36
45
55
66
78
91
105
120
136
153
171
190

Break Statement in FOR-loop

We want to iterate through code only for certain no. of time and till certain condition or certain output has come. We can do is in between, we just put IF-statement, where the condition is true; then the FOR-loop will break and we can close the FOR-loop and can move further with our code.

  • To stop execution of a loop
  • Stopping the loop in midway using a condition
In [80]:
sumx = 0 
x=1

for x in range(1,200): 
     sumx = sumx + x
     if(sumx>500):
         break
     print(sumx)
1
3
6
10
15
21
28
36
45
55
66
78
91
105
120
136
153
171
190
210
231
253
276
300
325
351
378
406
435
465
496

Function

Sometimes, in programming, we need to perform some task in different places in our code with different values and variables instead of writing same code over and over, we can write generalized code block which could be called a function and called whenever we need.

  • A function is a piece of code, which takes input values and returns a result.
  • Function can be used again with different values.
  • Instead of rewriting the whole code, it’s much cleaner to define a function, which can then be used repeatedly.
  • Function improves modularity of a program.
  • Functions are also known as Method.

Function Syntax:

  • Function has two Components:
    • Header (Again header has two components):
      • Function Name
      • Input Parameters
    • Body: (Consists of the procedure which we want the function to carry out).
  • Example:
In [81]:
def square(a):
      c = a*a
      return c

Writing our own function

  • Write a function that:
    • Two numerical variables as inputs.
    • Returns the remainder when variable 1 is divided by variable 2.
In [82]:
def remainder(var1, var2):
        a = var1%var2
        print(a)

Packages

Python packages are the true power of python which lies in the packages. Packages are like bundle of pre-build function; if you are from R background, then you must be knowing few libraries which has table functions and packages and are used to perform specific task.

  • A package is collection of python functions. A properly structured and complied code. A package may contain many sub-packages.
  • Many python functions are only available via “packages” that must be imported.
  • For example, to find value of log(10), we need to first import match package that has the log function in it.
In [83]:
log(10)
exp(5)
sqrt(256)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-83-b0bfd28c096c> in <module>()
----> 1 log(10)
      2 exp(5)
      3 sqrt(256)

NameError: name 'log' is not defined
In [84]:
import math
math.log(10)
Out[84]:
2.302585092994046
In [85]:
math.exp(5)
Out[85]:
148.4131591025766
In [86]:
math.sqrt(256)
Out[86]:
16.0
  • To be a good data scientist on python, one needs to be very comfortable with below packages:
    • Numpy
    • Scipy
    • Pandas
    • Scikit-Learn
    • Matplotlib

Important Packages- NumPy

Numpy is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric with extensive modifications. NumPy is open-source software and has many contributors.

  • NumPy is for fast operations on vectors and matrices, including mathematical, logical, shape manipulation, sorting, selecting.
  • It is the foundation on which all higher level tools for scientific Python are built.
In [87]:
import numpy as np

income = np.array([9000, 8500, 9800, 12000, 7900, 6700, 10000])
print(income) 
print(income[0])
[ 9000  8500  9800 12000  7900  6700 10000]
9000
In [88]:
expenses=income*0.65
print(expenses)
[ 5850.  5525.  6370.  7800.  5135.  4355.  6500.]
In [89]:
savings=income-expenses
print(savings)
[ 3150.  2975.  3430.  4200.  2765.  2345.  3500.]

Important Packages – Pandas

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license. The name is derived from the term “Panel Data”, an econometrics term for multidimensional structured datasets. Pandas is data frame package which allows any data file or csv file we can work on it.

  • Data frames and data handling.
  • Pandas has Data structures and operations for manipulating numerical tables and time series.
In [90]:
import pandas as pd
buyer_profile = pd.read_csv('R DatasetBuyers ProfilesTrain_data.csv')

print(buyer_profile)
    Age  Gender Bought
0    29    Male    Yes
1    34    Male    Yes
2    13  Female    Yes
3    27  Female     No
4    10  Female     No
5    68    Male    Yes
6    15    Male    Yes
7    53    Male    Yes
8    51    Male     No
9    48  Female     No
10   63  Female     No
11   43    Male    Yes
12    8  Female     No
13   47  Female     No
In [91]:
buyer_profile.Age
Out[91]:
0     29
1     34
2     13
3     27
4     10
5     68
6     15
7     53
8     51
9     48
10    63
11    43
12     8
13    47
Name: Age, dtype: int64
In [92]:
buyer_profile.Gender
Out[92]:
0       Male
1       Male
2     Female
3     Female
4     Female
5       Male
6       Male
7       Male
8       Male
9     Female
10    Female
11      Male
12    Female
13    Female
Name: Gender, dtype: object
In [93]:
buyer_profile.Age[0]
Out[93]:
29
In [94]:
buyer_profile.Age[0:10]
Out[94]:
0    29
1    34
2    13
3    27
4    10
5    68
6    15
7    53
8    51
9    48
Name: Age, dtype: int64

Important Packages- Scikit-Learn

Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

  • It is a Machine learning algorithms made easy by using this scikit-Learn.
In [95]:
import sklearn as sk
import pandas as pd

air_passengers = pd.read_csv('R DatasetAirPassengersAirPassengers.csv')
air_passengers
Out[95]:
Week_num Passengers Promotion_Budget Service_Quality_Score Holiday_week Delayed_Cancelled_flight_ind Inter_metro_flight_ratio Bad_Weather_Ind Technical_issues_ind
0 1 37824 517356 4.00000 NO NO 0.70 YES YES
1 2 43936 646086 2.67466 NO YES 0.80 YES YES
2 3 42896 638330 3.29473 NO NO 0.90 NO NO
3 4 35792 506492 3.85684 NO NO 0.40 NO NO
4 5 38624 609658 3.90757 NO NO 0.87 NO YES
5 6 35744 476084 3.83710 NO YES 0.66 YES NO
6 7 40752 635978 3.60259 NO YES 0.74 YES NO
7 8 34592 495152 3.60086 NO YES 0.39 NO NO
8 9 35136 429800 3.62776 NO NO 0.61 NO YES
9 10 43328 613326 2.98305 NO NO 0.66 NO NO
10 11 34960 492758 3.60089 NO NO 0.77 NO NO
11 12 44464 600726 2.56064 NO YES 0.74 YES NO
12 13 36464 456960 3.89655 NO YES 0.39 YES NO
13 14 44464 586096 2.47713 NO YES 0.79 YES NO
14 15 51888 704802 1.77422 YES YES 0.72 YES YES
15 16 36800 536970 3.92254 NO NO 0.43 NO YES
16 17 48688 742308 1.93589 NO NO 0.90 NO YES
17 18 37456 500234 3.99060 NO NO 0.46 NO NO
18 19 44800 570682 2.43241 NO YES 0.79 YES YES
19 20 56032 826420 1.41139 YES YES 0.80 YES NO
20 21 58800 761040 1.24488 YES NO 0.69 NO NO
21 22 57440 753466 1.36091 YES NO 0.60 NO NO
22 23 32752 502712 3.37428 NO YES 0.45 YES YES
23 24 43424 653856 2.88878 NO YES 0.89 YES YES
24 25 45968 706748 2.31898 NO YES 0.62 YES NO
25 26 38816 532602 3.85307 NO NO 0.75 NO YES
26 27 35168 518070 3.70671 NO YES 0.47 YES YES
27 28 34496 539378 3.48455 NO YES 0.78 YES YES
28 29 34208 414120 3.48166 NO YES 0.38 YES NO
29 30 44320 653338 2.58325 NO NO 0.71 NO YES
50 51 43728 590492 2.77882 NO YES 0.47 YES NO
51 52 47040 694568 2.06989 NO YES 0.55 YES NO
52 53 34512 493444 3.57125 NO NO 0.74 NO YES
53 54 57600 781718 1.35511 YES NO 0.67 NO YES
54 55 36064 526162 3.87218 NO YES 0.73 NO YES
55 56 49392 707070 1.91865 NO NO 0.75 NO NO
56 57 42378 545510 3.46630 NO NO 0.62 NO YES
57 58 38584 555170 3.99116 NO NO 0.77 NO NO
58 59 28700 405916 3.07021 NO NO 0.72 NO NO
59 60 55160 738794 1.48667 YES YES 0.71 YES NO
60 61 52472 666778 1.58686 YES YES 0.90 YES NO
61 62 54474 715498 1.52341 YES YES 0.55 YES NO
62 63 54222 754418 1.58647 YES NO 0.78 YES NO
63 64 73444 1012130 0.91298 YES YES 0.90 YES NO
64 65 67130 1003002 0.98050 YES NO 0.79 NO YES
65 66 39984 589526 3.77575 NO NO 0.81 NO NO
66 67 41972 550872 3.49699 NO YES 0.68 YES YES
67 68 43722 652680 2.84565 NO YES 0.69 YES NO
68 69 76972 1041796 0.87470 YES YES 0.90 YES NO
69 70 58156 881818 1.33013 YES NO 0.82 NO NO
70 71 52304 679938 1.68678 YES NO 0.63 NO YES
71 72 76524 1024450 0.87933 YES YES 0.90 YES NO
72 73 60620 844578 1.15504 YES NO 0.90 NO YES
73 74 32018 445424 3.23666 NO YES 0.64 YES YES
74 75 51814 669144 1.87321 YES NO 0.88 NO YES
75 76 66934 927696 1.07138 YES YES 0.84 NO NO
76 77 81228 1108254 0.85536 YES YES 0.90 YES NO
77 78 43288 638162 3.08191 NO NO 0.62 NO NO
78 79 43834 636636 2.75382 NO YES 0.79 YES YES
79 80 40852 575008 3.52768 NO YES 0.54 YES YES

80 rows × 9 columns

In [96]:
x=air_passengers['Promotion_Budget']
x=x.reshape(-1,1)
y=air_passengers['Passengers']
y=y.reshape(-1,1)
In [97]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(x, y)
Out[97]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [98]:
print('Coefficients: n', reg.coef_)
Coefficients: 
 [[ 0.06952969]]

Important Packages- Matplotlib

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like wxPython, Qt, or GTK+. There is also a procedural “pylab” interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of matplotlib. Plotting library similar to MATLAB plots matplotlib is an matlab library, where python do not have any specific library for plots so we use mathplotlib to draw any kind of plots.

In [99]:
import numpy as np
import matplotlib as mp
import matplotlib.pyplot

#to print the plot in the notebook:
%matplotlib inline

X = np.random.normal(0,1,1000)
Y = np.random.normal(0,1,1000)

mp.pyplot.scatter(X,Y)
Out[99]:
<matplotlib.collections.PathCollection at 0x2912d5537b8>

General notes

  • Variable is lost after restarting shell.
  • Using the same object name, overwrites the old object.
  • Customize the color coding and highlighting of coding syntax, it makes it easy to read.
  • Make use of variable explorer for physical verification of created variables.

Conclusion

  • In this session, we got basic introduction to Python. We tried some basic commands in Python.
  • In later sessions, we will see data handling and basic descriptive statistics.

 

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.