Appendix A — A Primer on Python

Learning Objectives of the Appendix

At the End of the Appendix, Students should be Able to -

  • Gain an Understanding about Python

  • Gain an Understanding about the Data Types and Data Structures in Python

  • Gain an Understanding about Arrays in Numpy, Indexing and Slicing of Arrays, and Operations of Arrays

  • Gain an Understanding about for Loop function, map function, and User Defined Function in Python

A.1 What is Python?

     According to www.python.org “Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.” It further explains - “Python’s simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse.”

A.2 Data Types in Python

     Data has different types. When dealing with data, we need to know the types of the data because different data types can do different things. There are six basic data types in python. They include - int, float, complex, bool, str, and bytes. We use type () function to know the types of the data. However, most commmonly used data types are int, float, str, and bool.

x = "hello world"
type(x)
str
x = 25
type(x)
int
x = 25.34
type(x)
float
x = True
type(x)
bool
x =7j
type(x)
complex
x = b"Hello World"
type(x)
bytes

A.3 Data Structures in Python

     Data structures are the collection of data on which different processes can be done efficiently. It enables quick and easier access, and efficient modifications. Data Structures allows to organize data in such a way that enables to store collections of data, relate them and perform operations on them. Data structures in python can broadly be classified into two groups - Built-in data structures and User-defined data structures. Figure A.1 Shows the data structure in python. Built-in data structure is important because they are widely used. Therefore, we will elaborate on built-in data structure.

Figure A.1: Data Structure in Python

A.4 Built-in Data Structure

A.4.1 List

     List is used to store collection of ordered1data items. Lists are created using square brackets ([]). We can also create a list using list () function. Lists can hold different types of data, including integers (int), floats (float), strings (str), and even other lists. We can use len () function to know the number to elements in the list. Moreover, lists are mutable, meaning that their contents can be changed after the list has been created.

colors = ['red', 'blue', 'green']
print(colors)
['red', 'blue', 'green']
len(colors)
3
a = [1, 'apple', 3.14, [5, 6]]
print(a)
[1, 'apple', 3.14, [5, 6]]
b = list((1, 'apple', 3.14, [5, 6]))
print(b)
[1, 'apple', 3.14, [5, 6]]

A.4.1.1 Creating a List with Repeated Elements

A list with repeated elements can be created using the multiplication operator.

x = [2] * 5
y = [0] * 7

print(x)
print(y)
[2, 2, 2, 2, 2]
[0, 0, 0, 0, 0, 0, 0]

A.4.1.2 Accessing List Elements

     Indexing can be used to access the elements in the list. Python indexes start at 0. Therefore, a[0] will access the first element in the list a. Figure A.2 shows the index of the list - colors.

Figure A.2: Index of List Elements
colors[0]
'red'
colors[-1]
'green'

A.4.1.3 Adding Elements to the List

We can add elements to the list using three methods - append (), insert (), and extend ().

# Initialize an empty list
m = []

# Adding 10 to end of list
m.append(50)  
print("After append(150):", m)  

# Inserting 40 at index 0
m.insert(0, 40)
print("After insert(0, 40):", m) 

# Adding multiple elements  [60,70,80] at the end
m.extend([60, 70, 80])  
print("After extend([60,70,80]):", m) 
After append(150): [50]
After insert(0, 40): [40, 50]
After extend([60,70,80]): [40, 50, 60, 70, 80]

A.4.1.4 Updating Elements to the List

We can change the value of an element by accessing it using its index.

p = [10, 20, 30, 40, 50]
# Change the second element
p[1] = 25 

print(p)  
[10, 25, 30, 40, 50]

A.4.1.5 Removing Elements from the List

We can remove elements from the list using three methods - remove (), pop (), and del ().

a = [10, 20, 30, 40, 50]

# Removes the first occurrence of 30
a.remove(30)  
print("After remove(30):", a)

# Removes the element at index 1 (20)
popped_val = a.pop(1)  
print("Popped element:", popped_val)
print("After pop(1):", a) 

# Deletes the first element (10)
del a[0]  
print("After del a[0]:", a)
After remove(30): [10, 20, 40, 50]
Popped element: 20
After pop(1): [10, 40, 50]
After del a[0]: [40, 50]

A.4.2 Dictionary

     Dictionary data structure in python is used to store data in key:value format. Unlike list - which uses square brackets ([]) - dictionary uses curly brackets ({}). Like lists, dictionary is mutable. Dictionary items can be referred by using key name. We can use len () function to know the total number of element of a dictionary and type () to know the type.

my_car = {
  "brand": "Ford",
  "model": "Escape",
  "year": 2017
}
print(my_car)
{'brand': 'Ford', 'model': 'Escape', 'year': 2017}
print(my_car['model'])
Escape

The values in dictionary items can be of any data type

car_features = {
  "brand": "Ford", # string 
  "electric": False, # boolean 
  "year": 1964, # integer
  "colors": ["red", "white", "blue"] # list of string
}

The function dict () can also be used to construct dictionary.

my_friends = dict(
    name = ["John", "Smith", "Mark"], 
    age = [36, 45, 49], 
    country = ["Norway", "Sweden", "Finland"]
)
print(my_friends)
{'name': ['John', 'Smith', 'Mark'], 'age': [36, 45, 49], 'country': ['Norway', 'Sweden', 'Finland']}

Some built-in dictionary methods2 are -

  • dict.clear() - removes all the elements from the dictionary
employee = {
    'name': ["John", "Jessica", "Zack"], 
    'age': [18, 19, 20]
}
print(employee)
{'name': ['John', 'Jessica', 'Zack'], 'age': [18, 19, 20]}
employee.clear()
print(employee)
{}
  • dict.copy() - returns a copy of the dictionary

  • dict.get(key, default = “None”) - returns the value of specified key

  • dict.items() - returns the value of specified key

  • dict.keys() - returns a list containing dictionary’s key

  • dict.values() - returns a list of all the values of the dictionary.

A.4.3 Tuple

     In python, tuple is very similar to list, except one difference. List is mutable, but tuple is not. Once a tuple is created, its elements cannot be changed. Unlike lists, we cannot add, remove, or change elelment in tuple. Tuple is created by using parenthese (()). Also, the function tuple () can also be used to create tuple. We can access the elements of tuple by indexing as we did for lists.

my_tuple = ('10', '20', '30', 'hello', 'world')
my_tuple
('10', '20', '30', 'hello', 'world')
my_tuple[3]
'hello'

There are different operations that can be performed on the tuple. Some of them include -

  • Concatenation - To concatenate, plus operator (+) is used.

  • Nesting - Nested tuple means a tuple is inside the another tuple

  • Repetition - creating a tuple of several times

second_tuple = ('10', '20', 'SIU', "SOA", "Carbondale")
second_tuple*3
('10',
 '20',
 'SIU',
 'SOA',
 'Carbondale',
 '10',
 '20',
 'SIU',
 'SOA',
 'Carbondale',
 '10',
 '20',
 'SIU',
 'SOA',
 'Carbondale')
  • Slicing - Dividing a given tuple into small tuples using indexing is slicing.
second_tuple[1:]
second_tuple[2:4]
second_tuple[::-1]
('Carbondale', 'SOA', 'SIU', '20', '10')
  • Finding the Length - using len () function, we can figure out the total number of elements in the tuple.

  • Different data types in tuples - Tuple can include heterogenous data.

  • Lists to tuples - Using tuple () functions, we can convert a list into tuple.

A.4.4 Set

     A set in python is a collection of unordered, unchangeable, and unindexed items. Set items are unchangeable, but new items can be added to the set and old items can be deleted from the set. Another important characteristics of set is that it has no duplicate elements. Curly bracket ({}) is used to create a set. The function difference () or minus operator (-) is used to calculate difference between two sets.

new_set = {'Hello', 'World', "World"}
new_set 
{'Hello', 'World'}
type(new_set)
set
new_set[0] = "Hi"

A.5 What is Numpy?

A.5.1 Installing and Importing Numpy

     Numpy is a library in python and it is one of the most important and essential libraries for data science becasue almost all of the libraries in python PyData ecosystem use numpy. Therefore, understanding numpy is important. Moreover, numpy arrays are very fast as they are implemented in C. In this section, we will learn some useful numpy methods.

     Before we start using the numpy module (library), we need to install it. We can run the following code to install numpy.

pip install numpy 
# OR
conda install numpy 

     Once the numpy is installed, we need to load (import) the library by running the following code -

import numpy as np

A.5.2 Some Useful numpy Functions for Array

A.5.2.1 Array Creation

     Array is a multi dimensional data structure, which describes a collection of “items” of the same type (homogenous). Arrays are powerful for performing different mathematical and scientific computation. There are many functions in numpy to create arrays. Below some those methods (functions) are described.

     * np.array() is used to create an array in numpy. The array object is also called ndarray. You can pass a list or tuple to the np.array () function. You can create zero, one, two, or three dimensional arrays in numpy. Using the ndim() you can check the dimension of an array.

list = [2025, 2024, 2023, 2022, 2021, 2020, 2019]
np.array(list)
array = np.array(list)
array.ndim # 1 dimensional array

np.array((20)).ndim # zero dimensional array. 

array2 = np.array([[1,3,5,7], [2,4,6,8]])
array2
array2.ndim # 2 dimensional array
2

     * np.arange(start, stop, step)3 function is used to create array with values starting from start up to, but not including, stop value, increasing by step.

np.arange(10,21,1)
np.arange(21)
np.arange(10,21)
np.arange (-1,1)
np.arange(-1,1,0.001)
np.arange(10,30,3)
array([10, 13, 16, 19, 22, 25, 28])

     * np.linspace(start, stop, n) creates an array of n evenly spaced number between start and stop.

np.linspace(10,21,10)
np.linspace(1,100, 10)
array([  1.,  12.,  23.,  34.,  45.,  56.,  67.,  78.,  89., 100.])

     * np.zeros() is used to create an array with elements 0.

np.zeros(5)
np.zeros((5,5))
np.zeros([5,5])
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

     * np.ones() is used to create an array with elements 1.

np.ones(5)
np.ones((5,5))
np.ones([5,5])
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

     * np.eye() is used to create an identity matrix. The same can be done by using np.identity() function.

np.eye(5)
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

     * np.random.rand() function generates an array of random numbers between 0 and 1 from a uniform distribution.

np.random.rand(10) # one dimensional 
np.random.rand(3,2) # two dimensional 
array([[0.02645752, 0.94033618],
       [0.83656222, 0.59655895],
       [0.32579328, 0.80937829]])

     * np.random.randn() generates an array of random numbers between 0 and 1 from a standard normal distribution.

np.random.randn(10) # one dimensional 
np.random.randn(3,4) # two dimensional 
array([[ 0.30210099,  0.51770939,  1.82834208,  0.68299169],
       [-0.20429609,  0.48092033,  1.25141883,  1.63325492],
       [ 0.3242677 ,  0.78609986, -0.59844598, -0.78931083]])

     * np.random.randint() allows to generate random integer numbers given an interval of integers.

np.random.randint(low=0, high=10, size = 5)
array([7, 5, 7, 3, 5], dtype=int32)

     * np.reshape() allows to chgange the shape (rows and columns) without changing the data in the array.

array = np.random.randn(3,4) # two dimensional 
array
array.reshape(6,2)
array([[ 0.84150961, -0.99492664],
       [-2.06698101, -1.22421433],
       [-1.60569132, -1.75631627],
       [ 0.35674724, -0.47488292],
       [ 0.17706789, -1.85904999],
       [ 2.01366597, -0.65283419]])

     Some other useful functions from numpy include - np.shape, np.dtype, np.transpose. Some useful functions related to linear algebra include - np.linalg.inv() - to compute inverse of a matrix, np.linalg.det() - to compute determinant of a matrix, np.linalg.eig() - to compute eigenvalues and eigenvectors of a matrix, np.linalg.solve() - to solve a system of linear equations.

A.5.2.2 Array Indexing and Selection

     Both indexing and slicing of arrays are important skill to learn. Indexing refers to obtaining individual elements from an array while slicing refers to obtaining a sequence of elements from the array. We use array[start:end] to index an array.

# One Dimensional Array
array = np.random.randn(10)
array
array[2]
array[1:3]
array[:5]
array[5:]
array[-3:]
array[:-3]
array[array>0.50] # conditional indexing 
array([1.44293469, 1.16446499])
# Two Dimensional Array 
array2D = np.random.randn(8,5)
array2D
array2D[1]
array2D[1][2] # double brackets 
array2D[1,2] # single bracket (preferred method)
array2D[2:,]
array2D[2:,3:]
array2D[2:3]
array2D[2:4] # Only rows 
array2D[:,3:]
array([[-2.2122436 ,  0.06178527],
       [ 1.54939543, -0.25975965],
       [-0.74136396, -1.60482342],
       [-1.33514363, -0.7696434 ],
       [-2.34376853,  0.1489806 ],
       [ 0.62060885,  0.08980056],
       [ 0.63882864,  1.51053534],
       [-0.24009317, -1.36122267]])

A.5.2.3 Array Operations

     Array operations involve performing matematical operations on the array as a whole. They are not perfomed on the individual element of the array.

array1 = np.arange(85,96)
array2 = np.arange (35,46)
array1 + array2
array1 - array2
array1*array2
array1 / array2
array([2.42857143, 2.38888889, 2.35135135, 2.31578947, 2.28205128,
       2.25      , 2.2195122 , 2.19047619, 2.1627907 , 2.13636364,
       2.11111111])
array1.mean()
array2.std()
np.mean(array1)
np.min(array2)
np.max(array2)
np.sqrt(array1)
np.sum(array1)
np.log(np.sum(array1))
np.float64(6.897704943128636)

A.6 Functions in Python

     In python, there are some functions that we use very frequently. In this section, we will discuss some of those functions.

A.6.1 for Loop Function

     for loop function in python allows to iterate over iterable sequences such as list, tuple, string, or range and execuate codes for each elements in the sequence. for loop function helps to handle repititve tasks more efficiently and effectively. The syntax of a for loop function is -

for element in sequence:
  # Expected code to execute on each element of the sequence

     A basic example of a for loop function is -

analytics_students = ['Ashley', 'Elijah', 'John', 'Jack', 'Adams']
for student in analytics_students:
  print(student)
Ashley
Elijah
John
Jack
Adams

     Other examples of a for loop function is -

even_numbers = [2,4,6,8,10]
for numbers in even_numbers:
  square = numbers**2
  print (square)
4
16
36
64
100
even_numbers = [2,4,6,8,10]
squares = []
for numbers in even_numbers:
  square = numbers**2
  squares.append(numbers**2)

print (squares)
[4, 16, 36, 64, 100]

A.6.2 map() Function

     map () function, like for loop function, allows to apply a function on each item in an iterable (list, tuple, or string) sequence. The syntax for map () is - map (function, iterable). Below is an example of map () function -

var = [16, 17, 18, 19, 20]
var_log = map(lambda x: np.log(x), var)

for x in var_log:
  print(x)
2.772588722239781
2.833213344056216
2.8903717578961645
2.9444389791664403
2.995732273553991

     map () function is useful for simple calculations; however, for complex transformations, using for loop is efficient and effective.

A.6.3 User Defined Function (Named Function)

     In addition to predifend functions from different python modules, users can define their own functions, which are sometimes called named functions. The syntax for defining a user defined function in python -

def deduct_num (a, b):
  """
  The function will deduct two numbers
  """
  result = a - b
  return result

     In the above example, a user defined function is created. Then function name is deduct_num and it is created using def keyword. So, when we need to create a user defined function, we will start with def keyword followed by the name of the function. The a and b are the function’s arguments, which sometimes are also called parameters.

     The tripple quote """ """ is used to create a docstring, which also explains the nature of the function or what it will do. The return statement in function will return a value.

deduct_num(15, 100)
-85

     Another example of user defined function -

def welcome (name):
  """
  The function greets the person
  """
  print(f"Welcome {name}! How are you doing?")
welcome("John")
Welcome John! How are you doing?

A.6.4 Anonymous (Lambda) Function

     Anonymous function is a function without a name. It is also called lambda function in python. The syntax for lambda function is - lambda arguments: expression. Below is an example of lambda function -

sqr = lambda x: x**2
sqr(5)
25

     lambda function can take many arguments (parameters), but accepts only one expression.

A.7 Conclusions

Exercises

  1. Create an array of integers from 10 to 50.

  2. Create an array of all even integers from 10 to 50.

  3. Create an array of 10 threes (use either np.full() or np.ones() or np.repeat()).

  4. Create a 3 by 3 matrix with values ranging from 10 to 18.

  5. Create an array of 5 by 5 identify matrix.

  6. Use numpy to generate a random number between 0 and 1.

  7. Use numpy to generate an array of 25 random numbers sampled from a standard normal distribution.

  8. Create a matrix like below -

array([[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ],
       [0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 ],
       [0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 ],
       [0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 ],
       [0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 ],
       [0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6 ],
       [0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7 ],
       [0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ],
       [0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 ],
       [0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.  ]])
  1. Create an array of 50 linearly spaced points between 0 and 1.

  2. Create an array of 20 linearly spaced points between 0 and 1.

mat = np.arange(1,26).reshape(5,5)
mat
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])
  1. Produce the following matrix from the mat matrix above.
array([[12, 13, 14, 15],
       [17, 18, 19, 20],
       [22, 23, 24, 25]])
  1. Produce the following (value) 20 from mat matrix - np.int64(20)

  2. Produce the following matrix from mat.

array([[ 2],
       [ 7],
       [12]])
  1. Produce the following matrix from mat - array([21, 22, 23, 24, 25])

  2. Produce the following matrix from mat

array([[16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])
  1. Get the sum of all values in the mat matrix

  2. Get the standard deviation of the values in mat matrix

  3. Get the sum of all columns in the mat matrix

  4. Get the sum of all rows in the mat matrix

  5. Get the determinant and eigenvalues and eigenvectors of the matrix mat.


  1. When we say that lists are ordered, it means that the items have a defined order, and that order will not change. If you add new items to a list, the new items will be placed at the end of the list.↩︎

  2. In python, functions are called methods.↩︎

  3. range(start, stop, step) function creates a sequence of numbers starting from start, and stopping at stop. Usually, the step in range () function is 1.↩︎