Statistics Module in Python

 

Statistics Module in Python

For a better reading experience, check the published document

What can this module do?


"statistics" is a module used to perform some of the most common statistical equations on data. It has many equations like mean, standard deviation, median, and variance. It is also used to generate random variables and samples for probability calculations.



Importing the module


Import all the classes from the module:

from statistics import *



Statistical methods


Let's create a sample list:

myList = [13, 18, 13, 14, 13, 16, 14, 21, 13]     # a list for testing
N = len(myList)                                   # length of that list


The mean

It returns the mean of the values (the average value, so it will add them than divide them by their number)

mu = mean(myList)        # 15


The median

It returns the median of the values (the median is the middle value, so first we sort them numerically)

13, 13, 13, 13, 14, 14, 16, 18, 21, then select the middle value

mid = median(myList)     # 14


The mode

It returns the mode of the values (the mode is the number that is repeated more often than any other)

mod = mode(myList)     # 13


The multimode

It returns a list of the most frequently occurring values in the order they were first encountered in the data. If there are multiple modes, it will return more than one result.

sigle_mode = multimode(myList)        # 13
mul_mode = multimode([13, 18, 13, 14, 13, 16, 14, 21, 13, 14, 14])     
# [13, 14]


The quantile

It returns a quantiled set of numbers, it divides the data into intervals with equal probability

quantiles(myList)        # [13.0, 14.0, 17.0]


The popular variance

It returns the popular variance, which is the spread between numbers in a data set, it measures how far each number in the set is from the mean and thus from every other number in the set. It tells you how does a single variable vary

where (u) is the mean, (N) is the number of data points, (xi) is the

current data point

Sn = pvariance(myList)      # 7.111111111111111


The sample variance

It returns the sample variance (we only divide by (N-1) to get a better estimate)

Sn1 = variance(myList)      # 8


The popular standard deviation

It returns the popular standard deviation, which is the measure of the dispersion of a dataset relative to its mean. It determines how much each data point's deviation is relative to the mean.

sigma = pstdev(myList)        # 2.6666666666666665


The sample standard deviation

It returns the sample standard deviation

Sx = stdev(myList)      # 2.8284271247461903



Normal distribution subclass


We can use the object of this subclass to generate and manipulate data samples


Create the object

create an object that has defaults of (mu=0) and (sigma=1)

obj = NormalDist() 


Change its properties

obj.mean = 5         # to specify the mean of that object 
obj.median = 5       # to specify the median of that object 
obj.mode = 5         # to specify the mode of that object 
obj.stdev = 5        # to specify the standard deviation of that object 
obj.variance= 5      # to specify the variance of that object  


Generating samples

obj.from_data([1,2,3,4,5,6])     
# creates a normal distribution from the list

obj.samples(20)
# generates 20 random samples given the mean and standard deviation of that object 


Some probabilities

obj.pdf(10) 
# using a probability density function (pdf), compute the relative likelihood 
# that a random variable X will be near the given value

obj.cdf(10) 
# using a cumulative distribution function (cdf), compute the probability 
# that a random variable X will be less than or equal to the given valu




Comments