Statistics Module in Python
Statistics Module in Python
For a better reading experience, check the published document
What can this module do?
"statistics" is a module used to perform some of the most common statistical equations on data. It has many equations like mean, standard deviation, median, and variance. It is also used to generate random variables and samples for probability calculations.
Importing the module
Import all the classes from the module:
from statistics import *
Statistical methods
Let's create a sample list:
myList = [13, 18, 13, 14, 13, 16, 14, 21, 13] # a list for testing N = len(myList) # length of that list
The mean
It returns the mean of the values (the average value, so it will add them than divide them by their number)
mu = mean(myList) # 15
The median
It returns the median of the values (the median is the middle value, so first we sort them numerically)
13, 13, 13, 13, 14, 14, 16, 18, 21, then select the middle value
mid = median(myList) # 14
The mode
It returns the mode of the values (the mode is the number that is repeated more often than any other)
mod = mode(myList) # 13
The multimode
It returns a list of the most frequently occurring values in the order they were first encountered in the data. If there are multiple modes, it will return more than one result.
sigle_mode = multimode(myList) # 13 mul_mode = multimode([13, 18, 13, 14, 13, 16, 14, 21, 13, 14, 14]) # [13, 14]
The quantile
It returns a quantiled set of numbers, it divides the data into intervals with equal probability
quantiles(myList) # [13.0, 14.0, 17.0]
The popular variance
It returns the popular variance, which is the spread between numbers in a data set, it measures how far each number in the set is from the mean and thus from every other number in the set. It tells you how does a single variable vary
where (u) is the mean, (N) is the number of data points, (xi) is the
current data point
Sn = pvariance(myList) # 7.111111111111111
The sample variance
It returns the sample variance (we only divide by (N-1) to get a better estimate)
Sn1 = variance(myList) # 8
The popular standard deviation
It returns the popular standard deviation, which is the measure of the dispersion of a dataset relative to its mean. It determines how much each data point's deviation is relative to the mean.
sigma = pstdev(myList) # 2.6666666666666665
The sample standard deviation
It returns the sample standard deviation
Sx = stdev(myList) # 2.8284271247461903
Normal distribution subclass
We can use the object of this subclass to generate and manipulate data samples
Create the object
create an object that has defaults of (mu=0) and (sigma=1)
obj = NormalDist()
Change its properties
obj.mean = 5 # to specify the mean of that object obj.median = 5 # to specify the median of that object obj.mode = 5 # to specify the mode of that object obj.stdev = 5 # to specify the standard deviation of that object obj.variance= 5 # to specify the variance of that object
Generating samples
obj.from_data([1,2,3,4,5,6]) # creates a normal distribution from the list obj.samples(20) # generates 20 random samples given the mean and standard deviation of that object
Some probabilities
obj.pdf(10) # using a probability density function (pdf), compute the relative likelihood # that a random variable X will be near the given value obj.cdf(10) # using a cumulative distribution function (cdf), compute the probability # that a random variable X will be less than or equal to the given valu
Comments
Post a Comment