DEV Community

222010301052
222010301052

Posted on

Matplotlib Histogram in python

A histogram is an accurate representation of the numerical date. It is an estimation of the probability distribution of a continuous variable. A part, from numerical date, Histograms can also be used for visualizing the distribution of images.
It is a kind of bar graph.
Alt Text
In histogram, the bins must be adjacent are often of equal size. where every bin has a minimum and maximum value. Each bin also has a frequency between X and infinite. The bins are usually specified as consecutive, non-overlapping intervals of a variable.
To construct a histogram, follow the following steps:

  • Bin, the range of values.
  • Divide the entire range of values into a series of intervals.
  • Count how many values fall into each interval.

The bins, range, weight, and density parameter behave as in numpy.histogram.The matplotlib.pyplot.hist() function plots a histogram. It computes and draws the histogram of X.

PARAMETERS:
The following tells about the parameters for a histogram:

x:- Array or sequence of arrays. This takes either a single array or a sequence of arrays which are not required to be of the same length.

Bins:- Integer or sequence or string, default: rcParams“hist.bins” If bins is an integer, it defines the number of equal-width bins in the range. If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin; in this case, bins may be unequally spaced. All but the last (righthand-most) bin is half-open. In other words, if bins is: [1, 2, 3, 4] then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.
If bins is a string, it is one of the binning strategies supported by numpy.histogram_bin_edges: ‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, or ‘sqrt’.

Range:- Tuple or None, default: None The lower and upper range of the bins. Lower and upper outliers are ignored. If not provided, range is (x.min(), x.max()). Range has no effect if bins is a sequence.
If bins is a sequence or range is specified, autoscaling is based on the specified bin range instead of the range of x.

density:- Bool, default: False
If True, draw and return a probability density: each bin will display the bin’s raw count divided by the total number of counts and the bin width (density = counts / (sum(counts) * np.diff(bins))), so that the area under the histogram integrates to 1 (np.sum(density * np.diff(bins)) == 1). If stacked is also True, the sum of the histograms is normalized to 1.

cumulative:- Bool or -1, default: False If True, then a histogram is computed where each bin gives the counts in that bin plus all bins for smaller values. The last bin gives the total number of data points. If density is also True then the histogram is normalized such that the last bin equals 1.
If cumulative is a number less than 0 (e.g., -1), the direction of accumulation is reversed. In this case, if density is also True, then the histogram is normalized such that the first bin equals 1.

histtype:- {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}, default: ‘bar’
The type of histogram to draw.

  • ‘bar’ is a traditional bar-type histogram. If multiple data are given the bars are arranged side by side.
  • ‘barstacked’ is a bar-type histogram where multiple data are stacked on top of each other.
  • ‘step’ generates a lineplot that is by default unfilled.
  • ‘stepfilled’ generates a lineplot that is by default filled. weights:- This parameter is an optional parameter and it is an array of weights, of the same shape as x.

Bottom:- This parameter is the location of the bottom baseline of each bin.

align:- This parameter is an optional parameter and it controls how the histogram is plotted. {‘left’, ‘mid’, ‘right’}

rwidth:- This parameter is an optional parameter and it is a relative width of the bars as a fraction of the bin width

log:- This parameter is an optional parameter and it is used to set histogram axis to a log scale

color:- This parameter is an optional parameter and it is a color spec or sequence of color specs, one per dataset.

label:- This parameter is an optional parameter and it is a string, or sequence of strings to match multiple datasets.

normed:- This parameter is an optional parameter and it contains the Boolean values. It uses the density keyword argument instead.

Example for Histogram:
Say you ask for the height of 250 people, you might end up with a histogram like this:
Alt Text
You can read from the histogram that there are approximately:
2 people from 140 to 145cm
5 people from 145 to 150cm
15 people from 151 to 156cm
31 people from 157 to 162cm
46 people from 163 to 168cm
53 people from 168 to 173cm
45 people from 173 to 178cm
28 people from 179 to 184cm
21 people from 185 to 190cm
4 people from 190 to 195cm.
In Matplotlib, we use the hist() function to create histograms. "The hist() function will use an array of numbers to create a histogram, the array is sent into the function as an argument.
For simplicity we use NumPy to randomly generate an array with 250 values, where the values will concentrate around 170, and the standard deviation is 10.
Example: 1
A normal data distribution
import numpy as np
x = np.random.normal(170, 10, 250)
print(x)
The hist() function will read the array and produce a histogram:

Example: 2
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()

Below shows the most minimal Matplotlib histogram:

Example: 3
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,100]
num_bins = 5
n, bins, patches = plt.hist(x, num_bins, facecolor='blue', alpha=0.5)
plt.show()

Many things can be added to a histogram such as a fit line, labels and so on. The code below creates a more advanced histogram.

Example: 4

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

example data

mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 20

the histogram of the data

n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='blue', alpha=0.5)

add a 'best fit' line

y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

Tweak spacing to prevent clipping of ylabel

plt.subplots_adjust(left=0.15)
plt.show()

Example: 5

Implementation of matplotlib function

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(10**7)
mu = 121
sigma = 21
x = mu + sigma * np.random.randn(1000)

num_bins = 100

n, bins, patches = plt.hist(x, num_bins,
density = 1,
color ='green',
alpha = 0.7)

y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
np.exp(-0.5 * (1 / sigma * (bins - mu))**2))

plt.plot(bins, y, '--', color ='black')

plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')

plt.title('matplotlib.pyplot.hist() function Example\n\n',
fontweight ="bold")

plt.show()

Example: 6

Implementation of matplotlib function

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(10**7)
n_bins = 20
x = np.random.randn(10000, 3)

colors = ['green', 'blue', 'lime']

plt.hist(x, n_bins, density = True,
histtype ='bar',
color = colors,
label = colors)

plt.legend(prop ={'size': 10})

plt.title('matplotlib.pyplot.hist() function Example\n\n',
fontweight ="bold")

Top comments (0)