Machine learning models are complex collections of many variables, but they must be trained to find good values. This also means that these “weights” have to be set to initial values. One option for this is to start with all the weights as zeros, and go from there. However, this causes issues algorithmically - basically, the gradients of errors have trouble fixing errors. Instead, we often set these weights to random values. After that point, the model learns and adjusts.
TensorFlow has a number of built-in methods for generating random numbers. This includes distributions we are all familiar with, like “uniform”, and others that you have probably heard of, like “normal” distributions. Uniform distributions are like those you get when you roll a dice - there is a set of values, and they are all equally likely. Normal distributions are the standard taught in statistics classes, where the data has a mean that is more likely, and a “bell-shaped” curve around it. Others are included as well, as we will see.
For this lesson, we are going to create a basic helper function that simply runs a single TensorFlow variable. This little function can be quite useful! It creates a session, initialises variables and runs it for us. It’s limited to a single variable though, so may not be useful for larger programs.
Hopefully this is all familiar to you by now. If not, have another look at our lesson on Variables to get started.
Let’s start with a basic distribution, the uniform distribution.
This gives us a 6 by 4 tensor (see Broadcasting for more information full of random values. To visualise this, we can use a histogram:
Note, if you are using Jupyter Notebooks, use
%matplotlib inline instead and remove the
The resulting graph shows the picture, although it isn’t perfectly clear yet…
This histogram shows that the possible values are between 0 and 1. Every value should be equally likely, but it doesn’t really look that way. The reason for this is that we have only chosen a small number of values. If we increase the size of the array, it becomes much more uniform.
That’s more uniform!
A uniform distribution can be quite useful for initialising weights in machine learning models, if you don’t have any other information to go by. It is also a “bounded” distribution, whereby it has a set minimum and maximum value, and the random values cannot fall outside that range. To change the range, for instance to 0 and 10, you multiply by the range and add the minimum. There is an exercise on this at the end of the lesson.
Another commonly used distribution is the normal distribution, implemented in TensorFlow as the
This distribution, by default, has a mean of around 0 and a standard deviation of 1. The values are not bounded, but become highly unlikely the further from the mean you stray, with the standard deviation setting the rate of decrease in likelihood. In practice, around 60% of values fall within a “radius” of one standard deviation from the mean in each direction, and 99% fall within 4 standard deviations.
The mean and standard deviation are parameters to the
For example, heights can be approximately modeled as a normal distribution with a mean of around 170cm and a standard deviation of around 15cm.
Up to now, our histograms have been generated with matplotlib.
We can use TensorFlow to create these as well!
histogram_fixed_width function takes a list of values (like our random values), the range, and the number of bins to compute.
It then counts how many values are within the range of each bin, and returns the result as an array.
plt.bar call, we generate the bin values again manually, and then plot those as x values with our
histogram_bins as the heights using a bar plot.
That’s correct, but doesn’t look right. The histogram values are there, but the widths are unusually narrow (our bins are represented by single values only). Let’s fix that:
- Use a Uniform distribution to model a single dice-roll. Plot the result to ensure it is consistent with your expectations
- Replace the last code block of this lesson with pure TensorFlow calls in a single graph. In other words, use TensorFlow concepts to replace the
lencalls. Only the plotting should be done without TensorFlow!