Mathematics

Sound is the result of air pressure varying over time. We can represent a sound by a graph with time on the x-axis and amplitude of the air pressure variation on the y-axis, which we call a signal.

Air pressure vs time

This time based representation of sound is convenient for recording and playing back sound, but we can have little idea of what such a signal will sound like. A frequency based representation will, however, give us an idea of what the signal sounds like.

Any signal is the sum of a number of sine and cosine functions of different frequencies and magnitudes i.e. a sum of functions of the form:

f(t) = m * sin(f * t) where m and f are constants

We can use mathematics to extract the magnitude-frequency pairs, and then plot them on a graph.

The frequencies vary over time, however. In order to see, how the frequencies vary over time we calculate the magnitude-frequency graphs for every 22 millisecond interval. We can then plot all these graphs together into what is called a spectrogram. Each magnitude-frequency graph occupies a vertical column of pixels. There is only a y-axis in a column of pixels which is given to the frequency, and we use color to represent the magnitude of a given frequency/pixel. The result would be a top down view of the below graph.

3D depiction of a spectrogram
Spectrogram of Speech