Mathematics
Sound is the result of air pressure varying over time. We can represent a sound by a graph with time on the x-axis and amplitude of the air pressure variation on the y-axis, which we call a signal.
This time based representation of sound is convenient for recording and playing back sound, but we can have little idea of what such a signal will sound like. A frequency based representation will, however, give us an idea of what the signal sounds like.
Any signal is the sum of a number of sine and cosine functions of different frequencies and magnitudes i.e. a sum of functions of the form:
f(t) = m * sin(f * t) where m and f are constants
We can use mathematics to extract the magnitude-frequency pairs, and then plot them on a graph.
The frequencies vary over time, however. In order to see, how the frequencies vary over time we calculate the magnitude-frequency graphs for every 22 millisecond interval. We can then plot all these graphs together into what is called a spectrogram. Each magnitude-frequency graph occupies a vertical column of pixels. There is only a y-axis in a column of pixels which is given to the frequency, and we use color to represent the magnitude of a given frequency/pixel. The result would be a top down view of the below graph.