Home | Book | Articles | Biography | Contact
WHY 12 STEPS IN A SCALE? THE MATHEMATICS OF MUSICAL FREQUENCIES
Number Theory Project, Fall 2000 Caleb Rossiter
There are 12 steps in the western musical scale. These are shown concretely by the 12 frets of a guitar and the 12 keys of a piano before reaching an octave. However, all Western instruments, including those with variable pitches, such as horns and fretless strings, attempt to sound these same 12 steps, which comprise the possible notes, including all sharps, before reaching an octave on a written staff.
Why 12? Why not 5, or 13, or 34? Can the answer lie in some mathematical characteristics of 12, as opposed to other numbers? According to some musical theorists, this in unlikely, because musical systems are largely determined by cultural, not mathematical factors:
"It is customary to relate the developed Western harmonic system to the overtone series, and up to a point this relation is in accord with the facts. But it leaves unexplained the selection made by art…We take for granted our diatonic scale of seven notes or our chromatic scale of twelve, but this is not natural law. It is possible, within the octave, to use a five-note scale (pentatonic) or a 43-note scale (as devised by Harry Partch). Almost any arithmetical or proportional division might be used as a basis for creating a usable scale within the octave..."
Other theorists, though, would probably assume that there is an important mathematical component to any musical relationship, such as the 12-step scale:
"Thus, all tones produced by ratios formed by 2,3, 5, their products and quotients, become known once we know the musical meaning of 2,3, and 5, which are prime numbers. Prime numbers give birth to new tone values."
This paper argues that while culture and taste play a role in the development of scales, the latter orientation is far closer to the truth than the former. There are good mathematical explanations for the 7-note scale (as there are for a number of others), and there are prohibitive mathematical reasons for the 12-note scale, as opposed to any other, especially if it is created to make the 7-note scale transposable to all keys, as is the case for western music. Western musicians arrived at 12 as the magic number after many centuries of experimentation and learning; they could have saved themselves a lot of time by first doing the math.
* * *
The Physics of Musical Sound
Before addressing the mathematics of the scale, it is worthwhile to explain a bit of the mathematical physics of musical sound. A stretched string makes a sound we call an octave when it is fretted halfway along its length, meaning that that place where the ratio of the original string length to the length of the string at the new note is 2:1. The sound at 2:1 is a noticeably higher replica of the original sound, 1:1. This new sound is called an octave because it is the 8th note in a single key's scale in Western music, but in other cultures the second note is achieve by the 5th, 17th, or 22nd note, and is recognized as a renewal of the scale. (A better term for the note at 2:1 is the early Greek word for the phenomenon, "diapason," since at this note we appear to have run "through all" the range of a scale. However, octave is so familiar that it is used in this paper.)
Fretting a string only one-third the way along its length results in notes on either side of the string that, because one is twice as long as the other, form a new octave. The ratios of the original string-length to the new notes are 3:1 and 3:2. In all known forms of music, not just Western music, these new notes are perceived as related to 1:1, forming a pleasing blend. The note at 3:2 is called a fifth of the note at 1:1, because of its place in our 7-step scale. The note at 3:2 also blends in a different pleasing way with the note at 2:1, and in western music 2:1 is called a fourth of 3:2 because it is four steps above it on our scale. Its length compared to 3:2 is 2:1 divided by 3:2, or 4:3.
Why should these pleasing sounds have relationships of whole numbers? The answer lies in the physics of sound waves, especially musical sound waves. Sound moves in waves that oscillate as they move through the air before striking the eardrum and being perceived. The airwave moves like a water wave: the physical elements are stationary and serve as a medium that transmits the force of the wave. Sound waves are harmonic, meaning they are or are composed of regular "sine curves" of varying amplitude that oscillate equally on both sides of a mean line. Each note has a fundamental frequency, meaning the number of times its main wave returns to a single point of displacement in a period of time. The frequencies of different musical notes create the mathematics of music.
The frequency of a musical wave is inversely proportional to the length of the string making the tone (or, similarly, to the distances tones travel in a wind instrument). The formula for a frequency of string has been shown to be: f = 1/2l X square root of (p/m), where f is vibrations per second, l is length in meters, p is the stretching force on the string in Newtons and m is the mass of one meter of string in kilograms. The formula makes explicit what musicians know: as the length of the unblocked portion of the string or the thickness of the string increases, the frequency decreases, and as the string is tightened, the frequency increases.
The human ear can hear sounds whose waves oscillate between 20 and 20,000 times per second (also called 20 and 20,000 Herz, or Hz). The notes on a piano run from about 40 to about 6,000 Hz. By convention, a "concert A" pitch is 440 Hz, but over the centuries the pitch chosen for this note has ranged in the last three centuries from 393 to 450.
Because a sound wave acts like a string being plucked, an octave will always have waves half the length of its base. For example, the fifth A on a piano has waves of 39 centimeters, while the concert (fourth) A has waves twice as long, 78 centimeters. Since all sound travels at the same rate (about 343 meters per second), an octave will necessarily have a frequency per second, or pitch, twice as high as its base's. For example, the A below concert A has a frequency of 220 Hz, and the A above concert A has a frequency of 880. So, the octaves are not evenly spaced in terms of frequency, but rather related by a function in which A(n) = 2A(n-1). Since concert A, 440 Hz, is in the fourth scale on the piano, the first A has a frequency of 55, and A(n) = 55*(2^(n-1)).
Within the range of each octave, the ratio of the frequencies of the notes to each other will be the same, because the notes are both twice the frequency of their mates in the previous scale. For example, in the fourth C scale on the piano, the C is 264 Hz and the D is 297 Hz, for a ratio of 1.125. In the fifth C scale, the C is 528 Hz and the D is 594 Hz, for a ratio of 1.125. Both within a single scale, then, and over the entire range of music, pitches are measured not on an arithmetic scale, where a given note between octaves is a certain number of vibrations per second higher than those of the lower octave, but on a logarithmic scale, where each note's number of vibrations is a certain ratio of those of the lower octave. The scale is log (base 2), because a new unit is reached each time the vibrations per second double.
Musical sound contains more than one frequency. A note does emit a dominant "fundamental frequency," but it also produces weaker "overtones" that are simple multiples of the fundamental frequency and that provide richness to the note. Different instruments emit different amplitudes of various overtones, creating the "timbre" or unique sound of a tuba, as opposed to a violin. A collection of frequencies can be seen as being determined by the mathematics of the Fourier series for the particular instrument, meaning that the composite wave they create is the sum of the fundamental and the overtones, at various amplitudes. As the overtones expand to higher multiples, they tend to lose amplitude, so the most important overtones for any fundamental note will be those at 2, 3, 4, and perhaps 5 times the vibrations per second of the fundamental: the octave, its fifth, the following octave, and its third. For example, the successive overtones of concert A (440 Hz) are:
880 (the octave) 1320 (the fifth to the octave) 1760 (the second octave) 2200 (the third to the second octave, but also a fourth before the third octave) 2640 (the fifth to the second octave) 3080 (a note nearly exactly the "blues seventh," halfway in log (base 2) between the sixth and the seventh to the second octave) and 3520 (the third octave).
At this point, there are seven overtones before the eighth octave, 1.125 (second), 1.25 (third), 1.375 (.04 above fourth), 1.5(fifth), 1.625 (.04 below sixth), 1.75 (blues seventh), and 1.875 (seventh). Since two of these notes are not elements in the scale, they would start to create dissonance if heard at the same volume as the first few overtones. However, after two octaves, the overtones are losing steam, and have little amplitude.
Overtones help explain why certain combinations of notes are pleasing: particularly the fifth, the fourth, and to a lesser extent the third. Not only are the fundamental tones of these notes related by low-denominator fractions of wavelengths, but their overtones are also similarly related: pleasing combinations of notes should also have pleasing combinations of the first few overtones.
A final physical phenomenon that is related to overtones and that constrains the creation of scales is called "beats." When two notes are extremely close, the vibrations of their fundamental frequencies will combine to swell and fall dramatically, because they align almost perfectly at first and eventually align poorly. This creates an unpleasant wobble, pulse, or beat, if it happens about once a second, and creates an unpleasant sensation if the beat is, say, six per second - too many to hear individually, but too few to eliminate the effect of alignment. Notes that are separated by a noticeable amount have fundamental frequencies that do not repeat often enough to form swelling and falling. As a result, scales should be composed of notes whose overtones will not be too close in frequency to each other.
* * *
Construction of the 12-Note Scale by Trial and Error
Pythagoras, in the Sixth Century B.C., and many Greek philosophers who followed him developed first a five note and then a seven-note scale, taking for their notes the ratios between string lengths with the lowest possible numerator and denominator with a fraction between 1/1 and 2/1, such as 3/2, 4/3, and 5/4, and then dividing or multiplying these core fractions by simple fractions. They found these fractions with the lowest possible denominators to have the most pleasing relationships with 1/1 and 2/1, and often with each other. While they could not measure the actual frequencies of the component waves of pitches and therefore had no formal theory of overtones or beats, they recognized that to avoid the "near coincidences" of frequencies in overtones that create beats, one should use the lowest possible denominators in simple fractions, which create sufficient space between notes to avoid beats arising between the proliferation of overtones.
Eventually, the Greeks settled on the "diatonic" or seven-note scale, with ratios to the original pitch and original string length of 9/8, 5/4, 4/3, 3/2, 5/3, 15/8, and 2/1. The ratios of string length between these steps and the step below are 9/8, 10/9, 16/15, 9/8, 10/9, 9/8, and 16/15. This is the simple Western scale immortalized by Julie Andrews as the von Trapp family maid.
The steps is the diatonic scale are called either tone (9/8 or 10/9) or semi-tone (16/15). The tone steps are 9/8 = 1.125 or 10/9 = 1.111. The difference between the 9/8 and 10/9 tone steps,
(9/8)/(10/9) = 81/80 = 1.0125, also expressed as 1.125/1.111 = 1.0125,
is impossible for most listeners to hear, since the human ear can rarely distinguish notes that are this close in pitch. Since the exact number of vibrations that can be reliably distinguished depends on the pitch, which expands logarithmically, the arithmetic size of typical human error also depends on the pitch. The typical error is about 10 percent of a semi-tone step of 1.0595, described below. This error is therefore about .00595 on either side of a note, so notes within a total range of about .013, such as the 1.0125 in this case, are hard to distinguish. This explains why the semi-tone steps of 16/15 = 1.0667 were generally judged to be one-half of the larger tone step 9/8 (a full step would be 1.0667^2 = 1.1378, which is within the permission boundary for 1.125) and thought to be slightly "off" for the smaller tone step of 10/9, since 1.111 is just outside the boundary.
There was little change in the diatonic scale until European musicians in the second millennium wanted to create instruments that could modulate - that is, transpose the notes and play the same tune in all the seven keys, not just the fundamental key. The result was, at first, "chaos," since to do this perfectly on one instrument would require 11 new frequency-to-octave ratios in the original and surrounding octaves, and each of the 11 new notes would also need many new notes to permit full modulation for its scale as well, as would those new notes, in an increasingly large and never ending demand for perfect scales.
The initial solution to the problem, "mean tone" tempering, was based on the size of the error in human perception. Adding new semi-tones halfway, logarithmically or arithmetically, between the five tone steps, created a 12-note scale that had roughly equal proportional steps throughout. The steps were not perfectly equal because, as we have seen, 16/15 is not exactly half of 10/9 or 9/8, but they were close enough not to create extreme dissonance. All of the 11 new notes fell near enough to the seven original notes and the 5 new semi-tones between the full tones that they can be subsumed within their margin of listening error. The new arrangement did, however, create a strange effect during modulation: melodies and chords simply sounded different in different keys, as the effects of misaligned fundamentals and overtones added up in the listener's perception. A song played in C did not sound quite the same as a song played in D, thus defeating the purpose of modulation.
The final solution to the problem was to take the logical 12-step system that had been developed by trial and error, and turn it into 12 equal, perfectly proportional steps, so that all pieces would sound alike in the 12 different keys. These "equal-tempered" 12 steps are of exactly the same proportion to their predecessor, and that proportion is 1.05946309…, the 12th root of 2. That is, this number, roughly 1.0595, taken to the 12th power equals 2, the beginning of the next octave. Using equal steps also ended the need to expand the set of notes as new notes required their own scales. In today's equal-tempered world, everybody is a little bit out of tune all the time, but nobody is ever out of tune very much. Indeed, the old "mean tone" tempering of the five new semi-tones produced a scale that was closer to true pitch for 6 major and three minor keys than the equal-tempered system, but for the other 6 major and nine minor keys, it produced worse approximations, some of which, unlike in equal-tempering, were horrifically noticeable.
* * *
Construction of the 12-Note Scale by Continued Fractions and Diophantine Approximation
What is the best transposable scale we can construct? That is, what number of equally proportioned steps will cover the most important notes with the least deviation in all modulations? Remarkably, the mathematical answer turns out to be the same as the answer arrived at by trial and error: 12.
Why? Let us start with the rules of scale-building. A scale must be anchored at 1/1 and 2/1, and it must have between those notes a note extremely close to the fraction 4/3 and the fraction 3/2, the two most important mates to 1/1 and 2/1. It would also be important to have a note extremely close to the fraction 5/4, which after two octaves becomes an overtone of 1/1. So, we need a scale of fractional powers of 2, a scale of "m" steps in which the original note 1/1 is denoted by
2^(0/m) = 2^0 = 1,
the final note 2/1 is denoted by 2^(m/m) = 2^1 = 2,
and there are integers x, y, and z between 0 and m such that 2^(x/m) is roughly 4/3, 2^(y/m) is roughly 3/2, and 2^(z/m) is roughly 5/4, because these are the primary consonant tones and overtones of a fundamental tone.
"x/m" is the power by which 2 is raised to equal 4/3, so by the definition of a logarithm, it can also be expressed as "log (base 2) of 4/3," which equals log (base 2) of 4 minus log (base 2) of 3, which equals 2 minus Log (base 2) of 3. Similarly, "y/m" is the power by which 2 is raised to equal 3/2, or "log (base 2) of 3/2," which equals log (base 2) of 3 minus log (base 2) of 2, which equals log (base 2) of 3 minus 1. Finally, "z/m" is the power by which 2 is raised to equal 5/4, or "log (base 2) of 5/4," which equals log (base 2) of 5 minus log (base 2) of 4, which equals log (base 2) of 5 minus 2.
The key to identifying the best step-size for "m" is therefore to find the Diophantine approximation, by the use of continued fractions, that best approximates log (base 2) of 3 while still having a denominator whose gradations can be easily perceived by the human ear. We should similarly find the best Diophantine approximation for log (base 2) of 5, and if its denominator is different, test to see if this denominator generates less total difference from true tone for the fifth, the fourth, and the third.
Continued fractions are based on the Euclidean algorithm for finding greatest common divisors, also dubbed "The Pulverizer" by Aryabhata. In each iteration of the Euclidean algorithm, a number is written as the product of two integer factors, followed by a remainder. For a Diophantine approximation of a Real number by continued fractions, we apply the same basic procedure to a non-integer, such as 74/11, or e, and then improve the approximation of the remainder:
for fractions: after the Euclidean algorithm has established a sequence of equations that produces a final remainder of 1 or 0, we divide all elements of both sides of each equation by the divisor. This creates a set of inverse identities for the succeeding remainders that allows us to describe the original fraction far more precisely; and
for irrational numbers: after writing out a decimal approximation of the number, we separate its integer base, or "floor," from its decimal remainder. The original floor is the first approximation of the number, and it is improved at each iteration by adding to it a fractional approximation of the decimal remainder, found by expressing the decimal remainder as the inverse of its inverse, and then expressing the inverse, again, as a floor and its remainder. The succeeding approximations of the remainder are "nested" in the denominator, creating ever smaller adjustments to the approximation.
Here are the results of an Excel program identifying the successive floors for log (base 2) of 3 with the Euclidean algorithm:
1/C, (Floor), Remainder
1.5849625 (1) 0.5849625
1.70951129 (1) 0.70951129
1.40942084 (1) 0.40942084
2.4424746 (2) 0.4424746
2.26001675 (2) 0.26001675
3.84590604 (3) 0.84590604
1.18216439 (1) 0.18216439
5.48954709 (5) 0.48954709
2.0427044 (2) 0.0427044
23.4167898 (23) 0.41678978
2.39929105 (2) 0.39929105
2.5044388 (2) 0.5044388
1.98240104 (1) 0.98240104
1.01791423 (1) 0.01791423
55.8215318 (55) 0.82153184
1.21723828 (1) 0.21723828
4.60324035 (4) 0.60324035
1.65771405 (1) 0.65771405
1.52041756 (1) 0.52041756
1.92153392 (1) 0.92153392
1.08514726 (1) 0.08514726
11.7443597 (11) 0.74435975
1.34343643 (1) 0.34343643
2.91174701 (2) 0.91174701
Here are the Diophantine approximations of log (base 2) of 3, based on the floors identified for the inverses of the successive remainders in the preceding table. The floors are: (1, 1, 2, 2, 3, 1, 5, 2…):
1.5849625 = 1, floor is 1 = 1 + 1/(1) = 2,
floor is 1 = 1 + 1/(1+1/1) = 3/2,
floor is 2 = 1 + 1/(1+(1/(1+(1/2)))) = 8/5,
floor is 2 = 1 + 1/(1+(1/(1+(1/(2+1/2)))) = 19/12,
floor is 3 = 1 + 1/(1+(1/(1+(1/(2+1/(2+1/3))))) = 65/41,
floor is 1 = 1 + 1/(1+(1/(1+(1/(2+1/(2+1/(3+1/1))) = 84/53,
floor is 5 = 1 + 1/(1+(1/(1+(1/(2+1/(2+1/(3+1/(1+1/5))) = 202/127,
floor is 2, etc…
We can see that 12 is a better approximation than 5 or 2 for m, the number of steps between the octaves, but is worse than 41, 53, 127, or the succeeding denominators. However, scales of 41, 53, or 27 notes are unwieldy to construct and the differences in the notes they create are beyond the grasp of the human ear to distinguish.
For example, if the number of steps in an octave were 41, the proportionate size of the steps will be 2^(2/41)/2^(1/41), or 1.017, versus the 1.0595 "semi-tone" step-size for an equal-tempered 12-note scale. According to Johnston, as noted above, even a well-trained human ear is hard-pressed to distinguish an error of 1/10th of a semi-tone, or 1.00595 so the steps of 1.017 for a 41-note octave would be barely distinguishable.
For a typical note in the mid range, such as concert A = 440 Hz, the next note would be 447.5 Hz, which is the sort of variation in perception that can be subsumed by a single note, versus the next note for 12-note of 466.2, a variation that is distinct. This step of 7.5 Hz is less than the 7.833 HZ difference between the equally-tempered A# that follows A = 440 Hz and one of the diatonic notes it "covers," and since we accept that divergence as essentially the same, we would have a difficult time hearing the divergence as part of a scale. More importantly, the overtones of the 41 notes, when played in various combinations, would stack up beats far more quickly than the lower denominator of 12, creating an unpleasant sensation.
Just how good 12 is can be seen in how close the notes of a 12-step scale are to the true fifth and fourth: the ratio of its seventh step and a true fifth (1.5/1.4983) is 1.0011, as is the ratio of its fifth step and a true fourth (1.3348/1.3333), or 1.0011). No number less than 41 can have such a close coincidence. For example, the closest approximation of the fifth with a 13-step scale is 1.532/1.5=1.0213, and the fourth is 1.3333/1.3055=1.0213, 20 times the difference for a 12-step scale. Looking at the best approximation less than 12, the closest approximations of the fifth and fourth in a five-step scale are 1.5157/1.5=1.0105 and 1.3333/1.3195=1.0105, 10 times the difference for a 12-step scale.
While 12 is clearly the proper base for superb approximations of the fourth and fifth, the third is also important. Here are the results of an Excel program identifying the successive floors for log (base 2) of 5 with the Euclidean algorithm:
1/C, (Floor), Remainder
2.32192809 (2) 0.32192809
3.10628372 (3) 0.10628372
9.40877874 (9) 0.40877874
2.4463112 (2) 0.4463112
2.24058906 (2) 0.24058906
4.15646494 (4) 0.15646494
6.39120816 (6) 0.39120816
2.55618391 (2) 0.55618391
1.79796642 (1) 0.79796642
1.25318556 (1) 0.25318556
3.94967228 (3) 0.94967228
1.05299482 (1) 0.05299482
18.8697673 (18) 0.86976726
1.14973286 (1) 0.14973286
6.67856062 (6) 0.67856062
Here are the Diophantine approximations of log (base 2) of 5, which the preceding table shows to have floors of 3, 9, 2, 2, 4, 6, 2…
2.322 = 2, floor is 3 = 2 + 1/(3) = 7/3,
floor is 9 = 2 + 1/(3+(1/9)) = 65/28,
floor is 2 = 2 + 1/(3+(1/(9+1/2))) = 137/59,
floor is 2 = 2 + 1/(3+(1/(9+(1/(2+1/2))))) = 339/146,
floor is 4….
Here, 2, 3, 28, and 59, in ascending order, are the bases proving the closest approximation of the third. Clearly, 28 and 59 are not appropriate, for the same reasons 41 is not: too many notes and problematic overtones. And 12 is precisely as good as a step-system as 3 for the third, because the step that is closest to 1.25 in a 3-step scale, the first step, provides precisely the same exponent (1/3) for 2 as the step that is closest to 1.25 in a 12-step scale, the fourth step, (4/12=1/3)! The error in a 12-step scale or a 3-step scale for the third of 1.2599/1.25=1.00794 is larger than the error for the fourth (1.00115) and fifth (1.00113), but it is not a significant error. The average variance in the three key notes for a 12-step scale is 1.01022, which cannot be rivaled by any number less than the unworkable 41.
As can be seen in the following table, none of the 17 ratios for the full diatonic range's notes (the 19 discussed above, less the two octaves) is more than 1.017 "off" from the nearest equal step in the 12-step scale, when the 1.0595. This is a variation, as noted, that we can barely perceive, if at all. Only six ratios exceed 1.01, and the average amount each of the 17 notes is "off" is 1.0082.
(17 entries for the ratio to the closest 12-step tone, followed by a total and an average)
1.01709108
1.00462084
1.01022595
1.00226106
1.01479007
1.00793684
1.00450739
1.00115492
1.01829894
1.00566298
1.00112989
1.01593667
1.00911606
1.0033935
1.01365197
1.00224853
1.00679927
17.138826
1.00816623
More importantly, the two most basic notes added to the scale (the fifth and fourth) are nearly perfect: 1.00113 and 1.00115 "off" - which is, for all practical purposes, not "off" at all, and the next most basic note (the third) is also quite good: 1.00794. The problem was simply that "two eminently reasonable requirements" collided: simple fractions, for consonance, and equal footing for modulation. The solution was found, by trial and error, in the number 12. However, the solution could have been found simply by studying the properties of that number.
* * *
Bibliography J.M. Bowsher, Alexander Wood's The Physics of Music, Chapman and Hall, London, 1975. Richard Franko Goldman, Harmony in Western Music, W. W. Norton, New York, 1965. Ian Johnston, Measured Tones: The Interplay of Physics and Music, Adam Hilger, New York, 1989. Ernst Levy, A Theory of Harmony, SUNY Press, Albany, NY, 1985. Michael Moravcsik, Musical Sound: An Introduction to the Physics of Music, Paragon House, New York, 1987.
* * *