Beyond MIDI: The Handbook of Musical Codes

Goals of Musical Description

Eleanor Selfridge-Field


1. What Is Music Representation?

Musical codes have been used since man's earliest efforts to transcribe sounds. From the accents for tonal inflection in many of the world's languages, through the neumes representing chant in medieval monasteries, to the solfegge of musical pedagogy in recent centuries, codes for sound have always had the purpose of prescribing consistency of practice. Almost as often, the practice so defined has been a relatively local one.

1.1 Musical Codes in Common Use

    The most comprehensive coded language in regular use for representing sound is the common musical notation (CMN) of the Western world. Western musical notation is a system of symbols that is relatively, but not completely, self-consistent and relatively stable but still, like music itself, evolving. It is an open-ended system that has survived over time partly because of its flexibility and extensibility.

    The adaptability of common notation means that it is not a perfect guide to the reproduction of sound. The apparent continuity of graphic symbols over centuries does not guarantee the same degree of continuity in practice. There is no easy substitute for a knowledge of the history of musical performance in determining how the "common" notation should be converted to sound. Yet common musical notation is the cornerstone of all efforts to preserve a sense of the musical present for other performers and listeners.

1.2 Musical Representations for Computer Applications

    While musical codes are not circumscribed by computer applications, most are selective in what they represent: finger use, fretboard position, relative melodic interval sizes, thematic incipits, and so forth. Because of its power and memory, the computer offers the possibility of representing entire musical works of considerable length with as many attributes as the interested party is willing to encode.

    The lure of total representations has now been pursued for roughly three decades. Yet no one involved with the most competent of systems claims that any piece of music can be represented at the 100% level in all of its conceivable aspects. Every system makes sacrifices somewhere to optimize clarity of its preferred features.

    Most systems are extensible, but all become cumbersome when they begin to seem like centipedes--with too little core to support a large array of extensions and too few links between extensions to provide an integrated logical foundation for understanding the music as music. Each new addition takes the representation further from the object it attempts to simulate and taxes programming effort as well.

    The establishment of the Musical Instrument Digital Interface (MIDI) gave easy access to tools for musical sound to home users, hobbyists, and millions of ordinary musicians. These users established a new constituency who could experiment with sound control in ways that previously had been possible only in research studios. MIDI is now the most prevalent representation of music, but what it represents is based on hardware control protocols for sound synthesis. Programs that support sound input for graphical output necessarily must span a gamut of representational categories. For applications in music pedagogy, musical analysis, musical style simulation, and music theory the nature of the representation matters a great deal. Four "black circles" between two "vertical lines" do not have the same semantic meaning as four "quarter notes" between two "bar lines." MIDI key number 64 is always in the first instance interpreted with the note name E, denying it the possibility of being an F or a D.

1.3 Reusable Codes

    Since encoding music by any system is laborious, an interest naturally arises in driving multiple applications from single data sets.The road to platform- and system-independent music-printing applications has been full of cul-de-sacs, and the road to music-analytical applications is still a dirt one, full of pot-holes. Only sound applications have thrived. MIDI succeeds as a lowest common denominator in the world of musical information. Thus much remains to be done.

2 Parameters of Musical Information

2.1 The Three Contexts: Sound, Graphics, and "Logical" Information

    Musical information may describe music in any of its several contexts--the sound or phonological context, the notation or graphical context of notation, the rational context of analytical parameters, and the semantic context of musical perception and understanding. Most codes described in Beyond MIDI are optimized for one of these three domains. Nothing seems to sum up the challenges posed by the sometimes contradictory natures of these contexts so well as some remarks made recently by the noted musicologist Margaret Bent in relation to the "dilemma" of editing early music. Bent wrote:

      In one sense, music exists only in sound, but paradoxically, sound is its least stable element. But also, visual presentation may be an important or essential ingredient, even to the extent of constituting part of the structure or at least of the aesthetic. And there are other senses in which the music exists in dimensions (e.g., numerical) that are not immediately audible.(3)

    A fundamental question that users will want to consider in evaluating different systems is which of these three modes holds the dominant position. The coordination of sound and graphics is native to musicological applications, but the ultimate objective of representing both in relation to an entity at once more fundamental and more complete--the "logical" or "core" work--is the one that holds the highest interest as an ultimate objective. Schemes of representation that fall short of this requirement are often conducive to the use of different data sets for every application.

2.2 The Principal Attributes of Musical Information

    The attributes of a musical "event" are arbitrary in number. The sound phenomenon, the notated representation, and the underlying "logical core" of a musical work have some shared and some unique attributes. Some common attributes of a note are indicated in Figure 1.1.

    Certain details of each of these attributes differ according to the context in which they are considered. The phenomenon we call pitch exists in sound but not in notation. The notehead position that represents it on the page provides, in combination, three absolute elements of information: (1) it confers a pitch name by vertical staff position; (2) it implies an octave position, which may be inferred from a clef sign; (3) it allows for a chromatic inflection, which may be conferred globally in a key signature or locally by an accidental sharp, flat, or natural sign.

    Figure 1.1 Some attributes of (or associated with) a single note.
    
    
    
    
    
    
    
    

    Duration exists only in sound and only in the context of other sounds, since duration is conventionally represented as a relative value. In the graphical representation of duration, there are four possible components: (1) notehead "color" (open or filled); (2) the presence or absence of a stem; (3) the presence or absence of flags or, for groups of notes, beams; (4) the presence of augmentation dots. All four aspects differentiate longer from shorter notes in a complex hierarchy of values ranging from whole notes (value = 1) to 256th notes (value = 1/256). A fourth variable in the description of duration is notehead shape: square and rectangular noteheads represent note values greater than whole notes. It is not possible to "perform" such visual elements as a flag or a stem or a beam; yet they are essential to the correct rhythmic interpretation of written scores.

    In notation, rests represent only duration. Yet in many representation schemes their presence must be recorded as a "pitch" event to retain a pitch "marker" in a stream of events.

    The tempo of a live performance is ultimately established by a user. Dynamic level is relative, since "loud" (f) must be determined not only in relation to "soft" (p) but also in relation to a moderate dynamic level for the complex of performers involved and the dynamics of the room or hall. The "moderate" dynamic level of a 100-piece orchestra in a concert hall is necessarily much greater than that of a solo guitar in a small room.

    Articulation as a class of interpretive features includes staccato (truncated) and legato (smooth) performance. These kinds of articulations affect duration and often have implications for dynamics. The attack times of two consecutive eighths will be the same, whether or not they are staccato. In notation it is the theoretical eighth-note value that is represented; in notational logic the staccato dot represents the subtraction of a variable proportion of the theoretical value to achieve the actual duration. This distinction causes a significant divide between representations that take sound as primary and those that take notation as primary.

    In music for more than one performer the part as a component of the whole is also a cause for considerable attention. In relation to performance it is sometimes the timbral definition that is most relevant. Much of the work of electronic music research has been concerned with simulating "natural" sounding timbres. Timbre is largely irrelevant to the notation of music. For this reason, notation-based representations typically may lack an essential piece of information required in sound realizations.

    Many conventions of Western notation require that one graphic element serve more than one purpose. Thus the stem that is involved in representing note duration can also be relevant, through its orientation, to the differentiation of two parts when notated on one staff. A study of the representation of musical notation induces a profound respect for its ingeniousness. Graphical features have been used very cleverly to convey aspects of sound that must be accommodated in the realization of a single event.

2.3 Implicit Information

    Many elements of musical notation are contextually determined, and a fundamental question in the representation of notation is how to prioritize contextual or implicit information in relation to absolute or explicit information.

    Musical notation is highly dependent on oral tradition for its actual interpretation. The value attached to musical performance in Western culture is bound up with the differing suppositions that musicians make in the interpretation of a written score, and "schools" of performance that descend from one famous teacher abound in the annals of concert life. More significant from the point of view of representing music is the fact that some widely accepted conventions of notation require that the underlying logic of the performance contradict the logic of the written score. Music of the Baroque era (1600-1750) abounds in examples.

2.4 Issues of Processing Order

    Software used to process musical information must make certain assumptions about the ordering of elements. The musical score is easily viewed as a two-dimensional array, but there is no natural processing order among its dimensions. Figure 1.2 represents a hypothetical score. Individual parts are represented on the horizontal axis. The simultaneous activities of all of these would be represented on the vertical plane.

    If the score involves many performers, let us say a conventional orchestra, then each system, representing the total complex, is likely to be subdivided into groups of similar timbre--strings, winds, brasses, and percussion instruments. Our model score contains only two subgroups--the top one of four voices or instruments and an underlying one.

    Figure 1.2 A hypothetical score viewed as a two-dimensional array.
    
              Subsystem 1    Part 1 |  BAR1   |  BAR2  |  BAR3  |
                             Part 2 |  BAR1   |  BAR2  |  BAR3  |
    System 1                 Part 3 |  BAR1   |  BAR2  |  BAR3  |
                             Part 4 |  BAR1   |  BAR2  |  BAR3  |
    
              Subsystem 2    Part 5 |  BAR1   |  BAR2  |  BAR3  |
    =================================================================
    System 2  Part 1  |  BAR4  |  BAR5  |  BAR6  |  BAR7  |  BAR8  |
              Part 2  |  BAR4  |  BAR5  |  BAR6  |  BAR7  |  BAR8  |
              Part 3  |  BAR4  |  BAR5  |  BAR6  |  BAR7  |  BAR8  |
              Part 4  |  BAR4  |  BAR5  |  BAR6  |  BAR7  |  BAR8  |
    
              Part 5  |  BAR4  |  BAR5  |  BAR6  |  BAR7  |  BAR8  |
    =================================================================
    System 3  Part 1  |  BAR9  |  BAR10 |  BAR11 |  BAR12 |  BAR13 |
              Part 2  |  BAR9  |  BAR10 |  BAR11 |  BAR12 |  BAR13 |
              Part 3  |  BAR9  |  BAR10 |  BAR11 |  BAR12 |  BAR13 |
              Part 4  |  BAR9  |  BAR10 |  BAR11 |  BAR12 |  BAR13 |
    
              Part 5  |  BAR9  |  BAR10 |  BAR11 |  BAR12 |  BAR13 |
    

    Within the two-dimensional array there may be various splittings and joinings. If the underlying part in the hypothetical example were a piano accompaniment, it would be written on the grand staff, on which treble and bass staves are combined. If the highest part were for first violins in a concerto grosso, solo and tutti cues would indicate where only the soloists would play and where all the first violinists would play together--from one part. Part reinforcement complicates sound applications as well as graphic applications, for timbral specification only confers sound quality. It does not give any sense of the relative dynamic levels of parts within a whole. In orchestral music, passages in which parts of different timbres play the same notes (e.g., violin and oboe) may be combined on one staff but they will be split into separate parts (divisi) when the content is no longer duplicated. Long passages of repeated notes on one pitch may be indicated by tremolandi signs (diagonal strokes).

2.5 Feature Selection and Definition

    Assumptions concerning end use determine what elements of information are to be considered essential. Notation programs must produce stems, beams, and slurs but sound applications can function perfectly well without these elements of visual grammar. The number of visual objects encoded in one system may greatly exceed the number encoded in another. These issues have been somewhat obscured by the popularity of input systems based on sound capture. Yet a great many features that appear on the printed page are non-sounding.

2.6 Problems Rooted in the Nature of Notation

    Musical notation is not logically self-consistent. Its visual grammar is open-ended--far more so than any numerical system or formal grammar. The musical notation for a relatively complex work only provides reasonably common results from one group of performers to another because a vast amount of oral tradition stands behind it.

2.7 Issues of Sequence Specification

    Musical structures that are easily parsed usually include sections of material that are repeated. Varying sequences of repetition differentiate one musical form from another. Economy in music printing may substitute verbal or symbolic instructions for visual reiteration.

    Let us take a predictable case. There is common agreement about the performing order (and therefore the processing order) of the double movement type of the minuet and trio. It is represented in Table 1.1.

    The regularity of this procedure and the number of examples are both so great and so familiar that it is very tempting to assume that musical works could easily be encoded to assure an appropriate result in electronic performance.(8) It happens, however, that the number of ways in which this procedure is expressed by score markup is very large.

    Table 1.1 Differences of sectional sequence in the ordering of a score vs. the ordering of a performance of a "minuet and trio" movement.

    
    No.  Sequence of items in score    No.  Sequence of items in performance
    ========================================================================
    1    Minuet section (A), Part a    1    Minuet section (A), Part a
                                       2    Minuet section (A), Part a
    2    Minuet section (A), Part b    3    Minuet section (A), Part b
                                       4    Minuet section (A), Part b
    3    Trio section (B), Part a      5    Trio section (B), Part a
                                       6    Trio section (B), Part a
    4    Trio section (B), Part b      7    Trio section (B), Part b
                                       8    Trio section (B), Part b
                                       9    Minuet section (A), Part a
                                      10    Minuet section (A), Part b
    
    

    A few years ago we conducted a survey of methods used to indicate the repeats found in such movements. We stopped counting after compiling a list of 25 different conventions.

    Many cases are less predictable. In Baroque suites, for example, there are procedurally similar designs compounded by the execution of additional movements before the reprise of the original minuet or gavotte or bourrée movement. Some other complicating factors were these:

    • Repeats do not always start with the original first bar.

    • Last-beat and first-beat rhythms do not always make up a full bar's worth of beats.

    • Multiple endings for movement sections that are sometimes but not always repeated require explicit instructions: for example, "select the first ending in the first execution of two but the second ending in the first execution of one."

    • In practice automatic sequencing will sometimes fail to produce the correct result for lack of recognizable cues.

2.8 Features with Context-Dependent Interpretations

    Inevitably some elements of sound are more easily represented than others. The studies of performance practice that have occurred over recent decades have demonstrated over and over that some elements of notation that have remained fixed over long periods of time have nonetheless varied in their interpretation from century to century. Historical context may therefore bestow on performers the need to modify the sound, or, in the case of electronic applications, the sound output.

    Tuning systems have varied with time and place, such that the A = 440 of modern times can be reinterpreted as A = 415 in historical performances of string music or A = 460 in brass music of Gabrieli's time or A = 392 in French music for the early oboe. Within the octave, however reconciled in hertz,(9) the tuning of the individual notes of the scale has also varied substantially over time. Timbres of specific instruments have changed subtly as the materials and mechanics of instrument manufacture have changed. The accommodation of these kinds of variables has been explored in many research environments, but easily available software that supports historical adaptation is not the norm. Synthesis hardware may provide some accommodation of tuning systems, however.(10)

    Also related to pitch representation are the varying practices employed by transposing instruments as they have evolved over the past three hundred years. Sound-oriented programs will invariably represent sounding pitch, while notation-oriented programs will represent written pitch. Thus in a work for modern trumpet and orchestra in C Major, the part for B trumpet will be written in D Major. Here, and in numerous other instances, someone wishing to analyze music would prefer the sound representation, but the notational information is also useful documentation. In centuries past, most transposing instruments known today in only one size (and tuning) were known in multiple sizes (and tunings).

    The most prevalent area of interpretive discrepancy is that of rhythm. Western notation is better suited to express binary than ternary subdivisions of the beat. When two eighth notes take the time of one quarter note they need only be presented to be correctly interpreted. When three eighth notes take the time of one quarter note, they are normally accompanied by the numeral "3" to indicate this less common grouping. There is no particular difficulty in representing these occurrences in either sound or notation data.

    Difficulties arise when the examples become more complex. For example, the Baroque notational convention of writing a dotted eighth followed by a sixteenth (an implied quadruple subdivision of the beat) to imply a triple subdivision(11) prevents one representation from serving the dual purposes of sound and notation. Well trained performers will recognize the passages requiring reinterpretation, but the theoretical information that would need to be maintained to facilitate appropriate electronic performance is dauntingly great. A similar situation arises with notes inégales--those passages with long series of single-dotted notes that, it is generally believed, were performed as if they were double-dotted. Other contradictions between sight and intended sound arise in the realization of arpeggiated chords, the style brisé, grace notes, ornaments, basso continuo figuration, and cadenzas.

    The interpretation of grace notes, which occur prolifically in Western music, is also dependent on historical context. In the eighteenth century(12) grace notes were normally executed on the beat, stealing time from succeeding notes. In the nineteenth century(13) the interpretation became the reverse: the succeeding note took its full value. Time was theoretically borrowed instead from the preceding note. In both cases the written notation was the same, and in both cases the beat count was complete without the grace notes.

    The implementation of grace notes therefore diverges according to the intended application. In many notation-oriented systems of representation, grace notes have a time value of zero. In sound applications, however, grace notes must have a positive time value in order to sound. The eighteenth-century model is easy to implement, but the nineteenth-century model is very difficult to implement in sound applications: real time already spent cannot be recaptured!

    The implementation of arpeggiated chords, the arpeggiated style brisé,(14) and the realization of ornaments all belong to the same category of difficulties as the grace-note problem because all three are suggestive notations that necessarily take time values in actual performance that are different from those expressed in writing. All the notes of an arpeggiated chord are shown with identical rhythmic values, but each has a different attack time. The unmeasured notation of the style brisé usually aimed to produce something akin to an arpeggio, but the tones included could be scalar as well as chordal and the composite line was represented on the diagonal or horizontal rather than on the vertical plane of the arpeggio (15).

    The interpretation of ornaments is a heady subject on which consensus goes only so far. There is general agreement as to which tones relative to the principal note are involved in the execution of such common ornaments as trills, mordents, and turns. The amount of time each of these notes takes and how it is derived from the reduction in values of surrounding notes is subject to some of the same considerations as the interpretation of grace notes. In the domain of real-time execution, there is greater scope for rhythmic latitude in the interpretation of many ornaments. There is also widespread criticism of ornaments that sound "too mechanical." This has led some music systems developers to devise algorithms to subtly vary the sounding durations of logically equal values.

    Inevitably there are many tradeoffs between representation and software. Some early notation programs required the encoding of stem directions, but today most stemming and beaming is done automatically by the software. Does this mean, however, that stem and beam information can be safely forgotten? Not necessarily, for stem information may denote the differentiation of rhythmically independent voices, and beam information may suggest intended phrasing or breathing or the delineation of voices in a polyphonic context.

    Short cadenzas--the passages of virtuoso display that appear just before the final cadence of sonatas and concertos--are performed at the discretion of the soloist. Maximum latitude is expressed by rhythmic values that are merely suggestive but, as written, consume many more beats than can be accommodated within the one theoretical bar in which they are shown. This presents a very great problem in the coordination of written and sounded information. Let us suppose that a cadenza containing 14 written beats is designated to occur within the time otherwise taken by one 4-beat bar. Adequately articulate software will reproduce the notation, but an analogous electronic performance will necessarily require the time normally taken by 3.5 bars. Thus ensuing bar numbers will diverge between the two data sets and the metrical stress (which admittedly exists only in the mind of the listener) will also be shifted.

    File organization is predominantly determined by the presumed context of use. Many notation codes and one of the proposed interchange schemes we include are page-specific, while sound and analysis codes tend to be organized by voice and movement--that is by musical rather than physical units. Programmers like to claim that conversions from a page-dominant to a part-dominant format are easy to make, but successful examples do not abound. Sound files altogether lack the spacing information required for page layout; page files contain a great deal of information that is irrelevant to sound output. These are some of the principal issues that are illustrated in the passages of codes that are shown.

3 Purposes of Beyond MIDI: The Handbook of Musical Codes

    This book has two main aims. One is to introduce the general subject of music representation, showing how intended applications influence the kinds of information that are encoded. The second aim is to present a broad range of representation schemes, illustrating a wide variety of approaches to music representation. We have grouped these approaches, somewhat arbitrarily, into ten categories and have provided at least two sample schemes within each category. In some categories dozens more might have been provided.

    In order to facilitate comparison between representation schemes, each approach is illustrated using a common set of musical examples. Since the codes themselves are generally so different in what they aim to represent, it is almost impossible to compare them when unrelated works are shown by each author. Yet this is the normal state of available documentation.

    In collecting such diverse codes within a single volume, we hope to encourage informed discussion about the interchange of data in diverse formats. In general data interchange is more practical within single-application domains, such as printing, than between application domains, such as sound and printing, or printing and analysis(16). Ultimately, however, we leave the questions of feasibility to our readers. Our overriding goal is simply to provide materials that may be drawn upon in forming such judgments.

    To accomplish these aims, we have attempted to organize each chapter according to a common outline. On account of the highly diverse nature of the codes, their content (ASCII, binary, hexadecimal, etc.), and their file organization, this has not always been fully possible.

    The main topics, where relevant, are a general description of the history and intended purposes of the code, a description of the representation of the primary attributes of music (pitch, duration, accentuation, ornamentation, dynamics, and timbre), a description of the file organization, some mention of existing data in this format, reference to resources for further information, and at least one encoded example.

    
    

    Footnotes

    1. Among the hymnographers, the system used by James Love (Scottish Church Music, Edinburgh, 1889) is noteworthy for its quite effective extensions to solfegge. Early efforts to describe musical contour go back at least to the work of Frances Densmore (1918) on the music of the Sioux. Among theoretical writings, the encoding scheme for rhythmic patterns developed by Mauritz Hauptmann in his study of harmony and meter (1853) is especially noteworthy. The responsibility for recognizing musical paraphrase has fallen largely to music bibliographers, whose work is discussed in later portions of this publication.

    2. See especially Alexander Rossignol's "apparatus for tracing music" (1872) as described in A Dictionary of Music and Musicians, ed. Sir George Grove (London: Macmillan, 1889), IV, 769f.

    3. Margaret Bent, "Editing Early Music: The Dilemma of Translation," Early Music XXII/3 (1994), 392.

    4. Notehead shapes can also be used to differentiate percussion instruments and other variables that are independent of pitch.

    5. That is, verbal instructions without other notational expression.

    6. At the present time few sequencer programs recognize the need to "double" unison parts in the electronic performance of orchestral scores.

    7. For example, in the Finale of Beethoven's Fifth Symphony the use of tremolandi reduced the page count in one printing experiment from 80 to 50.

    8. The sections numbered 2, 4, 6, and 8 are repetitions of the sections that immediately precede them. Sections 9 and 10 and reiterations of sections 1 and 3. Such repetitions are exact in theory, but in practice multiple endings may differentiate one iteration from another.

    9. i.e., cycles per second.

    10. The Roland electronic harpsichord models C-20 and C-50 that offered the choice of five historical temperaments (equal, mean-tone, just, "Kirnberger," and "Werckmeister 3") have been withdrawn from general circulation.

    11. The dotted eighth note takes two thirds of the time of the quarter note; the sixteenth note takes the remaining one third.

    12. For example in the music of Haydn and Mozart.

    13. For example in the music of Chopin and Mendelssohn.

    14. In which non-coincident decay times are often shown in notation. However, this visual representation may only attempt to express what is true of arpeggios in general--that the sooner the note is struck, the sooner it will decay.

    15. One representation scheme that has dealt successfully with this problem of "diagonal texture" is that of the musicologist Etienne Darbellay. It is mentioned in the glossary under the software name Wolfgang.

    16. Although programmers and statisticians would assume the output of analysis programs to be fundamentally numerical, there is no defined set of features constituting an analytical input domain. Among the most popular kinds of analysis currently practiced by music theorists and analysts, Schenkerian analysis is a reductionist system dependent on notation with output expressed graphically. Set theory is reductionist in its substitution of relative pitch classes for absolute octave numbers and exact pitch names (making it well suited to MIDI input), but its focus is primarily on transformations of order and its expression is typically in numerical series or tables. Types of analysis that require enharmonic pitch information, or are concerned with aspects of rhythm, or which concentrate on the coordinated study of features of pitch and/or rhythm and/or harmony and/or form, or which differentiate one style of composition from another for the purpose of generating new works have distinctly different, and widely varied, needs. No adequate discussion of the issues that arise can be presented here. The analysis domain (if one exists) must be understood to be a generalized one which in particular instances may overlap with any of the others.