What's in a MIDI File?

2020-10-20

MIDI is a standard for communicating data about musical events, such as notes played on a keyboard. Standard MIDI files are files that store performance information that can be played back by a DAW, edited, or shared with other users.

In this article we will look at how to inspect and interpret the binary data inside a standard MIDI file, then we will practice writing our own MIDI file, from scratch without any libraries, using the Rust programming language.

MIDI file created in REAPER

Here is a MIDI file I created using REAPER. It contains four notes. This file is stored in binary, not a plain-text format, so we will need to use a special tool called a hex editor to inspect the contents of the file. xxd is a very standard hex editor that may already be installed on your system. Here's the output from inspecting the MIDI file from the command line with xxd midi-test.mid:


00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54  MThd..........MT
00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74  rk... ....midi_t
00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307  est..X.......Q..
00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090  . ../.MTrk...(..
00000040: 2460 8740 8024 0000 9030 6087 4080 3000  $`.@.$...0`.@.0.
00000050: 0090 2b60 8740 802b 0000 9034 6087 4080  ..+`.@.+...4`.@.
00000060: 3400 00ff 2f00                           4.../.

This output is called a hexdump. The middle section shows the content of our binary file in hexadecimal format, with 16 bytes of data per line. The left section shows the line numbers, and on the right we see the ASCII values for any bytes in the file that can be read as ASCII characters. Some of these are actaully ASCII strings, in our case, and some just happen to be interpretable as ASCII. For convenience, I've saved the hexdump to a text file with xxd midi-test.mid > midi-dump.txt.

Our initial goal will be to decode this nonsense, and there are two documents that are essential for this.

The main one is The Complete MIDI 1.0 Detailed Specification, available here.
This document is only available as a PDF download, and the MIDI Association asks that you register with your email address to download the document. This won't be necessary to follow along, since I'll cover the relevant concepts, but you'll want to grab it before you go on your own forays into the world of algorithmic MIDI composion. Everything we need from this 300+ page document will be in the Standard MIDI Files section, which is only 14 pages long.

Next, we need the Summary of MIDI Messages reference, available online here. This resource tells us all we need to know about MIDI messages, and pertains only to these types of messages and not other types of messages such as meta-events, as we will see.

Depending on what we're looking at in our MIDI file some data will be best understood by viewing either its binary, hexadecimal, or decimal representation, so we'll need to be very fluid with converting between any of these formats. I keep a Python REPL open to help with these conversions. Python includes handy functions for making these conversions, without needing to import anything.

# Binary, decimal, and hex representations for a MIDI Note On event

bin(0x90)         # '0b10010000'
int(0b10010000)   #  127
hex(127)          # '0x90'

Now that we've got the docs and the tools that we need to decode this file, let's get started!

Part 1: Decoding the MIDI File

The ASCII section of our hexdump gives us just a very broad outline of what's in the file. The file begins with four characters MThd, and includes two identical 4-character strings MTrk. We also have the string midi_test, which is the name I gave to the file when I created it. Everything else appears to be trash and it is; the ASCII represenations of those bytes aren't useful for decoding their meaning.

The Standard MIDI Files 1.0 reference (we'll call it SMF from now on) specifies that every MIDI is made of chunks binary data. It starts with a header chuck, followed by at least one track chunk. The first four bytes of a chunk are the chunk type in ASCII: MThd for a header chunk, and MTrk for a track chuck. We can tell by looking at the ASCII section of our hexdump that our MIDI file starts with a header chunk, like it's supposed to, and includes two track chunks.

The Header Chuck

Alright, let's zoom in on that header chuck and see what's going on.

00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54  MThd..........MT
00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74  rk... ....midi_t
00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307  est..X.......Q..
00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090  . ../.MTrk...(..
00000040: 2460 8740 8024 0000 9030 6087 4080 3000  $`.@.$...0`.@.0.
00000050: 0090 2b60 8740 802b 0000 9034 6087 4080  ..+`.@.+...4`.@.
00000060: 3400 00ff 2f00                           4.../.

The SMF reference says our header chunk will include the following data:

MThd The chunk type indicating a header chunk
length a 32-bit representation of the length of the header chunk after
format 16-bits, possible values are 0, 1, and 2. Specifies the overall structure of the file.
ntrks Number of tracks
division Number of ticks in a single quarter note

Each two-character hexadecimal number represents 1 byte, or 8 bits. Since length is a 32-bit value, we're expecting the eight characters after 4d54 6864 (the chunk type) to represent the length. The next eight characters are 0000 0006. Since hexadecimal values lower than 16 are identical to their decimal representation, we don't need to do any conversion to know that this says we should expect 6 more bytes in the header. This checks out; we can count six more bytes (represented as two-character hexadecimal values) on this line before we get to 4d54 or MT, which is the beginning of the first track chunk.

The next 16-bits 0001 are the format. Format 1 means one file has multiple simultaneous tracks, per the docs.

The next 16-bits 0002 is the number of tracks, which we know is two because we can see two MTrk chunk types in the hexdump

Finally we have the 16-bit value 03c0, for our division. This will be the number of ticks in a quarter note, but the hexadecimal representation isn't very meaningful to us. I'll use the Python REPL to convert this to a decimal value.

int(0x03c0) # 960

The SMF reference uses 96 as a reference value for the division. But REAPER created our MIDI file with a more fine-grained division value of 960 ticks per quarter note. Possibly, this difference is due to changes in computing power since the SMF spec was created in 1996.

Duodecimal or The Joy of Being A Multiple of Twelve

Why use 96, or 960, ticks per quarter note?

Duodecimal is the name for a base-12 number system. The cool thing about using twelve as a base is that it is a "superior highly composite number", meaning that it is the smallest number with the numbers 1 through 4 as factors. Check the wikipedia page for Duodecimal for more info about the supreme divisibility of the number twelve.

Since 96 is a multiple of twelve (12 × 8 = 96), this means we have a lot of options for how we can cleanly subdivide a bar or a quarter note. For example, we could chop each quarter note in half to 48-tick eighth notes (96 ÷ 2 = 48), then divide each of those into 16-tick sixteenth note triplets (48 ÷ 3 = 16), then subdivide each of those by four to create 4-tick, um, like, something notes.

If we wanted half note triplets instead of quarter notes, we could take the number of ticks in a bar (given a 4/4 time signature) (96 × 4 = 384) and divide by 3 for three 128-tick notes per bar.

The reason being able to evenly divide a bar or note is important is that we have a tricky problem to solve if we want to subdivide a note into a number of notes that is not a factor of the number of ticks in that note.

For example if we wanted to divide a quarter note into sixteenth note quintuplets (96 ÷ 5 = 19.2), how would that work? A tick represents a single unit of MIDI processing work, so an event is either going to occur on a given tick or not. Non-integer values like 19.2 aren't valid. We could give one note in the quintuplet 20 ticks and each of the other four 19 ticks (20 + 19 × 4 = 96) and the listener would likely not notice the sleight of hand, but how would we automate splitting up ticks for arbitrary values?

Since our modern MIDI file is using 960 ticks per quarter note instead of 96, we have a 10-times greater degree of granularity to work with, making it less likely we'll have a metrical conundrum. Revisiting our example above, we can simply divide our 960-tick quarter note into 192-tick 16th note quintuplets (960 ÷ 5 = 192) without issue.

Track 1: meta-events

Next we'll highlight the relevant bytes of data from the first track chunk:

00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54  MThd..........MT
00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74  rk... ....midi_t
00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307  est..X.......Q..
00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090  . ../.MTrk...(..
00000040: 2460 8740 8024 0000 9030 6087 4080 3000  $`.@.$...0`.@.0.
00000050: 0090 2b60 8740 802b 0000 9034 6087 4080  ..+`.@.+...4`.@.
00000060: 3400 00ff 2f00                           4.../.

Track chunks contain the following data:

MTrk The chunk type indicating a track chunk
length A 16-bit represention of the length of the following track data, in bytes
MTrk event+ One or more MTrk events, which each consist of a time delta from the previous MTrk event, and an event

There are three types of events: MIDI events, meta-events, and sysex events. MIDI events are often notes. meta-events can be info like the time signature or tempo. We won't be using sysex events in this article.

According to the SMF reference a format 1 MIDI file consists of a header chunk, followed by a track chunk containing metadata about the time signature and tempo, followed by the track chunk for the first music track. Therefore, we expect to see only meta-events in track chunk for track 1.

Each meta event begins with 0xff and an additional identifier byte for the event, followed by an arbitrary amount of extra data, depending on the event. The following table shows every meta event in the file, and the SMF reference details many more, but not all possible, meta-events:

ff 03 len text Sequence/Track Name
ff 2f 00 End of Track
ff 51 03 tttttt Set Tempo
ff 58 04 nn dd cc bb Set Time Signature

Let's walk through each MTrk event in this track chuck:

00 ff 03 09 6d 69 64 69 5f 74 65 73 74

Zero time delta from the previous event, Track Name event.

The first byte of the meta-event data is the length of the track name (0x09).
The final nine bytes represent midi_test in ACSII.

00 ff 58 04 04 02 18 08

This is a Time Signature event, after zero time delta. The two bytes 04 02 represent the numerator and denominator of the time signature, respectively, where the numberator is what it what it looks like, and the number for the denominator specifies what power to raise the number 2 to. In other words the denominator value dd is log2 of x, where x is the desired denominator in decimal. So 02 is 4, 03 would be 8, 04 is 16, etc. The following two numbers helps to set the metronome, and the final number is the number of 32nd notes in a quarter note. We won't dwell on these two numbers any further.

The time delta for every MTrk event in this track chunk is zero (0x00), meaning all meta-events are applied at the beginning of the track, and don't change.

Please note that time deltas are not guaranteed to be a single byte in length, they're actually variable length quantities that can be between one and four bytes and they are damn weird. We'll get back to those soon.

Track 2: MIDI events

Finally, let's look at the data for track 2, which contains MTrk events which each consist of a time delta and a MIDI event.

00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54  MThd..........MT
00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74  rk... ....midi_t
00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307  est..X.......Q..
00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090  . ../.MTrk...(..
00000040: 2460 8740 8024 0000 9030 6087 4080 3000  $`.@.$...0`.@.0.
00000050: 0090 2b60 8740 802b 0000 9034 6087 4080  ..+`.@.+...4`.@.
00000060: 3400 00ff 2f00                           4.../.

Part 2: Writing A MIDI File from Scratch in Rust

Now that we understand the contents of a MIDI file, we have the knowledge we need to be able to write one. I'll show this using the Rust programming language. Becuase we're going to do this from scratch, without using any libraries, will be able to do this in any programming language you choose.

To start, let's formalize what we've learned about the Standard MIDI File specification in code by creating a tiny MIDI module.

mod midi {
    const NOTE_ON: u8 = 0x90;
    const NOTE_OFF: u8 = 0x80;
    const HEADER: &[u8; 4] = b"MThd";
    const TRACK: &[u8; 4] = b"MTrk";

    pub fn delta_time(mut ticks: u32) -> Vec<u8> {
        // Since four bits are reserved in a variable-length quantity, 
        // the maximum possible quantity is the maximum unsigned 28-bit 
        // integer.
        assert!(ticks <= 0x0fffffff);
        // Use a bitmask to select only the first seven bits of the input value
        let mut output = vec![ticks as u8 & 0x7f];
        ticks >>= 7;
        while ticks > 0 {
            let next_quantity = ticks as u8 & 0x7f | 0x80;
            output.push(next_quantity);
            ticks >>= 7;
        }
        output.reverse();
        output
    }
    pub fn end_of_track() -> Vec<u8> {
        vec![0xff, 0x2f, 0x00]
    }
    pub fn note_on(channel: u8, note: u8, velocity: u8) -> Vec<u8> {
        vec![NOTE_ON + channel - 1, note, velocity]
    }
    pub fn note_off(channel: u8, note: u8) -> Vec<u8> {
        vec![NOTE_OFF + channel - 1, note, 0]
    }
    pub fn sequence_name(name: &str) -> Vec<u8> {
        let name_length = name.len() as u8;
        let mut output = vec![0xff, 0x03, name_length];
        output.extend(name.as_bytes());
        output
    }
    pub fn time_signature(nn: u8, dd: u8, cc: u8, bb: u8) -> Vec<u8> {
        vec![0xff, 0x58, 0x04, nn, dd, cc, bb]
    }
    pub fn tempo(bpm: u16) -> Vec<u8> {
        let mut output = vec![0xff, 0x51, 0x03];
        let us_per_quarter: u32 = (60.0 / bpm as f32 * 1_000_000.0) as u32;
        // Include microseconds per quarter in output as a 24-bit integer.
        output.extend(&us_per_quarter.to_be_bytes()[1..4]);
        output
    }
    pub fn track_event(time_delta: u32, event: Vec<u8>) -> Vec<u8> {
        let mut t_event = delta_time(time_delta);
        t_event.extend(event);
        t_event
    }
    pub fn make_header_chunk(format: u16, track_count: u16, division: u16) -> Vec<u8> {
        // Format can be 0, 1, or 2.
        //Format 1 includes a header chuck and one or more simultaneous
        // track chucks. Division is the number of ticks per quarter note.
        // While the Standard MIDI file 1.0 specification shows using 96 ticks 
        // per quarter note, REAPER exports MIDI files with a division of 960 
        // ticks per quarter note.
        let mut header = Vec::new();
        let header_length: u32 = 6;
        header.extend(HEADER);
        header.extend(&header_length.to_be_bytes());
        header.extend(&format.to_be_bytes());
        header.extend(&track_count.to_be_bytes());
        header.extend(&division.to_be_bytes());
        header
    }
    pub fn make_track_chunk(track_events: Vec<u8>) -> Vec<u8> {
        let mut track = Vec::new();
        track.extend(TRACK);
        track.extend(&(track_events.len() as u32).to_be_bytes());
        track.extend(track_events);
        track
    }
}

Most of the functions in this module are just simple wrappers for the MIDI events and meta-events that exist in the MIDI specification, and follow the nomenclature given in the SMF reference. There is also the track_event function that takes a time delta and an event and combines them into a MTrk event, and make_header_chunk and make_track_chunk which each take the data for their chunk type and automate adding the chunk type and length bytes.

By far the monst complex function is delta_time, which takes a normal integer value and manipulates its bits to create and VLQ. We'll break that function down line by line in a moment, but first let's briefly discuss endianness.

Endianness

Every function in our MIDI module has Vec<u8>, a vector of bytes, as its return type. Soon when go to write our content to a new file we'll use the write_all method, which expects to receive a reference to an array of bytes. You will notice in a few places in our module, such as inside the make_header_chunk function, we start with a different unsigned integer type, such as a u32 or 32-bit word, and then use the to_be_bytes method to break the word into its component bytes and extend the output vector with them. to_be_bytes means "to big endian bytes", it has companion methods, to_le_bytes and to_ne_bytes, to getting the memory representation of an integer in little endian byte order, or the native order of the architechture you're compiling for, respectively. It's important to realize that the MIDI file we're writing is not machine code that will be executed by your CPU. It's just data in a file that will be interpreted by your DAW software or whatever else you feed MIDI files to. All this is to say that standard MIDI files are written to in the more human-readable big-endian byte order, and this has nothing to do with the architecture you're compiling your program for.

Bit Twiddling

The delta_time function does the unenviable work of taking ordinary numeric values and converting them bit-by-bit into the variable length quantity scheme that the MIDI spec authors created. The code is loosely based on C code provided in the SMF reference for writing to and reading from this format. I'll deconstruct this function here, in case you find it a bit dense.

The plan for converting an input value to a VLQ is to consume the bits of the input, seven bits at time, adding a 0 or 1 to the MSB end of the seven bits to create a full bytes, and continuing as long as there are non-zero bits left to consume. Here's how that's done:

Step one is to isolate the seven least significant bits of the input. This is done by performing a bitwise AND operation between the input and the value 0x7f (0b1111111), seven bits all set to 1, which acts as a bitmask.

The bitwise AND operation compares the operands at each position and yields a 1 in the output at the same position only if both compared bits are 1. Since the bitmask is seven bits long, with all bits set to one, it creates a window seven bits wide that our input can be seen through. All bits outside that window are discarded.

Supposing an input value of 960 (0b1111000000), here's what happens when do the bitwise AND operation to read the first seven bits of the input, and push the result into the output vector:

Bitwise AND
Input0000001111000000
Bitmask01111111
Result01000000

Output [0b01000000]

Because each value in the output array is an 8-bit byte, the most significant bit is naturally 0. Note that we didn't alter the input at this stage, we just read from it.

Now that we've consumed the seven least significant bits of the input, we discard them by arithmetically shifting the input by 7 to the right, mutating the input value.

Right Arithmetic Shift
Input0000001111000000
Result00000111

Next we'll consume the next seven bits of the input as before, with only one difference: we initially processed the least significant bits of the input, which were prefaced with a 0 implicitly. The following bytes should have a 1 at the most significant position, to specify that they are not the terminal byte in the VLQ. To set the most significant bit we'll again use a bitmask, this time with a bitwise OR operation. This operation compares the bits of both operands, adding a 1 to the output if either of the compared values are 1.

Bitwise OR
Input00000111
Bitmask10000000
Result10000111

Output [0b01000000, 0b10000111]

At this point, for our example input of 960, all non-zero bits have been consumed and the while loop will exit.

Finally we reverse the output vector, because while we were reading bits from the input from right to left, we pushed them into the output vector from left to right, so we need to put the bytes back in the correct, big-endian order.

Writing the MIDI File

Now that we've examined all the components that make up a simple MIDI file and created a small module to help create MIDI data, I'd like to present the following example program for writing a MIDI file from scratch. This is a minimal example, and an opportunity to get creative with note choices and durations. Anything you can imagine is possible!

use std::fs::File;
use std::io::prelude::*;

use midi::*;

// midi module code can go here or at the bottom of the file

fn main() -> std::io::Result<()> {

    // Header

    let format: u16 = 1;
    let num_tracks: u16 = 2;
    let division: u16 = 960;
    let header = make_header_chunk(format, num_tracks, division);

    // Track 1 - Meta events

    let mut track_events: Vec<u8> = Vec::new();

    track_events.extend(&track_event(0, sequence_name("midi_seq1")));
    track_events.extend(&track_event(0, time_signature(4, 2, 24, 8)));
    track_events.extend(&track_event(0, tempo(120)));
    track_events.extend(&track_event(0, end_of_track()));

    let track_01 = make_track_chunk(track_events);

    // Track 2 - MIDI events

    let mut track_events: Vec<u8> = Vec::new();

    let notes = [60, 63, 67, 72, 63, 67, 72, 75, 72, 63, 67, 72, 75, 67, 72, 75];
    let sixteenth_note: u32 = division as u32 / 4;
    for note in notes.iter() {
        track_events.extend(&track_event(0, note_on(1, *note, 96)));
        track_events.extend(&track_event(sixteenth_note, note_off(1, *note),
        ));
    }
    track_events.extend(&track_event(0, end_of_track()));

    let track_02 = make_track_chunk(track_events);

    // Write the header and both tracks to a new MIDI file

    let mut file = File::create("test.mid")?;
    file.write_all(&header)?;
    file.write_all(&track_01)?;
    file.write_all(&track_02)?;

    Ok(())
}