What's in a MIDI File?
2020-10-20
MIDI is a standard for communicating data about musical events, such as notes played on a keyboard. Standard MIDI files are files that store performance information that can be played back by a DAW, edited, or shared with other users.
In this article we will look at how to inspect and interpret the binary data inside a standard MIDI file, then we will practice writing our own MIDI file, from scratch without any libraries, using the Rust programming language.
Here is a MIDI file I created using REAPER. It contains four notes. This file
is stored in binary, not a plain-text format, so we will need to use a special
tool called a hex editor to inspect the contents of the file. xxd
is a very
standard hex editor that may already be installed on your system. Here's the
output from inspecting the MIDI file from the command line with xxd midi-test.mid
:
00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54 MThd..........MT 00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74 rk... ....midi_t 00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307 est..X.......Q.. 00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090 . ../.MTrk...(.. 00000040: 2460 8740 8024 0000 9030 6087 4080 3000 $`.@.$...0`.@.0. 00000050: 0090 2b60 8740 802b 0000 9034 6087 4080 ..+`.@.+...4`.@. 00000060: 3400 00ff 2f00 4.../.
This output is called a hexdump. The middle section shows the content of
our binary file in hexadecimal format, with 16 bytes of data per line. The left
section shows the line numbers, and on the right we see the ASCII values for
any bytes in the file that can be read as ASCII characters. Some of these are
actaully ASCII strings, in our case, and some just happen to be interpretable
as ASCII. For convenience, I've saved the hexdump to a text file with xxd midi-test.mid > midi-dump.txt
.
Our initial goal will be to decode this nonsense, and there are two documents that are essential for this.
The main one is The Complete MIDI 1.0 Detailed Specification, available
here.
This document is only available as a PDF download, and the MIDI Association
asks that you register with your email address to download the document. This
won't be necessary to follow along, since I'll cover the relevant concepts, but
you'll want to grab it before you go on your own forays into the world of
algorithmic MIDI composion. Everything we need from this 300+ page document
will be in the Standard MIDI Files section, which is only 14 pages long.
Next, we need the Summary of MIDI Messages reference, available online here. This resource tells us all we need to know about MIDI messages, and pertains only to these types of messages and not other types of messages such as meta-events, as we will see.
Depending on what we're looking at in our MIDI file some data will be best understood by viewing either its binary, hexadecimal, or decimal representation, so we'll need to be very fluid with converting between any of these formats. I keep a Python REPL open to help with these conversions. Python includes handy functions for making these conversions, without needing to import anything.
# Binary, decimal, and hex representations for a MIDI Note On event
bin(0x90) # '0b10010000'
int(0b10010000) # 127
hex(127) # '0x90'
Now that we've got the docs and the tools that we need to decode this file, let's get started!
Part 1: Decoding the MIDI File
The ASCII section of our hexdump gives us just a very broad outline of what's
in the file. The file begins with four characters MThd
, and includes two
identical 4-character strings MTrk
. We also have the string midi_test
,
which is the name I gave to the file when I created it. Everything else appears
to be trash and it is; the ASCII
represenations of those bytes aren't useful for decoding their meaning.
The Standard MIDI Files 1.0 reference (we'll call it SMF from now on) specifies
that every MIDI is made of chunks binary data. It starts with a header
chuck, followed by at least one track chunk. The first four bytes of a
chunk are the
chunk type in ASCII: MThd
for a header chunk, and MTrk
for a track
chuck. We can tell by looking at the ASCII section of our hexdump that our MIDI
file starts
with a header chunk, like it's supposed to, and includes two track chunks.
The Header Chuck
Alright, let's zoom in on that header chuck and see what's going on.
00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54 MThd..........MT 00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74 rk... ....midi_t 00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307 est..X.......Q.. 00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090 . ../.MTrk...(.. 00000040: 2460 8740 8024 0000 9030 6087 4080 3000 $`.@.$...0`.@.0. 00000050: 0090 2b60 8740 802b 0000 9034 6087 4080 ..+`.@.+...4`.@. 00000060: 3400 00ff 2f00 4.../.
The SMF reference says our header chunk will include the following data:
MThd | The chunk type indicating a header chunk |
length | a 32-bit representation of the length of the header chunk after |
format | 16-bits, possible values are 0, 1, and 2. Specifies the overall structure of the file. |
ntrks | Number of tracks |
division | Number of ticks in a single quarter note |
Each two-character hexadecimal number represents 1 byte, or 8 bits. Since
length
is a 32-bit value, we're expecting the eight characters after 4d54 6864
(the chunk type) to represent the length. The next eight characters are
0000 0006
. Since hexadecimal values lower than 16 are identical to
their decimal representation, we don't need to do any conversion to know that
this says we should expect 6 more bytes in the header. This checks out; we can
count six more bytes (represented as two-character hexadecimal values) on this
line before we get to 4d54
or MT
, which is the beginning of the first track
chunk.
The next 16-bits 0001
are the format. Format 1 means one file has multiple
simultaneous tracks, per the docs.
The next 16-bits 0002
is the number of tracks, which we know is two because
we can see two MTrk
chunk types in the hexdump
Finally we have the 16-bit value 03c0
, for our division. This will be the
number of ticks in a quarter note, but the hexadecimal representation isn't
very
meaningful to us. I'll use the Python REPL to convert this to a decimal value.
int(0x03c0) # 960
The SMF reference uses 96
as a reference value for the division. But REAPER
created our MIDI file with a more fine-grained division value of 960 ticks per
quarter note. Possibly, this difference is due to changes in computing power
since the SMF spec was created in 1996.
Duodecimal or The Joy of Being A Multiple of Twelve
Why use 96, or 960, ticks per quarter note?
Duodecimal is the name for a base-12 number system. The cool thing about using twelve as a base is that it is a "superior highly composite number", meaning that it is the smallest number with the numbers 1 through 4 as factors. Check the wikipedia page for Duodecimal for more info about the supreme divisibility of the number twelve.
Since 96 is a multiple of twelve (12 × 8 = 96), this means we have a lot of options for how we can cleanly subdivide a bar or a quarter note. For example, we could chop each quarter note in half to 48-tick eighth notes (96 ÷ 2 = 48), then divide each of those into 16-tick sixteenth note triplets (48 ÷ 3 = 16), then subdivide each of those by four to create 4-tick, um, like, something notes.
If we wanted half note triplets instead of quarter notes, we could take the number of ticks in a bar (given a 4/4 time signature) (96 × 4 = 384) and divide by 3 for three 128-tick notes per bar.
The reason being able to evenly divide a bar or note is important is that we have a tricky problem to solve if we want to subdivide a note into a number of notes that is not a factor of the number of ticks in that note.
For example if we wanted to divide a quarter note into sixteenth note quintuplets (96 ÷ 5 = 19.2), how would that work? A tick represents a single unit of MIDI processing work, so an event is either going to occur on a given tick or not. Non-integer values like 19.2 aren't valid. We could give one note in the quintuplet 20 ticks and each of the other four 19 ticks (20 + 19 × 4 = 96) and the listener would likely not notice the sleight of hand, but how would we automate splitting up ticks for arbitrary values?
Since our modern MIDI file is using 960 ticks per quarter note instead of 96, we have a 10-times greater degree of granularity to work with, making it less likely we'll have a metrical conundrum. Revisiting our example above, we can simply divide our 960-tick quarter note into 192-tick 16th note quintuplets (960 ÷ 5 = 192) without issue.
Track 1: meta-events
Next we'll highlight the relevant bytes of data from the first track chunk:
00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54 MThd..........MT 00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74 rk... ....midi_t 00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307 est..X.......Q.. 00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090 . ../.MTrk...(.. 00000040: 2460 8740 8024 0000 9030 6087 4080 3000 $`.@.$...0`.@.0. 00000050: 0090 2b60 8740 802b 0000 9034 6087 4080 ..+`.@.+...4`.@. 00000060: 3400 00ff 2f00 4.../.
Track chunks contain the following data:
MTrk | The chunk type indicating a track chunk |
length | A 16-bit represention of the length of the following track data, in bytes |
MTrk event+ | One or more MTrk events, which each consist of a time delta from the previous MTrk event, and an event |
There are three types of events: MIDI events, meta-events, and sysex events. MIDI events are often notes. meta-events can be info like the time signature or tempo. We won't be using sysex events in this article.
According to the SMF reference a format 1 MIDI file consists of a header chunk, followed by a track chunk containing metadata about the time signature and tempo, followed by the track chunk for the first music track. Therefore, we expect to see only meta-events in track chunk for track 1.
Each meta event begins with 0xff
and an additional identifier byte for the
event, followed by an arbitrary amount of extra data, depending on the event.
The following table shows every meta event in the file, and the SMF reference
details many more, but not all possible, meta-events:
ff 03 len text | Sequence/Track Name |
ff 2f 00 | End of Track |
ff 51 03 tttttt | Set Tempo |
ff 58 04 nn dd cc bb | Set Time Signature |
Let's walk through each MTrk event
in this track chuck:
00
ff 03
09 6d 69 64 69 5f 74 65 73 74
Zero time delta from the previous event, Track Name
event.
The first byte of the meta-event data is the length of the track name (0x09
).
The final nine bytes represent midi_test
in ACSII.
00 ff 58 04 04 02 18 08
This is a Time Signature
event, after zero time delta. The two bytes 04 02
represent the numerator and denominator of the time signature, respectively,
where the numberator is what it what it looks like, and the number for the
denominator specifies what power to raise the number 2 to. In other words the
denominator value dd
is log2 of x
, where x is the desired denominator in
decimal. So 02
is 4, 03
would be 8, 04
is 16, etc. The following two
numbers helps to set the metronome, and the final number is the number of 32nd
notes in a quarter note. We won't dwell on these two numbers any further.
The time delta for every MTrk event
in this track chunk is zero (0x00
),
meaning all meta-events are applied at the beginning of the track, and don't
change.
Please note that time deltas are not guaranteed to be a single byte in length, they're actually variable length quantities that can be between one and four bytes and they are damn weird. We'll get back to those soon.
Track 2: MIDI events
Finally, let's look at the data for track 2, which contains MTrk events
which each consist of a time delta and a MIDI event
.
00000000: 4d54 6864 0000 0006 0001 0002 03c0 4d54 MThd..........MT 00000010: 726b 0000 0020 00ff 0309 6d69 6469 5f74 rk... ....midi_t 00000020: 6573 7400 ff58 0404 0218 0800 ff51 0307 est..X.......Q.. 00000030: a120 00ff 2f00 4d54 726b 0000 0028 0090 . ../.MTrk...(.. 00000040: 2460 8740 8024 0000 9030 6087 4080 3000 $`.@.$...0`.@.0. 00000050: 0090 2b60 8740 802b 0000 9034 6087 4080 ..+`.@.+...4`.@. 00000060: 3400 00ff 2f00 4.../.
Part 2: Writing A MIDI File from Scratch in Rust
Now that we understand the contents of a MIDI file, we have the knowledge we need to be able to write one. I'll show this using the Rust programming language. Becuase we're going to do this from scratch, without using any libraries, will be able to do this in any programming language you choose.
To start, let's formalize what we've learned about the Standard MIDI File specification in code by creating a tiny MIDI module.
mod midi {
const NOTE_ON: u8 = 0x90;
const NOTE_OFF: u8 = 0x80;
const HEADER: &[u8; 4] = b"MThd";
const TRACK: &[u8; 4] = b"MTrk";
pub fn delta_time(mut ticks: u32) -> Vec<u8> {
// Since four bits are reserved in a variable-length quantity,
// the maximum possible quantity is the maximum unsigned 28-bit
// integer.
assert!(ticks <= 0x0fffffff);
// Use a bitmask to select only the first seven bits of the input value
let mut output = vec![ticks as u8 & 0x7f];
ticks >>= 7;
while ticks > 0 {
let next_quantity = ticks as u8 & 0x7f | 0x80;
output.push(next_quantity);
ticks >>= 7;
}
output.reverse();
output
}
pub fn end_of_track() -> Vec<u8> {
vec![0xff, 0x2f, 0x00]
}
pub fn note_on(channel: u8, note: u8, velocity: u8) -> Vec<u8> {
vec![NOTE_ON + channel - 1, note, velocity]
}
pub fn note_off(channel: u8, note: u8) -> Vec<u8> {
vec![NOTE_OFF + channel - 1, note, 0]
}
pub fn sequence_name(name: &str) -> Vec<u8> {
let name_length = name.len() as u8;
let mut output = vec![0xff, 0x03, name_length];
output.extend(name.as_bytes());
output
}
pub fn time_signature(nn: u8, dd: u8, cc: u8, bb: u8) -> Vec<u8> {
vec![0xff, 0x58, 0x04, nn, dd, cc, bb]
}
pub fn tempo(bpm: u16) -> Vec<u8> {
let mut output = vec![0xff, 0x51, 0x03];
let us_per_quarter: u32 = (60.0 / bpm as f32 * 1_000_000.0) as u32;
// Include microseconds per quarter in output as a 24-bit integer.
output.extend(&us_per_quarter.to_be_bytes()[1..4]);
output
}
pub fn track_event(time_delta: u32, event: Vec<u8>) -> Vec<u8> {
let mut t_event = delta_time(time_delta);
t_event.extend(event);
t_event
}
pub fn make_header_chunk(format: u16, track_count: u16, division: u16) -> Vec<u8> {
// Format can be 0, 1, or 2.
//Format 1 includes a header chuck and one or more simultaneous
// track chucks. Division is the number of ticks per quarter note.
// While the Standard MIDI file 1.0 specification shows using 96 ticks
// per quarter note, REAPER exports MIDI files with a division of 960
// ticks per quarter note.
let mut header = Vec::new();
let header_length: u32 = 6;
header.extend(HEADER);
header.extend(&header_length.to_be_bytes());
header.extend(&format.to_be_bytes());
header.extend(&track_count.to_be_bytes());
header.extend(&division.to_be_bytes());
header
}
pub fn make_track_chunk(track_events: Vec<u8>) -> Vec<u8> {
let mut track = Vec::new();
track.extend(TRACK);
track.extend(&(track_events.len() as u32).to_be_bytes());
track.extend(track_events);
track
}
}
Most of the functions in this module are just simple wrappers for the MIDI
events and meta-events that exist in the MIDI specification, and follow the
nomenclature given in the SMF reference. There is also the track_event
function that takes a time delta and an event and combines them into a MTrk
event, and make_header_chunk
and make_track_chunk
which each take the data
for their chunk type and automate adding the chunk type and length bytes.
By far the monst complex function is delta_time
, which takes a normal integer
value and manipulates its bits to create and VLQ. We'll break that function
down line by line in a moment, but first let's briefly discuss endianness.
Endianness
Every function in our MIDI module has Vec<u8>
, a vector of bytes, as its
return type. Soon when go to write our content to a new file we'll use the
write_all
method, which expects to receive a reference to an array of bytes. You will
notice in a few places in our module, such as inside the make_header_chunk
function, we start with a different unsigned integer type, such as a u32
or
32-bit word, and then use the
to_be_bytes
method to break the word into its component bytes and extend the output vector
with them. to_be_bytes
means "to big endian bytes", it has companion methods,
to_le_bytes
and to_ne_bytes
, to getting the memory representation of an
integer in little endian byte order, or the native order of the architechture
you're compiling for, respectively. It's important to realize that the MIDI
file we're writing is not machine code that will be executed by your CPU. It's
just data in a file that will be interpreted by your DAW software or whatever
else you feed MIDI files to. All this is to say that standard MIDI files are
written to in the more human-readable big-endian byte order, and this has
nothing to do with the architecture you're compiling your program for.
Bit Twiddling
The delta_time
function does the unenviable work of taking ordinary numeric
values and converting them bit-by-bit into the variable length quantity scheme
that the MIDI spec authors created. The code is loosely based on C code
provided in the SMF reference for writing to and reading from this format. I'll
deconstruct this function here, in case you find it a bit dense.
The plan for converting an input value to a VLQ is to consume the bits of the
input, seven bits at time, adding a 0
or 1
to the MSB end of the seven bits
to create a full bytes, and continuing as long as there are non-zero bits left
to consume. Here's how that's done:
Step one is to isolate the seven least significant bits of the input. This is
done by performing a bitwise AND operation between the input and the value
0x7f
(0b1111111
), seven bits all set to 1
, which acts as a bitmask.
The bitwise AND operation compares the operands at each position and yields a
1
in the output at the same position only if both compared bits are 1
.
Since the bitmask is seven bits long, with all bits set to one, it creates a
window seven bits wide that our input can be seen through. All bits outside
that window are discarded.
Supposing an input value of 960 (0b1111000000
),
here's what happens when do the bitwise AND operation to read the first seven
bits of the input, and push the result into the output vector:
Input | 0000 | 0011 | 1100 | 0000 |
Bitmask | 0111 | 1111 | ||
Result | 0100 | 0000 |
Output | [0b01000000] |
Because each value in the output array is an 8-bit byte, the most significant
bit is naturally 0
. Note that we didn't alter the input at this stage, we
just read from it.
Now that we've consumed the seven least significant bits of the input, we discard them by arithmetically shifting the input by 7 to the right, mutating the input value.
Input | 0000 | 0011 | 1100 | 0000 |
Result | 0000 | 0111 |
Next we'll consume the next seven bits of the input as before, with only one
difference: we initially processed the least significant bits of the input,
which were prefaced with a 0
implicitly. The following bytes should have a
1
at the most significant position, to specify that they are not the terminal
byte in the VLQ. To set the most significant bit we'll again use a bitmask,
this time with a bitwise OR operation. This operation compares the bits of both
operands, adding a 1
to the output if either of the compared values are
1
.
Input | 0000 | 0111 | ||
Bitmask | 1000 | 0000 | ||
Result | 1000 | 0111 |
Output | [0b01000000, 0b10000111] |
At this point, for our example input of 960, all non-zero bits have been consumed and the while loop will exit.
Finally we reverse the output vector, because while we were reading bits from the input from right to left, we pushed them into the output vector from left to right, so we need to put the bytes back in the correct, big-endian order.
Writing the MIDI File
Now that we've examined all the components that make up a simple MIDI file and created a small module to help create MIDI data, I'd like to present the following example program for writing a MIDI file from scratch. This is a minimal example, and an opportunity to get creative with note choices and durations. Anything you can imagine is possible!
use std::fs::File;
use std::io::prelude::*;
use midi::*;
// midi module code can go here or at the bottom of the file
fn main() -> std::io::Result<()> {
// Header
let format: u16 = 1;
let num_tracks: u16 = 2;
let division: u16 = 960;
let header = make_header_chunk(format, num_tracks, division);
// Track 1 - Meta events
let mut track_events: Vec<u8> = Vec::new();
track_events.extend(&track_event(0, sequence_name("midi_seq1")));
track_events.extend(&track_event(0, time_signature(4, 2, 24, 8)));
track_events.extend(&track_event(0, tempo(120)));
track_events.extend(&track_event(0, end_of_track()));
let track_01 = make_track_chunk(track_events);
// Track 2 - MIDI events
let mut track_events: Vec<u8> = Vec::new();
let notes = [60, 63, 67, 72, 63, 67, 72, 75, 72, 63, 67, 72, 75, 67, 72, 75];
let sixteenth_note: u32 = division as u32 / 4;
for note in notes.iter() {
track_events.extend(&track_event(0, note_on(1, *note, 96)));
track_events.extend(&track_event(sixteenth_note, note_off(1, *note),
));
}
track_events.extend(&track_event(0, end_of_track()));
let track_02 = make_track_chunk(track_events);
// Write the header and both tracks to a new MIDI file
let mut file = File::create("test.mid")?;
file.write_all(&header)?;
file.write_all(&track_01)?;
file.write_all(&track_02)?;
Ok(())
}