Posted by Nathan (@circuitlions) on June 8, 2015 in Code, Scholarship, Technology, Videogames

I AM EXTRA: Manic Compression

How Donkey Kong’s programmers used compression to save ROM space.

The following section is one of I AM ERROR’s oldest—it appeared in the initial drafts of chapter 2 (which I wrote first) in early 2012. It also remained intact until quite late in the manuscript process—at least until mid-2014—lodged between the ‘Missing Pies’ and ‘Binary Kill’ sections. As such, it made it through numerous revisions and copyedits.

In the end, I cut the section because it felt supplemental. Though compression is an important concept in videogame design, especially on constrained systems like the NES, I covered it later in chapter 4 while discussing Super Mario Bros. And though I liked describing a specific form of compression used frequently in NES programming, run-length encoding never came up again, so it felt too technical.

I’m also sad that I lost the section title. Many of my headings are puns or references that allude to videogame media or other pop culture I like. In this case, Manic Compression hailed from a Quicksand album of the same name, released in 1995. It’s good.

Manic Compression

Today, little of our digital media is measured in kilobytes. The annotated, plain text disassembly of Donkey Kong I consulted while writing this section weighs in at 353KB, nearly fifteen times the size of the NROM’s PRG-ROM and CHR-ROM combined. In the Famicom era, every byte counted.

Compared to later titles like Mega Man 2, The Legend of Zelda, Ninja Gaiden, or Metroid, Donkey Kong’s gameplay is rudimentary. There are four non-scrolling play screens, a title screen, a small handful of sprites, limited character movement, and no persistent game states. But the aforementioned games benefited from advancements in cartridge or disk hardware that would grant swappable memory for code and graphics, on-board batteries for game saves, enhanced sound, and more. We know that Uemura and R&D2 had Donkey Kong in mind when they designed the Famicom, but its base specifications fell just shy of delivering the complete arcade experience. And despite excising an entire stage, the high score table, and a number of animations, the programmers still relied on a programming technique known as compression to minimize their use of ROM space.

The basic role of compression is to eliminate data redundancy. Fortunately, Famicom graphics are stored as symmetrical 8×8 or 8×16 pixel tiles. Owing to the limited set of pattern tiles available, many onscreen elements are repeated to form larger structures. To draw a platform, a designer might, for example, use only three tiles: two caps at either end with a long string of identical tiles in the center. These may be more complex or, in Donkey Kong’s case, far simpler—as we now know, the bottom platform of the rivets stage is a single 8×8 tile repeated horizontally across the screen.

How might a programmer create such a platform? Here’s one way:

Choose the first tile’s starting name table location (e.g., bottom left).
Pick the appropriate tile from the pattern table.
Draw the tile at that location.
Shift to the next position on the name table, eight pixels to the right.
Repeat steps 3 and 4 until the entire platform is drawn.

Step 5’s redundancy should be obvious. The programmer knows in advance what tiles will be used to draw the platform and how wide that platform will be. The data used to store the platform’s shape takes up precious ROM space. When only a single tile is involved, it is a significant waste of both CPU cycles and memory to repeat the same steps again and again. Compression is a means to manage redundancies like these.

One of the simplest compression schemes is called run-length encoding, or RLE. RLE is used in situations like the example above, when a run of elements repeats for a prescribed length. If our rivet tile’s pattern table index byte is, say, $21, and the platform is fifteen sprites long, using the five steps above we might represent the platform data in code as:

$21,$21,$21,$21,$21,$21,$21,$21,$21,$21,$21,$21,$21,$21,$21

We draw each individual tile fifteen times, using fifteen bytes of space. Using RLE, however, we could encode that run as follows:

$21,$0F,$00

The first byte is the hexadecimal reference to the rivet tile, the second byte designates the desired number of repetitions ($0F = 15), and the final byte terminates the sequence. The original fifteen-byte data block compresses to three bytes with RLE—an eighty percent reduction.

In practice, RLE requires a few more details. For instance, there must be a related routine that understands how the compression scheme is formatted so objects may be properly uncompressed. Keep in mind that RLE was not an automated routine built into Famicom software, like inserting a disc into iTunes, clicking a button to ‘Import CD,’ and having a properly encoded .mp3 come out on the other end. Coders had to ‘roll their own’ compression algorithms. But the extra overhead required to interpret the data was usually worth the effort relative to the memory saved, as Donkey Kong’s programmers proved.

We saw in chapter 1 that the PPU is synced carefully to the ebb and flow of the television’s electron beam. As the gun rakes its pattern of scanlines down the glass, it has two distinct periods of rest, the horizontal blank (HBLANK) and the vertical blank (VBLANK). HBLANK occurs after the completion of a scanline, when the gun must reset from one side of the screen to the other. The HBLANK window is minuscule, even in microprocessor terms—without tightly-timed code or additional hardware help, it is risky to issue new commands to the PPU during that time. (Updating the PPU while the scanline is actively being drawn is a Famicom development no-no, resulting in nasty onscreen glitches.) VBLANK has a wider window, since the electron gun must travel the full height of the screen. When the television enters this state, it triggers a non-maskable interrupt, or NMI. ‘Non-maskable’ means that this signal cannot be suppressed. In essence, whatever the CPU was doing at that moment is immediately interrupted and sent to the NMI handler, a special routine that executes during VBLANK. How the programmer chooses to use this time is largely up to their discretion, but the VBLANK happens without fail (barring any hardware malfunctions).

Good programming practice on the Famicom dictates that developers use those fleeting slivers of time wisely. If the game’s state has changed during the previous frame (e.g., a player presses a button), graphic updates likely need to be made—sprites move a few pixels, name table tiles swap, and so on. And VBLANK is the only time (assuming the PPU is enabled) tiles may update without risking graphical glitches. The catch is that an entire screen of graphics—960 tiles total—can not update during a single VBLANK. The margin is too narrow and the CPU is too slow. To make the most of VBLANK, the programmer must make good use of time preceding the NMI, getting their proverbial house in order before the lights come back on. That means keeping a running list of what needs to be updated and where, then waiting for the NMI to trigger. As soon as the interrupt fires, the code is ready to run its updates, shifting or replacing tiles before the PPU resumes rendering.

In Donkey Kong, there is a 64-byte background update list that records which tiles must update during a given frame. The first byte stores the current list size, which is reset to zero (#$00) at the end of each NMI. The bytes that follow describe a queue of background tiles that require updates during the next VBLANK. During normal gameplay, major and minor events take place, ranging from full stage transitions to the bonus timer decrementing, that require name table updates. When these events trigger, a code subroutine updates the current list size and then appends the relevant background tile to the update list. However, due to the list’s limited memory size—1 byte for the ‘header,’ 63 bytes of tile data—there is a hard upper limit to the number of updates per frame. Each time a new tile appends to the queue, a separate subroutine runs a check against the list limit. If the limit is exceeded, the subroutine rejects the tile. The simple check ensures that all updates fit within VBLANK.

Tile updates may also use RLE compression when needed. The VRAM write routine called during NMI requires three bytes of information to draw the appropriate tile. The first two hold the 16-bit VRAM address where the tile (or first tile in a run) will be placed. A third byte contains three bit flags: orientation, RLE encoding, and run length. The orientation bit indicates whether tiles will run horizontally or vertically from their tile origin. This flexibility is catered to Donkey Kong’s design, since the stages contain both horizontal (platforms) and vertical (ladders) strips of tiles. The second bit indicates whether RLE is ‘active’ for the given tile update. The remaining six bits store the run length, whose maximum possible value is %111111, or 63. Notice that the run length limit is identical to the update list limit. In practice, there is no background element in Donkey Kong that requires sixty-three repeated tiles, but the congruence is clearly by design.

As the bit flag above indicates, RLE is optional. There are some instances where RLE is detrimental to code compression. A single tile update, for instance, requires more information to encode the ‘compressed’ data than it does to use the data itself. Donkey Kong’s engine is impressively agile for a Famicom launch title. It uses vertical strips of RLE-encoded rivet tiles to draw the letters on the title screen, but keeps RLE inactive when it needs to update a single tile in the level indicator. It handles both cases with a single bit toggle.

To read more I AM ERROR excerpts, check out I AM EXTRA.

Tags: Donkey Kong, NES, Nintendo, platformstudies, programming