Adding Closed Captions to DVDs

Sonic Scenarist, Spruce Maestro and Apple DVD Studio Pro are the only DVD authoring tools capable of adding closed captions to the DVDs they create. Adding captions using any other authoring tool requires two additional steps: muxing the captions into the MPEG-2 video elementary file, and modifying the appropriate VTS_XX_0.IFO file. The second step can be performed easily using IfoEdit--double-click the Video line in the VTS overview - Title Set (Movie) attributes and check one or both boxes under "CC for Line 21":

NOTE: Line 21 consists of two fields: Field 1 contains up to two closed caption streams, while Field 2 contains XDS (V-chip, VCR clock set, and various other pieces of information), ITV (Interactive TV links), and another closed caption stream. I've never seen a DVD that used Field 2, but all three DVD authoring tools support it, just in case.

The rest of this article will outline the requirements for a tool that would perform the first step of muxing closed captions into an MPEG-2 file.

Input

The tool would need to handle up to three inputs:

An MPEG-2 Video Elementary Stream File (required)
A closed caption file for Field 1 (required)
A closed caption file for Field 2 (optional)

I assume any programmer tackling this project knows far more about the structure of the first input than I do. For the other two, there are a number of proprietory formats out there for storing closed captions. Of these, the Raw Broadcast Format is the easiest to work with, while the Scenarist Closed Caption Format is the most-widely used (as well as being the only accepted input for the three authoring programs that import captions), so I suppose a successful tool would have to support both.

Raw Broadcast files are in binary format. They can have any extension, but .bin is preferred. The first four bytes are ff ff ff ff. After this, bytes are associated in pairs with each frame of video. Most of the byte pairs are 80 80, the closed caption code for "do nothing".

Scenarist Closed Caption files are a readable shorthand of the data in the Raw Broadcast format. The file for Field 1 has the extension .scc, while the file for Field 2 has the extension .sc2 (or .scc again). In format, they are text files mostly resembling hex dumps. Here's an example:

Scenarist_SCC V1.0

01:02:53:14	94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f

01:02:55:14	942c 942c

01:03:27:29	94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f

The file is double-spaced, with data lines alternating with blank lines. The first line identifies the format and version--there was only one version, so it will always be exactly as shown. The third and subsequent alternating lines start with the timecode and are followed by the data.

The timecode is in SMPTE format, which is either hours:minutes:seconds:frames for non-dropframe timebase or hours:minutes:seconds;frames for dropframe timebase. The timebase should be the same as the video's timebase.

The data is made up of two-byte hexidecimal words, separated from each other by spaces and from the timecode by a tab character. As with Raw Broadcast format, each word takes one frame to transmit.

The purpose of the timecodes is so that the long stretches of the word 8080 can be skipped (i.e. all gaps between timecodes are entirely made up of the bytes 80 80).

Output

The only output is another MPEG-2 video elementary stream file, slightly bigger than the input file. Specifically, it will be about 200 bytes bigger per second of video (1.3 MB for two hours).

Procedure

Here is how to mux closed captions into an MPEG-2 video elementary stream file for DVD (this is derived from examining a dozen NTSC DVD's, so hopefully I haven't missed anything):

Copy from MPEG input to MPEG output until an I-frame header is reached (first picture header, 00 00 01 00, after GOP header, 00 00 01 b8). This is where the closed captions for this GOP will be inserted.
Look ahead in the MPEG input file to determine the number of frames in this GOP, N (because of automatic scene detection during encoding, this value can change from GOP to GOP, but for NTSC it can never be greater than 18).
Output the User Data Packet header: 00 00 01 b2.
Output the Closed Caption header: 43 43 01 f8.
Calculate then output the Attribute byte:

Start with N * 2.
Add 0 for the Extra Field Flag. The other possible value is 1, but I've seen DVDs where this is never set, and Scenarist and DVDMaestro never set it. When it is set, it means that the last caption segment is followed by an additional three bytes for an extra field. My guess is that this is an artifact of analog editing equipment, where a scene (and therefore a GOP) can be ended between the two fields of a single frame. Another possibility is that it is used to force all closed caption packets to have a length evenly-divisible by 4 (for a 15-frame GOP, using an N value of 14 and setting the Extra Field Flag to 1 results in a closed caption packet with a length of 96 bytes).
Add the Pattern Flag, which starts as 80, then toggles between 0 and 80 every time the previous GOP's Extra Field Flag is set. Scenarist and DVDMaestro always leave this as 80.

Output six bytes for each frame in the GOP:

Output a Field Byte of ff for Field 1 if the Pattern Flag is 80, or fe for Field 2 if the Pattern Flag is 0.
Copy two bytes from the appropriate field's closed caption file. If a file has run out, use the byte pair 80 80. For the case of what to do with Field 2 when a file is not supplied, DVDMaestro uses 80 80, while Scenarist uses 00 00. I've seen commercial DVD's using each of these substitutions.
Output another Field Byte with the opposite value from before (fe or ff).
Copy two bytes from the other field's closed caption file (or 80 80 or 00 00).
Loop back up to Step 6 until every frame in this GOP is accounted for.
If the Extra Field Flag is set, output ff for Pattern Flag 80 or fe for Pattern Flag 0, followed a pair of the appropriate field's data.

On some DVDs (the ones that use the Extra Field Flag a lot), the filler byte 00 is output repeatedly for GOPs with less than 15 frames in order to make the caption packet have a constant length of 96 bytes (I also saw a DVD that padded to 100 bytes). Other DVDs (and Scenarist and DVDMaestro-produced DVDs) have no filler here.
Loop back up to Step 1. This will output the picture packets in the GOP, then proceed to the captions for the next GOP.