Sonic Scenarist, Spruce Maestro and Apple DVD Studio Pro are the only DVD authoring tools capable of adding closed captions to the DVDs they create. Adding captions using any other authoring tool requires two additional steps: muxing the captions into the MPEG-2 video elementary file, and modifying the appropriate VTS_XX_0.IFO file. The second step can be performed easily using IfoEdit--double-click the Video line in the VTS overview - Title Set (Movie) attributes and check one or both boxes under "CC for Line 21":
NOTE: Line 21 consists of two fields: Field 1 contains up to two closed caption streams, while Field 2 contains XDS (V-chip, VCR clock set, and various other pieces of information), ITV (Interactive TV links), and another closed caption stream. I've never seen a DVD that used Field 2, but all three DVD authoring tools support it, just in case.
The rest of this article will outline the requirements for a tool that would perform the first step of muxing closed captions into an MPEG-2 file.
The tool would need to handle up to three inputs:
I assume any programmer tackling this project knows far more about the structure of the first input than I do. For the other two, there are a number of proprietory formats out there for storing closed captions. Of these, the Raw Broadcast Format is the easiest to work with, while the Scenarist Closed Caption Format is the most-widely used (as well as being the only accepted input for the three authoring programs that import captions), so I suppose a successful tool would have to support both.
Raw Broadcast files are in binary format.
They can have any extension, but .bin
is preferred. The first
four bytes are ff ff ff ff
. After this, bytes are associated in
pairs with each frame of video. Most of the byte pairs are 80 80
,
the closed caption code for "do nothing".
Scenarist Closed Caption files are a readable
shorthand of the data in the Raw Broadcast format. The file for Field 1 has the
extension .scc
, while the file for Field 2 has the extension
.sc2
(or .scc
again). In format, they are text files
mostly resembling hex dumps. Here's an example:
Scenarist_SCC V1.0 01:02:53:14 94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f 01:02:55:14 942c 942c 01:03:27:29 94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f |
The file is double-spaced, with data lines alternating with blank lines. The first line identifies the format and version--there was only one version, so it will always be exactly as shown. The third and subsequent alternating lines start with the timecode and are followed by the data.
The timecode is in SMPTE format, which is either
hours:minutes:seconds:frames
for non-dropframe timebase or
hours:minutes:seconds;frames
for dropframe timebase. The
timebase should be the same as the video's timebase.
The data is made up of two-byte hexidecimal words, separated from each other by spaces and from the timecode by a tab character. As with Raw Broadcast format, each word takes one frame to transmit.
The purpose of the timecodes is so that the long stretches of the word
8080
can be skipped (i.e. all gaps between timecodes are entirely
made up of the bytes 80 80
).
The only output is another MPEG-2 video elementary stream file, slightly bigger than the input file. Specifically, it will be about 200 bytes bigger per second of video (1.3 MB for two hours).
Here is how to mux closed captions into an MPEG-2 video elementary stream file for DVD (this is derived from examining a dozen NTSC DVD's, so hopefully I haven't missed anything):
00 00 01 00
, after GOP header,
00 00 01 b8
). This is where the closed captions for this GOP
will be inserted.N
(because of automatic scene detection during encoding,
this value can change from GOP to GOP, but for NTSC it can never be greater
than 18).00 00 01 b2
.
43 43 01 f8
.
N
* 2.1
, but I've seen DVDs where this is never set, and Scenarist
and DVDMaestro never set it. When it is set, it means that the last
caption segment is followed by an additional three bytes for an extra
field. My guess is that this is an artifact of analog editing equipment,
where a scene (and therefore a GOP) can be ended between the two fields of
a single frame. Another possibility is that it is used to force all closed
caption packets to have a length evenly-divisible by 4 (for a 15-frame GOP,
using an N
value of 14 and setting the Extra Field Flag to 1
results in a closed caption packet with a length of 96 bytes).80
, then toggles
between 0
and 80
every time the previous GOP's
Extra Field Flag is set. Scenarist and DVDMaestro always leave this as
80
.ff
for Field 1 if the Pattern Flag is
80
, or fe
for Field 2 if the Pattern Flag is
0
.
80 80
. For the case of
what to do with Field 2 when a file is not supplied, DVDMaestro uses
80 80
, while Scenarist uses 00 00
. I've seen
commercial DVD's using each of these substitutions.fe
or ff
).80 80
or 00 00
).ff
for Pattern Flag
80
or fe
for Pattern Flag 0
,
followed a pair of the appropriate field's data.00
is output repeatedly for GOPs with less than 15 frames in
order to make the caption packet have a constant length of 96 bytes (I also
saw a DVD that padded to 100 bytes). Other DVDs (and Scenarist and
DVDMaestro-produced DVDs) have no filler here.