DEV Community


go-mp4: Golang Library and CLI Tool for MP4

Kubo Ryosuke
Updated on ・4 min read

Japanese Page


I'm going to introduce go-mp4 which is Golang library for reading and writing MP4.

go-mp4 was developed by extending MP4 module of ABEMA which is the video streaming service in Japan.


What's MP4

There are many standards each for video codecs and audio codecs, for example, AVC, AV1, AAC, MP3, and etc.
However, it is not possible to record/playback the stream under synchronizing video and audio by only codecs.
You should use container like MP4 or MPEG2-TS for the purpose.
Containers can have video, audio, timecode, and other informations.

ISO/IEC 14496-12 "ISO Base Media File" stipulates the data unit called "Box" (also called "Atom"), tree structure of boxes, and the box types for each role.
And ISO/IEC 14496-14 "MP4 file format" is extension of that.

Structure of MP4 (ISO Base Media File Format)

The figure below shows an example of hexdump of MP4.
As mentioned above, it is composed of a collection of data units called box.
The first 4 bytes represent box-size, and the next 4 bytes represent box-type.
(If the box-size can not fit in 4 bytes, 0x00000001 is written in the first 4 bytes, and the box-size is written in 8 bytes of next to the box-type. And if the box-size is 0x00000000, the box continues until EOF.)

Alt Text

Looking at moov box, the size is 0x000004b3 (= 1203 bytes).
You can see that this section contains mvhd box and trak box.
Furthermore, the trak box contains tkhd box and mdia box.
Therefore, this data has a tree structure as shown in the following figure.

Alt Text

The standard describes the role of each box types, box type order, fields of each box types, and more.

About abema/go-mp4


When rewriting or adding some boxes, in some situation, it can be implemented easily depending with no library.
However, in order to read or write boxes of deep layer, you have to seek repeatedly to reach to the target boxes. Furthermore, when the box size changes, you must seek back to the box-size of parent and correct it.
I was looking for Golang library which supports many box types and has various uses, but I couldn't find.

And, I often need to investigate the data of box.
There are many tools available for that, but many of them seems to support only some of boxes or display only some fields of box.

So, we developed abema/go-mp4 to be more general-purpose and to meet these requirements.

Integration with Your Go App

You can visit to the boxes by depth-first order with the ReadBoxStructure function as follows:

_, err := mp4.ReadBoxStructure(file, func(h *mp4.ReadHandle) (interface{}, error) {
    fmt.Println("depth", len(h.Path))

    // Box Type (e.g. "mdhd", "tfdt", "mdat")

    // Box Size

    if h.BoxInfo.Type.IsSupported() {
        // Payload
        box, _, _ := h.ReadPayload()

        // Expands children
        return h.Expand()
    return nil, nil

You can also extract only boxes which has specific path using the ExtractBox function as follows:

// extract tkhd boxes which has path of moov->trak->tkhd
path := mp4.BoxPath{mp4.BoxTypeMoov(), mp4.BoxTypeTrak(), mp4.BoxTypeTkhd()}
boxes, err := mp4.ExtractBoxWithPayload(file, nil, path)

CLI Tool

go-mp4 has CLI tool.
For example, you can view the box tree with the mp4tool dump command.

$ mp4tool dump sample.mp4
[ftyp] Size=32 MajorBrand="isom" MinorVersion=512 CompatibleBrands=[{CompatibleBrand="isom"}, {CompatibleBrand="iso2"}, {CompatibleBrand="avc1"}, {CompatibleBrand="mp41"}]
[free] (unsupported box type) Size=8 Data=[...] (use -a option to show all)
[mdat] Size=6402 Data=[...] (use -mdat option to expand)
[moov] Size=1836 
  [mvhd] Size=108 ... (use -a option to show all)
  [trak] Size=743 
    [tkhd] Size=92 ... (use -a option to show all)
    [edts] Size=36 
      [elst] Size=28 Version=0 Flags=0x000000 EntryCount=1 Entries=[{SegmentDurationV0=1000 MediaTimeV0=2048 MediaRateInteger=1}]
    [mdia] Size=607 
      [mdhd] Size=32 Version=0 Flags=0x000000 CreationTimeV0=0 ModificationTimeV0=0 TimescaleV0=10240 DurationV0=10240 Pad=false Language="eng" PreDefined=0
      [hdlr] Size=44 Version=0 Flags=0x000000 PreDefined=0 HandlerType="vide" Name="VideoHandle"

By default, long lines are abbreviated. You can output everything except mdat box with the -a option.
Using -mdat option, you can view the mdat box as hex.

Implementation with Run-Time Reflection

You can add box types easily to go-mp4.
For example, tfdt box has implemented as follows:

package mp4

func BoxTypeTfdt() BoxType { return StrToBoxType("tfdt") }

func init() {
  AddBoxDef(&Tfdt{}, 0, 1)

type Tfdt struct {
  FullBox                      `mp4:"extend"`
  BaseMediaDecodeTimeV0 uint32 `mp4:"size=32,ver=0"`
  BaseMediaDecodeTimeV1 uint64 `mp4:"size=64,ver=1"`

// GetType returns the BoxType
func (*Tfdt) GetType() BoxType {
  return BoxTypeTfdt()

It has just a struct and few implementations.
It is interesting that the version number of tfdt box decides whether the size of BaseMediaDecodeTime is 32 bits or 64 bits.
mp4:"size=32,ver=0" tag and mp4:"size=64,ver=1" tag specifies that.
In most cases you can add a new box type without special logic.

It is a disadvantage that the evaluation is done at runtime using reflection.
However, the processing cost will be relatively insignificant for many applications.
(Creating a code generator, it will be completely resolved, and it will be easier to support other languages.)


I have introduced our Golang library for MP4.

In our company, we are developing more softwares related to video streaming.
If possible, I want to make the such softwares to open source too.

Discussion (0)