UTK

From Niotso Wiki
Jump to: navigation, search
MicroTalk
Type of format Audio
Filename extension .utk
Magic number UTM0
Byte order Little endian
First appeared Beasts and Bumpkins (1997)
Type ID 0x1B6B9806
EA MicroTalk (also UTalk or UTK) is a linear-predictive speech codec used in various games by Electronic Arts. The earliest known game to use it is Beasts & Bumpkins (1997). The codec has a bandwidth of 11.025kHz (sampling rate 22.05kHz) and frame size of 20ms (432 samples) and only supports mono. It is typically encoded at 32 kbit/s.

The codec was reverse-engineered by trass3r on August 05, 2010 and was later analyzed by Andrew D'Addesio on January 12, 2012.

MicroTalk has been seen in several containers, depending on the game:

  • PT/.m10 (Beasts and Bumpkins)
  • Maxis UTK (The Sims Online, SimCity 4)
  • SCxl (FIFA 2001 PS2, FIFA 2002 PS2)

All code and text content on this page is public domain and licensed under the UNLICENSE.

For decoders for all of the above games, see https://github.com/daddesio/utkencode.

Variants[edit]

There are a few variants of MicroTalk:

  • MicroTalk 10:1 (codec_id=4 in EA SCxl) and MicroTalk 5:1 (codec_id=22) are really the same codec; the encoder simply chooses different compression parameters (reduced_bw is enabled in 10:1 and disabled in 5:1). MicroTalk 10:1 is typically encoded at 32kit/s; 5:1 is encoded at 64kbit/s.
  • MicroTalk Revision 3 (revision=3 in SCxl) is a variant which supports raw PCM samples. (See Revision 3.)

UTK file header[edit]

The simplest container is Maxis UTK. The UTK file header is little-endian and is very similar to the Maxis XA file header, with a few modifications.

struct UTKHeader
{
    char  sID[4];
    DWORD dwOutSize;
    DWORD dwWfxSize;
    /* WAVEFORMATEX */
    WORD  wFormatTag;
    WORD  nChannels;
    DWORD nSamplesPerSec;
    DWORD nAvgBytesPerSec;
    WORD  nBlockAlign;
    WORD  wBitsPerSample;
    WORD  cbSize;
    WORD  padding;
};
  • sID - A 4-byte string identifier equal to "UTM0".
  • dwOutSize - The size in bytes of the decompressed audio, not including the WAV header; equal to num_samples*2.
  • dwWfxSize - The size in bytes of the WAVEFORMATEX structure to follow; must be 20 (0x14).
  • wFormatTag - The decoded audio format; set to WAVE_FORMAT_PCM (0x0001).
  • nChannels - Number of channels in the decoded audio; must be 1.
  • nSamplesPerSec - Sampling rate of the decoded audio; this is always 22050.
  • nAvgBytesPerSec - Byte rate of the decoded audio; equal to nSamplesPerSec*nChannels*wBitsPerSample/8 or nSamplesPerSec*nBlockAlign.
  • nBlockAlign - Number of bytes per sampling interval in the decoded audio; equal to nChannels*wBitsPerSample/8.
  • wBitsPerSample - Bits per sample in the decoded audio (8, 16, 24, 32, etc.); this is always 16.
  • cbSize - The size in bytes of extra format information following the WAVEFORMATEX structure; must be 0.
  • padding - Two bytes of padding; this is always 0.

Decoding[edit]

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
 
/* Note: This struct assumes a member alignment of 4 bytes.
** This matters when pitch_lag > 216 on the first subframe of any given frame,
** causing the adaptive codebook pointer to underflow as far as fixed_gains.
** (Does that happen in any real-world samples?) */
typedef struct UTKContext {
    FILE *fp;
    unsigned int bits_value;
    int bits_count;
    int reduced_bw;
    int multipulse_thresh;
    float fixed_gains[64];
    float rc[12];
    float synth_history[12];
    float adapt_cb[324];
    float decompressed_frame[432];
} UTKContext;
 
enum {
    MDL_NORMAL = 0,
    MDL_LARGEPULSE = 1
};
 
static const float utk_rc_table[64] = {
    0.0f,
    -.99677598476409912109375f, -.99032700061798095703125f, -.983879029750823974609375f, -.977430999279022216796875f,
    -.970982015132904052734375f, -.964533984661102294921875f, -.958085000514984130859375f, -.9516370296478271484375f,
    -.930754005908966064453125f, -.904959976673126220703125f, -.879167020320892333984375f, -.853372991085052490234375f,
    -.827579021453857421875f, -.801786005496978759765625f, -.775991976261138916015625f, -.75019800662994384765625f,
    -.724404990673065185546875f, -.6986110210418701171875f, -.6706349849700927734375f, -.61904799938201904296875f,
    -.567460000514984130859375f, -.515873014926910400390625f, -.4642859995365142822265625f, -.4126980006694793701171875f,
    -.361110985279083251953125f, -.309523999691009521484375f, -.257937014102935791015625f, -.20634900033473968505859375f,
    -.1547619998455047607421875f, -.10317499935626983642578125f, -.05158700048923492431640625f,
    0.0f,
    +.05158700048923492431640625f, +.10317499935626983642578125f, +.1547619998455047607421875f, +.20634900033473968505859375f,
    +.257937014102935791015625f, +.309523999691009521484375f, +.361110985279083251953125f, +.4126980006694793701171875f,
    +.4642859995365142822265625f, +.515873014926910400390625f, +.567460000514984130859375f, +.61904799938201904296875f,
    +.6706349849700927734375f, +.6986110210418701171875f, +.724404990673065185546875f, +.75019800662994384765625f,
    +.775991976261138916015625f, +.801786005496978759765625f, +.827579021453857421875f, +.853372991085052490234375f,
    +.879167020320892333984375f, +.904959976673126220703125f, +.930754005908966064453125f, +.9516370296478271484375f,
    +.958085000514984130859375f, +.964533984661102294921875f, +.970982015132904052734375f, +.977430999279022216796875f,
    +.983879029750823974609375f, +.99032700061798095703125f, +.99677598476409912109375
};
 
static const uint8_t utk_codebooks[2][256] = {
    { /* normal model */
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 17,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5, 21,
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 18,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5, 25,
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 17,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5, 22,
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 18,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5,  0,
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 17,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5, 21,
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 18,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5, 26,
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 17,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5, 22,
        4,  6,  5,  9,  4,  6,  5, 13,  4,  6,  5, 10,  4,  6,  5, 18,
        4,  6,  5,  9,  4,  6,  5, 14,  4,  6,  5, 10,  4,  6,  5,  2
    }, { /* large-pulse model */
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 23,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8, 27,
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 24,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8,  1,
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 23,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8, 28,
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 24,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8,  3,
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 23,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8, 27,
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 24,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8,  1,
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 23,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8, 28,
        4, 11,  7, 15,  4, 12,  8, 19,  4, 11,  7, 16,  4, 12,  8, 24,
        4, 11,  7, 15,  4, 12,  8, 20,  4, 11,  7, 16,  4, 12,  8,  3
    }
};
 
static const struct {
    int next_model;
    int code_size;
    float pulse_value;
} utk_commands[29] = {
    {MDL_LARGEPULSE, 8,  0.0f},
    {MDL_LARGEPULSE, 7,  0.0f},
    {MDL_NORMAL,     8,  0.0f},
    {MDL_NORMAL,     7,  0.0f},
    {MDL_NORMAL,     2,  0.0f},
    {MDL_NORMAL,     2, -1.0f},
    {MDL_NORMAL,     2, +1.0f},
    {MDL_NORMAL,     3, -1.0f},
    {MDL_NORMAL,     3, +1.0f},
    {MDL_LARGEPULSE, 4, -2.0f},
    {MDL_LARGEPULSE, 4, +2.0f},
    {MDL_LARGEPULSE, 3, -2.0f},
    {MDL_LARGEPULSE, 3, +2.0f},
    {MDL_LARGEPULSE, 5, -3.0f},
    {MDL_LARGEPULSE, 5, +3.0f},
    {MDL_LARGEPULSE, 4, -3.0f},
    {MDL_LARGEPULSE, 4, +3.0f},
    {MDL_LARGEPULSE, 6, -4.0f},
    {MDL_LARGEPULSE, 6, +4.0f},
    {MDL_LARGEPULSE, 5, -4.0f},
    {MDL_LARGEPULSE, 5, +4.0f},
    {MDL_LARGEPULSE, 7, -5.0f},
    {MDL_LARGEPULSE, 7, +5.0f},
    {MDL_LARGEPULSE, 6, -5.0f},
    {MDL_LARGEPULSE, 6, +5.0f},
    {MDL_LARGEPULSE, 8, -6.0f},
    {MDL_LARGEPULSE, 8, +6.0f},
    {MDL_LARGEPULSE, 7, -6.0f},
    {MDL_LARGEPULSE, 7, +6.0f}
};
 
static int utk_read_bits(UTKContext *ctx, int count)
{
    int ret = ctx->bits_value & ((1 << count) - 1);
    ctx->bits_value >>= count;
    ctx->bits_count -= count;
 
    if (ctx->bits_count < 8) {
        /* read another byte */
        uint8_t x;
        if (fread(&x, 1, 1, ctx->fp) != 1)
            x = 0;
        ctx->bits_value |= x << ctx->bits_count;
        ctx->bits_count += 8;
    }
 
    return ret;
}
 
static void utk_init(FILE *fp, UTKContext *ctx)
{
    int i;
    uint8_t x;
    float multiplier;
 
    memset(ctx, 0, sizeof(*ctx));
 
    ctx->fp = fp;
 
    if (fread(&x, 1, 1, ctx->fp) != 1)
        x = 0;
    ctx->bits_value = x;
    ctx->bits_count = 8;
 
    ctx->reduced_bw = utk_read_bits(ctx, 1);
    ctx->multipulse_thresh = 32 - utk_read_bits(ctx, 4);
    ctx->fixed_gains[0] = 8.0f * (1 + utk_read_bits(ctx, 4));
    multiplier = 1.04f + utk_read_bits(ctx, 6)*0.001f;
 
    for (i = 1; i < 64; i++)
        ctx->fixed_gains[i] = ctx->fixed_gains[i-1] * multiplier;
}
 
static void utk_decode_excitation(UTKContext *ctx, int use_multipulse, float *out, int stride)
{
    int i;
 
    if (use_multipulse) {
        /* multi-pulse model: n pulses are coded explicitly; the rest are zero */
        int model, cmd;
        model = 0;
        i = 0;
        while (i < 108) {
            cmd = utk_codebooks[model][ctx->bits_value & 0xff];
            model = utk_commands[cmd].next_model;
            utk_read_bits(ctx, utk_commands[cmd].code_size);
 
            if (cmd > 3) {
                /* insert a pulse with magnitude <= 6.0f */
                out[i] = utk_commands[cmd].pulse_value;
                i += stride;
            } else if (cmd > 1) {
                /* insert between 7 and 70 zeros */
                int count = 7 + utk_read_bits(ctx, 6);
                if (i + count * stride > 108)
                    count = (108 - i)/stride;
 
                while (count > 0) {
                    out[i] = 0.0f;
                    i += stride;
                    count--;
                }
            } else {
                /* insert a pulse with magnitude >= 7.0f */
                int x = 7;
 
                while (utk_read_bits(ctx, 1))
                    x++;
 
                if (!utk_read_bits(ctx, 1))
                    x *= -1;
 
                out[i] = (float)x;
                i += stride;
            }
        }
    } else {
        /* RELP model: entire residual (excitation) signal is coded explicitly */
        i = 0;
        while (i < 108) {
            if (!utk_read_bits(ctx, 1))
                out[i] = 0.0f;
            else if (!utk_read_bits(ctx, 1))
                out[i] = -2.0f;
            else
                out[i] = 2.0f;
 
            i += stride;
        }
    }
}
 
static void rc_to_lpc(const float *rc, float *lpc)
{
    int i, j;
    float tmp1[12];
    float tmp2[12];
 
    for (i = 10; i >= 0; i--)
        tmp2[1+i] = rc[i];
 
    tmp2[0] = 1.0f;
 
    for (i = 0; i < 12; i++) {
        float x = -tmp2[11] * rc[11];
 
        for (j = 10; j >= 0; j--) {
            x -= tmp2[j] * rc[j];
            tmp2[j+1] = x * rc[j] + tmp2[j];
        }
 
        tmp1[i] = tmp2[0] = x;
 
        for (j = 0; j < i; j++)
            x -= tmp1[i-1-j] * lpc[j];
 
        lpc[i] = x;
    }
}
 
static void utk_lp_synthesis_filter(UTKContext *ctx, int offset, int num_blocks)
{
    int i, j, k;
    float lpc[12];
    float *ptr = &ctx->decompressed_frame[offset];
 
    rc_to_lpc(ctx->rc, lpc);
 
    for (i = 0; i < num_blocks; i++) {
        for (j = 0; j < 12; j++) {
            float x = *ptr;
 
            for (k = 0; k < j; k++)
                x += lpc[k] * ctx->synth_history[k-j+12];
            for (; k < 12; k++)
                x += lpc[k] * ctx->synth_history[k-j];
 
            ctx->synth_history[11-j] = x;
            *ptr++ = x;
        }
    }
}
 
static void utk_decode_frame(UTKContext *ctx)
{
    int i, j;
    int use_multipulse = 0;
    float excitation[5+108+5];
    float rc_delta[12];
 
    memset(&excitation[0], 0, 5*sizeof(float));
    memset(&excitation[5+108], 0, 5*sizeof(float));
 
    /* read the reflection coefficients */
    for (i = 0; i < 12; i++) {
        int idx;
        if (i == 0) {
            idx = utk_read_bits(ctx, 6);
            if (idx < ctx->multipulse_thresh)
                use_multipulse = 1;
        } else if (i < 4) {
            idx = utk_read_bits(ctx, 6);
        } else {
            idx = 16 + utk_read_bits(ctx, 5);
        }
 
        rc_delta[i] = (utk_rc_table[idx] - ctx->rc[i])*0.25f;
    }
 
    /* decode four subframes */
    for (i = 0; i < 4; i++) {
        int pitch_lag = utk_read_bits(ctx, 8);
        float pitch_gain = (float)utk_read_bits(ctx, 4)/15.0f;
        float fixed_gain = ctx->fixed_gains[utk_read_bits(ctx, 6)];
 
        if (!ctx->reduced_bw) {
            utk_decode_excitation(ctx, use_multipulse, &excitation[5], 1);
        } else {
            /* residual (excitation) signal is encoded at reduced bandwidth */
            int align = utk_read_bits(ctx, 1);
            int zero = utk_read_bits(ctx, 1);
 
            utk_decode_excitation(ctx, use_multipulse, &excitation[5+align], 2);
 
            if (zero) {
                /* fill the remaining samples with zero
                ** (spectrum is duplicated into high frequencies) */
                for (j = 0; j < 54; j++)
                    excitation[5+(1-align)+2*j] = 0.0f;
            } else {
                /* interpolate the remaining samples
                ** (spectrum is low-pass filtered) */
                float *ptr = &excitation[5+(1-align)];
                for (j = 0; j < 108; j += 2)
                    ptr[j] =   ptr[j-5] * 0.01803267933428287506103515625f
                             - ptr[j-3] * 0.114591561257839202880859375f
                             + ptr[j-1] * 0.597385942935943603515625f
                             + ptr[j+1] * 0.597385942935943603515625f
                             - ptr[j+3] * 0.114591561257839202880859375f
                             + ptr[j+5] * 0.01803267933428287506103515625f;
 
                /* scale by 0.5f to give the sinc impulse response unit energy */
                fixed_gain *= 0.5f;
            }
        }
 
        for (j = 0; j < 108; j++)
            ctx->decompressed_frame[108*i+j] =   fixed_gain * excitation[5+j]
                                               + pitch_gain * ctx->adapt_cb[108*i+216-pitch_lag+j];
    }
 
    for (i = 0; i < 324; i++)
        ctx->adapt_cb[i] = ctx->decompressed_frame[108+i];
 
    for (i = 0; i < 4; i++) {
        for (j = 0; j < 12; j++)
            ctx->rc[j] += rc_delta[j];
 
        utk_lp_synthesis_filter(ctx, 12*i, i < 3 ? 1 : 33);
    }
}

Sample program[edit]

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
 
#define MAKE_U32(a,b,c,d) ((a)|((b)<<8)|((c)<<16)|((d)<<24))
#define ROUND(x) ((x) >= 0.0f ? ((x)+0.5f) : ((x)-0.5f))
#define MIN(x,y) ((x)<(y)?(x):(y))
#define MAX(x,y) ((x)>(y)?(x):(y))
#define CLAMP(x,min,max) MIN(MAX(x,min),max)
 
static void read_bytes(FILE *fp, uint8_t *dest, size_t size);
static void write_bytes(FILE *fp, const uint8_t *dest, size_t size);
static uint32_t read_u32(FILE *fp);
static uint16_t read_u16(FILE *fp);
static void write_u32(FILE *fp, uint32_t x);
static void write_u16(FILE *fp, uint16_t x);
 
int main(int argc, char *argv[])
{
    const char *infile, *outfile;
    UTKContext ctx;
    uint32_t sID;
    uint32_t dwOutSize;
    uint32_t dwWfxSize;
    uint16_t wFormatTag;
    uint16_t nChannels;
    uint32_t nSamplesPerSec;
    uint32_t nAvgBytesPerSec;
    uint16_t nBlockAlign;
    uint16_t wBitsPerSample;
    uint16_t cbSize;
    uint32_t num_samples;
    FILE *infp, *outfp;
    int force = 0;
    int error = 0;
    int i;
 
    if (argc == 4 && !strcmp(argv[1], "-f")) {
        force = 1;
        argv++, argc--;
    }
 
    if (argc != 3) {
        printf("Usage: utkdecode [-f] infile outfile\n");
        printf("Decode Maxis UTK to wav.\n");
        return EXIT_FAILURE;
    }
 
    infile = argv[1];
    outfile = argv[2];
 
    infp = fopen(infile, "rb");
    if (!infp) {
        fprintf(stderr, "error: failed to open '%s' for reading: %s\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }
 
    if (!force && fopen(outfile, "rb")) {
        fprintf(stderr, "error: '%s' already exists\n", argv[2]);
        return EXIT_FAILURE;
    }
 
    outfp = fopen(outfile, "wb");
    if (!outfp) {
        fprintf(stderr, "error: failed to create '%s': %s\n", argv[2], strerror(errno));
        return EXIT_FAILURE;
    }
 
    sID = read_u32(infp);
    dwOutSize = read_u32(infp);
    dwWfxSize = read_u32(infp);
    wFormatTag = read_u16(infp);
    nChannels = read_u16(infp);
    nSamplesPerSec = read_u32(infp);
    nAvgBytesPerSec = read_u32(infp);
    nBlockAlign = read_u16(infp);
    wBitsPerSample = read_u16(infp);
    cbSize = read_u16(infp);
    read_u16(infp); /* padding */
 
    if (sID != MAKE_U32('U','T','M','0')) {
        fprintf(stderr, "error: not a valid UTK file (expected UTM0 signature)\n");
        return EXIT_FAILURE;
    } else if ((dwOutSize & 0x01) != 0 || dwOutSize >= 0x01000000) {
        fprintf(stderr, "error: invalid dwOutSize %u\n", (unsigned)dwOutSize);
        return EXIT_FAILURE;
    } else if (dwWfxSize != 20) {
        fprintf(stderr, "error: invalid dwWfxSize %u (expected 20)\n", (unsigned)dwWfxSize);
        return EXIT_FAILURE;
    } else if (wFormatTag != 1) {
        fprintf(stderr, "error: invalid wFormatTag %u (expected 1)\n", (unsigned)wFormatTag);
        return EXIT_FAILURE;
    }
 
    if (nChannels != 1) {
        fprintf(stderr, "error: invalid nChannels %u (only mono is supported)\n", (unsigned)nChannels);
        error = 1;
    }
    if (nSamplesPerSec < 8000 || nSamplesPerSec > 192000) {
        fprintf(stderr, "error: invalid nSamplesPerSec %u\n", (unsigned)nSamplesPerSec);
        error = 1;
    }
    if (nAvgBytesPerSec != nSamplesPerSec * nBlockAlign) {
        fprintf(stderr, "error: invalid nAvgBytesPerSec %u (expected nSamplesPerSec * nBlockAlign)\n", (unsigned)nAvgBytesPerSec);
        error = 1;
    }
    if (nBlockAlign != 2) {
        fprintf(stderr, "error: invalid nBlockAlign %u (expected 2)\n", (unsigned)nBlockAlign);
        error = 1;
    }
    if (wBitsPerSample != 16) {
        fprintf(stderr, "error: invalid wBitsPerSample %u (expected 16)\n", (unsigned)wBitsPerSample);
        error = 1;
    }
    if (cbSize != 0) {
        fprintf(stderr, "error: invalid cbSize %u (expected 0)\n", (unsigned)cbSize);
        error = 1;
    }
    if (error)
        return EXIT_FAILURE;
 
    num_samples = dwOutSize/2;
 
    utk_init(infp, &ctx);
 
    write_u32(outfp, MAKE_U32('R','I','F','F'));
    write_u32(outfp, 36 + num_samples*2);
    write_u32(outfp, MAKE_U32('W','A','V','E'));
    write_u32(outfp, MAKE_U32('f','m','t',' '));
    write_u32(outfp, 16);
    write_u16(outfp, wFormatTag);
    write_u16(outfp, nChannels);
    write_u32(outfp, nSamplesPerSec);
    write_u32(outfp, nAvgBytesPerSec);
    write_u16(outfp, nBlockAlign);
    write_u16(outfp, wBitsPerSample);
    write_u32(outfp, MAKE_U32('d','a','t','a'));
    write_u32(outfp, num_samples*2);
 
    while (num_samples > 0) {
        int count = MIN(num_samples, 432);
 
        if (feof(infp)) {
            fprintf(stderr, "error: unexpected end of file in '%s'\n", infile);
            return EXIT_FAILURE;
        }
 
        utk_decode_frame(&ctx);
 
        for (i = 0; i < count; i++) {
            int x = ROUND(ctx.decompressed_frame[i]);
            write_u16(outfp, (int16_t)CLAMP(x, -32768, 32767));
        }
 
        num_samples -= count;
    }
 
    if (fclose(outfp) != 0) {
        fprintf(stderr, "error: failed to close '%s': %s\n", outfile, strerror(errno));
        return EXIT_FAILURE;
    }
 
    fclose(infp);
 
    return EXIT_SUCCESS;
}
 
static void read_bytes(FILE *fp, uint8_t *dest, size_t size)
{
    size_t bytes_copied;
 
    if (!size)
        return;
 
    bytes_copied = fread(dest, 1, size, fp);
    if (bytes_copied < size) {
        if (ferror(fp)) {
            fprintf(stderr, "error: fread failed: %s\n", strerror(errno));
            exit(EXIT_FAILURE);
        }
 
        /* We have reached EOF, so return zeros. */
        memset(dest+bytes_copied, 0, size-bytes_copied);
    }
}
 
static void write_bytes(FILE *fp, const uint8_t *dest, size_t size)
{
    if (!size)
        return;
 
    if (fwrite(dest, 1, size, fp) != size) {
        fprintf(stderr, "error: fwrite failed: %s\n", strerror(errno));
        exit(EXIT_FAILURE);
    }
}
 
static uint32_t read_u32(FILE *fp)
{
    uint8_t dest[4];
    read_bytes(fp, dest, sizeof(dest));
    return dest[0] | (dest[1] << 8) | (dest[2] << 16) | (dest[3] << 24);
}
 
static uint16_t read_u16(FILE *fp)
{
    uint8_t dest[2];
    read_bytes(fp, dest, sizeof(dest));
    return dest[0] | (dest[1] << 8);
}
 
static void write_u32(FILE *fp, uint32_t x)
{
    uint8_t dest[4];
    dest[0] = (uint8_t)x;
    dest[1] = (uint8_t)(x>>8);
    dest[2] = (uint8_t)(x>>16);
    dest[3] = (uint8_t)(x>>24);
    write_bytes(fp, dest, sizeof(dest));
}
 
static void write_u16(FILE *fp, uint16_t x)
{
    uint8_t dest[2];
    dest[0] = (uint8_t)x;
    dest[1] = (uint8_t)(x>>8);
    write_bytes(fp, dest, sizeof(dest));
}

Huffman tables[edit]

Normal model (Table 0):

Huffman code Output command Meaning Next model
0 0 4 Insert one 0.0f MDL_NORMAL
0 1 5 Insert one -1.0f MDL_NORMAL
1 0 6 Insert one +1.0f MDL_NORMAL
1 1 0 0 9 Insert one -2.0f MDL_LARGEPULSE
1 1 0 1 10 Insert one +2.0f MDL_LARGEPULSE
1 1 1 0 0 13 Insert one -3.0f MDL_LARGEPULSE
1 1 1 0 1 14 Insert one +3.0f MDL_LARGEPULSE
1 1 1 1 0 0 17 Insert one -4.0f MDL_LARGEPULSE
1 1 1 1 0 1 18 Insert one +4.0f MDL_LARGEPULSE
1 1 1 1 1 0 0 21 Insert one -5.0f MDL_LARGEPULSE
1 1 1 1 1 0 1 22 Insert one +5.0f MDL_LARGEPULSE
1 1 1 1 1 1 0 0 25 Insert one -6.0f MDL_LARGEPULSE
1 1 1 1 1 1 0 1 26 Insert one +6.0f MDL_LARGEPULSE
1 1 1 1 1 1 1 0 0 Insert a pulse with magnitude >= 7.0f MDL_LARGEPULSE
1 1 1 1 1 1 1 1 2 Insert between 7 and 70 0.0fs MDL_NORMAL

Large-pulse model (Table 1):

Huffman code Output command Meaning Next model
0 0 4 Insert one 0.0f MDL_NORMAL
0 1 0 7 Insert one -1.0f MDL_NORMAL
0 1 1 8 Insert one +1.0f MDL_NORMAL
1 0 0 11 Insert one -2.0f MDL_LARGEPULSE
1 0 1 12 Insert one +2.0f MDL_LARGEPULSE
1 1 0 0 15 Insert one -3.0f MDL_LARGEPULSE
1 1 0 1 16 Insert one +3.0f MDL_LARGEPULSE
1 1 1 0 0 19 Insert one -4.0f MDL_LARGEPULSE
1 1 1 0 1 20 Insert one +4.0f MDL_LARGEPULSE
1 1 1 1 0 0 23 Insert one -5.0f MDL_LARGEPULSE
1 1 1 1 0 1 24 Insert one +5.0f MDL_LARGEPULSE
1 1 1 1 1 0 0 27 Insert one -6.0f MDL_LARGEPULSE
1 1 1 1 1 0 1 28 Insert one +6.0f MDL_LARGEPULSE
1 1 1 1 1 1 0 1 Insert a pulse with magnitude >= 7.0f MDL_LARGEPULSE
1 1 1 1 1 1 1 3 Insert between 7 and 70 0.0fs MDL_NORMAL

Revision 3[edit]

MicroTalk Revision 3 does not seem to be used in any games(?) but can be created using the Electronic Arts Sound eXchange tool sx.exe (sx -mt5_blk input.dat -=output.dat).

The only difference is that each audio frame may optionally contain raw 16-bit PCM data, which overwrites n samples at a given offset in the frame.

The decoder logic changes from this:

while position < num_samples:
  utk_decode_frame();

to this:

while position < num_samples:
  raw_data_present = (read_byte() == 0xee);
  utk_decode_frame();
  reset_bit_reader();
  if (raw_data_present):
    offset = read_i16_be(); // 16-bit big-endian signed integer
    count = read_i16_be();
    if (offset < 0 || offset > 432)
        die("offset out of range");
    if (count < 0 || count > 432 - offset)
        die("count out of range");

    for (i = 0; i < count; i++):
      decompressed_frame[offset+i] = read_i16_be();

sx.exe reads the offset and count fields as 16-bit signed integers and does not do any bounds checking (see 004274D1 in sx.exe v3.01.01). This means a specially crafted MicroTalk Rev. 3 file can crash sx.exe. For completeness, we have added the bounds checking to the above pseudocode.