RefPack

RefPack is an LZ77/LZSS compression format made by Frank Barchard of EA Canada for the Gimex library used by many older games by EA. In The Sims Online, RefPack bitstreams are held in Stream files. In Need For Speed II SE, the first game to use RefPack, RefPack data is found in Gimex (GMX) files, and in Command & Conquer 3, this data is found in BIG files. Other games may keep RefPack-compressed data in standalone files with no container.

KUDr is credited with reverse-engineering RefPack for Command & Conquer 3.

The values for RefPack are occasionally tweaked on a per-game basis (as observed in SimCity 4). The Sims Online uses the normal values defined in EA's default implementation of RefPack: it is these values that are described in this article.

Functions
In The Sims Online New & Improved Trial version, the RefPack decompression code is duplicated verbatim across many of the game dlls. The copy used for decompressing tuning.dat and censor.dat is located at TSOServiceClientD_base+0x724fd.

Following is a verbatim translation of TSOServiceClientD_base+0x724fd from x86 assembly into C.

Here is a safe reimplementation of the function:

Bitstream specification
In this article, bytes are rendered in big-endian bit order, that is, from the most significant bit of the first byte to the least significant bit of the first byte, followed by the most significant bit of the second byte, and so on.

Header
The RefPack header begins with a 1-byte Flags field, followed by a 1-byte magic number equal to 0xFB:

The large-files flag ("L") specifies that the decompressed field and (if applicable) the compressed size field are 4-byte fields; if this flag is unset, both of these fields are 3-byte fields. The unknown flag ("U") is currently unknown. The compressed-size-present flag ("C") specifies that the compressed size field is present.

Note that The Sims Online only recognizes the compressed-size-present flag; the large-files flag and the unknown flag were not introduced until SimCity 4 and Command & Conquer 3.

Afterwards, if the compressed-size-present flag is both set and recognized by the decoder, the compressed size field follows in big-endian format. It is unknown if this field starts counting from the beginning of the RefPack bitstream or from after this field, because no file in the game specifies this field, and the game appears to always ignore this field.

Afterwards, the decompressed size field follows in big-endian format.

Commands
Following the header is a series of commands, each containing a 1- to 4-byte opcode followed by its proceeding data.

Each opcode specifies three parameters:
 * Proceeding data length - The length of the data that follows the opcode and which is to be appended to the output buffer; this data is called the proceeding data.
 * Referenced data length - The length of the data which is to be copied one byte at a time from earlier in the output buffer and appended to the output buffer directly after the proceeding data; this data is called the referenced data.
 * Referenced data distance - The distance to the beginning of the referenced data from the current position in the output buffer, after the proceeding data.

Like any other LZ77 algorithm, the referenced data's source pointer may overrun into the initial value of the destination pointer. This happens when Referenced data length is longer than Referenced data distance. This is considered legal: in this case, the decoder shall copy one byte at a time, or, equivalently, the decoder shall copy and paste the first Referenced data length bytes of the referenced data repeatedly until the correct length is met.

The Referenced data distance field of any command must not be greater than the total number of bytes in the output buffer at the time of the reference copy. (If this rule is violated, the decoder in The Sims Online will break.)

Additionally, the stop command terminates the decoder after the command's proceeding data has been copied. The stop command must appear exactly once after all ordinary (non-stop) commands. (If the stop command does not appear, the decoder in The Sims Online will break.)

SimCity 4 uses a modified definition for the 4-byte opcode; it appears that this is the only game to use a modified version of RefPack.

1-byte command
The 1-byte command follows one of two definitions: the ordinary 1-byte command and the stop command.

If the numerical value of the opcode byte is less than 0xFC, then the command is the ordinary 1-byte command and follows this format:

Otherwise, if the numerical value of the opcode is greater than or equal to 0xFC, then the command is the stop command and follows this format:

Naming notes
The original RefPack.cpp written by Frank Barchard was released by WCNews: http://download.wcnews.com/files/documents/sourcecode/shadowforce/transfer/asommers/mfcapp_src/engine/compress/RefPack.cpp

In The Sims Online, this file was split into refencode.cpp and refdecode.cpp, according to TSOGimexD.dll, although these files have not been leaked.