Difference between revisions of "BIN (File Format)"

From MK8
Jump to: navigation, search
(Unknown header field is header size)
(Update article after revisiting format with current research)
Line 1: Line 1:
'''BIN''' is a file format used in [[Mario Kart 8]] to store binary lookup tables. They consist of several sections as defined in the file / main header, in which there are groups containing specific elements of data of a given type. They are found in the [[Filesystem/content/common/mush]] directory.
+
'''BIN''' is a file format used in [[Mario Kart 8]] to store binary lookup tables, like item probabilities at distances to the lead racer, the distances for this themselves, course information, audio information, controller mappings (possibly), kart body / tire / glider settings, engine statistics or (in Mario Kart 8 Deluxe) software racer skill (AI) and user interface configuration.
  
The format is similar to the one used in '''Mario Kart 7''', though it strips a lot of textual data describing the entries of the groups, which makes giving meaning to the values of the elements harder, though many elements are simply evolved or even ported from Mario Kart 7.
+
BIN files are found in the [[Filesystem/content/common/mush]] directory.
  
They are used to lookup typical tabular data like the item probabilities at distances to the lead racer, the distances for this themselves, course information, audio information, controller mappings (possibly), kart body / tire / glider settings or engine statistics.
+
== Format ==
  
= Format =
+
Each BIN file consists of several sections as defined in the file header. The sections have a unique identifier, which the game uses to determine how to parse the data available after the section header.
  
Mario Kart 8 (Wii U) stores the files in big endian, Mario Kart 8 Deluxe (Switch) in little endian.
+
Mario Kart 8 (Wii U) stores the files in big endian, Mario Kart 8 Deluxe in little endian.
  
The generic file format used in both games only differs in the section header, with Deluxe storing additional 4 bytes of unknown purpose. Of course, the element counts in specific sections also differ to store new / different driver and kart statistics, but this information is not known or required at the base format layer.
+
The format is similar to the one used in '''Mario Kart 7''', though it strips a lot of textual data describing the entries of the groups, which makes giving meaning to the values of the elements harder, though many elements are simply evolved or even ported from Mario Kart 7.
  
 
In the following, C# data types are used to describe the data.
 
In the following, C# data types are used to describe the data.
  
== Header ==
+
=== File Header ===
  
 
Each BIN file starts with a main header which provides information about the available sections in the file.
 
Each BIN file starts with a main header which provides information about the available sections in the file.
 
{| class="wikitable"
 
{| class="wikitable"
! Offset
+
! Offset !! Size !! Type !! Description
! Size
+
! Type
+
! Description
+
 
|-
 
|-
| 0x00
+
| 0x00 || 4 || uint || '''File identifier'''. Takes the form of a 4 character ASCII string acronym of the file's purpose.
| 4
+
| uint
+
| '''File identifier'''. Takes the form of a 4 character long ASCII string acronym of the file's purpose.
+
 
|-
 
|-
| 0x04
+
| 0x04 || 4 || int || '''File size''' in bytes. Sometimes slightly off from the real file size, unclear why this is the case.
| 4
+
| int
+
| '''File size''' in bytes. Sometimes slightly off from the real file size, unclear why this is the case.
+
 
|-
 
|-
| 0x08
+
| 0x08 || 2 || short || '''Number of sections''' following the header.
| 2
+
| short
+
| '''Number of sections''' following the header.
+
 
|-
 
|-
| 0x0A
+
| 0x0A || 2 || short || '''Header size'''. Required to compute absolute offsets to the sections (s. below).
| 2
+
| short
+
| '''Header size'''. Required to compute offsets to sections (s. below).
+
 
|-
 
|-
| 0x0C
+
| 0x0C || 4 || int || '''Version number'''. Always 1000. Can be used to determine endianness.
| 4
+
| int
+
| '''Version number'''. Always 1000.
+
 
|-
 
|-
| 0x10
+
| 0x10 || 4 * Number of sections || int[numberOfSections] || '''Offsets to the sections''', relative to the end of the header.
| 4 * Number of sections
+
| int[numberOfSections]
+
| '''Offsets to the sections''', relative to the end of the header.
+
This means that the first offset is always 0, as the first section always starts directly after the header.
+
 
|}
 
|}
  
== Section ==
+
=== Section ===
 +
 
 +
A section begins with a section header which stores a unique identifier and some parameters required to parse the following section data. The start of each section is aligned by 4 bytes.
  
Each section header at the offsets given in the main header describes the number of groups of data in the section, the number of elements in each group, and the type of the elements.
 
 
{| class="wikitable"
 
{| class="wikitable"
! Offset
+
! Offset !! Size !! Type !! Description
! Size
+
! Type
+
! Description
+
 
|-
 
|-
| 0x00
+
| 0x00 || 4 || uint || '''Section identifier'''. Takes the form of a 4 character ASCII string acronym of the section's purpose.
| 4
+
| uint
+
| '''Section identifier'''. Takes the form of a 4 character long ASCII string acronym of the section's purpose.
+
 
|-
 
|-
| 0x04
+
| 0x04 || 2 || short || '''Count''' parameter. Typically used to specify the number of elements in the data arrays (s. next parameter).
| 2
+
| short
+
| '''Element count'''. The number of elements in a group.
+
 
|-
 
|-
| 0x06
+
| 0x06 || 2 || short || '''Repeat''' parameter. Typically used to specify the number of data arrays in this section.
| 2
+
| short
+
| '''Group count'''. The number of groups in the section.
+
 
|-
 
|-
| 0x08
 
| 4
 
| int
 
| '''Type ID'''. Depending on this value, the format of the section data is different.
 
 
|- bgcolor="#AAAAFF"
 
|- bgcolor="#AAAAFF"
| colspan="4" align="center" | '''if Mario Kart 8 Deluxe'''
+
| colspan="4" align="center" | '''if Mario Kart 8'''
 
|- bgcolor="#DDDDFF"
 
|- bgcolor="#DDDDFF"
| 0x0C
+
| 0x08 || 4 || int || '''Additional''' parameter. Can be used to assume the type of data following, but this is often not correct.
| 4
+
|-
| int
+
|- bgcolor="#FFAAFF"
| {{Unknown|Unknown value, might just be a longer type ID.}}
+
| colspan="4" align="center" | '''if Mario Kart 8 Deluxe'''
 +
|- bgcolor="#FFDDFF"
 +
| 0x08 || 4 || int || '''Unknown''' parameter.
 +
|- bgcolor="#FFDDFF"
 +
| 0x0C || 4 || int || '''Additional''' parameter. Can be used to assume the type of data following, but this is often not correct.
 
|}
 
|}
  
Groups are simply arrays of the given number of elements. Depending on the type ID provided in the section header, the format (and thus size) of the elements differs. The type IDs seem to be the same as the ones used in Mario Kart 7.
+
=== Section Data ===
  
The start and end of each section is aligned by 4 bytes, which is important for those sections containing strings.
+
Depending on the ''Section identifier'', the game knows how to interprete the data following in the section, with the help of the common parameters given in the section header. It can be assumed that the data is directly copied into console memory and then casted to structures. Thus, it is required to know which ''Section identifier'' maps to which structure.
  
=== Dword Array (Type 0) ===
+
However, even without knowing these structures, several generic section data formats are seen, which are described in the following. This list is not exhaustive.
  
The group elements are arrays with 4-byte numerical data types (either integer or floats, or both mixed) of a specific length. The game knows the data type of each element and array length by switching on the section identifier (it can be assumed it directly reads those into structures). Without this knowledge, the length of the element arrays can still be computed with the following formula:
+
==== 3-dimensional Dword Array ====
  
elementLength = sectionSizeWithoutHeader / (groupCount * elementCount) / sizeof(int)
+
The section data consists of a 3-dimensional array of 4-byte integer or float values. The length of each dimension is given in the section header, except for the last dimension, which has to be computed manually if the resulting structure is not known:
  
 
{| class="wikitable"
 
{| class="wikitable"
! Offset
+
! Dimension !! Length
! Size
+
! Type
+
! Description
+
 
|-
 
|-
| 0x00
+
| 0 || '''Repeat''' parameter given in section header.
| 4 * elementCount * elementLength
+
|-
| int[elementCount][elementLength] or
+
| 1 || '''Count''' parameter given in section header.
float[elementCount][elementLength]
+
|-
| '''Integer or float arrays'''.
+
| 2 || section data size / '''Repeat''' / '''Count''' / sizeof(int)
 
|}
 
|}
  
=== String Array (Type 160) ===
+
==== 2-dimensional String Array ====
 +
 
 +
The data in this section are arrays of offsets pointing to 0-terminated strings following them. The number of string arrays is given in the '''Repeat''' and the length of each array in the '''Count''' parameter of the section header. Each array is stored as follows:
  
The group elements are an array of offsets pointing to elements of the following array of 0-terminated strings.
 
 
{| class="wikitable"
 
{| class="wikitable"
! Offset
+
! Size !! Type !! Description
! Size
+
! Type
+
! Description
+
 
|-
 
|-
| 0x00
+
| 4 * elementCount || int[elementCount] || '''String offsets''', relative to the end of the last offset. E.g., the first offset is always 0.
| 4 * elementCount
+
| int[elementCount]
+
| '''String offsets''', relative to the end of the last offset. E.g., the first offset is always 0.
+
 
|-
 
|-
| -
+
| - || string[elementCount] || '''Strings''', 0-terminated, pointed to by the preceeding offsets.
| -
+
| string[elementCount]
+
| '''Strings''', 0-terminated, pointed to by the preceeding offsets.
+
 
|}
 
|}
  
=== Other Types ===
+
==== 2-dimensional Indexed String Array ====
  
There are several other types holding other values (most ending with a list of strings) which have not been covered here yet.
+
The data in this section are arrays of increasing indices and offsets pointing to 0-terminated strings following them. The number of string arrays is given in the '''Repeat''' and the length of each array in the '''Count''' parameter of the section header. Each array is stored as follows:
 +
 
 +
{| class="wikitable"
 +
! Size !! Type !! Description
 +
|-
 +
| 8 * elementCount || Entry[elementCount] || '''String entries''', each stored as follows:
 +
|- bgcolor="#DDDDDD"
 +
| 1 || Byte || '''Index''' of this string. Starts at 0 and increased by 1.
 +
|- bgcolor="#DDDDDD"
 +
| 3 || - || '''Padding'''.
 +
|- bgcolor="#DDDDDD"
 +
| 4 || int || '''String offset''', relative to the end of the last offset. E.g., the first offset is always 0.
 +
|-
 +
| - || string[elementCount] || '''Strings''', 0-terminated, pointed to by the preceeding offsets.
 +
|}
  
= Tools =
+
== Tools ==
  
 
The following tools can operate on BIN files:
 
The following tools can operate on BIN files:

Revision as of 14:00, 13 September 2017

BIN is a file format used in Mario Kart 8 to store binary lookup tables, like item probabilities at distances to the lead racer, the distances for this themselves, course information, audio information, controller mappings (possibly), kart body / tire / glider settings, engine statistics or (in Mario Kart 8 Deluxe) software racer skill (AI) and user interface configuration.

BIN files are found in the Filesystem/content/common/mush directory.

Format

Each BIN file consists of several sections as defined in the file header. The sections have a unique identifier, which the game uses to determine how to parse the data available after the section header.

Mario Kart 8 (Wii U) stores the files in big endian, Mario Kart 8 Deluxe in little endian.

The format is similar to the one used in Mario Kart 7, though it strips a lot of textual data describing the entries of the groups, which makes giving meaning to the values of the elements harder, though many elements are simply evolved or even ported from Mario Kart 7.

In the following, C# data types are used to describe the data.

File Header

Each BIN file starts with a main header which provides information about the available sections in the file.

Offset Size Type Description
0x00 4 uint File identifier. Takes the form of a 4 character ASCII string acronym of the file's purpose.
0x04 4 int File size in bytes. Sometimes slightly off from the real file size, unclear why this is the case.
0x08 2 short Number of sections following the header.
0x0A 2 short Header size. Required to compute absolute offsets to the sections (s. below).
0x0C 4 int Version number. Always 1000. Can be used to determine endianness.
0x10 4 * Number of sections int[numberOfSections] Offsets to the sections, relative to the end of the header.

Section

A section begins with a section header which stores a unique identifier and some parameters required to parse the following section data. The start of each section is aligned by 4 bytes.

Offset Size Type Description
0x00 4 uint Section identifier. Takes the form of a 4 character ASCII string acronym of the section's purpose.
0x04 2 short Count parameter. Typically used to specify the number of elements in the data arrays (s. next parameter).
0x06 2 short Repeat parameter. Typically used to specify the number of data arrays in this section.
if Mario Kart 8
0x08 4 int Additional parameter. Can be used to assume the type of data following, but this is often not correct.
if Mario Kart 8 Deluxe
0x08 4 int Unknown parameter.
0x0C 4 int Additional parameter. Can be used to assume the type of data following, but this is often not correct.

Section Data

Depending on the Section identifier, the game knows how to interprete the data following in the section, with the help of the common parameters given in the section header. It can be assumed that the data is directly copied into console memory and then casted to structures. Thus, it is required to know which Section identifier maps to which structure.

However, even without knowing these structures, several generic section data formats are seen, which are described in the following. This list is not exhaustive.

3-dimensional Dword Array

The section data consists of a 3-dimensional array of 4-byte integer or float values. The length of each dimension is given in the section header, except for the last dimension, which has to be computed manually if the resulting structure is not known:

Dimension Length
0 Repeat parameter given in section header.
1 Count parameter given in section header.
2 section data size / Repeat / Count / sizeof(int)

2-dimensional String Array

The data in this section are arrays of offsets pointing to 0-terminated strings following them. The number of string arrays is given in the Repeat and the length of each array in the Count parameter of the section header. Each array is stored as follows:

Size Type Description
4 * elementCount int[elementCount] String offsets, relative to the end of the last offset. E.g., the first offset is always 0.
- string[elementCount] Strings, 0-terminated, pointed to by the preceeding offsets.

2-dimensional Indexed String Array

The data in this section are arrays of increasing indices and offsets pointing to 0-terminated strings following them. The number of string arrays is given in the Repeat and the length of each array in the Count parameter of the section header. Each array is stored as follows:

Size Type Description
8 * elementCount Entry[elementCount] String entries, each stored as follows:
1 Byte Index of this string. Starts at 0 and increased by 1.
3 - Padding.
4 int String offset, relative to the end of the last offset. E.g., the first offset is always 0.
- string[elementCount] Strings, 0-terminated, pointed to by the preceeding offsets.

Tools

The following tools can operate on BIN files: