Silver's Simple Site - Weblog - 2010 - May - 10


Simis Jinx Binary File Format

This is the second of the 2nd-level (inner) formats for the Simis file format used by Microsoft Train Simulator.

Unlike the text format, the binary format has a simple binary header - but it should look rather familiar. The binary header has the same 16 characters as the text header, but this time they're encoded as single bytes and one of the characters is notably different: the 8th character is "b" for binary, rather than "t" for text (if there existed a single-byte text format this would have been the only difference with this format in the headers).

 00000010   4A 49 4E 58  30 .. .. 62  5F 5F 5F 5F  5F 5F 0D 0A   JINX0..b______..

Just like in the text format, the 2 missing bytes in the middle are the two characters (a letter then a number) that identify the 3rd-level format used. We will get to those next time.

Note: If the 1st-level format is using compression, this header is the first item within the compressed stream; for simplicity, all my examples will be for uncompressed files.

Now follows the actual data...

As I started talking about last time, Simis Jinx files are basic trees; the binary format is nothing more than an alternative representation of the same data. It is, however, more of a challenge to read and write correctly - something the 3rd-level formats will help deal with.

Each node in the tree has an 8 byte header, consisting of a 4 byte unsigned integer identifying the node's type and a 4 byte unsigned integer specifying the length of the contents. The contents consist of an optional name - 1 byte for length plus UTF16-LE characters - for the enclosing node and the child values and nodes.

There are a number of common types of value included:

  • Unsigned integer (4 bytes).
  • Signed integer (4 bytes).
  • Floating-point number (4 bytes).
  • String (2 bytes for length plus UTF16-LE characters).

Let's have a look at an example, GLOBAL\capview.iom, but remember that to correctly parse this I am using the 3rd-level format:

 00000000   53 49 4D 49  53 41 40 40  40 40 40 40  40 40 40 40   SIMISA@@@@@@@@@@

The standard 1st-level header...

 00000010   4A 49 4E 58  30 69 30 62  5F 5F 5F 5F  5F 5F 0D 0A   JINX0i0b______..

2nd-level header indicating a binary version of 3rd-level formal 'i0'.

 00000020   63 00 00 00  64 02 00 00                             c...d...

Node type is 99 (0x63), contents size is 612 bytes (0x264).

 00000020                             00                                 .

Node has no name.

 00000020                                01 00 00  00 00 00 00            .......
00000030   00                                                   .

Two unsigned integer values: 1 and 0.

 00000030      64 00 00  00 35 00 00  00 00                       d...5....

Node type is 100 (0x64), contents size is 53 bytes (0x35) and there's no name.

 00000030                                   CB 00  01 00                   ....

Unsigned integer value: 65,739 (0x100CB).

 00000030                                                15 00                 ..
00000040   6E 00 75 00  64 00 67 00  65 00 5F 00  63 00 61 00   n.u.d.g.e._.c.a.
00000050   62 00 63 00  6F 00 6E 00  74 00 72 00  6F 00 6C 00   b.c.o.n.t.r.o.l.
00000060   5F 00 6C 00  65 00 66 00  74 00                      _.l.e.f.t.

String value: length of 21 (0x15) plus 21 UTF16-LE characters "nudge_cabcontrol_left".

 00000060                                   00 00  00 00                   ....

Unsigned integer value: 0.

The 1 byte for no name, 8 bytes for two unsigned integer values plus 44 bytes for the string mean the total contents are up to 53 bytes - that means it is the end of this node type 100.

 00000060                                                64 00                 d.
00000070   00 00 37 00  00 00 00                                ..7....

Node type is 100 (0x64), contents size is 55 bytes (0x37) and there's no name.

Writing the above in text format would give:

 <node type 99> (
    1
    0
    <node type 100> (
        65739
        nudge_cabcontrol_left
        0
    )
    <node type 100> (
...

It's clear that we're missing the node type names found in the text files, and there's no indication whether some 4 bytes are a new node, an integer or float - some heuristics can work for this some of the time, I found, but in the end this is what the 3rd-level format is for.

Other 2nd-level formats

I've covered the two main Simis 2nd-level formats, but there are others; most notably, the texture files (.ace) are wrapped in a 1st-level Simis header with their own format inside. I won't be covering these other formats soon, as there are already tools that can handle these files sufficiently for Microsoft Train Simulator's needs, and the 3rd-level Simis formats are more interesting anyway.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 11:07PM on Sunday, 09 May, 2010 | Comments: 0

Powered by the Content Parser System, copyright 2002 - 2022 James G. Ross.