Silver's Simple Site - Weblog - Tags - Format

Simis File Format

Microsoft Train Simulator was made by Kuju Entertainment, formed by a management buyout of the Simis game studio in 1998, according to Wikipedia. Many of the files that form part of the game are in one of a number of formats bearing the "SIMISA" signature. For the purposes of discussing the file formats, I will only label the outer-most format the "Simis file format"; inner formats will be identified separately, later.

The Simis file format is identified a 16 byte or longer signature and header, with the remainder of the file being an inner format/data. The formats can all be identified by reading the first 8 bytes.

Uncompressed Simis File Format

 00000000   53 49 4D 49  53 41 40 40  40 40 40 40  40 40 40 40   SIMISA@@@@@@@@@@

Uncompressed files are simple; 16 fixed bytes ("SIMISA@@@@@@@@@@"), followed by the inner format/data.

Compressed Simis File Format

 00000000   53 49 4D 49  53 41 40 46  .. .. .. ..  40 40 40 40   SIMISA@F....@@@@
00000010   78 9C                                                x?

Compressed files are slightly more interesting; 8 fixed bytes ("SIMISA@F"), followed by a 4 byte unsigned integer containing the uncompressed size of the remaining data, and another 4 fixed bytes ("@@@@"). The data itself is compressed using zlib's DEFLATE algorithm, which is identified by the two bytes following the 16 bytes header; all subsequent data is the raw DEFLATE stream.

Unicode Text Simis File Format

 00000000   FF FE 53 00  49 00 4D 00  49 00 53 00  41 00 40 00   ÿþS.I.M.I.S.A.@.
00000010   40 00 40 00  40 00 40 00  40 00 40 00  40 00 40 00   @.@.@.@.@.@.@.@.
00000020   40 00                                                @.

This format is a variation on uncompressed, but for a specific type of inner format: Unicode text encoded with UTF16-LE. In this situation, two things are done to the header:

  • It is encoded as UTF16-LE characters like the inner format/data.
  • A UTF16-LE byte order mark is pre-pended.

Inner Formats

Whether the Simis file is compressed or not does not affect the inner formats allowed; any format may be compressed or uncompressed with no differences beyond that compression and the header indicating as much. It is unclear, however, how the Unicode text format would be compressed so, as I have not found any examples within Microsoft Train Simulator, I am considering them distinct at this level.

Permalink | Author: | Tags: Format, Games, Kuju, Microsoft, Simis, Train Simulator | Posted: 12:53AM on Friday, 09 April, 2010 | Comments: 0

Simis Jinx Unicode Text File Format

This is one of the 2nd-level (inner) formats for the Simis file format used by Microsoft Train Simulator.

As I mentioned last time, the 1st-level (outermost) format has an adjusted header when working with this 2nd-level format; in particular, the file starts with a UTF16-LE byte order mark and the header itself is Unicode text encoded as UTF-16LE. Here it is again:

 00000000   FF FE 53 00  49 00 4D 00  49 00 53 00  41 00 40 00   ÿþS.I.M.I.S.A.@.
00000010   40 00 40 00  40 00 40 00  40 00 40 00  40 00 40 00   @.@.@.@.@.@.@.@.
00000020   40 00                                                @.

The 2nd-level format identifies itself with its own header, unsurprisingly, starting with the text "JINX0", two values indicating the 3rd-level (inner inner) format, the text "t" indicating the Unicode text variety of this format, some padding (6 underscores) and a newline.

 00000020         4A 00  49 00 4E 00  58 00 30 00  .. .. .. ..     J.I.N.X.0.....
00000030   74 00 5F 00  5F 00 5F 00  5F 00 5F 00  5F 00 0D 00   t._._._._._._...
00000040   0A 00                                                ..

Those 4 missing bytes are the two characters (a letter then a number) that identify the 3rd-level format used. As text, the header of these files looks like the following line:


Now follows the actual data...

Simis Jinx files all store data in a tree-like structure, where each tree node has a type and optional name, and is intermixed with values. In other words, each node's children can be both nodes and values, including a mixture. The structure is marked out by parentheses ("(" and ")") for blocks, with values being raw or quoted strings (generally speaking, any value with no whitespace and no significant symbols - quotes, parentheses - can be left unquoted). Let's get straight to an example from Microsoft Train Simulator, the file GLOBAL\gui.txt:


io_dev ( KEYB 0
    io_map ( T                 "sounddialog"        ALL_UP SHIFT_DOWN )
    io_map ( ESCAPE            "escape"             ALL_UP )
    io_map ( F1                "Help"               ALL_UP )

The header has identified this as containing 3rd-level format "I0"; for now, though, let's focus on the 2nd-level format. There is a single root node of type "io_dev" (and it has no name) which contains:

  • The value "KEYB".
  • The value "0".
  • Three nodes of type "io_map", each containing a selection of values but no further nodes.

This is just a simple example, but the format is pretty easy to read (although a little tricky to parse correctly); the 3rd-level format actually defines which node types and what nesting of them is allowed and which values should be where.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 04:48PM on Thursday, 22 April, 2010 | Comments: 0

Simis Jinx Binary File Format

This is the second of the 2nd-level (inner) formats for the Simis file format used by Microsoft Train Simulator.

Unlike the text format, the binary format has a simple binary header - but it should look rather familiar. The binary header has the same 16 characters as the text header, but this time they're encoded as single bytes and one of the characters is notably different: the 8th character is "b" for binary, rather than "t" for text (if there existed a single-byte text format this would have been the only difference with this format in the headers).

 00000010   4A 49 4E 58  30 .. .. 62  5F 5F 5F 5F  5F 5F 0D 0A   JINX0..b______..

Just like in the text format, the 2 missing bytes in the middle are the two characters (a letter then a number) that identify the 3rd-level format used. We will get to those next time.

Note: If the 1st-level format is using compression, this header is the first item within the compressed stream; for simplicity, all my examples will be for uncompressed files.

Now follows the actual data...

As I started talking about last time, Simis Jinx files are basic trees; the binary format is nothing more than an alternative representation of the same data. It is, however, more of a challenge to read and write correctly - something the 3rd-level formats will help deal with.

Each node in the tree has an 8 byte header, consisting of a 4 byte unsigned integer identifying the node's type and a 4 byte unsigned integer specifying the length of the contents. The contents consist of an optional name - 1 byte for length plus UTF16-LE characters - for the enclosing node and the child values and nodes.

There are a number of common types of value included:

  • Unsigned integer (4 bytes).
  • Signed integer (4 bytes).
  • Floating-point number (4 bytes).
  • String (2 bytes for length plus UTF16-LE characters).

Let's have a look at an example, GLOBAL\capview.iom, but remember that to correctly parse this I am using the 3rd-level format:

 00000000   53 49 4D 49  53 41 40 40  40 40 40 40  40 40 40 40   SIMISA@@@@@@@@@@

The standard 1st-level header...

 00000010   4A 49 4E 58  30 69 30 62  5F 5F 5F 5F  5F 5F 0D 0A   JINX0i0b______..

2nd-level header indicating a binary version of 3rd-level formal 'i0'.

 00000020   63 00 00 00  64 02 00 00                             c...d...

Node type is 99 (0x63), contents size is 612 bytes (0x264).

 00000020                             00                                 .

Node has no name.

 00000020                                01 00 00  00 00 00 00            .......
00000030   00                                                   .

Two unsigned integer values: 1 and 0.

 00000030      64 00 00  00 35 00 00  00 00                       d...5....

Node type is 100 (0x64), contents size is 53 bytes (0x35) and there's no name.

 00000030                                   CB 00  01 00                   ....

Unsigned integer value: 65,739 (0x100CB).

 00000030                                                15 00                 ..
00000040   6E 00 75 00  64 00 67 00  65 00 5F 00  63 00 61 00   n.u.d.g.e._.c.a.
00000050   62 00 63 00  6F 00 6E 00  74 00 72 00  6F 00 6C 00   b.c.o.n.t.r.o.l.
00000060   5F 00 6C 00  65 00 66 00  74 00                      _.l.e.f.t.

String value: length of 21 (0x15) plus 21 UTF16-LE characters "nudge_cabcontrol_left".

 00000060                                   00 00  00 00                   ....

Unsigned integer value: 0.

The 1 byte for no name, 8 bytes for two unsigned integer values plus 44 bytes for the string mean the total contents are up to 53 bytes - that means it is the end of this node type 100.

 00000060                                                64 00                 d.
00000070   00 00 37 00  00 00 00                                ..7....

Node type is 100 (0x64), contents size is 55 bytes (0x37) and there's no name.

Writing the above in text format would give:

 <node type 99> (
    <node type 100> (
    <node type 100> (

It's clear that we're missing the node type names found in the text files, and there's no indication whether some 4 bytes are a new node, an integer or float - some heuristics can work for this some of the time, I found, but in the end this is what the 3rd-level format is for.

Other 2nd-level formats

I've covered the two main Simis 2nd-level formats, but there are others; most notably, the texture files (.ace) are wrapped in a 1st-level Simis header with their own format inside. I won't be covering these other formats soon, as there are already tools that can handle these files sufficiently for Microsoft Train Simulator's needs, and the 3rd-level Simis formats are more interesting anyway.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 11:07PM on Sunday, 09 May, 2010 | Comments: 0

Simis Jinx 3rd Level File Formats

The Simis file format with the 2nd-level Unicode text and binary Jinx formats are a pretty generic set of formats; they contain an arbitrarily nested tree structure with strings, integer and floating point numbers at any level. To actually interpret and describe the contents, a 3rd level of formats is needed.

As mentioned in both Simis Jinx Unicode Text File Format and Simis Jinx Binary File Format, this 3rd level of formats is identified by a letter and a number - and there are quite a lot of them. To actually define these formats in a useful way, though, we need to use another format - Backus-Naur Form (BNF). The exact format I've used is a variant of the standard Backus-Naur Form derived from the BNF files that shipped with Microsoft Train Simulator itself (in the UTILS\FFEDIT directory).

Train Simulator Backus-Naur Form

The BNF files are text; new lines have no significance; any of ASCII, UTF-8 and UTF-16 character encodings can be used, provided a byte order mark is included to identify UTF-8 and UTF-16. The files are made up of a number of definitions and productions - in any order - and a special termination marker.

Definitions specify a shared or standalone expression. Any other expression can reference it and has their reference expanded to the expression on the right-hand side of the equals ("=").

Productions specify, through the expression on the right of the arrow ("==>"), what is allowed/expected inside the block identified by the name on the left.

The expressions in both definitions and productions contain a space-separated list of items, each of which can be:

  • A string literal, e.g. "Activity".
  • A pre-defined data type, e.g. :sint. Available data types:
    • uint
    • sint
    • dword
    • string

    Data types can additionally be named, by including a comma and identifier after the type, e.g. :sint,TileX.

  • Another production or definition, e.g. :Tr_Activity.

There are three operators allowed within expressions:

  • Square brackets, denoting an optional section, e.g. [:Description].
  • Curly brackets, denoting a repeatable section (1 or more times), e.g. {:UiD :SidingItem}.
  • Pipe symbol, denoting a choice between sections, e.g. :Engine|:Wagon.

The choice operator (pipe) binds tighter than whitespace. Therefore, the expression :foo :bar|:baz means "foo followed by either bar or baz".

The end of an expression is denoted by a period (".").

Comments can be placed anywhere whitespace is allowed and use the common multi-line comment syntax of "/*" to start and "*/" to finish.

Termination of the BNF is indicated by the identifier "EOF". Everything after this is completely ignored.

3rd-level Format BNFs

Here's the current route car spawn.bnf as an example:

/* File format information */
FILE                          = :uint,Count [{:CarSpawnerItem}] .
FILE_NAME                     = "Route Car Spawn" .
FILE_EXT                      = "carspawn.dat" .
FILE_TYPE                     = "v" .
FILE_TYPE_VER                 = "1" .

/* Base types */
CarSpawnerItem                ==> :string :uint .

/* Format types */

EOF                           /* End of file */

All BNFs for the tools are required to have the five definitions shown above, so that the various programs can use them. FILE_TYPE and FILE_TYPE_VER are the letter and number (both as strings) used in all Simis Jinx files. FILE_EXT is either a file extension (e.g. "act") or a filename (e.g. "carspawn.dat") which selects which files can contain this format. FILE_NAME is a name suitable for displaying to the user. FILE is an expression representing the root of the file - the base of all parsing.

Binary Block Type Names

While the BNFs define what is allowed where, there is still one remaining problem for the Simis Jinx Binary format - each block type is identified by a number, not a string. For this, we can turn to some other files included with the original Train Simulator - the files in UTILS\FFEDIT.

  • sidn.txt defines a few base IDs, including "core" and "train" (0 and 4 respectively).
  • coreids.tok contains a list of all core "tokens" - i.e. block type names - in order of the numerical value.
  • appids.tok is a C header which includes forms.hdr and loadstr.hdr with a token defined before and after each inclusion.
  • forms.hdr and loadstr.hdr contain lists of all MSTS tokens in numerical order.

To construct the 32bit unsigned number used in the Simis Jinx Binary file format, the base ID and the token ID (from its position) are combined with the base forming the most significant 16bits and the token the least significant 16bits. E.g. the 7th "train" token would be 0x00040007.


Together with the BNFs, the number-block type name mapping completes the picture for loading and saving Simis Jinx files. However, as the BNFs are of my own construction, they are necessarily incomplete and possibly still inaccurate in some areas. This has improved a lot over the past few months, and will continue to do so, providing a good, solid and generic reading and writing capability for most Simis Jinx files.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 11:55PM on Sunday, 23 May, 2010 | Modified: 12:02AM on Monday, 24 May, 2010 | Comments: 0

Powered by the Content Parser System, copyright 2002 - 2022 James G. Ross.