Silver's Simple Site - Weblog - 2010 - April


Simis File Format

Microsoft Train Simulator was made by Kuju Entertainment, formed by a management buyout of the Simis game studio in 1998, according to Wikipedia. Many of the files that form part of the game are in one of a number of formats bearing the "SIMISA" signature. For the purposes of discussing the file formats, I will only label the outer-most format the "Simis file format"; inner formats will be identified separately, later.

The Simis file format is identified a 16 byte or longer signature and header, with the remainder of the file being an inner format/data. The formats can all be identified by reading the first 8 bytes.

Uncompressed Simis File Format

 00000000   53 49 4D 49  53 41 40 40  40 40 40 40  40 40 40 40   SIMISA@@@@@@@@@@

Uncompressed files are simple; 16 fixed bytes ("SIMISA@@@@@@@@@@"), followed by the inner format/data.

Compressed Simis File Format

 00000000   53 49 4D 49  53 41 40 46  .. .. .. ..  40 40 40 40   SIMISA@F....@@@@
00000010   78 9C                                                x?

Compressed files are slightly more interesting; 8 fixed bytes ("SIMISA@F"), followed by a 4 byte unsigned integer containing the uncompressed size of the remaining data, and another 4 fixed bytes ("@@@@"). The data itself is compressed using zlib's DEFLATE algorithm, which is identified by the two bytes following the 16 bytes header; all subsequent data is the raw DEFLATE stream.

Unicode Text Simis File Format

 00000000   FF FE 53 00  49 00 4D 00  49 00 53 00  41 00 40 00   ÿþS.I.M.I.S.A.@.
00000010   40 00 40 00  40 00 40 00  40 00 40 00  40 00 40 00   @.@.@.@.@.@.@.@.
00000020   40 00                                                @.

This format is a variation on uncompressed, but for a specific type of inner format: Unicode text encoded with UTF16-LE. In this situation, two things are done to the header:

  • It is encoded as UTF16-LE characters like the inner format/data.
  • A UTF16-LE byte order mark is pre-pended.

Inner Formats

Whether the Simis file is compressed or not does not affect the inner formats allowed; any format may be compressed or uncompressed with no differences beyond that compression and the header indicating as much. It is unclear, however, how the Unicode text format would be compressed so, as I have not found any examples within Microsoft Train Simulator, I am considering them distinct at this level.

Permalink | Author: | Tags: Format, Games, Kuju, Microsoft, Simis, Train Simulator | Posted: 12:53AM on Friday, 09 April, 2010 | Comments: 0


Simis Jinx Unicode Text File Format

This is one of the 2nd-level (inner) formats for the Simis file format used by Microsoft Train Simulator.

As I mentioned last time, the 1st-level (outermost) format has an adjusted header when working with this 2nd-level format; in particular, the file starts with a UTF16-LE byte order mark and the header itself is Unicode text encoded as UTF-16LE. Here it is again:

 00000000   FF FE 53 00  49 00 4D 00  49 00 53 00  41 00 40 00   ÿþS.I.M.I.S.A.@.
00000010   40 00 40 00  40 00 40 00  40 00 40 00  40 00 40 00   @.@.@.@.@.@.@.@.
00000020   40 00                                                @.

The 2nd-level format identifies itself with its own header, unsurprisingly, starting with the text "JINX0", two values indicating the 3rd-level (inner inner) format, the text "t" indicating the Unicode text variety of this format, some padding (6 underscores) and a newline.

 00000020         4A 00  49 00 4E 00  58 00 30 00  .. .. .. ..     J.I.N.X.0.....
00000030   74 00 5F 00  5F 00 5F 00  5F 00 5F 00  5F 00 0D 00   t._._._._._._...
00000040   0A 00                                                ..

Those 4 missing bytes are the two characters (a letter then a number) that identify the 3rd-level format used. As text, the header of these files looks like the following line:

 SIMISA@@@@@@@@@@JINX0..t______

Now follows the actual data...

Simis Jinx files all store data in a tree-like structure, where each tree node has a type and optional name, and is intermixed with values. In other words, each node's children can be both nodes and values, including a mixture. The structure is marked out by parentheses ("(" and ")") for blocks, with values being raw or quoted strings (generally speaking, any value with no whitespace and no significant symbols - quotes, parentheses - can be left unquoted). Let's get straight to an example from Microsoft Train Simulator, the file GLOBAL\gui.txt:

 SIMISA@@@@@@@@@@JINX0I0t______

io_dev ( KEYB 0
    io_map ( T                 "sounddialog"        ALL_UP SHIFT_DOWN )
    io_map ( ESCAPE            "escape"             ALL_UP )
    io_map ( F1                "Help"               ALL_UP )
)

The header has identified this as containing 3rd-level format "I0"; for now, though, let's focus on the 2nd-level format. There is a single root node of type "io_dev" (and it has no name) which contains:

  • The value "KEYB".
  • The value "0".
  • Three nodes of type "io_map", each containing a selection of values but no further nodes.

This is just a simple example, but the format is pretty easy to read (although a little tricky to parse correctly); the 3rd-level format actually defines which node types and what nesting of them is allowed and which values should be where.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 04:48PM on Thursday, 22 April, 2010 | Comments: 0

Powered by the Content Parser System, copyright 2002 - 2024 James G. Ross.