Silver's Simple Site - Weblog - 2010

Simis File Format

Microsoft Train Simulator was made by Kuju Entertainment, formed by a management buyout of the Simis game studio in 1998, according to Wikipedia. Many of the files that form part of the game are in one of a number of formats bearing the "SIMISA" signature. For the purposes of discussing the file formats, I will only label the outer-most format the "Simis file format"; inner formats will be identified separately, later.

The Simis file format is identified a 16 byte or longer signature and header, with the remainder of the file being an inner format/data. The formats can all be identified by reading the first 8 bytes.

Uncompressed Simis File Format

 00000000   53 49 4D 49  53 41 40 40  40 40 40 40  40 40 40 40   SIMISA@@@@@@@@@@

Uncompressed files are simple; 16 fixed bytes ("SIMISA@@@@@@@@@@"), followed by the inner format/data.

Compressed Simis File Format

 00000000   53 49 4D 49  53 41 40 46  .. .. .. ..  40 40 40 40   SIMISA@F....@@@@
00000010   78 9C                                                x?

Compressed files are slightly more interesting; 8 fixed bytes ("SIMISA@F"), followed by a 4 byte unsigned integer containing the uncompressed size of the remaining data, and another 4 fixed bytes ("@@@@"). The data itself is compressed using zlib's DEFLATE algorithm, which is identified by the two bytes following the 16 bytes header; all subsequent data is the raw DEFLATE stream.

Unicode Text Simis File Format

 00000000   FF FE 53 00  49 00 4D 00  49 00 53 00  41 00 40 00   ÿþS.I.M.I.S.A.@.
00000010   40 00 40 00  40 00 40 00  40 00 40 00  40 00 40 00   @.@.@.@.@.@.@.@.
00000020   40 00                                                @.

This format is a variation on uncompressed, but for a specific type of inner format: Unicode text encoded with UTF16-LE. In this situation, two things are done to the header:

  • It is encoded as UTF16-LE characters like the inner format/data.
  • A UTF16-LE byte order mark is pre-pended.

Inner Formats

Whether the Simis file is compressed or not does not affect the inner formats allowed; any format may be compressed or uncompressed with no differences beyond that compression and the header indicating as much. It is unclear, however, how the Unicode text format would be compressed so, as I have not found any examples within Microsoft Train Simulator, I am considering them distinct at this level.

Permalink | Author: | Tags: Format, Games, Kuju, Microsoft, Simis, Train Simulator | Posted: 12:53AM on Friday, 09 April, 2010 | Comments: 0

Simis Jinx Unicode Text File Format

This is one of the 2nd-level (inner) formats for the Simis file format used by Microsoft Train Simulator.

As I mentioned last time, the 1st-level (outermost) format has an adjusted header when working with this 2nd-level format; in particular, the file starts with a UTF16-LE byte order mark and the header itself is Unicode text encoded as UTF-16LE. Here it is again:

 00000000   FF FE 53 00  49 00 4D 00  49 00 53 00  41 00 40 00   ÿþS.I.M.I.S.A.@.
00000010   40 00 40 00  40 00 40 00  40 00 40 00  40 00 40 00   @.@.@.@.@.@.@.@.
00000020   40 00                                                @.

The 2nd-level format identifies itself with its own header, unsurprisingly, starting with the text "JINX0", two values indicating the 3rd-level (inner inner) format, the text "t" indicating the Unicode text variety of this format, some padding (6 underscores) and a newline.

 00000020         4A 00  49 00 4E 00  58 00 30 00  .. .. .. ..     J.I.N.X.0.....
00000030   74 00 5F 00  5F 00 5F 00  5F 00 5F 00  5F 00 0D 00   t._._._._._._...
00000040   0A 00                                                ..

Those 4 missing bytes are the two characters (a letter then a number) that identify the 3rd-level format used. As text, the header of these files looks like the following line:


Now follows the actual data...

Simis Jinx files all store data in a tree-like structure, where each tree node has a type and optional name, and is intermixed with values. In other words, each node's children can be both nodes and values, including a mixture. The structure is marked out by parentheses ("(" and ")") for blocks, with values being raw or quoted strings (generally speaking, any value with no whitespace and no significant symbols - quotes, parentheses - can be left unquoted). Let's get straight to an example from Microsoft Train Simulator, the file GLOBAL\gui.txt:


io_dev ( KEYB 0
    io_map ( T                 "sounddialog"        ALL_UP SHIFT_DOWN )
    io_map ( ESCAPE            "escape"             ALL_UP )
    io_map ( F1                "Help"               ALL_UP )

The header has identified this as containing 3rd-level format "I0"; for now, though, let's focus on the 2nd-level format. There is a single root node of type "io_dev" (and it has no name) which contains:

  • The value "KEYB".
  • The value "0".
  • Three nodes of type "io_map", each containing a selection of values but no further nodes.

This is just a simple example, but the format is pretty easy to read (although a little tricky to parse correctly); the 3rd-level format actually defines which node types and what nesting of them is allowed and which values should be where.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 04:48PM on Thursday, 22 April, 2010 | Comments: 0

Simis Jinx Binary File Format

This is the second of the 2nd-level (inner) formats for the Simis file format used by Microsoft Train Simulator.

Unlike the text format, the binary format has a simple binary header - but it should look rather familiar. The binary header has the same 16 characters as the text header, but this time they're encoded as single bytes and one of the characters is notably different: the 8th character is "b" for binary, rather than "t" for text (if there existed a single-byte text format this would have been the only difference with this format in the headers).

 00000010   4A 49 4E 58  30 .. .. 62  5F 5F 5F 5F  5F 5F 0D 0A   JINX0..b______..

Just like in the text format, the 2 missing bytes in the middle are the two characters (a letter then a number) that identify the 3rd-level format used. We will get to those next time.

Note: If the 1st-level format is using compression, this header is the first item within the compressed stream; for simplicity, all my examples will be for uncompressed files.

Now follows the actual data...

As I started talking about last time, Simis Jinx files are basic trees; the binary format is nothing more than an alternative representation of the same data. It is, however, more of a challenge to read and write correctly - something the 3rd-level formats will help deal with.

Each node in the tree has an 8 byte header, consisting of a 4 byte unsigned integer identifying the node's type and a 4 byte unsigned integer specifying the length of the contents. The contents consist of an optional name - 1 byte for length plus UTF16-LE characters - for the enclosing node and the child values and nodes.

There are a number of common types of value included:

  • Unsigned integer (4 bytes).
  • Signed integer (4 bytes).
  • Floating-point number (4 bytes).
  • String (2 bytes for length plus UTF16-LE characters).

Let's have a look at an example, GLOBAL\capview.iom, but remember that to correctly parse this I am using the 3rd-level format:

 00000000   53 49 4D 49  53 41 40 40  40 40 40 40  40 40 40 40   SIMISA@@@@@@@@@@

The standard 1st-level header...

 00000010   4A 49 4E 58  30 69 30 62  5F 5F 5F 5F  5F 5F 0D 0A   JINX0i0b______..

2nd-level header indicating a binary version of 3rd-level formal 'i0'.

 00000020   63 00 00 00  64 02 00 00                             c...d...

Node type is 99 (0x63), contents size is 612 bytes (0x264).

 00000020                             00                                 .

Node has no name.

 00000020                                01 00 00  00 00 00 00            .......
00000030   00                                                   .

Two unsigned integer values: 1 and 0.

 00000030      64 00 00  00 35 00 00  00 00                       d...5....

Node type is 100 (0x64), contents size is 53 bytes (0x35) and there's no name.

 00000030                                   CB 00  01 00                   ....

Unsigned integer value: 65,739 (0x100CB).

 00000030                                                15 00                 ..
00000040   6E 00 75 00  64 00 67 00  65 00 5F 00  63 00 61 00   n.u.d.g.e._.c.a.
00000050   62 00 63 00  6F 00 6E 00  74 00 72 00  6F 00 6C 00   b.c.o.n.t.r.o.l.
00000060   5F 00 6C 00  65 00 66 00  74 00                      _.l.e.f.t.

String value: length of 21 (0x15) plus 21 UTF16-LE characters "nudge_cabcontrol_left".

 00000060                                   00 00  00 00                   ....

Unsigned integer value: 0.

The 1 byte for no name, 8 bytes for two unsigned integer values plus 44 bytes for the string mean the total contents are up to 53 bytes - that means it is the end of this node type 100.

 00000060                                                64 00                 d.
00000070   00 00 37 00  00 00 00                                ..7....

Node type is 100 (0x64), contents size is 55 bytes (0x37) and there's no name.

Writing the above in text format would give:

 <node type 99> (
    <node type 100> (
    <node type 100> (

It's clear that we're missing the node type names found in the text files, and there's no indication whether some 4 bytes are a new node, an integer or float - some heuristics can work for this some of the time, I found, but in the end this is what the 3rd-level format is for.

Other 2nd-level formats

I've covered the two main Simis 2nd-level formats, but there are others; most notably, the texture files (.ace) are wrapped in a 1st-level Simis header with their own format inside. I won't be covering these other formats soon, as there are already tools that can handle these files sufficiently for Microsoft Train Simulator's needs, and the 3rd-level Simis formats are more interesting anyway.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 11:07PM on Sunday, 09 May, 2010 | Comments: 0

Simis Jinx 3rd Level File Formats

The Simis file format with the 2nd-level Unicode text and binary Jinx formats are a pretty generic set of formats; they contain an arbitrarily nested tree structure with strings, integer and floating point numbers at any level. To actually interpret and describe the contents, a 3rd level of formats is needed.

As mentioned in both Simis Jinx Unicode Text File Format and Simis Jinx Binary File Format, this 3rd level of formats is identified by a letter and a number - and there are quite a lot of them. To actually define these formats in a useful way, though, we need to use another format - Backus-Naur Form (BNF). The exact format I've used is a variant of the standard Backus-Naur Form derived from the BNF files that shipped with Microsoft Train Simulator itself (in the UTILS\FFEDIT directory).

Train Simulator Backus-Naur Form

The BNF files are text; new lines have no significance; any of ASCII, UTF-8 and UTF-16 character encodings can be used, provided a byte order mark is included to identify UTF-8 and UTF-16. The files are made up of a number of definitions and productions - in any order - and a special termination marker.

Definitions specify a shared or standalone expression. Any other expression can reference it and has their reference expanded to the expression on the right-hand side of the equals ("=").

Productions specify, through the expression on the right of the arrow ("==>"), what is allowed/expected inside the block identified by the name on the left.

The expressions in both definitions and productions contain a space-separated list of items, each of which can be:

  • A string literal, e.g. "Activity".
  • A pre-defined data type, e.g. :sint. Available data types:
    • uint
    • sint
    • dword
    • string

    Data types can additionally be named, by including a comma and identifier after the type, e.g. :sint,TileX.

  • Another production or definition, e.g. :Tr_Activity.

There are three operators allowed within expressions:

  • Square brackets, denoting an optional section, e.g. [:Description].
  • Curly brackets, denoting a repeatable section (1 or more times), e.g. {:UiD :SidingItem}.
  • Pipe symbol, denoting a choice between sections, e.g. :Engine|:Wagon.

The choice operator (pipe) binds tighter than whitespace. Therefore, the expression :foo :bar|:baz means "foo followed by either bar or baz".

The end of an expression is denoted by a period (".").

Comments can be placed anywhere whitespace is allowed and use the common multi-line comment syntax of "/*" to start and "*/" to finish.

Termination of the BNF is indicated by the identifier "EOF". Everything after this is completely ignored.

3rd-level Format BNFs

Here's the current route car spawn.bnf as an example:

/* File format information */
FILE                          = :uint,Count [{:CarSpawnerItem}] .
FILE_NAME                     = "Route Car Spawn" .
FILE_EXT                      = "carspawn.dat" .
FILE_TYPE                     = "v" .
FILE_TYPE_VER                 = "1" .

/* Base types */
CarSpawnerItem                ==> :string :uint .

/* Format types */

EOF                           /* End of file */

All BNFs for the tools are required to have the five definitions shown above, so that the various programs can use them. FILE_TYPE and FILE_TYPE_VER are the letter and number (both as strings) used in all Simis Jinx files. FILE_EXT is either a file extension (e.g. "act") or a filename (e.g. "carspawn.dat") which selects which files can contain this format. FILE_NAME is a name suitable for displaying to the user. FILE is an expression representing the root of the file - the base of all parsing.

Binary Block Type Names

While the BNFs define what is allowed where, there is still one remaining problem for the Simis Jinx Binary format - each block type is identified by a number, not a string. For this, we can turn to some other files included with the original Train Simulator - the files in UTILS\FFEDIT.

  • sidn.txt defines a few base IDs, including "core" and "train" (0 and 4 respectively).
  • coreids.tok contains a list of all core "tokens" - i.e. block type names - in order of the numerical value.
  • appids.tok is a C header which includes forms.hdr and loadstr.hdr with a token defined before and after each inclusion.
  • forms.hdr and loadstr.hdr contain lists of all MSTS tokens in numerical order.

To construct the 32bit unsigned number used in the Simis Jinx Binary file format, the base ID and the token ID (from its position) are combined with the base forming the most significant 16bits and the token the least significant 16bits. E.g. the 7th "train" token would be 0x00040007.


Together with the BNFs, the number-block type name mapping completes the picture for loading and saving Simis Jinx files. However, as the BNFs are of my own construction, they are necessarily incomplete and possibly still inaccurate in some areas. This has improved a lot over the past few months, and will continue to do so, providing a good, solid and generic reading and writing capability for most Simis Jinx files.

Permalink | Author: | Tags: Format, Games, Microsoft, Simis, Train Simulator | Posted: 11:55PM on Sunday, 23 May, 2010 | Modified: 12:02AM on Monday, 24 May, 2010 | Comments: 0

Simis Editor - Feedback class

Feedback makes everything better, eventually. Getting or sending feedback is, however, not always simple or usable; users need to be able to bang out simple comments easily, with no forms to fill in, whilst still providing proper context and technical information if the feedback is the result of the application malfunctioning. Feedback should also be anonymous if the user wishes. The Feedback class in the next release of Simis Editor is attempting to do this; here I'm going to outline its user-facing functionality and the back-end implementation.

Entry Points

There are two different ways the feedback process can be started:

  • From the user: a "Send Feedback..." menu item under "Help".
  • From the application: anywhere in the application that catches exceptions.

While both routes show the same dialog, the latter case collects a load more contextual information to go with the report - most obviously, the exception, but it can also take anything the catch code wants to include.

Instanciation Code

The Feedback class is really simple to use, for both cases:

    try {
       new Feedback().PromptAndSend(ownerForm);
   } catch (SomeException e) {
       new Feedback(e, "sending feedback").PromptAndSend(ownerForm);

The ownerForm is used for showing the dialog modally. The class switches mode based on the arguments: none means "user feedback", Exception (exception) and String (operation) mean "application failure"; there is also a third mode where the caller provides the feedback type, operation and an IDictionary<string, string> of details.

User Dialog

The dialog is mostly the same for the two cases; the biggest change is the "faces" and introductory text. For user feedback, the introduction just explains when to include your e-mail address, as it is entirely optional.

In the application failure case, this dialog is the first thing the user sees when an operation fails, so it must explain that something's gone wrong and then why you should send the feedback at all.

As the purpose of the feedback dialog is to collect as many reports as possible, it attempts to ensure all users (or a maximum of users) are happy to send the reports by allowing the user to view all the data collected for sending. As shown below, this includes the full exception details (obviously) as well as some general system information. It also includes a user ID, which is randomly generated the first time the application intends to send feedback and which is not shared between applications (i.e. two applications that a user has installed that use this feedback system will each send a different user ID).

If the user is happy to send the report and clicks the button, an XML document is constructed, serialised and POSTed to the feedback server. The user is then given a message showing the success or failure of the feedback as a clear completion of the process.

Feedback Format

The feedback is sent as XML to make handing the data as easy as possible. This is an example of an application failure report, but user feedback reports are basically the same - just without the <details>.

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<report version="1.0" uid="ipejGfrUIt5gAZ3Y" time="2010-05-31T22:13:56.4276545+01:00" type="ApplicationFailure" email="">
   <os version="6.1.7600.0">Microsoft Windows NT 6.1.7600.0</os>
   <processor cores="4" />
   <clr bits="64" version="2.0.50727.4927" />
 <application version="">Simis Editor</application>
 <source file="C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\Simis Editor\Editor.cs" line="185" column="5">SimisEditor.Editor.OpenFile</source>
 <details>C:\Program Files (x86)\Microsoft Games\Train Simulator\ROUTES\JAPAN2\carspawn.dat

> From 0x00000122 - data preceding failure:
>   wnerItem( "Jp1van.s" 6 )
>   CarSpawnerItem( "Jp1van2.s" 6 )
>   )
> From 0x000001A2 - data following failure:
> > BNF has completed.
> >
> > Available states: .
> > Current rule: <none>.
> > Current state:
> >
> >    at Jgr.Grammar.BnfState.LeaveBlock() in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\JGR.Grammar\BNF.cs:line 175
> >    at Jgr.IO.Parser.SimisReader.ReadToken() in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\JGR.IO.Parser\SimisReader.cs:line 181
>    at Jgr.IO.Parser.SimisReader.ReadToken() in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\JGR.IO.Parser\SimisReader.cs:line 196
>    at Jgr.IO.Parser.SimisFile.ReadStream(Stream stream, SimisFormat& simisFormat, SimisStreamFormat& streamFormat, Boolean& streamCompressed, SimisTreeNode& tree) in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\JGR.IO.Parser\SimisFile.cs:line 74
>    at Jgr.IO.Parser.SimisFile..ctor(String fileName, SimisProvider simisProvider) in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\JGR.IO.Parser\SimisFile.cs:line 32

  at Jgr.IO.Parser.SimisFile..ctor(String fileName, SimisProvider simisProvider) in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\JGR.IO.Parser\SimisFile.cs:line 37
  at Jgr.IO.Parser.MutableSimisFile.Read() in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\JGR.IO.Parser\MutableSimisFile.cs:line 28
  at SimisEditor.Editor.OpenFile(String filename) in C:\Users\James\Documents\Visual Studio 2008\Projects\JGR MSTS Editor\Simis Editor\Editor.cs:line 185</details>

One thing which this does not show is "attachments" - where the code calling the Feedback class specifies arbitrary extra data to include; these are sent as additional details but each with a name: <details name="extra stuff">...</details>.

Permalink | Author: | Tags: Editor, Feedback, Simis, XML | Posted: 10:42PM on Monday, 31 May, 2010 | Comments: 0

Simis Editor v0.4

I've just released the latest version of my Microsoft Train Simulator tools: Simis Editor v0.4 with the usual documentation. Some highlights for this release:

  • Open and Save dialogs support full filename filters from BNFs (e.g. "tsection.dat") in addition to extension filters.
  • Support for adding new blocks to the tree via context menu with 4 groups of operations:
    • Insert previous siblings.
    • Insert next siblings.
    • Insert before children.
    • Insert after children.
  • Problems loading *.bnf files and loading or saving Simis files are all offered for reporting online (via the Feedback class).
  • Added a status bar and help text for menu items.

Permalink | Author: | Tags: Editor, Simis, Train Simulator | Posted: 10:42PM on Sunday, 06 June, 2010 | Comments: 0

Media Foundation, Matroska and MP3

I have a Matroska (.mkv) file with the following tracks (data streams):

Tracks : 2
Track 1 : Video
  - Codec : (V_MPEG4/ISO/AVC)
Track 2 : Audio
  - Codec : MPEG Audio 1, 2, 2.5 Layer III (A_MPEG/L3)

Nothing particularly special there; I have the following relevant DirectX Media Objects (DMOs), DirectShow and Media Foundation codecs installed:

  • Haali Media Splitter: a DirectShow splitter filter for Matroska files, among other container formats (analogous to the AVI Splitter for .avi and others).
  • ffdshow: a DirectShow decoder filter for just about anything, including MPEG-4 Video and MPEG-1 Audio Layer 3 (MP3).
  • Windows 7's in-box DMOs decoder filters for MPEG-4 Video and MP3. These can be used by both DirectShow and Media Foundation.

Question: What happens if this Matroska file is played in Windows Media Player or Windows Media Center?

Answer: No video and audio stutters a lot.

Question: Why?

Answer: Both will try to use Media Foundation first and DirectShow second. As Media Foundation has no preferred splitter for Matroska files (either in-box or that I've installed), it hunts for a supported transform (similar to DirectShow's filters) with which to play the file; the MP3 transform duly indicates that it can play the file.

I believe this is because the MP3 decoder ignores the data at the start of the file which it doesn't understand (to allow for ID3 tags) and then picks up the first frame of the audio stream inside the file. The stuttering is most likely it attempting to play back the video frames of data (the two data streams are interleaved within the container).

The solution: Amazingly simple; the only thing that matters is that they're trying to use Media Foundation first, so set one registry key to indicate .mkv files prefer to be handled by DirectShow and it works great in both players.

   Runtime    REG_DWORD    0x7


Permalink | Author: | Tags: DirectShow, MP3, Matroska, Media Center, Media Foundation, Microsoft, WMP, Windows | Posted: 10:27PM on Sunday, 13 June, 2010 | Comments: 0

Simis Editor v0.5

Download Simis Editor v0.5 and read the documentation. Release highlights:

  • New format support: Cameras, GUI, GUI Bitmaps, GUI Screens, Route Forests, Route Gantry Sets, Route Speed Post Sets, Route Tile Definition, Route Tile Definition Low, Route Track Types, Signals.
  • Updated format support: Activity, Route Reference, Route Train Path, Shape, Train Consist, Train Engine, Train Wagon, World.
  • Bug fixes and general improvements to underlying libraries.
  • Thanks to Jeffrey Kraus-Yao for many of the format additions and updates.

Permalink | Author: | Tags: Editor, Simis, Train Simulator | Posted: 10:38PM on Sunday, 19 September, 2010 | Comments: 0

Powered by the Content Parser System, copyright 2002 - 2022 James G. Ross.