In the previous article talking about d2i I had not mentioned the thought process on how I reversed the file and understood its structure. I thought this could interest a few people.

This is the thought process behind this previous article:

About d2i files

Having done some work in the emulation of the game Dofus as well as the bot side of it I knew that information such as weapon names, dialogues, maps, musics etc were stored in local files.

I knew that d2i files were the files responsible for the npc dialogues, thus I decided to try to understand them. You can follow this tutorial with the same file I used which can be found here:

Pattern Recognition

Knowing we are working with a video game, the data must be stored in an easy way to read otherwise there would be no point having them locally. After all the principle of local game resources is to speedup loading times.

My first approach was to open the file in atom, maybe with a bit of luck it would be something readable like JSON. And it turns out part of the file was readable. Lucky us that meant that there was no encryption on the data.

Readable d2i lines
Readable d2i lines

However some of the lines were completely not readable.

Unreadable d2i lines

My next step was opening it into a Hex editor since the jumbled text meant it wasn’t meant to be a readable text format.

Upon opening the file into a Hex Editor something caught my eye straightaway!

There was a pattern at least a visible one! I did not know what it represented nor how to interpret it but at least there was hope.

Visible pattern on the right panel

We can clearly see some repetition on the right panel and if we look at the Hex data we can see a lot of the 2 bytes groups ending in 00 or 00001.

Making sense of the Hex data

So I first decided to look at the beginning of the file and noticed that the beginning of the first string was after a group of 4 bytes and 2 bytes.

first line of the file

The number seems very “lucky” as this could easily be 3, 2 bytes integer or 1, 4 bytes integer and one 2 bytes integer.

While still looking at the beginning of the file I noticed something else that was interesting. The first string in the file is written twice (red) and we notice in the group of 4 bytes that make the beginning of the string they both have the first 2 unreadable bytes (blue)

String Start Pattern

0022 converted to int gives us 34 which coincidentally is the length of the string. Looking further we can see that every string starts with a 2 bytes integer that specifies the length of the string.

If we look at the first 4 bytes as an int we have the number: 26343905 (0191F9E1)

If we select all the bytes until we arrive the end of all the string listing we get the same number. So the first 4 bytes of a d2i file are representative of the size of the string data.

If we look at the next 4 bytes as an int they make: 1957787 (001DDF9B) This leads us to the beginning of another listing of string

Hex pointers . . . Wait what???

Yes now we are going to the boring or interesting stuff. If we go back to the end of our first string listing we notice a pattern in the hex data. The first 4 bytes group is such a perfect set: 00000001

If we look closer we see a pattern emerge:

Emerging id pattern

We see here that we have a value getting incremented same principle as an ID this would make sense since those files are data storage for the game.

We also notice the 01 that is repeating after each ID. until the next ID we have another 8 bytes we can easily see that they are also two numbers:
4 (00000004) and 40 (00000028)

From there I just followed my gut for ID 01 we get the number 4 which was the position of the first string in our file. the position is 4 if we count the first 4 bytes which are the size of the data. The second number 40 if we look at the 40th byte from the start we see its the same string as the first one but without capital letters or accents.

From this I deducted the following format which is explained in my previous blog post:

ID (Orange), diacritic exists? (cyan), string pointer (brown), diacritic pointer (blue)

I then checked with the second ID see if it corresponded to the second string and it did. from there I wrote a simple parser which then became a fully fledged reader:


This was a very fun and entertaining challenge which forced me to use logic as well as understanding my environment and following my gut. It was important to keep in mind that the file came from a game and thus reading time was as important as accessibility.