Reverse Engineering Dota 2 Resource Handles

Intro

Dota 2 is a well known MOBA that was released by Valve in 2013 and has been one of the most popular games on their publishing platform, Steam, ever since its release ^1.

Due to its huge annual tournament price pools and the ease of access to game replays and player stats, multiple sites dedicated to tracking statistics in various forms have been created over time, such as DatDota and Open Dota.

This post is about the process of figuring out how to correlate certain Dota 2 replay-packets with the actual in-game information, further improving the data available to those sites.

Dota 2’s replay format

When looking at in-game replays, Dota 2 or otherwise, information is mostly kept at a minimum. In fact, games that are completely deterministic (e.g. Chess), can simply store a list of inputs (or moves). Playing back all inputs in the correct order should produce the same game every time.

Less deterministic games takes snapshots of the current gamestate at fixed intervals and interpolate between them. The more snapshots recorded in the replay, the more information you have to work with. This is very similar of how most video encoders work, storing “Key Frames” and only tracking the changes in-between.

Dota 2 Replays (also called DEMOs) are a mixture of both. The demo-files contain a record of every important packet send over the network. The majority of useful information is kept in binary packets that contain the entities (things that are being rendered on screen), stringtables (dictionary of number -> strings and vice versa), as well as the Combatlog (history of all combat related events).

The most popular parsers to convert the .dem files into something we can work with are Clarity and Manta. What they both have in common is that they don’t include any way to track skills and projectiles, which is the part we will be deciphering.

Sunstrike

Sunstrike Gif

In the GIF above, Invoker is using a skill called Sunstrike which does damage to any unit within its impact radius. At the moment, all the stats you can gather from replays about this skill are mostly meta-data based.

One could, for example, track the number of casted Sunstrikes by counting the number of times the skill went on cooldown. The cooldown is an entity attribute and available via the ability entity. Combining this information with data from the Combatlog, in this case the heroes that were hit by a Sunstrike, will give you an accurate hit-to-miss ratio for a specific player.

What’s currently missing are any attributes related to the Sunstrike itself, such as where it was casted or by how much it missed a certain unit. This is definitely valuable information as it may help players an analysts determine how good a hero is being played.

Reverse-Engineering: Data location

I started off by going through the basic questions you should be asking yourself: Is the information available to me?, if so: Where is it likely stored? and last but not least Can I get to it?.

In this case there is obviously a way the engine detects when and where to render a Sunstrike. As to whether we can actually get that information: It depends. The engine might trigger the Sunstrike based on the current camera position and mouse location when the skill is casted. If this is done correctly and deterministically, you might encode all the information in as little as a couple of bits. Another way this might be triggered would be through a special packet that deals with spells being cast, something along the line of attack orders. Yet another angle would be to look at the Sunstrike as a particle being rendered which is what happens under the hood when you see the animation play.

Luckily for us, the Source Engine is very open about most of its stuff and provides tons of console commands to tweak and debug things. In this case, the “cl_particles_dumplist” command was what I was looking for. When invoked, it spews all particles (such as the Sunstrike) that are currently being rendered. Correlating the list of active particles with a couple of casts led me to believe that my assumption about the skill being mostly particle based is true.

Now that we have some general idea about what we need to look for, searching through the available data becomes a lot easier. A simple CTRL+F, “particle”, on the list of possible packets reveals the following:

message CDOTAUserMsg_ParticleManager {
	message ReleaseParticleIndex {
	}

	message CreateParticle {
		optional fixed64 particle_name_index = 1;
		optional int32 attach_type = 2;
		optional int32 entity_handle = 3;
		optional int32 entity_handle_for_modifiers = 4;
	}

	message DestroyParticle {
		optional bool destroy_immediately = 1;
	}

	message DestroyParticleInvolving {
		optional bool destroy_immediately = 1;
		optional int32 entity_handle = 3;
	}

	message UpdateParticle {
		optional int32 control_point = 1;
		optional .CMsgVector position = 2;
	}

	message UpdateParticleFwd {
		optional int32 control_point = 1;
		optional .CMsgVector forward = 2;
	}

	message UpdateParticleOrient {
		optional int32 control_point = 1;
		optional .CMsgVector forward = 2;
		optional .CMsgVector right = 3;
		optional .CMsgVector up = 4;
	}

	message UpdateParticleFallback {
		optional int32 control_point = 1;
		optional .CMsgVector position = 2;
	}

	message UpdateParticleOffset {
		optional int32 control_point = 1;
		optional .CMsgVector origin_offset = 2;
	}

	message UpdateParticleEnt {
		optional int32 control_point = 1;
		optional int32 entity_handle = 2;
		optional int32 attach_type = 3;
		optional int32 attachment = 4;
		optional .CMsgVector fallback_position = 5;
		optional bool include_wearables = 6;
	}

	message UpdateParticleSetFrozen {
		optional bool set_frozen = 1;
	}

	message UpdateParticleShouldDraw {
		optional bool should_draw = 1;
	}

	message ChangeControlPointAttachment {
		optional int32 attachment_old = 1;
		optional int32 attachment_new = 2;
		optional int32 entity_handle = 3;
	}

	message UpdateEntityPosition {
		optional int32 entity_handle = 1;
		optional .CMsgVector position = 2;
	}

	required .DOTA_PARTICLE_MESSAGE type = 1 [default = DOTA_PARTICLE_MANAGER_EVENT_CREATE];
	required uint32 index = 2;
	optional .CDOTAUserMsg_ParticleManager.ReleaseParticleIndex release_particle_index = 3;
	optional .CDOTAUserMsg_ParticleManager.CreateParticle create_particle = 4;
	optional .CDOTAUserMsg_ParticleManager.DestroyParticle destroy_particle = 5;
	optional .CDOTAUserMsg_ParticleManager.DestroyParticleInvolving destroy_particle_involving = 6;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticle update_particle = 7;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticleFwd update_particle_fwd = 8;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticleOrient update_particle_orient = 9;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticleFallback update_particle_fallback = 10;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticleOffset update_particle_offset = 11;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticleEnt update_particle_ent = 12;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticleShouldDraw update_particle_should_draw = 14;
	optional .CDOTAUserMsg_ParticleManager.UpdateParticleSetFrozen update_particle_set_frozen = 15;
	optional .CDOTAUserMsg_ParticleManager.ChangeControlPointAttachment change_control_point_attachment = 16;
	optional .CDOTAUserMsg_ParticleManager.UpdateEntityPosition update_entity_position = 17;
}

Finding the correct particle packet

Dumping all the CDOTAUserMsg_ParticleManager packets is just a couple lines of code in either parser. Protobuf includes a very handy “DebugString()” functions that allows you to print the packet in a human-readable, json-like format.

Now, how do we know if our Sunstrike is in one of the packets? One thing we could do is go through the replay, look for the exact second we cast it, and match it up with the In-Game time which is stored in the “Gamesrules” entity. An easier route I mostly take is to write a message in all chat, do the action I want to find in the replay, and write another message once I’m done. If our particle packet is actually recorded, it will be between those two message.

In a local lobby game without any other players casting skills and thus creating particles, the relevant packets were easy to find (notice the ehandle_for_modifiers attribute):

type: DOTA_PARTICLE_MANAGER_EVENT_CREATE
index: 4302
create_particle {
  particle_name_index: 12079402610762616890
  attach_type: 2
  entity_handle: 16777215
  entity_handle_for_modifiers: 4375396
}

type: DOTA_PARTICLE_MANAGER_EVENT_UPDATE
index: 4302
update_particle {
  control_point: 0
  position {
    x: 6725.3394
    y: 6081.7681
    z: 512
  }
}

At this point we can be certain that the information we are looking for is encoded in the particle manager packets.

The mysterious packet id

But how do we actually identify which particle is our Sunstrike without looking at it manually? The only information that seems to point at it is the index attribute of the DOTA_PARTICLE_MANAGER_EVENT_CREATE message.

Having a fixed length 64 bit ID for a particle seems a bit wasteful at first glance. Even if we assume that we’d have to handle hundreds of new particle effects every day, it would take forever to reach an amount which has to be represented with 64 bits, let alone the space required would be astronomical. This, combined with the fact that other particles originating from the same hero don’t share any relationship, leads me to believe that these are either some form of UUID or a hash based on the a particle identifier.

The ID is definitely a constant as different replays contained the same particle ID for the same skill, which I tested with multiple heroes and skills. As the requirements for adding a custom particle effect to e.g. a courier don’t mention UUIDs (or similar means of identification) anywhere, I assumed that we are looking at a hash.

A similar fixed-64 bit number is actually used when you look at the nModelIndex attribute of entities, which has the type StrongHandle<InfoForResourceTypeCModel>. Back when Dota 2 still used the Source 1 engine the corresponding model attribute was linked to a Stringtable, which in turn would result in the path to the model that was being rendered. The ID being directly related to the particle index (or at least being resolved in the same context) seems very likely.

At this point, I also looked if Valve’s packaging format for additional game files (VPAQ) changed with the Reborn-Update and if it includes the given identifier, but it turns out that it hasn’t been upgraded from Source 1.

Opening up IDA

Without having access to any of the source code there was no way of figuring how the ID is computed without looking at the disassembly in IDA.

The Source 2 engine isn’t packaged as a single large library but split up into multiple components. Throwing away some of the obviously unrelated libraries, such as audio codecs and bundled open source libraries, as well as discarding other unlikely candidates results in the following list:

game/dota/bin/libclient.dylib
game/core/bin/lib/libengine2.dylib
game/core/bin/lib/libfilesystem_stdio.dylib
game/core/bin/lib/libmaterialsystem2.dylib
game/core/bin/lib/libmeshsystem.dylib
game/core/bin/lib/libparticles.dylib
game/core/bin/lib/libresourcesystem.dylib

What client and engine are seems pretty obvious. The same goes for mesh, materials and particles. The reason I included all three of them in addition to resourcesystem and filesystem is that I feel it’s not quite clear in which the particles might be loaded / initialized. By including modules referencing resources other than particles, we might have a better shot of finding the correct library and the initialization code.

Before opening them up in IDA, I looked through all the strings embedded in the binaries, starting with the smallest library and searching for keywords like “particle” and “invoker”. Though libclient actually included the full path’s of some of invokers particles, those functions looked rather unrelated and from what I could tell, dealt with the static initialization of abilities outside of the KeyValue files.

The first interesting library I looked at was libresourcesystem, which didn’t include any references to our hero or the particle-keyword, but the following strings:

...
InfoForResourceTypeIParticleSystemDefinition
InfoForResourceTypeCAnimationResource
InfoForResourceTypeCAnimationGroupResource
InfoForResourceTypeCSequenceGroupResource
InfoForResourceTypeVBitmapFontRuntimeData_t
InfoForResourceTypeIMaterial2
InfoForResourceTypeCMorphSet
InfoForResourceTypeCRenderMesh
InfoForResourceTypeCModel
...
vpcf
vanim
vagrp
vseq
vfont
vmat
vmorf
vmesh
vmdl
...

With the particle extension included, as well as references to our StrongHandle, I was pretty sure that I could file particles under the keyword resources, and that they were probably being initialized in this file.

Other interesting strings were directly related to loading the resources from the filesystem:

GenerateResourceNameFromFileName: Invalid extension specified in file name "%s"!
ERROR: Specified full path %s does not lie under the mod search path!)
ERROR: Specified path %s could not be made content-relative
FixupResourceName: Illegal full path passed in ("%s")!
ERROR: Resource name "%s" has the incorrect extension "%s" for the specified resource type (expected "%s")!
FixupResourceName: Illegal path, missing extension passed in ("%s")!

Judging from the error messages, each resource is added by its relative path and has to include the file extension. The actual path also seems to be rewritten in FixupResourceName, probably for compatibility reasons.

It was now time to see if I guessed correctly and open up said functions in IDA.

Finding the correct function

Because of the fixed-length type of the ID, I first looked for any hash functions, which turned out be the right idea:

IDA Function Search: Hash

By cross referencing all functions calling MurmurHash64, we end up with a neat list of functions that we can analyse further:

IDA XREF graph

I correlated the functions from the XREF graph with the error messages I found earlier and followed all the functions being called throughout the replay:

///
/// First look at GenerateResourceNameFromFileName which contains the "Invalid extension specified" error message
///

// get the file extension and convert it to lowercase
V_GetFileExtension(X);
to_lower(X);

// if it's an absolute path, load it from the filesystem
if (V_IsAbsolutePath) {
    _V_FixupPathName();
    *g_pFullFileSystem->load;
    return;
}

// Call fixup and fixslashes
_V_FixupPathName(a2, v32, a1, 1u);
V_FixSlashes(a2, 47);

///
/// The above function was only used once in GenerateManifestFileForStringList in the following context:
///

// ...
GenerateResourceNameFromFileName(&v79, &v78, 512, v3); // Gen Resource name
v62 = ComputeResourceIdFromResourceName(0x6E616D7276LL, &v78); // Use it to generate the ID
// ...

///
/// The function ComputeResourceIdFromResourceName looks like this:
///

FixupResourceName(a2, a1, &v4, 512);
result = 0LL;
v3 = strlen(&v4);
result = MurmurHash64(&v4, v3, 0xEDABCDEF); // Last part is our hashing seed!
return result;

///
/// This only leaves FixupResourceName:
///

if ( V_IsAbsolutePath() )
    Warning("FixupResourceName: Illegal full path passed in (\"%s\")!\n", a1, v7, v8);

v9 = V_GetFileExtension(a1); // Return file extension
_V_FixupPathName(&v13, 512, a1, 1u); // This one only really calls V_RemoveDotSlashes, which removes "./" and "../"
V_FixSlashes(&v13, 47);
_V_SetExtension(&v13, v14, 512);
_V_FixupPathName(v5, v4, (const char *)v6, 1u);
V_FixSlashes(v5, 47);

At this point it was very clear that we were in fact looking at a hash, but the exact behavior of all the different FixXXXX functions wasn’t obvious from the disassembly. Searching for some of the functions via Google revealed MakeResources.cpp and strtools.h.

Both include most of the functions dealing with the sanitization process or at least enough comments to figure out what they do. With the header and the decompiled source at our disposal, writing a test implementation was just a bit of trial and error to get the path right.

Getting the particle list

To actually hash the resource path, we first have to figure out how exactly the particles are laid out. Even thought Reborn uses an entirely different engine, most of the supporting libraries (such as VPAQ) are still the same.

Said format is very easy to decode as compared to e.g. Blizzards MPQ format. A good example including some pseudo code is on the valve developer wiki. I’ve hacked up a simple C++ implementation of their code, which took less than an hour, when I realized that most of the existing implementations were Windows only or failed to properly compile / load the vpk for whatever reason.

As it turns out, modifying the path on our end wasn’t actually necessary as it was already sanitized before being added to the archive. The only part that gave me a headache for about half an hour was the fact that we need to remove _c (which stands for ‘compiled’) from the extension. I figured it out when I searched for some of the error strings on Google and found an example path that didn’t include the _c part of the path.

The rest of it was just finding the correct Murmur implementation used (Hint: Compare the magic values) and putting it all together.

So without further ado, the full source code of matching up the actual particle IDs (and ResourceHandles, too!) with their actual strings:

/**
 * @author Robin Dietrich <me (at) invokr (dot) org>
 *
 * @par License
 *    Copyright 2016 Robin Dietrich
 *
 *    Licensed under the Apache License, Version 2.0 (the "License");
 *    you may not use this file except in compliance with the License.
 *    You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 *    Unless required by applicable law or agreed to in writing, software
 *    distributed under the License is distributed on an "AS IS" BASIS,
 *    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *    See the License for the specific language governing permissions and
 *    limitations under the License.
 */

#include <iostream>
#include <unordered_map>
#include <vector>
#include <cstdint>
#include <cstdio>

#define MHASH_SEED 0xEDABCDEF

#pragma pack(push, 1)
/** VPK header, see https://developer.valvesoftware.com/wiki/VPK_File_Format */
struct VPKHeader {
    // Signature, should equal 0x55aa1234
    uint32_t Signature;

    // Version, should equal 1 or 2
    uint32_t Version;

    // The size, in bytes, of the directory tree
    uint32_t TreeSize;
 };

struct VPKDirectoryEntry {
    uint32_t CRC; // A 32bit CRC of the file's data.
    uint16_t PreloadBytes; // The number of bytes contained in the index file.

    // A zero based index of the archive this file's data is contained in.
    // If 0x7fff, the data follows the directory.
    uint16_t ArchiveIndex;

    // If ArchiveIndex is 0x7fff, the offset of the file data relative to the end of the directory (see the header for more details).
    // Otherwise, the offset of the data from the start of the specified archive.
    uint32_t EntryOffset;

    // If zero, the entire file is stored in the preload data.
    // Otherwise, the number of bytes stored starting at EntryOffset.
    uint32_t EntryLength;

    uint16_t Terminator;
};
#pragma pack(pop)

/** Read string from file until we reach 0 */
std::string readString(FILE *fp) {
    std::string ret = "";

    char c = fgetc(fp);
    while (c != 0) {
        ret += c;
        c = fgetc(fp);
    }

    return ret;
}

/** MurmurHash implementation */
uint64_t mhash64 ( const void * key, int len, uint64_t seed = MHASH_SEED) {
    const uint32_t m = 0x5bd1e995;
    const int r = 24;

    uint32_t h1 = uint32_t(seed) ^ len;
    uint32_t h2 = uint32_t(seed >> 32);

    const uint32_t * data = (const uint32_t *)key;

    while(len >= 8) {
        uint32_t k1 = *data++;
        k1 *= m; k1 ^= k1 >> r; k1 *= m;
        h1 *= m; h1 ^= k1;
        len -= 4;

        uint32_t k2 = *data++;
        k2 *= m; k2 ^= k2 >> r; k2 *= m;
        h2 *= m; h2 ^= k2;
        len -= 4;
    }

     if(len >= 4) {
        uint32_t k1 = *data++;
        k1 *= m; k1 ^= k1 >> r; k1 *= m;
        h1 *= m; h1 ^= k1;
        len -= 4;
    }

    switch(len) {
    case 3: h2 ^= ((unsigned char*)data)[2] << 16;
    case 2: h2 ^= ((unsigned char*)data)[1] << 8;
    case 1: h2 ^= ((unsigned char*)data)[0];
            h2 *= m;
    };

    h1 ^= h2 >> 18; h1 *= m;
    h2 ^= h1 >> 22; h2 *= m;
    h1 ^= h2 >> 17; h1 *= m;
    h2 ^= h1 >> 19; h2 *= m;

    uint64_t h = h1;
    h = (h << 32) | h2;
    return h;
}

std::unordered_map<uint64_t, std::string> hashes;

int main() {
    FILE* fp = fopen("/[..]/steamapps/common/dota 2 beta/game/dota/pak01_dir.vpk", "r");

    VPKHeader header;
    fread(&header, 1, sizeof(VPKHeader), fp);

    ASSERT_TRUE(header.Signature == 0x55aa1234, "VPK signature mismatch");
    ASSERT_TRUE(header.Version == 1, "Header version mismatch");

    std::string extension;
    std::string path;
    std::string name;

    while (true) {
        extension = readString(fp);
        if (extension.empty()) break;

        while (true) {
            path = readString(fp);
            if (path.empty()) break;

            while (true) {
                name = readString(fp);
                if (name.empty()) break;

                std::string fpath = path + "/" + name + "." + extension.substr(0, extension.size()-2);
                hashes[mhash64(fpath.c_str(), fpath.size())] = fpath;

                VPKDirectoryEntry entry;
                fread(&entry, 1, sizeof(VPKDirectoryEntry), fp);

                if(entry.ArchiveIndex == 0x7FFF) {
                    fseek(fp, entry.EntryOffset + entry.EntryLength + entry.PreloadBytes, SEEK_CUR);
                 } else if (entry.PreloadBytes > 0) {
                    fseek(fp, entry.PreloadBytes+1, SEEK_CUR);
                }
            }
        }
    }

    fclose(fp);

    // resolves to: models/creeps/lane_creeps/creep_bad_ranged/lane_dire_ranged.vmdl
    uint64_t h = 15587903937176337161u;

    // resolves to: particles/units/heroes/hero_invoker/invoker_sun_strike.vpcf, tested to be correct in game (cl_particles_dumplist)
    uint64_t h2 = 12079402610762616890u;

    std::cout << "Generated " << hashes.size() << " resource hashes" << std::endl;
    std::cout << hashes[h2] << std::endl;
}

Addendum

I’ve talked with @spheenik to include all the code in one of the future versions of Clarity. He figured out an easier way of getting all the relevant strings by looking at the SpawnGroupManifest package, so parsing the VPK isn’t strictly required anymore.

I’ll include a link to the Clarity Version from which this is available once it is implemented. Edit: Link to Clarity commit

*****
Written by Robin Dietrich on 01 September 2016, tagged as dota2, reverse-engineering