Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The MDL/MDX Format

BioWare’s Aurora/Odyssey engine stores 3D models in a pair of files: .mdl and .mdx. This page documents what’s inside them, how the engine consumes them, and – occasionally – why they look the way they do. Evidence throughout is drawn from Ghidra decompilation of swkotor.exe (K1 GOG build), cross-checked against hex dumps of vanilla assets and community references (kotorblender, mdledit, mdlops, pykotor, reone, xoreos).

Overview

At a glance:

PropertyValue
Extensions.mdl, .mdx
MagicBinary: first u32 == 0. ASCII: text (filedependancy, newmodel, …)
TypeHierarchical scene graph + animation + vertex data
Resource type ID2002 (MDL), 3008 (MDX) in KEY/BIF
Rust referencerakata_formats::Mdl

A model is a tree of nodes. Each node carries a transform (position + orientation), an animation track (“controllers”), and – depending on its type – geometry, light parameters, particle-emitter configuration, a skinning skeleton, a lightsaber blade, and so on. One MDL file can carry multiple named animations that operate on that tree.

The surprising shape of the format only makes sense once you understand one design choice, so let’s start there.

The core idea: load-and-fixup

The binary MDL is not a parsed format in the usual sense. The engine does not walk a byte stream field by field, calling read_u32, read_string, read_float. Instead, it does this:

  1. Allocate a buffer exactly the size of the model data.
  2. Copy the whole file into that buffer in one memcpy.
  3. Walk the now-in-memory structure and convert relative offsets into absolute pointers.

That’s it. The “parser” is a pointer rewriter. Every Reset* function you’ll see in the engine (InputBinary::Reset, ResetMdlNode, ResetTriMeshParts, …) takes a buffer base pointer and a struct pointer, and its job is essentially struct->field += base for every relocatable pointer in the struct, recursing into children as it goes.

An analogy: think of IKEA instructions that say “screw part A into the hole next to part B” rather than giving exact millimetre coordinates. The instructions are valid anywhere you choose to assemble the furniture. The MDL blob is identical: every pointer is expressed relative to the blob’s origin, so the engine can drop the blob anywhere in memory and then do a one-time pass to convert those relative offsets to real addresses.

This design choice ripples through everything:

  • On-disk layout matches in-memory layout exactly. If a MdlNodeTriMesh is 412 bytes in RAM, it’s 412 bytes on disk. Struct field offsets you see in a Ghidra decompilation are the file offsets.
  • Binary files are architecture-bound. This format is a snapshot of a specific compiler’s struct layout on 32-bit Windows. Field alignment, pointer size (4 bytes), endianness (little), and even padding bytes all match that ABI.
  • “Parsing” is really validation + relocation. A Rust reader doesn’t need to convert a byte stream into a Rust struct; it needs to interpret a memory image as a struct overlay, following pointers to walk the tree.
  • The engine never writes binary MDL. The shipping engine only has code to emit ASCII MDL. Binary MDL is produced exclusively by BioWare’s model compiler (a build-time tool). The runtime reads it but never round-trips it.

With that frame in place, the rest of the format falls into shape.

File structure

The 12-byte wrapper

The file begins with a tiny header:

OffsetTypeFieldNotes
+0x00u32zero markerAlways 0. Used to tell binary from ASCII.
+0x04u32MDL content sizeBytes of model data that follow.
+0x08u32MDX file sizeSize of the accompanying .mdx file.

Input::Read at 0x004a14b0 is the dispatcher: it peeks at the first byte, and if it’s \0 the file is binary (the first u32 is always zero). Otherwise the file starts with ASCII tokens like filedependancy or newmodel, and processing hands off to a line-based interpreter.

For binary files, InputBinary::Read at 0x004a1260 does the rest:

  1. Record mdl_content_size and mdx_file_size from the wrapper.
  2. Allocate a heap buffer the size of the MDL content; memcpy the model data into it.
  3. If MDX size is non-zero, allocate a second buffer and memcpy the MDX file into it.
  4. Call Reset(mdl_buf, mdx_buf, resource_handle).

Note: the wrapper is not part of the model data. Byte 12 of the on-disk file is byte 0 of the in-memory MDL blob. All internal offsets are relative to the in-memory origin.

Three kinds of pointer

Inside the MDL blob you’ll encounter three distinct flavours of “pointer”, which is worth keeping straight:

  1. MDL-relative offsets – the vast majority. Relocated to absolute pointers by Reset* functions. On re-serialization, they must be rewritten back to relative offsets.
  2. MDX-file byte offsets – used by a few fields (e.g. per-mesh mdx_data_offset at +0x144) to locate vertex data in the separate MDX file.
  3. String pointers – themselves MDL-relative, but pointing into a string table at the end of the blob, pointed to by the name-offsets array at model +0xB8.

Confusingly, there are two similarly named fields on each mesh node: mdx_data_offset at +0x144 (an MDX file offset) and vert_array_offset at +0x148 (a content-relative pointer to embedded position data). Conflating these produced one of the nastier bugs in our reader’s history (see War stories below).

Model header

Once the blob is in memory, InputBinary::Reset at 0x004a1030 walks the model header. Here’s the relevant field map:

OffsetFieldNotes
+0x00ModelDestructor vptrPopulated at load time.
+0x04ModelParseField vptrPopulated at load time.
+0x28root node offsetRelocated. ResetMdlNode recurses from here.
+0x48resource handlePopulated at load time.
+0x4Ctype byte`GetType()
+0x50classification0=Other, 1=Effect, 2=Tile, 4=Character, 8=Door.
+0x54ref count
+0x58animations array ptrRelocated; count at +0x5C.
+0x64supermodel pointerPopulated via FindModel(buf+0x88).
+0x68..+0x80bbox min/maxVector bmin, bmax.
+0x80radiusf32, default 7.0.
+0x84animation scalef32, default 1.0. ASCII: setanimationscale.
+0x88supermodel namechar[36], null-terminated. Drives recursive model load.
+0xA8node array (secondary)Relocated if non-zero.
+0xACMDX vertex pool offsetSource offset into MDX data (consumed into a GL pool).
+0xB0MDX data sizeSize of the vertex-pool copy.
+0xB8name offsets array ptrRelocated; count at +0xBC. Array entries also relocated.

Two fields deserve special mention:

  • +0x50 classification is the model’s high-level category (Character, Door, Tile, …). It’s never read during the Reset pass – it’s carried through as part of the memory-mapped blob and consulted at runtime. Cross-validated against hex dumps:

    File+0x50Category
    c_dewback.mdl0x04Character ✓
    dor_lhr01.mdl0x08Door ✓
    m01aa_01a.mdl0x00Other ✓
  • +0x88 supermodel name is a 32-byte (plus 4 padding) ASCII name. Loading a model with a supermodel triggers a recursive FindModel call for that name – think of supermodels as CSS-style inheritance, where animation data and bones defined on the parent are available to the child.

The node tree

The root node sits at model +0x28. From there, children are reached through a standard in-memory array layout: ptr + count_used + count_allocated at offsets +0x2C, +0x30, +0x34. This three-u32 pattern is BioWare’s CExoArrayList and shows up everywhere in the format – any time you see “12 bytes of array header”, this is what it is.

Base node layout (80 bytes)

All node types begin with the same 80-byte header:

OffsetSizeFieldNotes
+0x00u16node_typeFlag bitmask. Drives type dispatch.
+0x02u16node_idSequential 0..N-1.
+0x04u16node_id_dupIdentical copy of node_id. Never read.
+0x06u16paddingAlways zero.
+0x08u32name pointerRelocated. Points into the string table.
+0x0Cu32parent pointerRelocated if non-zero.
+0x1012positionVector{x, y, z} as 3×f32.
+0x1C16orientationQuaternion{w, x, y, z} as 4×f32.
+0x2C12children arrayCExoArrayList of MdlNode*.
+0x3812controller keys arrayCExoArrayList of NewController (16B each).
+0x4412controller data arrayCExoArrayList of float (packed key data).

The two bytes at +0x04 are a redundant duplicate of node_id – always identical to +0x02 across 209 nodes verified across four vanilla files, zero mismatches. No known engine function reads it. Best guess: legacy field or exporter artifact. It’s preserved for round-trip fidelity but has no semantic meaning.

A few conventions worth noting:

  • Quaternion order is (w, x, y, z). Confirmed via Gob::GetOrientation at 0x004499a0 which copies fields in that order. Identity quaternion is [1.0, 0.0, 0.0, 0.0]. The Rust API uses the same convention.
  • Position and orientation are read directly from the blob. They’re not relocated – they’re inline values, not pointers.
  • Only two fields need relocation in the base header: name pointer at +0x08 and parent pointer at +0x0C.

InputBinary::ResetMdlNodeParts at 0x004a0b60 handles the base relocations and then recurses: for each entry in the children array, relocate the child pointer and call ResetMdlNode on it.

Type dispatch

InputBinary::ResetMdlNode at 0x004a0900 reads the node_type field and dispatches:

node_typeHandlerKind
0x0001ResetMdlNodeParts onlyDummy / base
0x0003ResetLightLight
0x0005ResetMdlNodeParts onlyEmitter
0x0009ResetMdlNodeParts onlyCamera
0x0011ResetMdlNodeParts onlyReference
0x0021ResetTriMeshResetTriMeshPartsTriMesh
0x0061ResetSkinSkin mesh
0x00A1ResetAnimAnimMesh
0x0121ResetDanglyDangly mesh (cloth)
0x0221ResetAABBTree + ResetTriMeshPartsWalkmesh with AABB
0x0401(no-op)Trigger / unused
0x0821ResetLightsaberSaber mesh

The type values are stored as a lookup table in the executable at 0x00740a18 (12 × u32).

Though the type codes are shaped like a bitmask – HEADER=0x01, LIGHT=0x02|HEADER, EMITTER=0x04|HEADER, TRIMESH=0x20|HEADER, SKIN=0x40|TRIMESH, SABER=0x800|TRIMESH, and so on – the dispatch is an exact value match, not individual bit checks. The bitmask structure is meaningful (skin is a superset of trimesh, for instance), it’s just not how the engine branches.

Size summary

Every node type has a known fixed size, both on disk and in memory:

FlagTypeTotalBaseExtraExtends
0x0001Base80800
0x0003Light1728092MdlNode
0x0005Emitter30480224MdlNode
0x0009Camera80800MdlNode
0x0011Reference1168036MdlNode
0x0021TriMesh41280332MdlNode
0x0061Skin512412100TriMesh
0x00A1AnimMesh46841256TriMesh
0x0121Dangly44041228TriMesh
0x0221AABB4164124TriMesh
0x0401Trigger80800MdlNode
0x0821Saber43241220TriMesh

Verified via ParseNode’s operator_new(size) calls and Ghidra struct definitions. All mesh subtypes extend MdlNodeTriMesh – their extra data begins at node offset +0x19C, immediately after the TriMesh block.

Node types in depth

The lightweight types

Camera (0x009) has no extra data. Same 80-byte footprint as the base node. ResetMdlNode dispatches to ResetMdlNodeParts only. There are no camera-specific ASCII fields either – the ASCII parser also falls through to the base handler.

Reference (0x011) carries just two fields in 36 extra bytes: a 32-byte ref_model name and a 4-byte reattachable flag. Both inline (no pointers to relocate).

Trigger (0x401) – the decompiled ResetMdlNode explicitly returns void without calling any reset function for this type. In practice it appears to be unused in shipping content.

Light (0x003)

Lights carry 92 bytes of extra data. Most of the scalar fields are straightforward (priority, shadow flag, ambient-only flag, flare radius, etc.), but lights are the most complex non-mesh type because of their array fields:

Extra offsetFieldLayoutRuntime relocation
+0x04texture SafePointers12-byte array headerZeroed on disk
+0x10flaresizesCExoArrayListptr relocated
+0x1CflarepositionsCExoArrayListptr relocated
+0x28flarecolorshiftsCExoArrayListptr relocated
+0x34texturenamesCExoArrayList<char*> (each ptr too!)all ptrs relocated

Lights also drive their colour, radius, shadow radius, vertical displacement, and multiplier via controllers (types 0x4C, 0x58, 0x60, 0x64, 0x8C) – these live in the base node’s controller arrays, not in the light-specific block.

Emitter (0x005)

Emitters are 304 bytes and – pleasantly – contain no relocatable pointers. Everything is inline: a fistful of floats and ints, four 32-byte name fields (update, render, blend, texture), and a 16-byte chunk_name. The full field map is in the appendix.

The most important field is update at extra offset +0x20. It’s the emitter type string, a case-sensitive selector against:

  • "Fountain" → steady particle stream (most common)
  • "Explosion" → one-shot burst
  • "Single" → single particle
  • "Lightning" → lightning-bolt effect

MdlNodeEmitter::InternalCreateInstance at 0x0049d5c0 branches on this string to instantiate the appropriate runtime emitter class.

Known engine-level footgun: controller 502 (detonate) is only valid on "Explosion" emitters. InternalCreateInstance only allocates the detonation memory for that branch, so a detonate controller on a "Fountain" emitter reads unallocated memory at runtime and crashes. This is a known flaw in mdlops-based exporters (KotorMax); rakata-lint will validate this.

TriMesh (0x021)

This is the big one. 332 bytes of extra data, encoding everything you’d expect in a mesh plus many things you wouldn’t.

Inline fields

At a high level:

  • Runtime function pointers (+0x00, +0x04): written by the constructor. Zero on disk; never consumed from a file.
  • Faces array (+0x08): CExoArrayList of MaxFace (32 bytes each). See Face layout below.
  • Bounding volumes (+0x14..+0x38): bbox min, bbox max, bounding sphere (radius + centre xyz). The sphere is the one actually consumed at runtime – PartTriMesh::GetMinimumSphere hierarchically unions it with children’s spheres for culling. These sphere fields have no ASCII-parser equivalent; they’re exclusively binary-format fields written by the BioWare toolset.
  • Material (+0x3C..+0x54): diffuse RGB, ambient RGB, transparencyhint.
  • Textures (+0x58..+0x98): texture_0 (primary/diffuse) and texture_1 (secondary/lightmap), each a 32-byte null-terminated string, plus 32 bytes of padding up to +0xE8.
  • UV animation (+0xEC..+0xF8): uv_direction_x, uv_direction_y, uv_jitter, uv_jitter_speed. Gated by animate_uv (+0xE8).
  • MDX vertex layout (+0x100..+0x12F): flags bitmask plus 11 per-attribute byte offsets. Described in the next subsection.
  • Counts and flags (+0x130..+0x13B): vertex_count (u16), texture_channel_count (u16), six 1-byte booleans (light_mapped, rotate_texture, is_background_geometry, shadow, beaming, render).
  • Tail (+0x13C..+0x14B): total_surface_area, one unresolved reserved slot, mdx_data_offset, vertex_data_ptr.

Out of 332 bytes, 61 fields are fully confirmed through Ghidra cross-referencing, 5 are confirmed-unused, 1 is “very likely” (the always-3 indices_per_face), and exactly 1 remains unresolved (the 4 bytes at +0x140, which the constructor initializes to zero and no known function ever touches).

MDX vertex layout

The flags field at extra +0x100 is a bitmask describing what each MDX vertex record contains:

BitComponentSize
0x01position3×f32 (12B) – always set
0x02UV1 / tverts02×f32 (8B)
0x04UV2 / tverts12×f32 (8B)
0x08UV3 / tverts22×f32 (8B)
0x10UV4 / tverts32×f32 (8B)
0x20normal3×f32 (12B) – always set
0x80tangent space3×3×f32 (36B) – bump-mapped meshes

Common patterns in vanilla K1: 0x21 (pos+norm only, 24B stride), 0x23 (+UV1, 32B), 0x27 (+UV2, 40B), 0xA7 (+tangent, 76B).

Note that vertex colours have no flag bit. Their presence is signalled by the per-attribute offset slot being != -1. The 11 offset slots are:

SlotExtra offsetFieldEvidence
0+0x104positionLightPartTriMesh reads 3×f32, world-transforms
1+0x108normalLightPartTriMesh reads 3×f32, rotation only
2+0x10Cvertex colorChecked != -1, reads RGB only. Alpha unused.
3+0x110UV1PartTriMesh reads 2×f32
4+0x114UV2Structural: tverts1 in InternalGenVertices
5+0x118UV3Structural: tverts2
6+0x11CUV4Structural: tverts3
7+0x120tangent spaceFilled by CalculateTangentSpaceBasis
8–10+0x124..+0x12CreservedAlways -1 across 215 surveyed vanilla meshes

Vertex colour alpha is unused (confirmed 2026-04-04). LightPartTriMesh reads only bytes [0], [1], [2] (RGB). Byte [3] is stored but never read. The rendered output hardcodes alpha to 0xFF. The fourth byte exists purely for alignment.

Important subtlety: the engine doesn’t trust any of these values on load. InternalPostProcess at 0x0043cf00 recomputes the flags, stride, per-attribute offsets, and mdx_data_offset from scratch, based on which vertex components are actually present in the node’s arrays. It also recomputes vertex normals via edge cross products, and re-derives the bounding box and sphere. The on-disk values preserve the compiler’s original output, but they’re cosmetic from the engine’s perspective.

This has a consequence for tooling: you can largely get away with wrong values in these fields as long as your mesh is otherwise valid, because the engine will fix them up at load time. But a correct writer should still populate them – community tools (kotorblender, mdledit) depend on them, and the BioWare build pipeline does too.

Skin mesh (0x061)

100 extra bytes beyond TriMesh. Skinning data (bone weights, inverse-bind-pose rotation and translation, bone-index mapping) sits here, along with several padding regions:

Skin offsetFieldLayoutNotes
+0x00weightsCExoArrayListAlways zero in binary files.
+0x14bone_weight_dataptrRelocated if count at +0x18 > 0.
+0x1Cqbone_ref_invCExoArrayListInverse-bind rotations.
+0x28tbone_ref_invCExoArrayListInverse-bind translations.
+0x34bone_constant_indicesCExoArrayListBone-index remap.

The weights array deserves a call-out. A 52-byte SkinVertexWeight struct exists and is fully specified by the ASCII parser – 4 bone names, 4 weights, some metadata – but in the binary path, ResetSkin never relocates its pointer, and a corpus scan of all 968 skin nodes across 2832 vanilla models found zero non-empty weights arrays. Binary models store per-vertex bone data exclusively in MDX (via dedicated bone-weight and bone-index offsets), and the weights CExoArray is just a 12-byte zero blob on disk.

AnimMesh (0x0A1)

56 extra bytes. Carries a sample_period scalar and two CExoArrayList fields (anim_verts, anim_t_verts) for time-sampled vertex animation. The remaining six fields (three pointers + three counts + some padding) are runtime-only and zero on disk. Fun fact: no community tool (kotorblender, mdledit, kotormax, reone, xoreos, pykotor) parses AnimMesh nodes – we may have the first structured reader for this type.

Also: ResetAnim is peculiar in that it processes the extra data before calling ResetTriMeshParts, the reverse of every other mesh subtype. There’s no obvious reason for this.

Dangly mesh (0x121)

The simplest mesh subtype, 28 extra bytes. Four fields: a per-vertex constraints CExoArrayList, and three inline floats (displacement, tightness, period) that parameterize the soft-body simulation. A single conditional pointer at the tail is relocated only when the TriMesh vertex count is non-zero.

Dangly meshes are BioWare’s hack for cloth and hair – rigged to the skeleton like a skin mesh, but with simulation parameters that let parts of the geometry lag and swing.

AABB walkmesh (0x221)

4 extra bytes: a single pointer to the root of an AABB tree stored inline in the MDL blob.

The AABB tree is a flattened binary search tree written in DFS preorder. Each node is 40 bytes:

OffsetSizeFieldNotes
+0x0012box_min3×f32 AABB minimum corner
+0x0C12box_max3×f32 AABB maximum corner
+0x184right_childContent-relative offset (0 = no child)
+0x1C4left_childContent-relative offset (0 = no child)
+0x204face_indexi32. Leaves: ≥ 0. Internal: −1.
+0x244split_direction_flagsAxis bitmask: 1=+X, 2=+Y, 4=+Z, 8=−X, 16=−Y, 32=−Z

Note that right_child comes before left_child in the struct – this is the actual field order, not a typo. Matches Ghidra and the mdledit/mdlops implementations.

Leaf nodes have left = 0, right = 0, face_index ≥ 0, split_direction_flags = 0. Internal nodes have both children non-zero, face_index = -1, and flags computed from the child bounding-box separation. The format is the classic spatial subdivision tree used for fast triangle lookups during pathfinding and collision queries.

ResetAABBTree at 0x004a0260 recurses the tree, relocating each child pointer. It manually unrolls to depth 4 before recursing (the engine’s author was clearly worried about stack depth on a modest C++ compiler).

Lightsaber (0x821)

20 extra bytes – small but architecturally notable:

Saber offsetFieldNotes
+0x00saber vert dataRelocated pointer
+0x04saber UV dataRelocated pointer
+0x08saber normal dataRelocated pointer
+0x0CGL vertex pool IDRuntime-only (set by RequestPool)
+0x10GL index pool IDRuntime-only

Three arrays of exactly 176 vertices each (NUM_SABER_VERTS = 176, confirmed by kotorblender): position, UV, normal. The saber blade is a fixed-topology mesh – BioWare pre-baked the geometry as a flexible band that can be animated by swinging the endpoint controllers.

Unlike Skin/Dangly/AnimMesh, the saber uses the base TriMesh gen_vertices and remove_temporary_array callbacks. Its geometry doesn’t morph dynamically at the vertex-processing level – the animation is in the controller track.

Controllers and animation

The controller header

Controllers are the keyframe-animation primitive. Each node has an array of 16-byte NewController headers (at node +0x38) plus a shared pool of float data (at +0x44). Each header describes one animatable property of that node:

OffsetSizeFieldNotes
+0x00u32type_codeByte offset of the target property in the Part struct.
+0x04i16supermodel_linkAdditive-blending property offset; -1 = no blending.
+0x06u16row_countNumber of keyframes.
+0x08u16time_data_offsetFloat-array index for time values.
+0x0Au16data_offsetFloat-array index for value data.
+0x0Cu8value_type_and_flagsLow nibble: 1=float, 2/4=quaternion, 3=vector. Bit 4=0x10=Bezier.
+0x0D3paddingAlignment to 16 bytes. Never read.

The type_code is elegant: it’s literally the byte offset into the Part struct where the animated value lives. NewController::Control dereferences it as *(float*)(part_ptr + type_code). So type_code = 8 means “position” because position sits at Part+0x08; type_code = 20 means “orientation” because orientation sits at Part+0x14 (as a compressed axis-angle quaternion); and so on. This collapses what would otherwise be a switch over property IDs into direct pointer arithmetic.

The value_type_and_flags byte at +0x0C has a compound encoding that bit us hard early on:

  • Low nibble (& 0x0F) – value-type discriminator: 1=float, 2 or 4=quaternion, 3=vector. Selects the interpolation path (Lerp/Slerp/VectorLerp).
  • High nibble (& 0xF0) – flags. 0x10 signals Bezier interpolation, which triples the per-keyframe value count (each keyframe is value + in-tangent + out-tangent).
  • Special case: for orientation controllers (type code 20) with raw byte value == 2, the keyframe is a compressed quaternion packed into a single u32, not two f32 values.

The low nibble happens to coincide with the “number of floats per keyframe row” for simple cases (1, 3, 4), which is why the earlier interpretation of this byte as column_count mostly worked – until it didn’t. See the controller bug below.

Self-describing rows

Because value_type_and_flags is inline in each controller header, the binary format is entirely self-describing for animation data. The reader doesn’t need a lookup table mapping type codes to column counts – it reads the flags byte and knows how many floats to consume per row.

This is useful because vanilla K1 contains controller type codes (0x68, 0x188) that aren’t documented in any community reference. Trying to parse these with a closed enum caused 517 of 2832 vanilla MDLs (18.3%) to fail. MdlControllerType is therefore a newtype struct MdlControllerType(u32) with named constants for the three universally-confirmed base types (POSITION = 8, ORIENTATION = 20, SCALE = 36) and accepts any other u32 losslessly.

Base vs type-specific controllers

Three controllers are universal – they exist on every node type:

ASCII nameCodeColumnsMeaning
position83x, y, z
orientation204x, y, z, angle (compressed axis-angle)
scale361uniform scale factor

Type-specific codes live at higher numbers: light controllers start at 76 (color), emitter controllers are at 88+. All three base codes also support a Bezier variant (signalled by the flag bit, not a separate type code).

The MDX file: a mystery

Now for the strangest part of the format.

The MDX file contains interleaved vertex data – positions, normals, UVs, tangent space, colours – packed into records of width given by the mesh vertex_stride field, aligned into per-mesh blocks with sentinel-float terminators separating them. It looks exactly like what you’d expect a GPU vertex buffer to look like.

And the K1 engine never reads it.

Here’s the complete trace through InputBinary::Read:

  1. Read the MDX file into a buffer (pbVar9).
  2. Call Reset(mdl_content, mdx_content, resource).
  3. Reset passes mdx_content as param_3 through a chain of function calls (ResetMdlNode, ResetTriMeshParts, …). Every downstream function has param_3 as a formal parameter.
  4. param_3 is never used. In ResetTriMeshParts, it’s literally overwritten as a loop counter on line 67.
  5. Back in InputBinary::Read, line 78: _free(pbVar9). The MDX buffer is freed.

At no point does any vertex-related code path consume MDX data. InternalGenVertices builds vertex buffers from verts_arrays, which lives in the MDL content blob. ProcessVerts recomputes normals from geometry. LightPartTriMesh reads from the GL pool populated at +0xAC of the model header – which is sourced from the MDL content, not the MDX file.

So where does the vertex data actually come from? From a parallel set of position-only arrays stored inside the MDL content blob, pointed to by vert_array_offset at mesh +0x148 (content-relative), with additional UV/colour/normal data in the MdlNodeTriMeshVertArrays structures.

The MDX file, in short, is a redundant interleaved copy of data that the K1 engine could reconstruct from the MDL alone. Most likely theories for why it exists:

  • Build-pipeline artifact. BioWare’s Aurora engine (Neverwinter Nights) may have used the MDX format directly, and the K1 pipeline inherited the file-layout convention without the consuming code path.
  • Toolset requirement. Third-party editors and the BioWare toolset itself may still parse MDX for authoring workflows.
  • ResetLite path. There’s a separate “lightweight” loader (InputBinary::ResetLite at 0x004a11b0) that may use MDX for a reduced in-memory representation – unverified.

For our purposes, this has two consequences:

  1. Engine-functional MDX is near-trivial. Any MDX file the K1 engine happily ignores is a valid MDX file. You could write all zeros and the game would run.
  2. Round-trip-accurate MDX requires the per-mesh terminator convention (described next), because community tools do read MDX, and byte-identical round-trip is a useful correctness check.

Per-mesh terminators and alignment

Empirically, vanilla MDX files are larger than sum(vertex_count × stride). Across 2832 vanilla K1 models, 2445 have MDX files with excess bytes, totalling 3,278,456 bytes corpus-wide.

The excess has structure. After each mesh’s vertex data, there’s a terminator row of exactly one stride’s worth of bytes, beginning with three sentinel floats and padded with zeros:

Mesh typeSentinel valueHex (f32 LE)
Non-skin (type & 0x40 == 0)10,000,000.000 96 18 4B
Skin (type & 0x40 != 0)1,000,000.000 24 74 49

Corpus sentinel detection: 6,973 non-skin sentinels, 6 skin sentinels, 0 unknown patterns.

Between meshes, the cursor is padded to the next 16-byte boundary. The last mesh has no trailing alignment:

cursor = 0
for each mesh in MDX order:
    cursor += vertex_count × stride   # vertex data
    cursor += stride                   # terminator row
    if not last mesh:
        cursor = (cursor + 15) & ~15   # 16-byte alignment
mdx_file_size = cursor

For stride-24 meshes, the gap between meshes is either 24 or 32 bytes depending on current alignment. For stride-32 and stride-64 meshes, it’s always exactly stride because the stride is already a multiple of 16.

Mesh ordering in MDX

Non-skin meshes come first, then skin meshes. Within each group, the order is DFS-traversal-of-the-tree – mostly. About 27% of vanilla models exhibit a compiler-specific permutation that defers “second children” of paired parents until after all their siblings’ first children. This is reproducible for our own output (if we write DFS, we read DFS), but not for byte-identical round-trip of every BioWare file.

Writing in standard DFS order (non-skin first, skin second) produces semantically identical MDX data with the correct total size. 1784 of 2444 models match byte-for-byte; the remaining 660 have the non-standard compiler ordering.

What this means for mdx_data_offset

The mesh header has two adjacent u32 fields at +0x144 and +0x148:

  • +0x144 mdx_data_offset: per-mesh byte offset into the MDX file. Used by community tools to seek directly to that mesh’s vertex block. The engine also uses this after InternalPostProcess overwrites it with a GL-pool offset.
  • +0x148 vert_array_offset: content-relative pointer to the position-only vertex data embedded in the MDL content blob. Used by the engine during load. Relocated by ResetTriMeshParts via param_1->field60_0x198 = param_2 + param_1->field60_0x198 – where param_2 is the MDL content base, not the MDX base.

These two fields were conflated under a single MDX_OFFSET = 0x148 constant in our implementation for several months, which caused the reader to lose the MDX offset entirely and the writer to overwrite the content pointer with an MDX offset. Full story in War stories.

Face layout

Faces are 32-byte records (MaxFace) stored in the TriMesh faces CExoArray:

OffsetSizeFieldTypeNotes
+0x0012plane_normal3×f32Face plane normal.
+0x0C4plane_distancef32Plane equation: n·p = d.
+0x104surface_idu32Walkability / material identifier.
+0x146adjacent3×u16Indices of adjacent faces (for AABB/pathfinding).
+0x1A6vertex_indices3×u16Triangle vertex indices.

The plane normal and distance are pre-computed by the BioWare toolset. They can be re-derived from the geometry but the binary format preserves them. The adjacency graph is what makes AABB walkmesh lookups fast – each triangle points to its neighbours, enabling constant-time stepping during pathfinding.

An early version of our reader assumed 12-byte faces (just the vertex indices). This led to every 2.67th “face” being interpreted from garbage bytes belonging to the next face’s plane normal. It was masked by synthetic round-trip tests – write wrong, read wrong, match! – and only caught when vanilla-file validation found vertex indices exceeding the mesh’s vertex count.

War stories and implementation history

A brief chronicle of the bugs found while building the Rust reader/writer, because the “how we know this” is often as useful as the “what we know”.

The 12-byte face bug

Described above. The MaxFace stride is 32 bytes, not 12. Caught by vertex-index bounds checking against vanilla files.

Mesh header size corrections

The whole mesh extra-header was misunderstood for a long time. A sample of the corrections, all fixed in late February 2026:

  • VERTEX_COUNT offset was 0x9E → actually 0x130
  • MDX_OFFSET was 0xB8 → actually two separate fields at 0x144 and 0x148
  • VERTEX_STRUCT_SIZE was 0xBC → actually 0xFC
  • MESH_EXTRA_SIZE was 200 bytes → actually 332 (0x14C)
  • RENDER boolean was missing entirely → added at 0x139
  • SHADOW boolean was missing entirely → added at 0x137

All of these stemmed from extrapolating offsets from partial hex dumps rather than decompiling the struct. Ghidra’s MdlNodeTriMesh struct definition settled the whole thing – once the Ghidra type was aligned, the field offsets fell out directly.

Controller column-count encoding

Our reader initially used the raw value_type_and_flags byte (at controller +0x0C) directly as a float count per row. This worked for the common case (position=3, orientation=4, scale=1) but broke in two scenarios:

  • Bezier controllers set bit 0x10, turning raw=3 (Bezier position) into a byte value of 0x13 = 19 columns, not 9.
  • Integral orientation: ORIENTATION controllers with raw byte == 2 mean “compressed quaternion packed into one u32 per row”, not “2 f32 values per row”.

The integral-orientation case was the more painful bug: a c_dewback scan showed 876 integral-orientation controllers; c_rancor had 1,212. Reading 2 floats instead of 1 consumed double the expected data, desynchronizing every subsequent controller in the data array. Every node’s animation after the first compressed-quaternion keyframe was reading from a shifted window of garbage.

Fix: decode the raw byte with & 0x0F masking plus the two special cases (Bezier multiplies by 3; integral orientation uses 1 u32 per row regardless). The raw byte is preserved in a raw_column_count field for round-trip fidelity.

Animation node_number at +0x02

The 80-byte node header’s first 8 bytes are type_flags (u16), node_number (u16), name_index (u16), padding (u16). Our offset map had NODE_ID = 0x04, which pointed to name_index, not node_number.

For animation nodes specifically, node_number is the engine’s key for matching animation keyframe nodes to their geometry-side skeleton bones. Writing zeros at +0x02 and stuffing the name_index at +0x04 meant every animation node had node_number = 0, so every keyframe targeted the root bone. Visually: characters froze in T-pose with no skeletal motion whatsoever.

Fix: read node_number from +0x02 explicitly; derive name_index from the name map at +0x04.

MDX per-mesh seeking

Our MDX reader used a cumulative cursor assuming non-skin-first DFS ordering. For the ~51% of vanilla models where MDX layout doesn’t match that assumption, vertex data was assigned to the wrong mesh nodes. Self-round-trip tests couldn’t detect this – we were reading and writing the same wrong assignment, which is a consistency check for the tool’s own output, not for correctness against vanilla.

Fix: seek to info.mdx_data_offset (the +0x144 field) for each mesh, matching kotorblender and mdledit behaviour. The cumulative-cursor logic remains in the writer, which produces its own layout and backpatches the offset field; the reader trusts whatever the file says.

Name-table dead entries

220 vanilla K1 models have name tables containing entries that no node references. These turn out to be walkmesh node names (*_wok, *_pwk, *_dwk variants) from BioWare’s build pipeline, which apparently shared a single name table across the MDL and WOK outputs.

The engine only performs indexed lookups via name_index; it never iterates the full table or validates the count. Extra entries are harmless dead weight.

Decision: not preserved. Our writer builds the name table from the node tree (matching kotorblender and mdledit), producing files that are functionally identical but 20–80 bytes shorter. This is a known, benign size delta – not a parity bug.

Emitter controller code verification

All 48 emitter controller type codes were independently verified against the engine binary via Ghidra. For each, we located the __stricmp call for the ASCII field name and traced the controller type value stored on match. Every code matched mdledit’s ReturnControllerName table exactly – no additions, no omissions.

One naming correction: the engine’s canonical string for code 200 is "lightningZigzag" (camelCase Z). mdledit has "lightningzigzag" (all lowercase). Functionally identical because the engine uses __stricmp (case-insensitive), but the engine’s capitalization is now what we emit.

Corpus validation status

As of 2026-02-24: 2832/2832 (100%) structural round-trip success (parse → write → parse → compare). This was achieved after fixing three comparison issues in the test harness:

  1. NaN ≠ NaN (IEEE 754): 1559 false failures – floats containing NaN don’t equal themselves. Fixed with bitwise f32::to_bits() comparison.
  2. Parent index ordering: 135 mismatches from depth-first vs. original node ordering. The binary format preserves node ordering but our parent-index reconstruction uses DFS. Semantically equivalent, numerically different – skipped in comparison.
  3. Face NaN values: exactly one model (w_dblsbr_001) has NaN in its pre-computed plane_normal/plane_distance, because one of its faces is degenerate. Round-trips correctly once NaN-aware comparison is used.

Byte-level MDL/MDX equality is a separate target – 1784 of 2444 MDX files match byte-for-byte, with the remaining 660 showing the non-standard BioWare compiler traversal discussed earlier.

Appendix

Emitter field map

304 bytes total (80 base + 224 extra). Emitter-specific data:

Node offsetExtra offsetFieldType
+0x50+0x00deadspacef32
+0x54+0x04blast_radiusf32
+0x58+0x08blast_lengthf32
+0x5C+0x0Cnum_branchesi32
+0x60+0x10control_pt_smoothingi32
+0x64+0x14x_gridi32
+0x68+0x18y_gridi32
+0x6C+0x1Cspawn_typei32
+0x70+0x20updatechar[32]
+0x90+0x40renderchar[32]
+0xB0+0x60blendchar[32]
+0xD0+0x80texturechar[32]
+0xF0+0xA0chunk_namechar[16]
+0x100+0xB0two_sided_texi32
+0x104+0xB4loopi32
+0x108+0xB8render_orderu16
+0x10A+0xBAframe_blendingu8
+0x10B+0xBBdepth_texture_namechar[16]
+0x11B+0xCB(reserved)21 bytes

LOD naming convention

When a model has cullWithLOD set, the engine searches for LOD variants by appending suffixes to the model name:

  • <name>_x – medium LOD
  • <name>_z – far LOD

Loaded via FindModel(name + "_x") and FindModel(name + "_z") as separate Model instances linked to the primary. Not relevant to format parsing, but useful for model validation and lint rules.

Resource type IDs

FormatResource type
MDL2002 (0x7D2)
MDX3008 (0xBC0)

These map to the KEY/BIF resource type system. CAuroraInterface::RequestModel at 0x0070d8d0 resolves models through a sorted requestedModelList.

Dynamic type casts

The engine exposes As* functions for type-checked downcasts. Caller counts indicate runtime usage frequency:

FunctionCallers
AsModel34
AsMdlNodeTriMesh14
AsMdlNodeEmitter11
AsAnimation7
AsMdlNodeLightsaber5
AsMdlNodeSkin4
AsMdlNodeAABB3
AsMdlNodeDanglyMesh3
AsMdlNodeLight3
AsMdlNodeAnimMesh2
AsMdlNodeCamera2
AsMdlNodeReference2

TriMesh (14) and Emitter (11) are the most-queried node types – useful signal for prioritizing implementation completeness.

Binary MDL call graph

For reference when reading Ghidra decompilations:

NewCAurObject (0x00449cc0)
└── FindModel (0x00464110)           [by name; checks cache via BinarySearchModel]
    └── LoadModel (0x00464200)       [on cache miss]
        └── IODispatcher::ReadSync (0x004a15d0)
            └── Input::Read (0x004a14b0)          ← format dispatcher
                ├── InputBinary::Read (0x004a1260)   if first_byte == 0x00
                │   └── Reset / ResetLite                (pointer rewriting)
                │       ├── ResetMdlNode                  (per-node dispatch)
                │       │   ├── ResetMdlNodeParts         (base fields)
                │       │   ├── ResetTriMesh              (mesh subtypes)
                │       │   ├── ResetLight                (light extras)
                │       │   ├── ResetSkin, ResetAnim, ...
                │       │   └── ResetAABBTree             (recursive tree walk)
                │       └── ResetAnimation                (per-animation)
                └── FuncInterp loop                 otherwise (ASCII MDL)
    └── CreateInstanceTreeR (0x00449200)  [builds runtime Part tree from MdlNode tree]

Key Ghidra addresses

For anyone continuing this archaeology, the foundation set of function addresses in swkotor.exe (K1 GOG build):

FunctionAddress
Input::Read0x004a14b0
InputBinary::Read0x004a1260
InputBinary::Reset0x004a1030
InputBinary::ResetMdlNode0x004a0900
InputBinary::ResetMdlNodeParts0x004a0b60
InputBinary::ResetTriMeshParts0x004a0c00
InputBinary::ResetAABBTree0x004a0260
InputBinary::ResetLight0x004a05e0
InputBinary::ResetSkin0x004a01b0
InputBinary::ResetDangly0x004a0100
InputBinary::ResetAnim0x004a0060
InputBinary::ResetLightsaber0x004a0460
InputBinary::ResetAnimation0x004a0fb0
MdlNodeTriMesh::InternalPostProcess0x0043cf00
MdlNodeTriMesh::InternalGenVertices0x00439df0
MdlNodeTriMesh::InternalParseField0x004658b0
MdlNodeEmitter::InternalParseField0x004658b0
MdlNodeEmitter::InternalCreateInstance0x0049d5c0
PartTriMesh::GetMinimumSphere0x00443330
LightPartTriMesh0x0046a9e0
NewController::Control0x00483330
NewController::GetFloatValue0x00482bf0
Model constructor0x0044aa70
MaxTree constructor0x0044a900
ParseNode0x004680e0
Node type flag table0x00740a18