Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Rakata

Rakata is a clean-room Rust implementation of Knights of the Old Republic (KotOR) data formats and tooling. It provides a modular workspace designed for robust, type-safe, and canonical handling of Odyssey Engine game data.

This Wiki serves as the definitive reference manual for KOTOR Formats and Engine Behaviors, designed to decouple format knowledge from the underlying Rust source code.

Documentation Domains

Rakata’s documentation operates on two tiers: the Software API and the Format Specifications.

1. The Workspace (Code API)

The workspace is organized into focused crates and tools. If you are developing against Rakata and need to know the semantic layout of types, functions, and data structures, refer to the respective Rustdocs:

Libraries (crates/)

  • rakata-core: Foundational primitives (ResRef, ResourceType, ResourceId) and core utilities (encoding, filesystem, detection).
  • rakata-formats: Binary and text format readers/writers for 19 KotOR formats including GFF, ERF, RIM, KEY/BIF, MDL/MDX, TPC, TGA, and more.
  • rakata-generics: Typed wrappers around GFF-backed resources (all 13 types: UTW, UTC, UTI, etc.) with loss-aware conversion.
  • rakata-extract: Resource resolution logic, composite module handling (.mod + _s.rim + _dlg.erf), and game-wide resource access (GameResources).
  • rakata-lint: Comprehensive resource validation against engine-derived field schemas. Catches crash-causing mod errors across all formats before they hit the engine.
  • rakata-save: Save game parsing and modification logic.
  • rakata: Facade crate re-exporting the ecosystem.

Tools (tools/)

  • rakata-saveeditor: Desktop GUI application for editing save games.
  • vanilla-inspector: Corpus validation tool for testing format implementations against all vanilla game assets.

🔗 View Rakata Rustdocs

2. Format Specifications (This Wiki)

The entire formats/ specification manual effectively serves as Rakata’s formal Evidence Log. If you need to understand binary structure, historical context, or how the original swkotor.exe engine interprets byte bounds under the hood (via Ghidra-backed engine constraints), you are in the right place!

Navigate through the sidebar to explore our exhaustive, decoupled format libraries:

  • Archive Formats – Detailed overviews of encapsulated containers (BIF, KEY, ERF, RIM).
  • GFF Structure – The bedrock of KOTOR’s data, exposing the 13 distinct blueprint constraints (Creatures, Dialogues, Triggers, etc.).
  • 3D Models & Mesh – MDL/MDX structures and binary walkmesh topologies.
  • Textures & Audio – Overviews detailing graphic compression (TPC, DDS) and MP3/Miles Sound System wrappers.
  • Text & Data Formats – Localized Talk Tables (TLK), rule mappings (2DA), and hierarchical layout geometries (LYT, VIS).

Ready to dive in? Head over to the Goals & Roadmap to see where the project is heading, or look into the Architecture logic that powers the Rakata suite.

Project Roadmap

This document outlines what we’re tinkering with in rakata and where the project is heading.

Note

For day-to-day progress, bug fixes, and specific technical tasks, check out the Codeberg Issues tracker instead.

The End Goal

Right now, our libraries are mostly just good at reading and writing individual game files - like extracting a 3D model, opening a save file, or decoding audio. But the real dream for rakata is to build a full, modern KOTOR engine integration.

Eventually, it would be cool to tie all these isolated pieces together into an actual rendering pipeline. For example: dropping a vanilla model into an active window and have the engine stream the textures and background audio straight from the game data.

How We Get There

Since this is a passion project, we try to match the original game behavior down to the exact byte before building higher-level abstractions on top of it. It takes a bit longer, but it keeps us from having to constantly rewrite core parsers when we stumble into weird edge cases.

1. Laying the Foundation (Mostly Done)

Our core libraries (rakata-formats, rakata-save, etc.) can currently read, write, and safely roundtrip over 17 different KOTOR file formats. We’ve tackled a lot of the weird legacy archives (BIF), models (MDL/MDX), and raw textures (TPC), ensuring they line up with vanilla behavior.

However, the foundation is still growing! We still have a handful of outstanding data formats to map out and implement, including Pathfinding (PTH), UI Layouts (GUI), and Walkmeshes (WOK/DWK/PWK).

Additionally, formatting and bytecode support for NCS (Compiled Scripts) is actively being prioritized (see Issue #19) to allow rakata to interface natively with upcoming Rust-based community compilers and decompilers.

2. Building Real Tools (Our Active Focus)

Now that we can parse the data reliably, we are building stuff the community can actually use:

  • Mod Linter: A tool to scan modded files and point out if they break the game’s actual data constraints, catching crashes before you load them in-game.
  • Save Editor: A basic offline save editor (rakata-saveeditor) built directly on top of our stable format parsers.
  • Audio Streaming: Updating the generic audio logic (rakata-audio) so we can natively stream game music and voice lines instead of loading giant buffers into memory.
  • Drop-in Replacements: Providing modern, reliable drop-in replacements for legendary (but aging) community tools. By backing these with rakata’s strict parsing rules, we can offer faster, safer, cross-platform native tools for unpacking archives, compiling models, and building mods. (Note: While we aim to replace these tools, we will not inherit their legacy bugs or non-vanilla API quirks. When in doubt, the original game engine is our only source of truth).

3. KOTOR 2 (TSL) Support

We are strictly focusing on KOTOR 1 right now, but extending parsing support for TSL via compatibility flags is a planned enhancement for further down the line once K1 is completely stabilized.

4. The Runtime Engine (The dream but probably a few years away)

Once our standalone tools prove that our format parsers are perfectly stable, we have a pipedream to one day start weaving them together into a natively synchronized rendering loop.

Architecture Guide

This document outlines how the rakata workspace is structured and the design principles we try to stick to.

Core Principles

  1. Vanilla K1 First

    • By default, we target the original vanilla behavior of KotOR 1.
    • Compatibility for TSL or community tools is strictly opt-in behind feature flags, not the default assumption.
    • When deciding how to parse something, the original game engine is our ultimate source of truth. We use local fixtures and original game data to prove our parsers work, rather than just copying how older community tools did things.
  2. Aim for Lossless

    • We want to be able to read a file and write it back out to the exact same bytes. We’ve largely achieved this for standard archives and data formats (GFF, ERF, RIM, KEY, TLK, etc.).
    • For highly complex formats (like MDL/MDX models), there are some known divergences where achieving a byte-exact roundtrip is essentially impossible due to how the original compilers ordered geometry blocks. We track these exceptions, but the output still safely runs in-game.
    • No Lazy Pass-throughs: If a file has undocumented fields, we don’t just read them as an opaque Vec<u8> blob and blindly pass them through. Our goal is to properly reverse-engineer and map every single struct boundary. However, if we identify defined “reserved” fields in the binary layout that we haven’t cracked the meaning of yet, we will map them as properly sized reserved values so we don’t accidentally drop data the engine might rely on. (Note: explicit blank padding bytes aren’t stored in memory at all - we just recalculate those dynamically on write).
  3. Strict Text Handling

    • All text decoding goes through rakata-core::text.
    • Localized text (TLK entries, strings) uses language-aware encodings (Windows-1252, Shift-JIS, etc.) to match what the engine expects.
    • Binary strings (like node names or texture paths) use TextEncoding::Windows1252 since that’s what the engine actually uses under the hood. No silently stripping weird characters with lossless backups.

(For day-to-day coding rules around iterators, zero-cost abstractions, and memory safety, see the Idiomatic Rust section in the contributing.md guide!)

Workspace Boundaries

Note: This layout is a living target! Some of these crates (like rakata-audio and rakata-saveeditor) are currently under active development. As we tackle our near-term roadmap goals – like building out the rakata-lint validation engine – expect these existing crates to flesh out, alongside brand new sibling crates being added to the ecosystem.

The workspace is organized in a clean dependency chain. Crates can only depend on crates listed “above” them:

rakata-core          (no workspace deps)
  rakata-formats     (depends on: core)
    rakata-audio     (depends on: core, formats)
    rakata-generics  (depends on: core, formats)
    rakata-extract   (depends on: core, formats)
    rakata-lint      (depends on: formats, generics)
    rakata-save      (depends on: core, formats)
rakata               (facade: re-exports all library crates)

Library Crates (crates/)

  • rakata-core: The absolute basics (ResRef, IDs) and core utilities like file streams and text encoding.
  • rakata-formats: Our massive library of parsers and writers (GFF, ERF, BIF, MDL, TPC, etc.). This parses bytes into objects, but doesn’t know anything about how the game actually uses them.
  • rakata-audio: Audio streaming and decoding for the engine’s various sound formats (WAV, ADPCM, MP3).
  • rakata-generics: Strongly-typed Rust models for all the different GFF files (like Doors, Items, Characters).
  • rakata-extract: The logic for hunting down actual game files in the wild. It knows how to look inside ERFs, check the Override folder, and resolve files just like the engine does.
  • rakata-lint: Our rule engine for scanning modded files and checking them against vanilla schema constraints.
  • rakata-save: High-level logic for safely reading, editing, and backing up save files.
  • rakata: A handy facade crate that re-exports everything so you only need to add one dependency.

Tool Crates (tools/)

  • rakata-saveeditor: The actual desktop application for editing save files.
  • vanilla-inspector: A testing utility for validating our parsers against the actual mass of game files.

Format API Guidelines

Public API Shape

Every format parser in rakata-formats generally provides the same clean interface:

  • read_<fmt><R: Read>(reader: &mut R) -> Result<T, E>
  • read_<fmt>_from_bytes(bytes: &[u8]) -> Result<T, E>
  • write_<fmt><W: Write>(writer: &mut W, data: &T) -> Result<(), E>
  • write_<fmt>_to_vec(data: &T) -> Result<Vec<u8>, E>

Formats with multiple output modes (like exporting models to ASCII text or JSON) just use variations of these names (read_mdl_ascii()).

  • Generic Traits: We strongly prefer accepting generic I/O trait bounds (Read, BufRead, Write, Seek) over concrete types. Accept the narrowest trait that covers your API’s needs so callers aren’t forced to jump through hoops.

Error Handling

Robust parsing means strict error boundaries:

  • Each format module must define its own domain-specific error enum (e.g., GffError, ErfError) using the thiserror crate. Do not use generic stringly-typed errors or Box<dyn Error>.
  • Low-level read failures (like sudden bounds exhaustion or bad magic numbers) should wrap our shared BinaryLayoutError.
  • Never unwrap() at an API boundary! Only fail explicitly via Result or use .expect() with a hardcoded rationale if it is impossible to fail.

Memory & Ownership

While we try to avoid deep cloning and heavy allocations behind the scenes, we default to owned data types when crossing public API boundaries. Unless a module is explicitly built and documented as a zero-copy “View” type, you should avoid passing nasty lifetimes into the caller’s lap.

Keeping Concerns Separated

  • Dumb Parsers: Format modules in rakata-formats are intentionally “dumb”. They solely translate between raw byte streams and Rust structs without any awareness of game architecture, filesystems, or what a “module” is.
  • Smart Extractors: All the messy environment logic – hunting down loose files, enforcing vanilla precedence rules (e.g., checking the Override folder before extracting from a BIF archive), and assembling composite files – lives safely isolated inside rakata-extract. This separation guarantees our parsers can cleanly process isolated test files just as well as they operate in a massive live-game workflow.

Tracing & Telemetry

We strongly encourage instrumenting format parsers with tracing::instrument spans to help pinpoint exactly where a badly formed file breaks during a parse. However, this telemetry must remain entirely zero-cost for consumers who don’t need it! We achieve this by wrapping public parser entry points in conditional attributes: #[cfg_attr(feature = "tracing", tracing::instrument(...))]. If a user doesn’t explicitly opt-in via their Cargo.toml, the Rust compiler strips the instrumentation entirely.

Serialization (Serde)

Just like tracing, serde support for exporting our parsed files to JSON or YAML must be treated as a zero-cost, opt-in feature. Format structs and types should generously derive Serialize and Deserialize when the serde feature flag is enabled. This allows downstream utilities (like the Save Editor) to effortlessly convert memory layouts into text formats, while ensuring the core parsers stay extremely light for purely binary-focused applications.

Beyond Basic Parsing

While rakata-formats gives us the ability to parse isolated bytes, the game engine is much more complicated. Our higher-level crates exist to bridge that gap between “dumb bytes” and “actual game logic”.

Finding Files (rakata-extract)

rakata-extract handles the messy reality of finding files scattered across a massive KOTOR installation. It mirrors the vanilla engine’s lookup hierarchy in three distinct layers:

  1. Primitives: Grabbing a file out of a single archive (like unpacking a standalone ERF or BIF file).
  2. Composition: Treating related archive sets as a single “Module” (like grouping a .mod file with its matching _s.rim and _dlg.erf files so they load transparently together).
  3. Game-wide: Creating a unified GameResources tree that maps out the entire game installation.

Because we want our extraction to perfectly mirror vanilla behavior, lookups are strictly case-insensitive, and loading precedence is explicitly designed to mirror how the original game works (so a file in the Override folder automatically beats a file buried in a BIF archive).

Strongly-Typed Data (rakata-generics)

When we parse a .utc Character file, rakata-formats just hands us a raw GFF tree of untyped labels and values. rakata-generics wraps those raw data blobs in strongly-typed Rust structs (like Character, Item, Door). This guarantees that if a developer needs to access a character’s “Strength” stat, they get a guaranteed u8 property rather than blindly guessing string handles inside a raw binary tree.

High-Level Interaction (rakata-save & rakata-lint)

Finally, crates at the top of the stack use our extraction logic and strongly typed generic structs to actually do things. rakata-lint compares typed structs against vanilla constraints to catch modding errors, while rakata-save gracefully handles unpacking, editing, and re-compressing massive save-game directories without corrupting the player’s campaign!

Contributing Guide

Welcome to the Rakata workspace! This guide outlines how we build, how we test, and the core rules for keeping our code clean, compliant, and maintainable.

License Policy

  • License: All workspace crates use GPL-3.0-or-later.
  • Third-Party Components: New dependencies must be compatible (MIT, Apache-2.0, BSD). Add them to THIRD_PARTY_NOTICES.md before merge.

Clean Room Implementation

To ensure everything we build is 100% our own original work and we aren’t accidentally borrowing from other community tools (if you’re curious about why we’re so strict about this, check out docs/LEGAL.md):

  1. Reference Policy: Treat existing tools (like PyKotor) as behavioral references, not copy sources.
  2. No Copy-Paste: Do not copy source code blocks, large comments, or docstrings from third-party sources into Rust files.
  3. Re-Derivation: Derive implementation logic from format documentation, observed behavior (hex dumps), and black-box fixture analysis.
  4. Reverse Engineering:
    • Behavior verification via disassembly tools (e.g., Ghidra) is allowed for interoperability analysis.
    • Do not copy decompiled code into source files.
    • Record findings as paraphrased behavior notes natively within the relevant format specification under docs/src/formats/.

What belongs in the Engine Audits

The entire Rakata format specifications manual (docs/src/formats/) serves as the engine audit layer between reverse engineering and implementation. All Rust code is written strictly from these engine audits (specifically the Engine Audits & Decompilation sections embedded in each format’s blueprint), not from raw decompilation output.

  • Record: Field names, data types, default values, error conditions, and observable behavioral rules (e.g., “field X is clamped to range 0–100”, “list is sorted ascending by field Y”).
  • Do not record: Step-by-step algorithmic sequences, control flow structure, or implementation details that go beyond what is needed for interoperability. The test is: could someone implement correct behavior from this note without it dictating a specific code structure?

Format Work vs Engine Reimplementation

Right now, this workspace is exclusively focused on format parsing, linting, and modding tools - reading, writing, and validating the game’s actual data files. We are fundamentally just mapping out how the original game structures its data so we can build cool tools around it.

Building an actual game engine replacement (with gameplay logic, AI, and rendering pipelines) is a completely different beast for another day. But that’s exactly why these format blueprints are so critical: if someone wants to build an engine later, they can just use our shared engine audits to understand the data, rather than having to dig through raw decompiled binaries themselves!

Code Style & Linting

Pre-commit Hooks

We use pre-commit to keep the codebase consistently formatted without anyone having to manually police it. After cloning the repository, it’s highly recommended to set up the hooks:

pre-commit install
pre-commit install --hook-type pre-push

This registers two quick automated stages:

  • pre-commit: Formats your code via cargo fmt --all (auto-fixing it for you) and runs cargo clippy across all targets.
  • pre-push: Runs cargo test --workspace --all-features to ensure tests are green before you push.

Try to avoid skipping hooks using --no-verify. If a hook catches something, it’s usually just a helpful clippy suggestion or a quick formatting tweak!

Manual Checks

If you don’t like automated hooks and prefer running things manually from the workspace root before committing, you absolutely can:

cargo fmt --all
cargo clippy --workspace --all-targets --all-features
cargo test --workspace --all-features

Note: Passing --all-features to clippy and test is important so it catches optional code paths like serde and tracing! We just ask that fmt and clippy run cleanly before you open a Pull Request.

Idiomatic Rust

To keep the codebase consistently safe, lean, and fast, we heavily rely on a few core Rust principles:

  • Safe Numeric Casts: To prevent silent truncation bugs, we enforce #![warn(clippy::as_conversions)]. Avoid the raw as keyword; lean on From, TryFrom, or .into(). If an unsafe cast is truly unavoidable (like an f32 down to an i32), use a scoped #[allow(clippy::as_conversions)] and drop an inline comment explaining why it’s safe.
  • No Primitive Obsession: We heavily utilize strongly-typed wrappers (like ResRef) rather than passing raw [u8; 16] or String primitives around.
  • Strict Error Handling: We explicitly forbid .unwrap() and .unwrap_unchecked() in library code. Everything must propagate cleanly via Result using typed error enums (managed via thiserror).
  • Composition over Hierarchy: We prefer lean, flat structs and trait combinators over deep, messy object-oriented class hierarchies.
  • Iterators over Loops: We prefer functional iterator chains (map, filter, fold) over maintaining manual mutable state in for loops.
  • Zero-cost Features: Optional functionality (like serde serialization or tracing telemetry) must introduce absolutely zero overhead when disabled.
  • Safe by Default: We use #![forbid(unsafe_code)] across all core parser crates to enforce strict memory safety boundaries.

Testing & Quality

Our testing approach is a Gray Box strategy: we use our hard-earned white-box knowledge of the game engine (via Ghidra audits) to build extremely strictly-validated black-box test cases for our parsers. We want to test against how the real game engine behaves, not against artificial mocks.

When adding a brand new format, please make sure your PR includes:

  • Fixture-Backed Tests: Full roundtrip coverage using synthetic test files (stored in fixtures/). We never commit real game assets; run cargo test --test gen_fixtures -- --ignored to safely generate them! Byte-exact roundtrip assertions are the gold standard for any format where the engine consumes bytes exactly as written.
  • Mutation Tests: A quick pass to verify the parser safely rejects malformed or corrupted inputs without panicking (usually wired up via corruption_matrix.rs).
  • Module Documentation: A clean rustdoc block showing the basic format layout.

The Reserved Field Rule

Game engines are weird, and sometimes they leave mysterious “padding” or “reserved” sections in their binary formats. Every struct field that corresponds to a reserved region must be:

  • Stored strictly as a named array (e.g., reserved: [u8; N]) in the format struct.
  • Read directly from the source bytes verbatim.
  • Written back verbatim during a roundtrip.

If a writer zeroes out or silently drops a reserved field you parsed, we consider that a “lossless bug” – even if the engine doesn’t explicitly seem to use those bytes. If you’re constructing a brand new file from scratch, you can safely write zeroes for reserved regions, but the struct must be capable of storing exactly what it read off disk.

Release Process

(TODO: We haven’t cut an official production release yet! Right now we are aggressively building out the rakata-lint engine rules and expanding our format coverage. Once we officially stabilize v0.3.0 to crates.io, we’ll formalize our exact release checklist, dependency license refreshes, and CI pipelines here.)

Legal & Compliance

Disclaimer: We aren’t lawyers! The following information references specific legal statutes regarding software interoperability and reverse engineering simply to clearly demonstrate our commitment to strictly lawful development.

Project Intent

Rakata is an open-source research project and software library strictly designed to build interoperability with the data formats used by Star Wars: Knights of the Old Republic (KOTOR).

  • Our Goal: We want to empower users to access, read, edit, and safely modify their own legally purchased game files on modern operating systems using open-source tools.
  • No DRM Circumvention: This project completely avoids the game executable. We do not bypass, strip, or defeat any Digital Rights Management (DRM) or software encryption. We solely parse static data files (like .rim, .bif, and .mdl files) for the pure purpose of compatibility.
  • No Pirated Assets: This repository does not contain, distribute, or host any copyrighted game assets (art, sound, proprietary code, or binaries) owned by the original rights holders. You must supply your own legally obtained copy of the game to do anything useful with this software.

This project operates under the specific “Interoperability” exceptions provided by copyright law in major jurisdictions:

🇨🇦 Canada (Jurisdiction of Maintainer)

Under the Copyright Act (R.S.C., 1985, c. C-42), this project relies on Section 30.61, which permits the reproduction of a computer program for the purpose of:

  • (a) obtaining information that is necessary to allow the computer program to be compatible with another computer program; or
  • (b) correcting errors in the computer program.

🇺🇸 United States

Under the Digital Millennium Copyright Act (DMCA), this project operates under the 17 U.S.C. § 1201(f) exception for Reverse Engineering, which states:

  • (1) … a person who has lawfully obtained the right to use a copy of a computer program may circumvent a technological measure… for the sole purpose of identifying and analyzing those elements of the program that are necessary to achieve interoperability of an independently created computer program with other programs…

🇪🇺 European Union (Host Jurisdiction - Codeberg)

Under Directive 2009/24/EC (Legal Protection of Computer Programs), this project adheres to Article 6 (Decompilation), which allows for the reproduction of code and translation of its form when:

  • (a) these acts are performed by the licensee or by another person having a right to use a copy of a program…
  • (b) the information necessary to achieve interoperability has not previously been readily available…
  • (c) these acts are confined to the parts of the original program which are necessary to achieve interoperability.

Acknowledgements

Portions of the initial file format logic were originally derived from research by the awesome PyKotor project (licensed under LGPL-3.0-or-later) and verified against original game binaries using clean-room reverse engineering techniques (via Ghidra and ret-sync).

  • This project is open-source and licensed under GPL-3.0-or-later.
  • Star Wars: Knights of the Old Republic is a trademark of its respective owners. This passion project is not affiliated with, endorsed by, or connected to Bioware, LucasArts, or Disney in any way.

Format Implementation Reference

This launchpad tracks the implementation status of KotOR file formats across our parsing libraries (rakata-formats) and our strongly-typed wrappers (rakata-generics).

Status Legend:

  • Full: Binary reader/writer implemented with roundtrip tests.
  • Generics: Strongly-typed wrappers and linting schemas implemented.
  • Partial: Basic parsing support, advanced features deferred.
  • Canonical: Validated against vanilla KotOR (K1) runtime behavior.

Archive Formats

FormatStatusNotes
BIFFullSupports variable/fixed tables. Deterministic 4-byte payload alignment. BZF compression feature-gated.
KEYFullFirst-match lookup semantics (native verified). Duplicate key insertions ignored.
ERFFullSupports ERF/MOD/SAV. Optional blank-block emission for MODs is explicit opt-in.
RIMFullSupports V1.0. Offset fallback handled. Tight packing.

GFF & Blueprints

FormatStatusNotes
GFF StructureFullCore binary parity for structs/lists/fields. Localized strings supported. Stable list ordering.
GenericsGenerics13 typed blueprints completed: ARE, DLG, GIT, IFO, UTC, UTD, UTE, UTI, UTM, UTP, UTS, UTT, UTW. Tied into rakata-lint.

3D Models & Walkmeshes

FormatStatusNotes
MDL/MDXFullBinary reader/writer with full geometry, node hierarchy, controllers, and MDX vertex data. ASCII reader/writer for modder interop. In-game verified.
BWM / WOKFullV1.0 binary tables (vertices, faces, materials, etc.). Strict bounds validation.

Texture Formats

FormatStatusNotes
TPCFullContainer header/payload/footer. Canonical pixel-type mapping (DXT5 for type 4). Mip payload sizing matches native right-shift.
DDSFullSupports standard D3D headers and K1-specific CResDDS prefix (20-byte metadata).
TGAFullReader normalizes to RGBA8888. Canonical mode rejects grayscale RLE. Lossless passthrough when source pixels are unmodified.
TXIFullASCII format. Case-insensitive command tokens (native verified). Coordinate block support.

Text & Data Formats

FormatStatusNotes
2DAFullBinary V2.b.
TLKFullStrict language-aware decode/encode. Validated against test.tlk.
VISFullASCII format. Case-insensitive room normalization. Deterministic ordering.
LYTFullASCII format. Strict Windows-1252 text handling. Count-driven parsing.
LTRFullV1.0 headers. 28-char probability tables.

Audio Formats

FormatStatusNotes
WAVFullStandard RIFF + KotOR SFX/VO obfuscation wrappers. MP3-in-WAV unwrapping support.
LIPFullV1.0 header + keyframes. Deterministic writer.
SSFFullV1.1 header + 28-slot sound table.

Missing / Deferred Formats

These formats are currently unimplemented or do not yet have strongly-typed wrappers in rakata-generics.

FormatStatusNotes
NCS / NSSDeferredNWScript Source and Compiled bytecode. NCS decompilation is slated for future work via an independent pipeline.
GUIDeferredGraphical User Interface layout blueprints (GFF).
JRLDeferredJournal and quest tracking blueprints (GFF).
FACDeferredFaction mappings and reputations (GFF).
PTHDeferredPathfinding graphs and navigation waypoints (GFF).
ITPDeferredItem Palette definitions (GFF).
BIKDeferredBink Video container (proprietary video format). Unlikely to be implemented natively.

Provenance Policy

Because this project seeks to achieve strict interoperability with a two-decade-old engine, mere “correctness” is insufficient. We guarantee canonical behavior.

  • Target: Canonical vanilla Star Wars: Knights of the Old Republic 1 (2003).
  • Engine Audits: We do not guess how the engine behaves. Code is written exclusively from observed engine evidence notes derived from clean-room reverse engineering (via Ghidra/ret-sync). Every implementation choice is documented directly inside that format’s specific page on this site.
  • Verification: Behaviors are locked via deep integration tests against synthetic fixtures. If a parser perfectly round-trips an invalid file but the game engine rejects it, it is treated as a critical bug.

Archive Formats

At the heart of the Odyssey Engine is its virtual file system. Instead of loading tens of thousands of tiny loose files straight from the local disk, the engine efficiently streams them from large, concatenated archive blobs. You can think of these formats as extremely specialized zip files used to store binary models, compiled scripts, textures, and UI data.

Note

KOTOR utilizes a highly strict two-tier architecture. BIF & KEY act as the core foundational registry for all base-game assets (e.g. data/models.bif is mapped using chitin.key as the absolute global lookup index). Meanwhile, ERF & RIM files act as completely independent, self-contained archives used aggressively for loading localized module levels, stateful save games, and community mods.


Implementation Blueprints

FormatNameLayout & Purpose
BIFBinary Information FileMassive binary payload silos containing raw game assets packed end-to-end.
KEYGlobal Index FileMaster lookup table mapping precise file names directly to their internal BIF payload offset block.
ERFEncapsulated Resource FileExtremely versatile package format utilized heavily for modules (.mod), stateful save games (.sav), and generic archives (.erf).
RIMResource ImageStripped-down, fast-loading, highly compact localized module containers (often used to split up geometry models vs dynamic entity layouts).

BIF (Binary Information File)

BIFs are essentially giant, uncompressed data silos. Because they act as the raw storage tier of the KOTOR engine, they don’t waste bytes on complex metadata or internal filenames – they are simply pure, tightly packed continuous byte arrays for game resources. They are designed to be randomly accessed extremely quickly at runtime strictly via their companion KEY index file.

At a Glance

PropertyValue
Extension(s).bif
Magic SignaturesBIFF (version V1 )
TypeArchive Blob Payload
Rust ReferenceView rakata_formats::Bif in Rustdocs

Data Model Structure

The rakata-formats crate handles raw Bif parsing for you by reading the internal offset tables. However, developers very rarely interact with a raw Bif file on its own.

  • Unified Access: Typically, you’ll use the KeyFile API (rakata_extract::keyfile::KeyFile), which automatically ties .key index files to their .bif data payloads so you don’t have to map them yourself.
  • Seek Performance: To prevent loading 100MB+ binary files completely into memory just to read a tiny script, Rakata jumps directly to the exact file coordinate on your hard drive (via KeyFile::read_resource_by_seek), extracting only the single resource you specifically asked for!

Tip

The compressed BZF BIF variant did not exist in the original 2003 PC version of the game. It was added much later by Aspyr for their modern iOS, Android, and Nintendo Switch ports simply to save storage space on mobile devices. While our parser can read the BZF layout, it falls slightly outside our core focus on the original PC version and hasn’t been heavily tested against real mobile game files yet.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for .bif archive headers mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoResFile::LoadHeader (0x0040d910) and CExoResFile::ReadResource (0x0040da20).)

Archive Initialization (CExoResFile::LoadHeader)

Mapped from 0x0040d910.

Pipeline EventEngine Behavior & Result
Signature CheckThe engine strictly validates both the BIFF magic and the exact V1 version. It does not actively process any files that deviate from this signature pair.
Variable Table LoadingThe system extracts the variable_count value from the header and physically reads variable_count * 16 bytes from the variable_table_offset to map the resource keys.
Fixed Table BypassThe fixed_count header scalar is entirely decorative. It is not part of the active runtime read path (files with nonzero values are accepted but never mapped).
Direct Asset ExtractionWhen reading a physical asset out of the .bif, the engine isolates the entry_index using (resource_id & 0x3fff) * 0x10. It then calls a direct C fseek(SEEK_SET) strictly matching the raw data_offset extracted from the 16-byte variable table entry. No alignment or structural normalization is applied—the data is dumped entirely blindly.

Caution

Because the engine passes the internal data_offset integer directly into a raw C fseek(SEEK_SET), any custom BIF files must meticulously guarantee byte-perfect offset tables. If the offset is even slightly misaligned, the engine will read garbage data into the stream, inevitably crashing the game.

KEY (Global Index)

Think of the KEY file as the absolute master table of contents governing the entire game directory. Because uncompressed BIF archives are completely blind payloads that contain no internal filenames, the KEY file acts as the singular, authoritative index that tells the engine exactly which BIF holds which file, and precisely where to seek inside that BIF to find it.

At a Glance

PropertyValue
Extension(s).key
Magic SignaturesKEY (version V1 )
TypeArchive Global Index
Rust ReferenceView rakata_formats::Key in Rustdocs

Data Model Structure

The rakata-formats crate evaluates the .key file as the holy grail mapping for global engine initialization.

  • Indices Hierarchy: Internally, the format houses an array of bif_entries bounding archive paths and sizes, alongside a massive array of KeyResourceEntry structures fusing a standard ResRef string and a format TypeCode to a bit-packed numeric ResourceId.
  • Conflict Resolution: Because the game engine relies on a strict override hierarchy, multiple KEYs might accidentally declare the same resource! When constructing the active dictionary out of a KEY file (KeyFile::build_key_resource_index), Rakata explicitly utilizes or_insert() to strictly ensure only the first defined entry for a conflict is honored, perfectly mimicking the engine’s aggressive linear-scan precedence rules.

Engine Audits & Decompilation

The following information documents the KOTOR engine’s exact load sequence and field constraints for genuine .key files. All behavior was mapped natively from swkotor.exe during clean-room reverse engineering.

Key Table Registration (CExoKeyTable::AddKeyTableContents)

Mapped from 0x0040fb80.

ActionEngine Behavior
Signature CheckValidates exactly for the KEY magic and the explicit V1 version signature.
Version BranchingThere is absolutely zero logic handling any speculative V1.1 version branch in vanilla K1. It is currently unknown if a V1.1 KEY format actually exists in the wild, but the engine certainly wouldn’t load it.
Payload MappingExtrapolates the file location natively by tearing apart the ResourceId bitmask to locate both the target BIF file index and the internal struct array offset.

Note

The engine handles KEY table loading extremely early in the application lifecycle during CExoBase::InitObject. If a global KEY fails to mount due to malformed headers, the engine immediately aborts execution.

ERF (Encapsulated Resource File)

ERFs are the heavy lifters for standard game modules (.mod) and save game architectures (.sav). Unlike BIFs, which rely entirely on an external KEY file to resolve their resource identities, ERFs are completely self-contained entities that carry their own internal file tables, localized descriptions, and asset payloads.

At a Glance

PropertyValue
Extension(s).erf, .mod, .hak, .sav
Magic SignaturesERF , MOD , HAK , SAV (version V1.0)
TypeSelf-Contained Archive
Rust ReferenceView rakata_formats::Erf in Rustdocs

Data Model Structure

Because ERF files share the exact same structural responsibility as RIM files (acting as self-contained module wrappers), the rakata-extract crate abstracts both ERF and RIM parsing directly behind the unified Capsule struct.

  • Capsule Generalization: Standard module extraction relies entirely on calling rakata_extract::Capsule::read_from_bytes(). This actively probes and dynamically mounts either ERF or RIM boundaries identically in memory, completely hiding the underlying structural container differences from the developer API.

Engine Audits & Decompilation

The following information documents the KOTOR engine’s exact load sequence and field requirements for genuine .erf capsule variants. All behavior was mapped natively from swkotor.exe during clean-room reverse engineering.

Capsule Header Initialization (CExoEncapsulatedFile::LoadHeader)

Mapped from 0x0040e1f0.

ActionEngine Behavior
Signature CheckExplicitly validates the header against exactly matching ERF , MOD , or HAK signatures, paired with the mandatory V1.0 version string.
Unchecked SavesThe engine completely lacks a validation branch for .sav files. If a file is loaded as a Save Game (param flag 1), the engine falls through the validation tree and explicitly mandates the file use the MOD magic string natively. An ERF file with SAV magic will physically crash or reject here!
Header TruncationThe loader explicitly pulls the entire 160-byte header into scope (CExoFile::Read(..., 0xa0)), but only evaluates offsets 0x00 through 0x1C. Offset 0x18 (Key List) and anything beyond 0x1C is entirely ignored during initialization.

Tip

The 116-Byte “Dead Zone” The giant block of bytes stretching from physical offsets 0x2C down to 0xA0 inside the 160-byte header is formally loaded into the engine’s active memory stack… and then completely discarded immediately. It is totally inert data containing old Bioware build metadata.

RIM (Resource Image)

RIM files operate as a radically leaner alternative to ERFs. They are used exclusively by the game engine for distributing absolutely essential or lightweight modules without the hefty structural metadata overhead of an ERF file. They provide rapid, self-contained loading for core engine environments.

At a Glance

PropertyValue
Extension(s).rim
Magic SignaturesRIM (version V1.0)
TypeLightweight Archive
Rust ReferenceView rakata_formats::Rim in Rustdocs

Data Model Structure

Because RIM files act as a lightweight twin to the ERF format, the rakata-extract crate extracts them identically.

  • Capsule Generalization: Standard module extraction relies entirely on calling rakata_extract::Capsule::read_from_bytes(). The developer API makes absolutely no programmatic distinction between querying an ERF module or a RIM module—it behaves perfectly seamlessly either way.

Engine Audits & Decompilation

The following information documents the KOTOR engine’s exact load sequence and field requirements for genuine .rim capsule variants. All behavior was mapped natively from swkotor.exe during clean-room reverse engineering.

Resource Image Overrides (CExoKeyTable::AddResourceImageContents)

Mapped from 0x0040f990.

ActionEngine Behavior
Signature CheckExplicitly validates the exact RIM magic and the V1.0 version string implicitly upon loading.
Header EvaluationThe engine physically reads the entry_count (offset 0x0C) and the keys_offset (offset 0x10) from the header to explicitly navigate the file structures.

Tip

The 96-Byte “Dead Zone” Exactly like the ERF dead zone, RIM files feature a massive 96 bytes of completely inert padding sitting physically between offsets 0x18 and 0x77 inside the 120-byte header. The engine blindly sweeps right past it during initialization. It is perfectly safe to zero out this region when generating new synthetic fixtures.

GFF (Generic File Format)

The Generic File Format (GFF) is BioWare’s core binary serialization format, functioning like a binary JSON object or XML tree. It holds arbitrarily nested structures, typed fields, and lists, powering UI layouts, character sheets, dialogues, and area descriptions.

At a Glance

PropertyValue
Extension(s).gff, .utc, .uti, .utp, .ute, .utd, .dlg, .are, .ifo, etc.
Magic SignatureTarget type (e.g. UTC ) / V3.2
TypeGeneric Hierarchical Data
Rust ReferenceView rakata_formats::Gff in Rustdocs

Data Model Structure

The rakata-formats crate gracefully abstracts the GFF struct/field/list indexing graph into a user-friendly memory model (rakata_formats::Gff).

  • Typestate Wrapping: GFF natively supports discrete types (e.g., BYTE, SHORT, VOID, STRUCT, LIST). rakata_formats::GffValue encapsulates these identically, shielding developers from raw byte layouts and indirect arrays.
  • Data Deduplication: Unlike standard web JSON, GFF binaries limit all field labels to 16 characters and deduplicate them via a contiguous LabelTable. The rakata-formats implementation mimics this memory layout exactly during serialization, guaranteeing structurally deterministic binaries natively acceptable by the engine!

Engine Audits & Decompilation

Binary: swkotor.exe

Serialization Architecture (WriteGFFFile)

Derived from 0x00413030 / 0x004113d0.

The engine allocates the output buffer entirely in-memory and serializes exactly 7 contiguous sections in an absolutely strict order. No inter-section padding or reserved alignment bytes are inserted anywhere natively. Each section’s byte-offset is dynamically snapshotted into the 56-byte header, operating as the canonical write path utilized for save games and area extraction.

Phasing OrderSection ComponentMemory Footprint / Quirk
Phase 1Root HeaderExactly 56 bytes (0x38).
Phase 2Struct Array12B × struct_count
Phase 3Field Array12B × field_count
Phase 4Label Array16B × label_count
Phase 5Field Data BlobArbitrary bounds constraint.
Phase 6Field IndicesDynamic array bounds.
Phase 7List IndicesDynamic array bounds.

Warning

Because BioWare enforces fixed 16-byte elements inside the Label arrays, any label that exceeds 16 characters is strictly truncated by the engine array bounds.


Engine Blueprints: Specialized GFF Containers

While the gff.md reference explains the layout of raw GFF nodes, the engine frequently uses GFF as a structural wrapper to serialize completely deterministic entities known as Blueprints. These blueprints operate as the strict layouts defining creatures, dialogue trees, placeables, and area parameters.

Because rakata-lint provides deep behavioral validation over these blueprints natively, we have comprehensively audited how the K1 GOG executable (swkotor.exe) maps these layouts into active memory via its Load*FromGFF functions.

The Blueprint Engine Audits

The audits listed in this section’s navigation bar are formal, decompilation-backed blueprints cataloging KOTOR’s physical constraints. They document the exact fields, load phrasing, and engine rule evaluations that supersede any generic structural validity.

If a field exists in GFF but breaks the engine, our Linter rules will flag it using these documentation audits as the source of truth.

ExtTypeCore Function
.areArea Static BlueprintDefines overarching static world properties (weather, day/night limits, physics constraints).
.dlgDialogueEncapsulates the conversation graph, branching logic, and cinematic execution sequences.
.gitGame Instance TemplateThe physical object manifest. Orchestrates exact placement, vector orientations, and template spawning.
.ifoModule InfoRoot environment metadata bridging modules together and orchestrating spawn states.
.utcCreatureInstantiates NPCs, stat-blocks, and character body configurations.
.utdDoorConfigures transitions, linked bounds, and structural barriers.
.uteEncounterOrchestrates dynamic boundary triggers and valid enemy spawning constraints.
.utiItemUnifies structural stats across weapons, armors, and consumables.
.utmStoreLimits merchant arrays and details markup/markdown behaviors.
.utpPlaceableStandardizes interactive storage boxes, unusable statues, and deployable traps.
.utsSoundConfigures local dynamic audio emitters and distance volume calculations.
.uttTriggerPlots physical interactive polygons tracking spatial events.
.utwWaypointAnchors spatial float positions for navigation grids and area transitions.

ARE Format (Area Static Blueprint)

The Area (.are) blueprint format operates as the static environmental foundation of any game module. It establishes the rigid, overarching properties of a level, orchestrating the terrain’s grass rendering definitions, dynamic sunlight and fog constraints, ambient audio scale, and the primary interior/exterior state configurations. It effectively constructs the structural ‘stage’ that dynamic entities (like creatures and doors) populate later on.

At a Glance

PropertyValue
Extension(s).are
Magic SignatureARE / V3.2
TypeArea Static Blueprint
Rust ReferenceView rakata_generics::Are in Rustdocs

Data Model Structure

Rakata maps the Area definition directly into the rakata_generics::Are struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for .are files mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSArea::LoadArea at 0x0050e190.)

The initial LoadArea dispatch branches out to parse the .are GFF, .lyt layout, .git instance tracking, and .pth bounds. The engine processes roughly 61 scalar fields, 4 scripts, 3 lists, and a nested minigame struct natively within the LoadAreaHeader subroutine.

Core Environmental Identity

Field CategoryEngine Property & Behavioral Quirk
IdentityName (LocString), Comments (String), ID (Int) -> Standard definition strings.
IdentityTag (String) -> Lowercased on load (via CExoString::LowerCase). The only tag to behave this way!
ScriptsOnHeartbeat, OnUserDefined, ... -> CResRef script payloads.
State FlagsFlags (DWord) -> Bit 0 explicitly marks an Interior environment.
State FlagsRestrictMode (Byte) -> Hardcoded Event: Changing this to a non-zero value during gameplay forces CSWPartyTable::UnstealthParty.

Note

Internal Weather Truncation If Flags (Bit 0) marks the area as an interior space, the engine zeros out all weather properties upon load, actively discarding any prior weather assignments.

Weather & Terrain Generation

FieldTypeEngine Evaluation
ChanceFogINTStored persistently as an integer.
ChanceRain, ChanceSnow, ChanceLightning, WindPowerINTWarning: The engine explicitly truncates these INT properties to 8-bit bytes at runtime. Values over 255 silently wrap around.
Grass_TexNameResRefIf empty or invalid, the engine forces a hard fallback to "grass".
AlphaTestFLOATDefaults to 0.2 (older tools commonly assume 0.0).

Area Lighting & Sun/Moon Tracking

KOTOR handles dynamic sunlight constraints separately between Sun and Moon.

Property GroupsTypeEngine Evaluation
Fog Ranges (MoonFogNear/Far, SunFogNear/Far)FLOATDefaults to an immense distance of 10000.0. The engine aggressively clamps values to be ≥0.0.
Tints (*AmbientColor, *DiffuseColor, *FogColor)DWORDProcessed seamlessly as standard DWORD color masks.
Environment Shadows (ShadowOpacity, *Shadows)BYTEBasic toggles and opacities orchestrating render limits.

Map Transitions & Saving states

Feature CategoryEngine Evaluation & Triggers
Minimap LogicGeographic vectors (MapResX, spatial coordinate structs like WorldPt1X) are only loaded if an actual Minimap TGA/TPC asset matching the level name exists on disk!
Parsing TypeIf read, the engine parses MapPt along a dual-path logic checking if it is formally a FLOAT or INT type.
Zoom BiasArea maps evaluate MapZoom to a default scaling scalar of 1, not 0!
Stealth Save-StatesThe stealth framework leverages the .are struct to snapshot .StealthXPMax and .StealthXPCurrent directly as DWORDs when parsing the layout.

The Minigame Struct

Read via CSWMiniGame::Load (0x006723d0). If a minigame context triggers, the .are reads the nested Type (DWORD mapping 1=Swoop, 2=Turret). It injects highly specialized float properties modifying basic terrain speeds:

FieldInjection Default / Constraint
LateralAccelDefaults safely to 60.0.
MovementPerSecScales to 6.0 (Swoops), 90.0 (Turrets), or 0.0 otherwise!
Bump_PlaneBounds are heavily clamped to 0..3.
Nested ArraysThe struct natively requires sub-struct Player arrays (Models, Camera, Axes) and Enemy/Obstacles lists to operate properly.

Rakata Linter Rules

The core priority of rakata-lint is shielding users from fields that look valid in older editors but fail in the K1 engine. E.g., there are 19 distinct fields generated by standard modding tools (like DisableTransit, KOTOR 2 ForceRating, etc) that are completely evaluated as dead data by the vanilla engine.

(Seven crucial engine-read fields were previously obfuscated by strict model bindings, but now pass through source_root validation.)

Lint Diagnostics Implemented:

  1. Weather Truncation: Identifies Rain/Snow/Lightning chance above 255 before they wrap around as bytes.
  2. Context Discards: Flags interior environments that contain weather parameters the engine will inevitably zero out.
  3. Index Fallbacks: Informs the developer that an empty Grass_TexName operates identically as "grass".
  4. Behavioral Flags: Warns that area Tag edits are natively lowercased during instantiation.

DLG Format (Dialogue Blueprint)

Description: The Dialogue (.dlg) format is the beating heart of KOTOR’s storytelling. It acts as the master “script” for every conversation, cutscene, and cinematic sequence. Rather than just holding localized text, it acts as a branching storyboard that tells the engine exactly what the characters should say in audio, what animations they should perform, which camera angles to use, and when to fire off scripts that impact the plot.

At a Glance

PropertyValue
Extension(s).dlg
Magic SignatureDLG / V3.2
TypeDialogue Blueprint
Rust ReferenceView rakata_generics::Dlg in Rustdocs

Data Model Structure

Rakata parses the raw GFF structure into the rakata_generics::Dlg struct.

  • Typestate Extraction: By extracting the loosely-typed GFF binary into a strict Rust struct, Rakata replaces unsafe dynamic string queries with compile-time guaranteed data types (such as DlgAnimation and DlgCamera models).
  • (Note: rakata-lint does not currently implement behavioral validation for .dlg formats.)

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for .dlg files mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSDialog::LoadDialog (0x005a2ae0), cascading through LoadDialogBase (0x005a11c0) and LoadDialogCamera (0x005a1ab0).)

The LoadDialog subroutine processes the root-level conversation configuration before iterating over the heavily nested EntryList and ReplyList. For each of those conversational nodes, it delegates parsing to LoadDialogBase (for text and scripts) and LoadDialogCamera (for viewport directions).

Additionally, StartingList provides the dialogue entry points, while the StuntList associates cutscene actor models.

Root Conversation Configuration

Field CategoryEngine Property & TypeNotable Default or Behavioral Quirk
Identity & RulesCameraModel (ResRef), DelayEntry/Reply (DWord)Standard execution behaviors.
Identity & RulesSkippable (Byte)Defaults to 1 (True).
Logic HooksEndConversation, EndConverAbort (ResRefs)Fire when the dialogue terminates abruptly or via conclusion.
Hardware InterfacingConversationType (Int)0 = Cinematic, 1 = Computer, 2 = Special. Cinematic explicitly unstealths the party.
Hardware InterfacingComputerType (Byte)Only evaluated if ConversationType is 1. Otherwise, completely dead data.
Equipment & ActionsUnequipItems, UnequipHItem, AnimatedCutAnimatedCut forces a global unpauseable state within the client if non-zero.

Shared Dialogue Node Properties (LoadDialogBase)

These fields apply to both entries (NPC spoken) and replies (Player spoken), and are parsed via LoadDialogBase.

FieldTypeEngine Evaluation
TextLocStringThe spoken localized string.
Script, Speaker, QuestStrings/ResRefsStandard execution scripts and entity mapping.
Sound, VO_ResRefResRefSound Fallback: If Sound fails to execute, the engine will attempt to play VO_ResRef. If both fail, the bitmask SoundExists is forcibly downgraded to 0.
DelayDWordDelay Special Case: If value is 0xFFFFFFFF, the engine explicitly reads from the root DelayEntry/DelayReply field instead and modulates WaitFlags!
FadeTypeByteDetermines the FadeDelay and FadeLength. If set to 0 or missing, all fade configurations are zeroed inherently.

Viewport Framing (LoadDialogCamera)

FieldTypeEngine Evaluation
CameraIDINTDependent Field: Only permitted when CameraAngle = 6 (Placeable Camera). Otherwise, the engine forces the ID to -1 regardless of the static binary value.
CamFieldOfViewFLOATAggressively validated. If the property is entirely missing or is explicitly negative, the engine forces the perspective to -1.0.
CamHeightOffset, TarHeightOffsetFLOATStandard float deltas.

Relational Data Trees

Dialogues operate as highly interconnected link-lists.

  • Entry -> Reply Links (RepliesList within an Entry Node): Maps the Index (DWORD) to the overarching .ReplyList bounds. Unique in that it exclusively parses the DisplayInactive Byte.
  • Reply -> Entry Links (EntriesList within a Reply Node): Maps the Index to the .EntryList bounds.
  • Start Indices (StartingList): Uses the exact same linkage schema as a Reply->Entry link. Validates Index against entry_count.

Warning

Corrupted Link Constraints Index paths are strictly evaluated against the internal array bounds prior to traversing. If a node tries to link out of bounds, it immediately triggers a fatal Load Failure within the engine.

Ancillary Configuration Lists

  • AnimList: Defines custom Participant models and their accompanying Animation (WORD) action index to loop.
  • StuntList: Dictates which StuntModel should proxy standard rendering behavior for a given Participant.

Proposed Linter Rules

The rakata-lint dialogue ruleset has not been formally implemented yet. However, the following diagnostics are heavily recommended to combat engine failure domains directly derived from these decompilation audits:

  1. Camera Angle Compliance: Detect if CameraID holds a value while CameraAngle is anything other than 6, warning that the data is ignored by the engine.
  2. Conversation Type Mismatch: Warn if a ComputerType sub-property is set, but the parent ConversationType is not explicitly flagged to 1 (Computer Dialog).
  3. Ghost Delay Flags: Warn when an entry delay is maxed (0xFFFFFFFF), but execution triggers evaluate to an instantaneous termination sequence (Warning on Sound invalidation).
  4. Fatal Bounds Checking: Statically trace every Index parameter in node link-lists to ensure they never exceed array bounds and cause an engine hard-stop.
  5. Context Zeroing: Inform the developer if fade delays are configured, but the parent FadeType is 0, causing the engine to discard the timings.

GIT Format (Game Instance Template)

Description: The Game Instance Template (.git) orchestrates the exact placement of every single entity within an environment. If the .are file is the underlying “stage”, the .git file acts as the blueprint for its “actors”–defining exactly where creatures initially spawn, where placeables sit, the physical rotation of doors, and the bounds of any active sound emitters.

At a Glance

PropertyValue
Extension(s).git
Magic SignatureGIT / V3.2
TypeInstance Blueprint
Rust ReferenceView rakata_generics::Git in Rustdocs

Data Model Structure

Rakata parses the raw GFF structure into the rakata_generics::Git struct.

  • Typestate Extraction: By extracting the loosely-typed GFF binary into a strict Rust struct, Rakata inherently standardizes all 13 object sub-lists, creating deterministic representations of GitCreature, GitDoor, GitPlaceable, etc.
  • (Note: rakata-lint does not currently implement behavioral validation for .git formats.)

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for .git files mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSArea::LoadGIT at 0x0050dd80.)

The LoadGIT subroutine is a massive dispatcher. It evaluates 3 immediate root scalars before handing off evaluation to 13 distinct object-list loaders mapping entities. Crucially, the flag UseTemplates dominates this process by dictating whether these lists refer to external files or contain fully inline entity data.

Root Behavior Properties

FieldTypeEngine Evaluation
UseTemplatesBYTEControls whether object arrays read TemplateResRef to construct entities, or fall back to inline evaluation.
CurrentWeatherBYTEStandard BYTE. Zeroed to 0xFF on Interior Areas.
WeatherStartedBYTEStandard BYTE. Zeroed to 0 on Interior Areas.

(The engine validates weather fields against the .are properties immediately during load).

Field Naming Inconsistencies

Due to legacy asset sprawl, the engine evaluates vectors explicitly according to vastly different naming conventions depending entirely on the entity class. This is hardcoded into swkotor.exe.

Target ListsPosition ParadigmOrientation Paradigm
Creatures, Items, Waypoints, StoresXPosition, YPosition, ZPositionXOrientation, YOrientation, ZOrientation
Doors, PlaceablesX, Y, ZBearing (Float angle)
Area EffectsPositionX, PositionY, PositionZOrientationX, OrientationY, OrientationZ

Warning

Orientation Normalization The engine strictly evaluates 3D orientation logic. If a normalized orientation vector (like in StoreList or AreaEffectList) inadvertently resolves to 0.0 unconditionally, the engine catches the math fault and applies a hard fallback vector to (0, 1, 0).

Standard Instance Arrays

Standard loaders evaluate the generic ObjectId, process the localized position/orientation floats, and dispatch behavior mapping logic.

List NameStruct TargetEngine Triggers & Fallbacks
Creature ListLoadCreaturesPositions are explicitly validated defensively through ComputeSafeLocation bounds.
Door ListLoadDoorsSave states trigger LoadObjectState. External templates dynamically route to LoadDoorExternal.
WaypointListLoadWaypointsCompletely ignores UseTemplates–it solely relies on inline data! Z-height is shifted dynamically via ComputeHeight.
TriggerListLoadTriggersGeometry properties reuse native UTT formatting. Contains unique linkage arrays: LinkedToModule, TransitionDestination, LinkedTo.

Specialized Struct Parsings

Engine Dispatch TargetDescription & Findings
LoadSounds (0x00505560)Discard logic: Translates GeneratedType via DWord, but physically truncates it to an 8-bit byte on save, silently discarding the upper 24 bits!
LoadEncounters (0x00505060)Highly nested structural array reusing both Geometry and SpawnPointList formats natively built for UTE boundaries.
LoadPlaceableCameras (0x00505eb0)Client-side only struct that reads composite GFF spatial types correctly natively! Camera Limit: If it hits 51 camera entries, the loader formally rejects it.
“List” (Items) (0x00504de0)Bizarrely, the generic parent entity list List is used specifically to orchestrate Item instances!

Singular Structs

  • AreaProperties: Orchestrates stealth behavior state tracking and dynamic audio states. It physically reads AmbientSndDayVol / AmbientSndNitVol and explicitly truncates their INT declarations into a single native runtime byte value.
  • AreaMap: Strict binary blobs evaluating rendering properties (AreaMapData). It is absolutely bypassed during fresh loads, only executed conditionally during save-game states.

Proposed Linter Rules (Rakata-Lint)

The rakata-lint engine hasn’t implemented git.rs validations yet. However, the exact engine behaviors discovered during decompilation dictate these static constraints:

  1. Weather Zeroing: If CurrentWeather or WeatherStarted are configured on an area interior, the engine forcibly zeroes them immediately on load.
  2. Camera Array Bounds: If a CameraList contains 51 or more entries, it triggers an immediate engine-level loader failure.
  3. Stealth Clamping Constraint: The engine triggers hard integer clamping on StealthXPCurrent against the StealthXPMax bounds thresholds during evaluation.
  4. Volume Sub-Type Truncation: If AmbientSndDayVol or GeneratedType exceed 255, the engine natively wraps the integer into an 8-bit byte value, resulting in immediate data wrapping/corruption.

IFO Format (Module Info Blueprint)

Description: The Module Info (.ifo) is the absolute root metadata file for any environment. It dictates global module behavior, handling everything from the starting spawn location, to the local calendar and time-of-day progression, to script execution for global module events.

At a Glance

PropertyValue
Extension(s).ifo
Magic SignatureIFO / V3.2
TypeModule Blueprint
Rust ReferenceView rakata_generics::Ifo in Rustdocs

Data Model Structure

Rakata parses the raw GFF structure into the rakata_generics::Ifo struct.

  • (Note: rakata-lint does not currently implement behavioral validation for .ifo formats.)

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for .ifo files mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSModule::LoadModuleStart at 0x004c9050.)

Global State Configurations

FieldTypeEngine Evaluation
Mod_Entry_AreaResRefThe primary spawning area ResRef.
Mod_Entry_X / Mod_Entry_Y / Mod_Entry_ZFLOATExact spawning XYZ coordinates.
Mod_Entry_Dir_X / Mod_Entry_Dir_YFLOATEntry Direction Fallback: If the engine cannot evaluate Mod_Entry_Dir_Y, it forces a hard graphical fallback rendering the entity facing east (X=1.0, Y=0.0).
Mod_XPScaleBYTEGlobals XP multiplier scale. Defaults natively to 10.

Time & Cycle Management

FieldTypeDescription
Mod_DawnHourBYTEDawn hour integer marker.
Mod_DuskHourBYTEDusk hour integer marker.
Mod_MinPerHourBYTEConfiguration for exactly how many real-time active gameplay minutes constitute a module hour limit.

Note

Day/Night Cycle Computations The engine continuously computes localized day/night phases explicitly against Mod_DawnHour, Mod_DuskHour, and the current_hour. This dynamically updates an internal state flag denoting: 1=Day, 2=Night, 3=Dawn, 4=Dusk.

Global Event Scripts

Event scripts are universally evaluated as string ResRef pointers executing compiled NSS logic. The engine evaluates 15 separate global events (like Mod_OnHeartbeat, Mod_OnModLoad, Mod_OnClientEntr, Mod_OnPlrDeath, etc).

  • Asymmetric I/O (Equipping): The Mod_OnEquipItem array natively loads during absolute module startup bounds (LoadModuleStart), however, it is entirely omitted and ignored during the save-game serialization cycle (SaveModuleIFOStart).

Safe-State Injection (Save Games Only)

Certain blocks of data inside the .ifo are deliberately evaluated only when the engine is mounting a module directly from a loaded .sav archive block.

Engine TargetDescription
Player / Mod VariablesStructures like Mod_PlayerList, Mod_Tokens, VarTable, and the EventQueue are strictly bypassed unless natively evaluated under is_save_game conditions.
Area OverridesThe Mod_Area_list technically supports arrays (for NWN legacy), but KOTOR strictly enforces a single active area boundary. The secondary ObjectId within this specific array is only ever read natively inside a save state flow.
Legacy Hak De-sync“Hak Packs” are custom override archives natively used in Neverwinter Nights (the engine’s predecessor). While KOTOR’s save routine (SaveModuleIFOStart) blindly writes a Mod_Hak string into save-games as leftover legacy behavior, the actual load cycle (LoadModuleStart) completely ignores it. Modders cannot use this field to hook custom archives.

Proposed Linter Rules (Rakata-Lint)

While rakata-lint does not currently implement .ifo validation, the exact engine behaviors discovered during decompilation dictate these static constraints:

  1. Direction Fallback: If Mod_Entry_Dir_X and Mod_Entry_Dir_Y both evaluate unconditionally to 0.0, the engine forces an unrecorded fallback direction locking the player spawn sequence toward (1.0, 0.0).
  2. XP Dead-Scaling: Since the Mod_XPScale default value evaluates to 10, any unexpected baseline of 0 aggressively halts all localized XP acquisition flows.
  3. Eternal Day/Night Bounds: If Mod_DawnHour strictly equals Mod_DuskHour, the module becomes hopelessly locked into perpetual daylight configurations.
  4. Void Area Initialization: An empty Mod_Area_list array directly faults the load cycle, as the module has no physical payload layout to inject the player into.
  5. Dangling NWM Structure: Setting Mod_IsNWMFile to 1 without deploying the conditionally mandatory Mod_NWMResName evaluates to an unstable execution state.

UTC Format (Creature Blueprint)

Description: The Creature (.utc) blueprint format defines the attributes, stats, and behavior of all in-scene NPCs and monsters. It covers a creature’s identity, class/level, appearance, equipment, and event scripts. Because they hold so much state, Creatures are one of the most dynamic and memory-heavy templates processed by the Odyssey Engine.

At a Glance

PropertyValue
Extension(s).utc
Magic SignatureUTC / V3.2
TypeCreature Blueprint
Rust ReferenceView rakata_generics::Utc in Rustdocs

Data Model Structure

Rakata maps the Creature definition directly into the rakata_generics::Utc struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

A Creature breaks down into six main categories:

  1. Core Statistics: The basic stats that define the creature’s physical capabilities (e.g., Strength, Dexterity, base HitPoints).
  2. Identity & Graphics: Identifiers that define who the creature is and what 3D model they use (e.g., Tag, Appearance_Type, Conversation).
  3. Class & Skill Progression: The mechanics that define their level, classes, and skills (e.g., ClassList, SkillList).
  4. Combat Capabilities: The specific feats and Force powers the creature can use (e.g., FeatList, SpellList).
  5. Inventory & Equipment: The exact items the creature spawns with, including both equipped gear and inventory drops (e.g., Equip_ItemList, ItemList).
  6. Event Hooks (Scripts): The behavior scripts that run when the creature reacts to the world, such as taking damage or noticing an enemy (e.g., OnNotice, OnDamaged).
  • State Validation: rakata-lint checks the data against engine constraints to prevent fatal runtime crashes.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for .utc files mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSCreatureStats::ReadStatsFromGff at 0x005afce0.)

Structural Load Phasing

FunctionSizeBehavior
ReadStatsFromGff7835 BThe massive initial pass that parses 57 basic creature scalars including strength, dexterity, and physical appearance.
LoadCreatureSets up how the creature physically sits in the world, handling their stealth states, collision size, and idle animations.
ReadScriptsFromGffAttaches all the custom event scripts that fire when the creature notices an enemy, takes damage, dies, or simply stands around (heartbeat).
ReadItemsFromGffPulls all loot into memory, structuring items specifically into equipped slots, the backpack, or dropping them entirely if a creature spawns dead.
ReadSpellsFromGffSpecifically extracts the list of any Force powers or combat feats the creature is allowed to use.

Note

Zeroed Data Elements Legacy structures referencing Tail and Wings are explicitly hardcoded to 0 during parsing and completely bypassed by the binary loader.

Core Structural Findings

The engine strictly validates parameters when loading a .utc file. Improper formatting will trigger some of KOTOR’s most notorious game crashes.

Warning

Understanding Fatal Crash Codes (0x5fX) When the game engine parses a file and hits an invalid stat, it completely aborts loading. Instead of recovering gracefully, the engine deliberately triggers a fatal crash to your desktop and returns a specific hexadecimal error code (e.g., 0x5f7 or 0x5f4). The rules below track the specific scenarios where the game will crash.

Engine RuleRuntime Behavior
Class LimitsThe engine expects a strict limit of 2 discrete class types. Providing duplicate class configuration completely crashes the game (Engine Error 0x5f7).
Race BoundsThe engine compares Race against the compiled row count of racialtypes.2da. Exceeding this boundary fatally crashes the map loader (Engine Error 0x5f4).
Saves CalculationPre-computed saving throws (SaveWill, SaveFortitude) in the .utc file are completely ignored dead data. The engine overrides them exclusively by reading willbonus and fortbonus.
Perception FaultsA non-PC PerceptionRange initiates a read against appearance.2da for PERCEPTIONDIST. Failing to resolve this distance fails the entire creature load (Engine Error 0x5f5).
Movement FallbacksIf a unique MovementRate isn’t declared, the engine logic falls back directly to default WalkRate parameters.
Hard ClampingThe engine strictly limits specific numeric bounds upon load: Gender is clamped structurally at a maximum of 4, and GoodEvil is fiercely clamped so that it cannot exceed 100.
Appearance ShiftingIf Appearance_Head is 0, the engine overrides it to 1 to prevent rendering bugs.

Legacy & Ignored Data

Finding TypeExplanation
Legacy Engine ArtifactsA staggering 17 .utc fields (such as Morale, SaveWill, BlindSpot, PaletteID) present in older files are actually Neverwinter Nights or KOTOR 2 superset metrics that the K1 engine natively ignores.

Implemented Linter Rules (Rakata-Lint)

These static constraints are targeted for implementation under rakata_lint::rules::utc.

  1. Class Duplications: Checks if a creature is misconfigured with identical core class identifiers (preventing Game Crash Error 0x5f7).
  2. Race Bounds: Asserts the mapped Race identifier exists against the actual row bounds of the compiled racialtypes.2da map (preventing Game Crash Error 0x5f4).
  3. Class Limit: Ensures the creature never exceeds the hard-limit of two defined classes.
  4. Structure Clamping: Flags invalid scalars by actively verifying Gender (max 4) and GoodEvil (max 100) configurations, directly mirroring the binary’s hard clamp logic.
  5. Appearance Correction: Detects unconfigured Appearance_Head fields tracking to 0, predicting the engine’s hard override to 1.
  6. Dead Field Tracking: Validates that legacy or ignored values (like SaveWill and SaveFortitude) aren’t configured, saving payload evaluation cost.

UTD Format (Door Blueprint)

Description: The Door (.utd) blueprint defines interactive pathways on a level map. Beyond acting as physical barriers or transitions between areas, doors house lock mechanics, trap configurations, script hooks, and basic visual states (open, destroyed, jammed).

At a Glance

PropertyValue
Extension(s).utd
Magic SignatureUTD / V3.2
TypeDoor Blueprint
Rust ReferenceView rakata_generics::Utd in Rustdocs

Data Model Structure

Rakata maps the Door definition directly into the rakata_generics::Utd struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

A Door breaks down into four main categories:

  1. Core Identity & Geometry: The configuration for what the door looks like, its faction, and the text displayed when targeted (e.g., Appearance, TemplateResRef, LocName).
  2. Lock & Trap Mechanics: The parameters defining whether it’s locked, what key is needed, and rules for any attached traps (e.g., Locked, KeyName, TrapType, DisarmDC).
  3. Transition Pathways: The linked destination used when a door acts as a loading zone to another area (e.g., LinkedTo, LinkedToFlags).
  4. Behavioral Hooks (Scripts): The scripts that run when a player opens, destroys, or fails to unlock the door (e.g., OnOpen, OnFailToOpen, OnMeleeAttacked).
  • Active Validation: rakata-lint enforces checks against missing keys or invalid transition references before a module ever reaches the game engine.

Engine Audits & Decompilation

The following information documents the engine’s exact load sequence and field requirements for .utd files mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from the primary dispatcher CSWSDoor::LoadDoor at 0x0058a1f0.)

Structural Load Phasing

The engine processes a Door structurally by mapping its sub-fields into distinct operational constraints.

DomainSub-fields EvaluatedPurpose
Scales & State22Reads the physical health, visual appearance, and base traits determining whether the door is locked or indestructible.
Hooks15Attaches custom event scripts that fire when the door is opened, forced, unlocked, or trapped.
Mechanical9Configures the lock difficulty tiers and the specific skill hurdles required to detect and disarm any attached traps.
Transitions4Links the door strictly to another area (.are), turning it into a physical loading screen transition node.

Core Structural Findings

The CSWSDoor parser natively guarantees strict state adjustments upon parsing.

Engine RuleRuntime Behavior
Appearance TruncationThe engine reads Appearance as a 32-bit integer but forcefully truncates it to a single byte ((byte)uVar5). Any ID above 255 automatically wraps to 0 and breaks the physical door model.
Static EnforcementIf the door is marked Static, the engine automatically forces plot = 1. This safely guarantees that static level architecture cannot be destroyed by players.
Portrait FallbacksIf PortraitId is 0, the engine hardcodes it to 0x22E. If it is >= 0xFFFE, the engine ignores the integer and falls back to looking up the string Portrait resref instead.
Trap Hook FallbackIf the OnTrapTriggered script is left empty, set to null, or literally named "default", the engine pulls the default standard script from traps.2da instead.
HP SynchronizationCurrentHP is securely clamped against the door’s maximum HP to prevent overflow bugs.

Legacy & Ignored Data

Finding TypeExplanation
Legacy Engine Artifacts7 explicitly mapped template structures (like AnimationState, NotBlastable, OpenLockDiff) are Neverwinter Nights or KOTOR 2 legacy dependencies inherently ignored by the K1 parser.

Implemented Linter Rules (Rakata-Lint)

These static constraints are targeted for implementation under rakata_lint::rules::utd.

  1. Truncation Faults: (Pending) Flags Appearance values over 255 to prevent the engine from wrapping the 32-bit integer out of bounds.
  2. Static Parity: Asserts that Plot is active if Static is also active.
  3. Invalid Hooks: (Pending) Scans for explicitly empty or "default" OnTrapTriggered references that invoke the traps.2da fallback.
  4. Portrait Anomalies: (Pending) Detects PortraitId mappings equal to 0 or >= 0xFFFE.
  5. HP Bounds: Ensures initialized CurrentHP safely rests at or below the standard HP total.

UTE Format (Encounter Blueprint)

Description: The Encounter (.ute) blueprint defines interactive spawn points and boundary triggers across a level map. Instead of acting merely as a spatial zone, encounters handle complex difficulty scaling, bubble-sort creature limits, and explicit coordinate vertices to dynamically deploy combatants when a player crosses their geometry bounds.

At a Glance

PropertyValue
Extension(s).ute
Magic SignatureUTE / V3.2
TypeEncounter Blueprint
Rust ReferenceView rakata_generics::Ute in Rustdocs

Data Model Structure

Rakata maps the Encounter definition directly into the rakata_generics::Ute struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

An Encounter breaks down into four main categories:

  1. Spawn Population (CreatureList): The list of creature blueprints the encounter can spawn.
  2. Difficulty & Limits: Setting how many creatures spawn at once and how difficult they should be relative to the player (e.g., MaxCreatures, DifficultyIndex).
  3. Trigger Boundaries (Geometry): The coordinates defining the physical tripwire that triggers the spawn.
  4. Behavioral Hooks (Scripts): The scripts that run when a player enters or exits the trigger, or when the spawn pool runs dry (e.g., OnEntered, OnExhausted).
  • Model Validation: rakata-lint checks the data against engine constraints to prevent fatal runtime crashes.

Engine Audits & Decompilation

The following information documents the engine’s exact load sequence and field requirements for .ute files mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from the primary dispatcher CSWSEncounter::LoadEncounter at 0x00593830.)

Structural Load Phasing

The engine processes an Encounter structurally across several chunked subroutines, each responsible for unique spatial and logic bindings.

FunctionSizeBehavior
ReadEncounterFromGff3445 BThe initial pass that sets up the encounter’s identity, difficulty limits, and the spawn list.
ReadEncounterScriptsFromGff567 BAttaches scripts that trigger when players enter, exit, or exhaust the spawn pool.
LoadEncounterSpawnPoints364 BReads the coordinates so the engine knows exactly where to spawn the creatures.
LoadEncounterGeometry651 BReads the coordinates that trace the trigger’s boundaries on the floor.

Core Structural Findings

The engine rigorously evaluates geometric and spatial boundaries. Improper definitions break the spawn mapping algorithm.

Warning

Understanding Fatal Log Drops While minor coordinate math errors usually just cause creatures to spawn inside walls, failing strict geometry constraints causes KOTOR to abruptly abort parsing the Encounter. Specifically, if a .ute file declares it has geometry boundaries but fails to provide the actual coordinate vertices, the engine dumps a fatal error to its trace log and refuses to spawn the encounter at all.

Engine RuleRuntime Behavior
Tag OverridesThe engine forcefully converts any Tag to all-lowercase via CSWSObject::SetTag. Any static casing is lost immediately upon load.
Geometry IntegrityIf Geometry is explicitly defined but has 0 vertices, the engine logs a “has geometry, but no vertices” error and aborts loading the encounter entirely.
Geometry SynthesisIf the Geometry list is completely omitted from the blueprint, the engine falls back and safely synthesizes a default 4-vertex spatial box.
Difficulty ResolutionThe engine prioritizes using DifficultyIndex to look up the difficulty in encdifficulty.2da. The static Difficulty field is ignored unless the 2DA table fails to resolve.
Bubble SortingUpon loading the CreatureList, the engine runs a Bubble Sort algorithm to firmly re-order the encounter’s spawn pool by ascending CR (Challenge Rating), completely overriding any custom static display order.
Area InstantiationAreaList buffer allocation size is strictly dictated by AreaListMaxSize. If the real list exceeds this size, the buffer will silently overrun.

Legacy & Ignored Data

Finding TypeExplanation
Passive Legacy ArtifactsUnused fields left over from older tools or Odyssey branches (e.g., TemplateResRef, Comment, PaletteID) are completely dark. The engine inherently ignores them.
Superseded Legacy FieldsThe static Difficulty field is a completely inactive legacy metric as long as DifficultyIndex maps to a valid row inside encdifficulty.2da.

Implemented Linter Rules (Rakata-Lint)

These rules are documented for engine parity but are not yet implemented into rakata-lint/src/rules/.

  1. Dead Difficulty Traces: (Pending) Flags instances where a file statically defines Difficulty alongside a valid DifficultyIndex.
  2. Deficient Spawn Loops: (Pending) Warns when an Encounter evaluates as Active but initializes a completely empty CreatureList.
  3. Dead Field Evaluation: (Pending) Maps extraneous legacy engine artifacts (TemplateResRef, Comment, PaletteID) as dead fields.

UTI Format (Item Blueprint)

The Item (.uti) blueprint serves as the central data model for all tangible loot, weapons, armor, and usable gear in the game. It defines how an item physically appears on characters, what custom properties or stat bonuses it applies through specific upgrade hierarchies, its intrinsic monetary cost, and exactly what its runtime state behaves like when dropped into the world map.

At a Glance

PropertyValue
Extension(s).uti
Magic SignatureUTI / V3.2
TypeItem Blueprint
Rust ReferenceView rakata_generics::Uti in Rustdocs

Data Model Structure

Rakata maps the Item definition directly into the rakata_generics::Uti struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

An Item breaks down into four main categories:

  1. Core Identity: The basic text strings that provide the item’s name and description, including both identified and unidentified states (e.g., TemplateResRef, LocName, Description).
  2. Economic & Charge Mechanics: The value of the item, and the number of charges left for consumable abilities (e.g., Cost, Charges).
  3. Visual Geometry (Appearance): Setting what the item looks like when dropped on the floor or equipped (e.g., ModelVariation, TextureVar).
  4. Combat & Upgrade Properties (PropertiesList): The stat buffs, damage modifiers, and abilities bound to the item, alongside slots for workbench upgrades.
  • Model Validation: rakata-lint checks the data against engine constraints to prevent fatal runtime crashes.

Engine Audits & Decompilation

The following information documents the engine’s exact load sequence and field requirements for .uti files mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from the primary dispatcher CSWSItem::LoadDataFromGff at 0x0055fcd0.)

Structural Load Phasing

The engine processes an Item structurally across multi-pass capabilities mappings.

FunctionSizeBehavior
LoadDataFromGffThe main parser that sets what the item is, how many charges it holds, and its descriptions.
LoadItemPropertiesFromGffReads the special properties (like energy damage or stat boosts), splitting them into ‘useable’ abilities versus permanent buffs.
LoadItemThe constructor that decides whether to load the item onto a character or leave it idle in an inventory.
LoadFromTemplateA fallback used when spawning an item dynamically from a script instead of off a character.
SaveItem / SaveItemPropertiesThe opposite pipeline that writes the item into a save game, which notoriously forces the item to always be flagged as “Identified”.

Core Structural Findings

The engine rigorously evaluates base-item mapping constraints from 2DA arrays and aggressively overrides improperly defined models.

Engine RuleRuntime Behavior
Description Cross-SwapIf either Description or DescIdentified is missing, the engine automatically duplicates the provided string into the missing field so item identification mechanics never crash the game.
Model TruncationIf an older tool incorrectly configures ModelVariation to 0, the engine forcefully bumps it to 1 upon load, ensuring the item always has visible geometry instead of rendering an invisible weapon or armor piece.
Model & Body Variation HooksThe engine completely ignores the .uti’s BodyVariation field, opting instead to enforce the exact body_var value predefined in baseitems.2da. Additionally, TextureVar is unconditionally bypassed unless the item’s base type is strictly configured as Model Type 1.
Cost Generation FallbackThe physical Cost integer provided in the file is dead data. The engine strictly computes economic value actively via GetCost() calculations based on its properties, completely ignoring your defined value.
Identifier EnforcementDuring explicit serializing via SaveItem (when the player creates a save game), the engine actively forces and hardcodes Identified to 1 unconditionally.
Property CapabilitiesItem properties are structurally split into Active and Passive memory tables at load. The engine evaluates every PropertyName index: any ID strictly mapping to 10, 37, 46, or 53 (e.g., Cast Power, Trap) is actively hooked as a usable player ability, while all other integers are silently applied as passive stat modifiers.

Legacy & Ignored Data

Finding TypeExplanation
Superseded Legacy FieldsDirectly supplying static Cost or BodyVariation values is a byproduct of older file versions; these remain inherently unused overhead compared to the physical runtime 2DA evaluation.
Passive Legacy ArtifactsGeneral nodes left over from older tools (like TemplateResRef, Comment, PaletteID, and explicitly UpgradeLevel) are bypassed on load entirely.

Linter Rules

These rules are documented for engine parity but are not yet implemented into rakata-lint/src/rules/.

  1. Dead Cost Fields: (Pending) Diagnoses static .uti files configured with explicit Cost declarations tracking identically to dead data.
  2. Model Truncation Safety: (Pending) Throws a validation error if ModelVariation statically rests at 0 to prevent runtime geometric wrapping to 1.
  3. Dead Body Overrides: (Pending) Flags redundant definitions of BodyVariation to eliminate baseitems.2da duplicate resolution.
  4. Valid Capability Bounds: (Pending) Scans all properties directly ensuring PropertyName, UpgradeType (0xFF), and UsesPerDay (0xFF) meet standard operational targets.

UTM Format (Merchant Blueprint)

Description: The Merchant (.utm) blueprint natively handles the interactive storefront data for merchants and shops. Because shops strictly behave as container interfaces that dynamically buy, sell, and map economic value onto spawned .uti items, the structure of a .utm is highly compact, primarily consisting of economic markups and inventory sorting parameters.

At a Glance

PropertyValue
Extension(s).utm
Magic SignatureUTM / V3.2
TypeMerchant Blueprint
Rust ReferenceView rakata_generics::Utm in Rustdocs

Data Model Structure

Rakata maps the Merchant definition directly into the rakata_generics::Utm struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

A Merchant breaks down into three main categories:

  1. Core Identity: The basic identifiers providing the shop’s name and tag (e.g., Tag, LocName).
  2. Economic Metrics: The percentages controlling price scaling when buying or selling items, alongside basic shop rules (e.g., MarkUp, MarkDown, BuySellFlag).
  3. Store Inventory (ItemList): The list of items actively available in the shop’s stock, including rules for infinite regeneration.
  • State Validation: rakata-lint checks the data against engine constraints to ensure merchants don’t silently fail during initialization.

Engine Audits & Decompilation

Because .utm evaluating is structurally straightforward, the engine bypasses heavy memory allocations and maps fields in an incredibly fast iteration.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSStore::LoadStore at 0x005c7180.)

Structural Load Phasing

FunctionSizeBehavior
LoadStore1341 BThe primary parser that pulls the merchant’s basic identity, economic constraints (MarkUp/MarkDown), and buying capabilities.
ItemList ReadIterates through the list of store stock, actively pulling either explicitly saved item instances or generating them freshly from templates (InventoryRes).
AddItemToInventoryPushes the fully sorted loot stack into the physical storefront container so the player can actually interact with and purchase them.

Core Structural Findings

Engine RuleRuntime Behavior
Cost SortingWhen building the store inventory, the engine actively sorts the merchant’s final stock from cheapest to most expensive by checking the cost of each item. This completely overrides whatever custom display order you try to dictate statically.
Dynamic EconomicsThe engine relies entirely on the MarkUp and MarkDown integers to control shop prices. These act as simple percentages that mathematically bump or slash the base cost of every item the merchant sells or buys.
Buy/Sell Bit FlagsBuySellFlag is split into basic toggles: bit 0 controls whether you are allowed to sell your gear to the merchant, and bit 1 controls whether the merchant will actually sell anything to you.
Infinite StackingIf an item is flagged as Infinite, the engine specifically locks that item in memory so that no matter how many times a player buys it, the shop never physically runs out of stock.

Legacy & Ignored Data

Finding TypeExplanation
Legacy Interface ConfigurationsSome older tools expose positional values like Repos_PosX or Repos_PosY inherited from other Odyssey games, but the engine completely ignores them. The game physically builds its shop UI dynamically when you open it, rendering those grid coordinates totally useless.

Implemented Linter Rules (Rakata-Lint)

These static constraints are targeted for implementation under rakata_lint::rules::utm.

  1. Economic Bounding: (Pending) Ensures MarkUp and MarkDown exist natively as INT types, preventing memory reads from failing parsing boundaries.
  2. Flag Enforcement: (Pending) Actively asserts BuySellFlag and Infinite map strictly to BYTE logic to prevent memory overhang collisions.
  3. Reference Mapping: (Pending) Confirms OnOpenStore script hooks perfectly resolve to active files natively.
  4. Inventory Integrity: (Pending) Prevents broken shops by verifying InventoryRes strings identically match standard 16-character limits natively linking to valid .uti items.

UTP Format (Placeable Blueprint)

Description: The Placeable (.utp) blueprint dictates the configuration of universally interactive scenery and containers within a map. Ranging from simple locked footlockers to rigged command consoles and explodable starship barricades, .utp structs blend physical static properties (like structural HP and lock difficulties) with heavy dynamic script bindings.

At a Glance

PropertyValue
Extension(s).utp
Magic SignatureUTP / V3.2
TypePlaceable Blueprint
Rust ReferenceView rakata_generics::Utp in Rustdocs

Data Model Structure

Rakata maps the Placeable definition directly into the rakata_generics::Utp struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

A Placeable breaks down into five main categories:

  1. Core Identity & Geometry: The configuration for what the placeable looks like, its faction, and the text displayed when targeted (e.g., Appearance, TemplateResRef, LocName).
  2. Interactive State & Dialogue: Flags determining if the placeable can be clicked, if it starts a conversation/computer sequence, or if it acts as a loot container (e.g., Useable, Conversation, HasInventory).
  3. Lock & Trap Mechanics: The parameters defining whether it’s locked, what key is needed, and rules for any attached traps (e.g., Locked, KeyName, TrapType, DisarmDC).
  4. Health & Destruction: The physical integrity of the object, defining if it can be destroyed and its defensive thresholds (e.g., HP, Hardness, Static, Plot).
  5. Behavioral Hooks (Scripts): The scripts that run when a player explores, attacks, or opens the placeable (e.g., OnOpen, OnInvDisturbed, OnDamaged).
  • State Validation: rakata-lint checks the data against engine constraints to prevent runtime bugs.

Engine Audits & Decompilation

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSPlaceable::LoadPlaceable at 0x00585670.)

Because Placeables act as physical junctions for event hooking, they expose a massive suite of script triggers natively.

Structural Load Phasing

FunctionSizeBehavior
LoadPlaceable5092 BThe primary physical parser evaluating 46 core metrics including health, conversation dialogues, basic trap bindings, and physical alignment states.
ReadScriptsFromGffAttaches 16 dedicated script hooks dictating behavior when the placeable is bashed, opened, unlocked, or triggered.

Core Structural Findings

Engine RuleRuntime Behavior
Appearance TruncationThe engine reads Appearance as a 32-bit integer but forcefully truncates it to a single byte. Any ID above 255 automatically wraps to 0 and physically breaks the placeable model rendering.
Static vs. Plot ChainingJust like Doors, if a Placeable is marked Static=1, the engine completely overrides all other behaviors and acts as if Plot=1 is true, making the placeable totally indestructible even if it has an HP value defined.
Default Usability CheckIf the Static toggle is completely missing from the binary file, the engine automatically derives it by actively checking if the Placeable is marked as usable (!Useable).
Ground Pile ForcingThe engine reads whatever value you place in GroundPile, but physically overwrites it and forces it to 1 in memory, making native static configuration of this field utterly pointless.
Missing Door HooksToolsets erroneously expose OnFailToOpen for Placeables, but the engine specifically treats this as a Door-exclusive (.utd) script hook and completely ignores it here.
Trap Hook FallbackIf a trap bounds check fails or the OnTrapTriggered script is left blank, the engine automatically attempts to read the traps.2da table and pulls the default script based on the specific TrapType.

Legacy & Ignored Data

Finding TypeExplanation
Legacy Engine ArtifactsPlaceable binaries are littered with legacy metrics from older tools or other Odyssey games (Comment, OpenLockDiff, Interruptable, Type, PaletteID). The physical KOTOR engine constructor entirely ignores these.

Implemented Linter Rules (Rakata-Lint)

These static constraints are targeted for implementation under rakata_lint::rules::utp.

  1. Appearance Truncation: (Pending) Prevents rendering crashes by asserting Appearance never mathematically exceeds 255.
  2. Plot Chaining Context: (Pending) Asserts that if Static=1 is defined, Plot must explicitly match the forced reality of being indestructible.
  3. Ghost Value Detection: (Pending) Warns when GroundPile defaults to anything structurally since the engine forces it to 1.
  4. Dead Hook Pruning: (Pending) Flags OnFailToOpen instances because Placeables physically lack the event memory map to trigger it.
  5. HP Health Ceiling: (Pending) Confirms CurrentHP is less than or mathematically equal to HP, preventing immediate game-break physics on spawn.
  6. Animation Conditional Limits: (Pending) Verifies that custom AnimationState indices are strictly guarded by Open==0 closures.

UTS Format (Sound Object Blueprint)

Description: The Sound Object (.uts) blueprint defines dynamic, positional, and ambient audio emitters placed throughout a game map. Ranging from environmental hums and randomized crowd chatter to highly localized looping sound effects, .uts files act as physical sound nodes combining strict spatial coordinates with randomized pitch, interval, and varying volume matrices.

At a Glance

PropertyValue
Extension(s).uts
Magic SignatureUTS / V3.2
TypeSound Object Blueprint
Rust ReferenceView rakata_generics::Uts in Rustdocs

Data Model Structure

Rakata maps the Sound Object definition directly into the rakata_generics::Uts struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

A Sound Object breaks down into five main categories:

  1. Audio Emitters (Sounds List): An array containing the audio files (.wav files) the engine will sequence or shuffle through.
  2. Spatial Geometry: Distance boundaries determining exactly where the sound is audible in the map (MinDistance, MaxDistance).
  3. Playback Automation: Rules for how the sound loops and strings together (Continuous, Random, Active, Looping).
  4. Algorithmic Variations: Modifiers that dynamically distort the audio file’s pitch and volume at runtime (PitchVariation, FixedVariance, VolumeVrtn).
  5. Procedural Generators: Identifiers that tell the engine if the sound represents specific background noise like crowd chatter or combat ambiance (GeneratedType).
  • State Validation: rakata-lint checks the data against engine constraints to prevent runtime bugs.

Engine Audits & Decompilation

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSSoundObject::Load at 0x005c9040.)

Sound Objects represent one of the most streamlined parsers in the engine. They completely lack script triggers and rely almost entirely on mathematically calculating randomized positional matrices and variations natively.

Structural Load Phasing

FunctionSizeBehavior
Load1345 BThe primary physical parser evaluating 24 core audio metric bounds, defining spatial positioning, volume variation, pitch scales, and active looping capabilities.
Sounds ListIterates through the list of associated audio clips, actively loading sound resrefs into memory sequentially for playback.

Core Structural Findings

Engine RuleRuntime Behavior
Generated Type TruncationThe engine reads GeneratedType as a massive 32-bit integer from the file, but forcefully truncates it and stores only the bottom single byte in memory. Setting this number astronomically high physically corrupts the expected generator type.
Constructor DefaultsIf fields are missing from the .uts binary, the engine physically relies on its internal C++ constructor to populate default values, completely avoiding hardcoded literal checks during parse time.
Spatial Loading ContextWhen loaded globally via a static map (CSWSArea::LoadSounds), the engine skips reading positional coordinates from the .uts file entirely and strictly enforces the X, Y, and Z vectors defined practically in the area’s .git file.
Silent Sound ListsWhen pulling the list of sounds, the engine actively ignores missing entries. It only pushes a sound struct into playable memory if the file actually provided a valid Sound reference string.

Legacy & Ignored Data

Finding TypeExplanation
Legacy Engine ArtifactsSome older tools and legacy file revisions include values like TemplateResRef, LocName, Comment, Elevation, Priority, and PaletteID. These are artifacts from other Odyssey Engine branches (like Neverwinter Nights) and the KOTOR engine never evaluates them natively.

Implemented Linter Rules (Rakata-Lint)

These static constraints are targeted for implementation under rakata_lint::rules::uts.

  1. Volume Ceiling: (Pending) Prevents rendering distortion by asserting Volume stays strictly within the standard 0-127 engine byte threshold.
  2. Float Sanity Parsing: (Pending) Confirms FixedVariance mathematically parses as a valid FLOAT, protecting the engine from invalid arithmetic operations during randomization.
  3. Audio Integrity: (Pending) Asserts that every defined Sound reference resolves precisely to a physical audio stream in the active game modules.
  4. Emitter Verification: (Pending) Structurally ensures the emitter has at least 1 actively mapped Sounds entry to prevent dead objects from polluting active map memory.
  5. Byte Truncation Warnings: (Pending) Flags when GeneratedType overflows heavily past 255, predicting the engine’s physical byte wrap.

UTT Format (Trigger Blueprint)

Description: The Trigger (.utt) blueprint defines invisible zones placed across level maps. While encounters spawn creatures, triggers operate as tripwires – firing scripts, acting as loading zones to new areas, or springing mechanical traps when a character crosses them.

At a Glance

PropertyValue
Extension(s).utt
Magic SignatureUTT / V3.2
TypeTrigger Blueprint
Rust ReferenceView rakata_generics::Utt in Rustdocs

Data Model Structure

Rakata maps the Trigger definition directly into the rakata_generics::Utt struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

A Trigger breaks down into four main categories:

  1. Core Identity & Geometry: The basic identifiers and coordinate boundaries that define what the trigger is and where it sits on the ground (e.g., Tag, Geometry).
  2. Interactive State & Sub-types: Settings that determine if the trigger acts as a loading zone, a trap, or just a generic scripting boundary (e.g., Type, Cursor, HighlightHeight).
  3. Trap Mechanics: The parameters defining rules for trap visibility and skill checks required to disarm them (e.g., TrapType, TrapOneShot).
  4. Transition & Behavioral Hooks (Scripts): The event scripts that fire when a character enters, clicks, leaves, or disarms the trigger, as well as the destination area if the trigger acts as a loading zone (e.g., ScriptOnEnter, LinkedTo).
  • State Validation: rakata-lint checks the GFF structure directly against the constraints the engine expects.

Engine Audits & Decompilation

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSTrigger::LoadTrigger at 0x0058da80.)

Structural Load Phasing

FunctionSizeBehavior
LoadTrigger3381 BThe main constructor. It reads the trigger’s properties, scripts, and trap rules.
LoadTriggerGeometry743 BReads the X, Y, and Z coordinates that draw the trigger’s boundary on the floor.

Core Structural Findings

Engine RuleRuntime Behavior
Behavior Derived from TypeThe engine determines the trigger’s behavior and UI cursor based on the Type field. Type 1 makes it a map transition zone. Type 2 makes it a trap.
OnClick Duplication BugThe engine has a known bug where it copies the ScriptOnEnter value and uses it to overwrite the OnClick listener by default, unless explicitly overridden.
Trap Hook FallbackIf the OnTrapTriggered script is left empty, set to null, or named "default", the engine ignores it and pulls the default script from traps.2da based on the TrapType.
Highlight ClampingThe trigger’s HighlightHeight is ignored by the engine unless it is greater than 0.0. If it is exactly zero or negative, the engine falls back to a default rendering height of 0.1.
Contextual LoadingFields like LinkedTo, LinkedToModule, AutoRemoveKey, Tag, and Faction are only loaded into memory when the Trigger is processed from a .git area layout file.
Dual-Path PortraitsIf PortraitId is < 0xFFFE, the engine treats it as an ID to resolve the 2DA map icon. If it is >= 0xFFFE, the engine ignores the integer and uses the explicit Portrait string instead.

Legacy & Ignored Data

Finding TypeExplanation
Legacy Engine ArtifactsAs with other templates, older asset revisions include TemplateResRef, Comment, PaletteID, and PartyRequired. The engine completely ignores these.
Superseded Legacy FieldsOlder asset revisions typically map TrapDetectDC and DisarmDC in the .utt file itself, but the engine ignores them – it calculates DCs dynamically using the rules in the .2da files instead.

Implemented Linter Rules (Rakata-Lint)

These static constraints are targets for implementation under rakata_lint::rules::utt.

  1. Trap Type Verification: (Pending) Warns if a trigger has its TrapFlag set but its Type is not equal to 2. The engine will ignore its trap settings in this state.
  2. Transition Enforcement: (Pending) Flags triggers where Type==1 is set but no LinkedTo or TransitionDestination is defined.
  3. Height Bounding: (Pending) Detects configuration patterns where HighlightHeight is ≤ 0.0, triggering the mandatory engine fallback to 0.1.
  4. Default Script Identification: (Pending) Identifies empty or "default" OnTrapTriggered entries to explicitly document which default script the engine will pull from traps.2da.
  5. Geometry Safety: (Pending) Ensures that the trigger’s geometry contains at least 3 vertices to form a valid map boundary.

UTW Format (Waypoint Blueprint)

Description: The Waypoint (.utw) blueprint defines static reference coordinates within an area map. Unlike functional triggers or physical placeables, waypoints act exclusively as invisible logic markers. They provide coordinate anchors for creature patrol routes, spawn locations, camera focal points, or visible map pins in the player’s UI.

At a Glance

PropertyValue
Extension(s).utw
Magic SignatureUTW / V3.2
TypeWaypoint Blueprint
Rust ReferenceView rakata_generics::Utw in Rustdocs

Data Model Structure

Rakata maps the Waypoint definition directly into the rakata_generics::Utw struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.

A Waypoint breaks down into three main categories:

  1. Core Identity: The basic identifiers that define the waypoint’s name and tag used heavily by scripts (e.g., Tag, LocalizedName).
  2. Spatial Geometry: The exact map coordinates and facing orientation that creatures or cameras will reference (e.g., XPosition, XOrientation).
  3. Map Navigation Notes: The text and toggles that dictate whether the waypoint draws a physical pin on the player’s mini-map UI (e.g., HasMapNote, MapNote).
  • State Validation: rakata-lint checks the data against engine constraints to prevent runtime bugs or dead data paths.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for .utw files mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSWaypoint::LoadWaypoint at 0x005c7f30.)

Structural Load Phasing

FunctionSizeBehavior
LoadWaypoint682 BThe main constructor. It loads the waypoint’s identity, map geometry, and checks for mini-map pins.
LoadFromTemplate134 BA fallback used when dynamically spawning a waypoint from a script.

Core Structural Findings

Engine RuleRuntime Behavior
Map Note Two-Gate PatternIf HasMapNote is 0 or missing, the engine skips reading the map note entirely. If it is 1, it reads the strings but uses a second gate: if the MapNote string itself is missing, the entire map pin block is discarded silently.
Orientation NormalizationThe engine computes the squared magnitude of the orientation vectors. If it is not exactly 1.0, it automatically calls Vector::Normalize() to fix the math. Non-unit vectors are tolerated but corrected instantly at load.
Position OverrideWhen a waypoint is loaded from a .git area layout via LoadWaypoints, the engine re-reads the X and Y coordinates directly from the .git file, completely overriding the .utw. It also forcefully calculates the Z height based on the terrain collision mesh via ComputeHeight.
Dynamic IdentificationWaypoints never pull an ObjectId from their own .utw file. It is always forcibly assigned by the .git list element (defaulting to 0x7f000000).

Legacy & Ignored Data

Finding TypeExplanation
Superseded Legacy FieldsOlder asset revisions pad the file with fields like TemplateResRef, Appearance, PaletteID, Comment, LinkedTo, and Description. The KOTOR engine completely ignores these.

Implemented Linter Rules (Rakata-Lint)

These static constraints are targeted for implementation under rakata_lint::rules::utw.

  1. Tag enforcement: (Pending) Flags if Tag is completely empty, as waypoints are primarily targeted by scripts.
  2. Boolean Clamping: (Pending) Ensures HasMapNote acts properly as a BYTE constraint.
  3. Double-Gating Check: (Pending) Detects dead data patterns where MapNote or MapNoteEnabled are defined but HasMapNote is configured to 0.
  4. Orientation Warnings: (Pending) Warns if orientation vectors do not mathematically normalize to ~1.0, documenting the engine’s forced correction.

3D Geometry & Models

At the heart of the Odyssey Engine’s visual presentation is a proprietary structural design for interpreting and rendering 3D geometry. Modern formats like .glTF or .fbx bundle all visual and physical data into a single asset. KotOR however, splits this data across several distinct files. The engine strictly decouples the node hierarchy tree, the raw vertex buffers, and the mathematical collision boundaries.

Note

If you are looking for the exact underlying raw Ghidra decompilation notes detailing the K1 Engine’s InputBinary::Read pipeline and structural layout bytes, please refer to the preserved Raw MDL Decompilation Archive.


Implementation Blueprints

This section documents the primary pillars of KOTOR geometry and their mathematical foundations, backed by swkotor.exe clean-room reverse engineering.

FormatNameLayout & Purpose
MDLModel HierarchyThe architectural scaffold holding the model together. It defines the scene bounding volumes, spatial rotations, embedded animations, engine rendering parameters, and a deep recursive tree of typed Nodes (e.g., Lights, Bones, Emitters, Trimeshes).
MDXVertex DataThe abstract mathematical arrays defining the actual rendering payload. It directly encodes interleaved array blocks mapping exact spatial coordinates (X, Y, Z), texture UV layouts, and Lighting Normals.
BWMWalkmeshesThe raw mathematical graph of AABB bounds and face intersections that serve as physics collision boxes for area environments (.wok), placeables (.pwk), and interactive doors (.dwk).
MathTriMesh DerivationsDocumentation explaining exactly how variables like coordinate bounds and face offsets are mathematically derived across both visual Trimeshes and collision Walkmeshes.

MDL Format (Model Hierarchy)

The .mdl format serves as the overarching structural spine for 3D model geometry. Rather than storing literal vertex positions directly, it recursively structures a tree of generalized nodes (Bones, Trimeshes, Lights, Emitters) into a unified visual mesh. It delegates vertex geometry out, binds textures, links dynamic controllers (keyframe transformations), and maps bounding sphere matrices directly to the model’s rigid physical space.

At a Glance

PropertyValue
Extension(s).mdl
Magic SignatureText (filedependancy) or Binary (\0 byte header)
Type3D Hierarchical Mesh
Rust ReferenceView rakata_formats::Mdl in Rustdocs

Data Model Structure

Rakata maps the .mdl binary tree exactly into rakata_formats::Mdl.

Because a model intrinsically utilizes 11 distinct struct sub-types, Rakata resolves the pointer-based tree structure into a secure Rust Vec<MdlNode>. Native file pointer offsets which are normally resolved inside KOTOR via an explicit raw memory relocation dump are converted into safe recursive structures at parse time.

Node Sub-Types

The engine determines exact node allocations using a rigid bitflag header.

Sub-TypeDescription
BaseA pure structure node (Dummy) acting strictly as an invisible visual group or spatial pivot.
LightProjects localized dynamic lighting, lens flares, and shading priorities.
EmitterConfigures particle spawning systems (fountains, single-shots, lightning, explosions).
CameraAn empty node serving as a static viewport anchor for dialogue cinematics.
ReferenceAn anchor point explicitly linking an external 3D model asset to a point.
TriMeshA rigid standard triangle geometry boundary carrying static vertex arrays.
SkinMeshA procedural mesh utilizing skeleton bone-weights and vectors to calculate organic deformations.
AnimMeshA mesh carrying hardcoded, explicitly sampled vertex coordinate animation loops.
DanglyMeshA sub-mesh evaluated through swinging physics constraints (displacement, tightness, period).
AABBA strict spatial collision tree structurally defining an internal walkmesh barrier.
SaberAllocates dynamic 3D quad arrays utilized exclusively to generate stretching lightsaber swing trails.

Engine Audits & Decompilation

Deep Dive: For an exhaustive archive of the Ghidra decompilation notes detailing the exact byte-level layout of the binary MDL format and engine loading pipeline, refer to the MDL & MDX Deep Dive.

The following information documents the engine’s exact load sequence for genuine Binary MDL models. All behavior was mapped from natively analyzing swkotor.exe execution pipelines via Ghidra.

Loading and Wrapper Validation

Read initially via Input::Read (0x004a14b0).

Pipeline EventGhidra Provenance & Engine Behavior
Binary vs ASCII DetectionThe engine checks the exact first byte of the file. If it hits a \0 (NULL), it dispatches the asset entirely to the InputBinary track. If it hits text ("filedependancy" or "newmodel"), it loops into the FuncInterp ASCII parser track.
Wrapper MappingThe Binary format evaluates the initial 12 bytes as an abstract Wrapper block defining explicit sizes for the .MDL and the associated .MDX geometry.
In-Memory Heap DumpThe engine allocates the sizes noted in the wrapper, runs memcpy on both the .MDL and .MDX assets blindly into memory, and then runs the recursive Reset path to relocate spatial internal pointer offsets to absolute memory addresses.

Node Dispatch Architecture

Read initially via InputBinary::ResetMdlNode (0x004a0900). The engine recursively navigates downwards matching against a constant 16-bit node-type flag lookup spanning from 0x0001 (Base Node) to 0x0821 (Lightsaber).

Mapped PropertyEngine Behavior
Sub-node Allocation SizesNodes are dynamically allocated varying byte lengths strictly based on their type-mask. A root Base node only evaluates 80 contiguous bytes, but an Emitter allocates 304, and a Skin allocates 512.
Parent/Child Graph ResolutionEngine structures evaluate nodes continuously downward via embedded raw pointer arrays. These arrays branch a group of distinct sub-children implicitly off their master parent. At load time, the engine must safely rewrite all relative file offsets into absolute physical memory locations, otherwise the entire hierarchy will instantly detach.

Mapped Behavior Quirks

Mapped PropertyGhidra Provenance & Engine Behavior
LOD Suffix GenerationThe engine natively evaluates if the cullWithLOD property is set. If true, it explicitly triggers string concatenations for FindModel(name + "_x") and FindModel(name + "_z") sequentially to dynamically attach lower-quality auxiliary geometry instances based on viewport distance.
Animation Bone BindingWhen building the live hierarchy tree for a rendering sequence, the engine explicitly ignores the node’s textual string name. Instead, it rigidly evaluates physical pairings against a mapped node_id integer. If the bone isn’t properly sequenced to that numeric ID array, it detaches from the runtime arrays entirely.
Self-Describing KeyframesUnlike older properties that rely on rigid dictionaries, KOTOR determines how an animation was saved dynamically by reading the keyframe’s controller type integer. It applies a bitwise AND check against the type’s lowest hex digit (& 0x0F) to instantly dictate whether the loaded keyframe is a single float (like scaling), 3 floats (like an XYZ positional vector), or 4 floats (for a Slerp quaternion rotation).

Proposed Linter Rules (Rakata-Lint)

While rakata-lint currently only evaluates GFF formats and does not yet parse .mdl models dynamically, the engine behaviors above hint at some suggested lint diagnostics:

Planned Lint Diagnostics:

  1. Skeleton / Animation Tracing: Flags animation nodes where the internal skeletal node_number binding parameter implicitly equals 0, ensuring the mesh does not hard freeze via pointing to the rigid root spine.
  2. Controller Mask Encoding: Validates that generic Controller properties properly bit-mask against the Bezier indicator (0x10) rather than reading explicitly raw quaternion values (which causes cascading loop failures through the rest of the array block).
  3. Emitter Detonation Allocation: Flags interactive Emitter nodes attempting to bind the detonate key (Controller 502) while structurally mis-identifying as "Fountain". The engine native only maps controller 502 data to strict "Explosion" memory paths, resulting in an aggressive Access Violation engine crash otherwise.
  4. Name Graph Sanitization: Notifies developers if the node graph contains artificially un-referenced graph pointers mapped under the unified Name Table. (BioWare notoriously shipped identical shared name tables compiling .pwk and .wok models into .mdl nodes natively throughout the 2003 pipeline).

MDX Format (Vertex Data)

The .mdx format is a companion file that always pairs tightly with a .mdl model. While the .mdl file handles the complex math, skeletal hierarchy, and animation logic, the .mdx file acts as bulk storage; holding the massive lists of raw 3D coordinates (vertices) that make up the physical shape of the model.

Architecturally, the swkotor.exe engine treats these two files as a single combined asset: the .mdl dictates where and how the model moves, and the .mdx provides the points to physically draw on the screen.

At a Glance

PropertyValue
Extension(s).mdx
Magic SignatureRaw binary stream (No explicit signature block)
TypeInterleaved Vertex Payload Array
Rust ReferenceView rakata_formats::Mdx in Rustdocs

Data Model Structure

Rakata safely consumes the unindexed byte sequences into a typed geometry definition mapped within rakata_formats::Mdx.

At the raw binary level, .mdx data is strictly an interleaved buffer. Variables (like positional 3D XYZ vectors, Texture Parameter UV planes, and light-calculating Normals) are sequentially woven directly across the byte stream.

Engine Audits & Decompilation

Deep Dive: For an exhaustive archive of the Ghidra decompilation notes detailing the exact byte-level layout of the binary MDL/MDX format and engine loading pipeline, refer to the MDL & MDX Deep Dive.

The following documents the engine’s exact load sequence and structure for .mdx interleaved data pipelines mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from InputBinary::Read (0x004a1230) and InputBinary::ResetMdlNode (0x004a0900).)

Loading and Lifecycle

Pipeline EventGhidra Provenance & Engine Behavior
Memory WrappingTriggered immediately alongside the .mdl. The wrapper dynamically outlines the exact byte-count of .mdx data required (wrapper + 0x08).
Buffer LiberationMDX arrays are entirely stateless. Once InputBinary::ResetMdlNode computes the geometry arrays and translates the buffer directly into the OpenGL hardware render-pools during loading, the engine immediately calls free() wiping the MDX byte arrays from physical memory entirely.

TriMesh Structural Addressing

The KOTOR Engine avoids parsing the MDX data by scanning through it block-for-block. Instead, traversing the actual MDL hierarchy drives vertex payload requests explicitly.

Mapped PropertyGhidra Provenance & Engine Behavior
Array SlicingEvery distinct TriMesh instantiated in the parent MDL tree explicitly registers an mdx_data_offset pointer (TriMesh + 0x144). This dictates exactly where the engine explicitly seeks within the interleaved .mdx payload array to fetch this mesh’s native points.
Node Alignment ConstraintsVanilla assets maintain extremely strict alignment formats. Meshes are dynamically sorted prior to hardware parsing: static rendering models fall to the top of the index chain, whereas dynamic procedural meshes (like character .Skin nodes) are specifically dumped sequentially to the rear of the .mdx.

Note

Ghost Payload Sentinels During memory extraction, the engine implicitly pads geometric mesh payloads out to distinct 16-byte aligned boundaries using Terminator Rows. Any mesh vertex iteration falling slightly out of stride will be explicitly back-filled with ghost/sentinel float arrays ([0.0, 0.0, 0.0]) to ensure OpenGL buffer calculations remain strictly uniform without overflowing pointer indexes during hardware streaming.

Proposed Linter Rules (Rakata-Lint)

Incorrectly calculated .mdx offset spans or payload array lengths can cause the engine to read misaligned bytes or overflow data bounds. Providing a linter rule to validate these payload alignments helps prevent geometry corruption and potential engine/gpu crashes.

While rakata-lint currently only evaluates GFF formats and does not yet parse .mdx buffers dynamically, the engine behaviors above hint at the foundational requirements for .mdx stability:

Planned Lint Diagnostics:

  1. Mesh Slice Verification: Enforces explicit iteration seeking. Validates .mdx vector boundaries by explicitly jumping pointers down the file according to individual mdx_data_offset assignments mapped on explicitly bound TriMesh headers, rather than assuming unverified sequential payload lengths.

Walkmesh (BWM / WOK)

Walkmeshes govern physical collision and pathfinding across an area. They dictate exactly where a character can stand, what slopes they can climb, and what physical materials block their path.


BWM Binary

The binary implementation of the Walkmesh is entirely designed to be dumped straight into memory. Instead of smoothly parsing the file piece-by-piece, the engine constantly jumps around the file using a complex array of offsets located at the very top.

At a Glance

PropertyValue
Extension(s).bwm, .wok
Magic SignatureNone standard header block
TypeMemory-Mapped Collision Net
Rust ReferenceView rakata_formats::Bwm in Rustdocs

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and field requirements for Binary Walkmeshes mapped from swkotor.exe.

(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWCollisionMesh::LoadMeshBinary at 0x00597120.)

Pipeline EventGhidra Provenance & Engine Behavior
Pointer JumpingThe engine doesn’t read the file linearly from top to bottom. Instead, it uses direct memory math (pointer arithmetic) to aggressively jump between the header and the raw data payload.
Offset ExtractionThe beginning of the file contains exact byte locations the engine uses to orient itself:
+0x08 yields the total vertex_count
+0x0C..+0x18 provides the maximum limits for faces, materials, and walk-edges
+0x18..+0x24 yields adjacency boundaries
+0x3C..+0x48 stores the direct starting addresses for the geometry data
Bounding Box OffsetsThe spans immediately following (+0x48..+0x6C and +0x6C..+0x84) are reserved specifically for tracking offsets that point to the Axis-Aligned Bounding Box (AABB) collision trees.
Ignoring the Magic IDMagic bypass: Magic and version identifiers (BWM ) are actually ignored natively during the LoadMeshBinary process. It relies on a different system entirely to verify file signatures beforehand.
Read-Only FormatOne-Way Flow: Vanilla KOTOR contains strictly read-only capabilities for BWM binaries. Developers removed any functionality needed to compile or save collision data dynamically!

Tip

Orphaned Memory Gaps: The engine entirely skips reading two massive blocks of bytes off the disk: +0x24..+0x3C (24 bytes) and +0x64..+0x6C (8 bytes). For a byte-perfect roundtrip toolset, these gaps must absolutely be preserved verbatim!


BWM ASCII

For tooling purposes, BioWare engine modules support a raw ASCII readable version of the walkmesh that can be dynamically parsed at runtime at a massive performance cost.

At a Glance

PropertyValue
Extension(s).bwm (ASCII formatted)
Magic SignatureASCII Text Directives
TypeUncompiled Collision Text

Engine Audits & Decompilation

The following documents the engine’s exact load sequence and constraints for ASCII text walkmeshes mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWRoomSurfaceMesh::LoadMeshText at 0x00582d70.)

Pipeline EventGhidra Provenance & Engine Behavior
Searching for KeywordsThe engine scans the text file reading line-by-line to look for the specific keywords node, verts, faces, and aabb.
Strict Face FormattingEvery defined face string must strictly format exactly 8 numbers separated by spaces. Interestingly, while the engine reads the adjacency input, it immediately deletes it! The engine forces adjacency math to be physically recomputed from scratch post-load to prevent geometric errors from old assets.
Line Length LimitsThe engine will aggressively truncate or glitch if any single text line stretches beyond 256 characters (0x100 bytes).
Face ReorderingUsing the surfacemat.2da file, the engine completely shuffles the order of the faces while loading. It essentially groups every geometry face marked “walkable” at the absolute top of the array, and pushes all non-walkable geometry straight to the bottom.
Fudging the BoundariesWhen figuring out the Axis-Aligned Bounding Box (AABB) limits, the text loader artificially stretches the box outwards by roughly 0.01 across every axis. Due to the face reordering mentioned above, the engine also has to build a temporary remap table under the hood just to keep track of where everything moved!

Warning

Because the ASCII face-reordering mechanism radically shuffles the root array indexes from walkable to unwalkable clusters via the LoadMeshText routine, it is impossible to do a clean 1-to-1 binary-to-ASCII-to-binary round trip of a KOTOR walkmesh without completely losing the original face indexing format!

TriMesh Derived & Computed Fields Reference

This document catalogs derived or computable fields specifically impacting TriMesh generation for MDL/MDX structures.

At a Glance

PropertyValue
Extension(s).mdl
DomainGeometry Math / Model Reconstruction
Rust ReferenceView rakata_formats::MdlNodeTriMesh in Rustdocs

Data Model Structure

Rakata attempts to make building a TriMesh as painless as possible by handling the complex math under the hood.

  • Derived Fields: Rakata explicitly understands the difference between data you must supply (like static 3D coordinates) and data that can safely be calculated on the fly (like bounding limits, spherical radii, or adjacency maps). The rakata-formats API automatically calculates all of these required boundaries for you seamlessly whenever you serialize the file!

Engine Audits & Decompilation

This document catalogues every field on MdlMesh and MdlFace that can be derived from geometry, documenting what each field means, how community tools handle it, and what algorithm is needed to recompute it. This is the reference for future model-editing API work.

Field Categories

  • User-authored: Provided by the modeller. Never recomputed.
  • Derivable: Can be recomputed from geometry. Tools recompute on ASCII import / model rebuild; preserve verbatim on binary roundtrip.
  • Runtime-only: Written by the engine at load time. On-disk values are meaningless stubs.

1. Internal CExoArrayList Fields (+0x98 .. +0xC8)

The five CExoArrayList slots in the TriMesh header form a coordinated GL index buffer submission system. Each stores a 12-byte header (ptr/count/alloc) in the mesh header plus a single u32 data value in the content blob.

1.1 vertex_indices (+0x98) – Dead in KotOR

What it is: A legacy engine array block. In BioWare’s older titles (like Neverwinter Nights), this block pointed to vertex index data. In KOTOR, the engine never actually looks at this field at all.

Community tools:

  • mdledit: Misidentifies as cTexture3 (12-byte string). Byte-exact preserve.
  • mdlops: Reads as raw bytes via darray struct. Byte-exact preserve.
  • PyKotor: Reads as indices_counts. Byte-exact preserve.
  • xoreos/reone: Skip entirely.

Vanilla values: Always zeros (ptr=0, count=0, alloc=0).

Rakata Processing Rule: Store as [u8; 12] for lossless preservation, or zero on write. No computation needed.

1.2 left_over_faces (+0xA4) – Dead in KotOR

What it is: Another legacy array block. In NWN, this stored “left over” face geometry. In KOTOR, the engine updates the pointer location dynamically but completely forgets to actually use or read the data during the OpenGL rendering cycle. rendering loop.

Community tools:

  • mdledit: Misidentifies as cTexture4 (12-byte string). Byte-exact preserve.
  • mdlops: Reads as raw bytes via darray struct. Points to the packed u16 vertex index data (mdlops uses this as the indirection to find face indices).
  • PyKotor: Reads as indices_offsets. Byte-exact preserve.
  • xoreos: Only field it actually follows – reads the pointer to find packed u16 face vertex indices.
  • reone: Reads as indicesOffsetArrayDef. Uses first element as pointer to u16 index data.

Vanilla values: Typically non-zero. The pointer value points to the packed u16 face vertex index data. Count is 1, alloc is 1.

Rakata Processing Rule: Store the raw pointer and count variables. The pointer is content-relative and must be explicitly backpatched on write to point to the packed u16 face index data block.

1.3 vertex_indices_count (+0xB0) – Derivable

What it is: Single u32 value = total number of u16 vertex indices in the face index buffer.

Formula: face_count * 3

Community tools:

  • mdledit: Recomputes on every write (nVertIndicesCount = Faces.size() * 3).
  • mdlops: Recomputes on ASCII import.
  • PyKotor: Preserves from binary, creates empty for new models.

Rakata Processing Rule: Dynamically derive from faces.len() * 3. Never store a static value in the struct.

1.4 mdx_offsets (+0xBC) – Derivable (pointer)

What it is: Single u32 value = content-relative offset to the packed u16 face vertex index data in the MDL content blob.

Community tools:

  • mdledit: Writes placeholder, backpatches when VertIndices data is written.
  • mdlops: Same approach.
  • PyKotor: Same approach.

Rakata Processing Rule: Compute strictly at serialization time via the binary writer. Never store a static value in the struct.

1.5 index_buffer_pools / Inverted Counter (+0xC8) – Preserve or Derive

What it is: A standard 32-bit number. On the physical hard drive, this acts exclusively as a sequence counter that numbers meshes using a bizarre “inverted” counting pattern. However, the moment the engine loads the file into memory, it deletes this number and overwrites the exact memory space with an OpenGL hardware connection handle.

The inverted counter formula (from mdledit asciipostprocess.cpp:1024):

mesh_counter: sequential 1-based index across all mesh nodes in DFS tree order.
              Saber meshes consume TWO increments (one per inverted counter).

Quo = mesh_counter / 100
Mod = mesh_counter % 100
inverted_counter = (2^Quo) * 100 - mesh_counter
                 + (Mod != 0 ? Quo * 100 : 0)
                 + (Quo != 0 ? 0 : -1)

Example sequence: 98, 97, 96, …, 1, 0, 100, 199, 198, …, 101, 200, …

Community tools:

  • mdledit: Preserves from binary. Recomputes from formula only for ASCII import when value is missing (!nMeshInvertedCounter.Valid()).
  • mdlops: Recomputes on ASCII import using same formula.
  • PyKotor: Preserves from binary.

Rakata Processing Rule: Map as a static u32 field to perfectly preserve binary roundtripping. When natively constructing new models, dynamically compute the inverted sequence according to the formula using a DFS mesh counter.


2. Packed u16 Face Vertex Indices

What it is: A tightly packed list of u16 index triplets (yielding exactly 6 bytes per face). Each 3-piece triplet tells the renderer which three vertex dots to connect to draw one flat triangle. This entire block is physically uploaded straight to the graphics card to render the final model.

Relationship to MdlFace: The packed u16 data is identical to MdlFace.vertex_indices for each face, laid out sequentially. It is fully redundant with the face array.

Community tools:

  • mdledit: Reads from binary into nVertIndices (3 u16 per face, stored alongside face data). Writes from face data.
  • mdlops: Reads as vertindexes darray. Writes from face data on ASCII import.
  • xoreos/reone: Read from the pointer at +0xA4 or +0xBC.

Rakata Processing Rule: Always dynamically derive identical copies directly from faces[i].vertex_indices during binary emission. Never map a redundant array inside the Rakata struct.


3. Face Fields (MdlFace, 32 bytes per face)

3.1 plane_normal ([f32; 3]) – Derivable

What it is: The geometric direction the triangle’s flat surface is facing (a unit normal vector).

Formula:

edge1 = positions[v1] - positions[v0]
edge2 = positions[v2] - positions[v0]
normal = normalize(cross(edge1, edge2))

Community tools: All tools that recompute adjacency also recompute normals.

3.2 plane_distance (f32) – Derivable

What it is: The raw distance measured straight from the physical center of the world (origin) to the face’s flat surface along its normal vector.

Formula: plane_distance = -dot(plane_normal, positions[v0])

Note: some tools negate this differently. Verify against vanilla data.

3.3 surface_id (u32) – User-authored

What it is: Material/surface type identifier. Determines footstep sounds, walkability, etc. in walkmeshes; material properties in render meshes.

Not derivable – assigned by the modeller or inherited from the source asset.

3.4 adjacent ([u16; 3]) – Derivable

What it is: For each edge of the triangle, the index of the face sharing that edge. 0xFFFF means no adjacent face (boundary edge).

Edge-to-adjacent mapping:

  • adjacent[0]: face sharing edge (v0, v1)
  • adjacent[1]: face sharing edge (v1, v2)
  • adjacent[2]: face sharing edge (v2, v0)

Rakata Hash-Map Adjacency Algorithm:

1. Build position_key(v) = format!("{:.4e},{:.4e},{:.4e}", pos[0], pos[1], pos[2])

2. Build vertex_group: HashMap<String, Vec<usize>>
   For each vertex index i:
       vertex_group[position_key(i)].push(i)

3. Build vertex_to_faces: HashMap<usize, Vec<usize>>
   For each face f, for each vertex v in face.vertex_indices:
       vertex_to_faces[v].push(f)

4. Build face_set(vertex_index) -> HashSet<usize>:
   Collect all faces touching any vertex in the same position group:
       group = vertex_group[position_key(vertex_index)]
       union of vertex_to_faces[g] for all g in group

5. For each face f:
   For each edge (va, vb) in [(v0,v1), (v1,v2), (v2,v0)]:
       candidates = face_set(va) & face_set(vb) - {f}
       adjacent[edge] = if candidates.is_empty() { 0xFFFF }
                        else { min(candidates) }

Complexity: O(F * V_avg) where V_avg is the average number of faces per vertex group. Effectively O(F) for well-behaved meshes.

No-neighbor sentinel: 0xFFFF (u16::MAX). All tools agree except PyKotor which incorrectly uses 0 (bug – face 0 is a valid index).

Non-manifold edges: When more than 2 faces share an edge, tools differ:

  • mdledit: First match wins, logs a warning.
  • mdlops: Arbitrary (hash iteration order).
  • PyKotor: Smallest face index wins (min(candidates)).

Rakata Processing Rule: Always use min(candidates) internally so evaluation remains deterministic and aligns with PyKotor output. If non-manifold geometric edges are detected, the formatter must throw a logger warning.

Important: Vertex matching must be position-based, not index-based. Meshes commonly have duplicate vertices at the same position with different normals/UVs (hard edges, UV seams). Index-based matching would miss adjacency across these seams.

3.5 vertex_indices ([u16; 3]) – User-authored

What it is: The three vertex indices forming this triangle.

Not derivable – defines the mesh topology.


4. Mesh Bounding Geometry – Derivable

4.1 bounding_min / bounding_max ([f32; 3])

What it is: A perfect, square box drawn tightly around every single vertex dot in the model (an Axis-Aligned Bounding Box).

Formula:

bounding_min = [min of all positions[i][0], min of [1], min of [2]]
bounding_max = [max of all positions[i][0], max of [1], max of [2]]

4.2 bsphere_center / bsphere_radius ([f32; 3], f32)

What it is: Minimum bounding sphere enclosing all vertices. Used by the engine for frustum culling (PartTriMesh::GetMinimumSphere at 0x00443330).

Engine algorithm (from Ghidra, confirmed in mdl_mdx.md):

center = average of all vertex positions (centroid)
radius = max distance from center to any vertex

This is NOT the true minimum bounding sphere (Welzl’s algorithm), but a simpler centroid-based approximation. Matches what vanilla files contain.

4.3 total_surface_area (f32)

What it is: Sum of all triangle areas in the mesh.

Formula:

For each face:
    edge1 = positions[v1] - positions[v0]
    edge2 = positions[v2] - positions[v0]
    area += 0.5 * length(cross(edge1, edge2))
total_surface_area = sum of all face areas

5. AABB Tree – Derivable (complex)

What it is: A mathematical collision-detection tree (Binary Space Partition) built over the faces of the mesh. It recursively slices the physics block into smaller and smaller floating boxes so the engine can quickly determine if a player bumps into a wall, saving it from checking collision against every single polygon.

When needed: Only for MdlNodeData::Aabb nodes (walkmesh-like collision geometry). Regular render meshes don’t have AABB trees.

Node layout: 40 bytes (see mdl_mdx.md for full struct).

Build algorithm: Recursive spatial partition:

  1. Compute AABB of all face centroids.
  2. Choose split axis (longest AABB dimension).
  3. Sort faces by centroid along split axis.
  4. Split at median into left/right subsets.
  5. Recurse on each subset until single-face leaves.

Community tools generally don’t rebuild AABB trees from scratch – they preserve the existing tree or require external tooling to generate it.


6. Fields That Are NOT Derivable

These distinct fields are explicitly user-authored or carried over from tooling. Rakata must treat them strictly as rigid payload endpoints. They are never mathematically recomputed across the pipeline:

FieldSource
Vertex positions, normals, UVs, tangent space3D modeller
Vertex colors3D modeller or material editor
Texture names (texture_0, texture_1)Material assignment
Diffuse/ambient colorsMaterial properties
Transparency hint, light_mapped, beaming, etc.Material flags
Surface ID per faceSurface type assignment
Vertex indices per faceMesh topology
Controller keyframesAnimation data
Bone weights, indices, bonemapRigging tool
Emitter propertiesParticle editor

7. Tool Cross-Reference: CExoArrayList Naming

The naming across tools is wildly inconsistent:

OffsetEngine (Ghidra)rakatamdleditmdlopsPyKotorxoreos
+0x98vertex_indicesvertex_indices_arraycTexture3pntr_to_vert_numindices_counts(skip)
+0xA4left_over_facesleft_over_faces_arraycTexture4pntr_to_vert_locindices_offsetsoffOffVerts
+0xB0vertex_indices_countvertex_indices_count_arrayIndexCounterArrayarray3counters(skip)
+0xBCmdx_offsetsmdx_offsets_arrayIndexLocationArray(backpatch only)(not modeled)offOffVerts
+0xC8index_buffer_poolsindex_buffer_pools_arrayMeshInvertedCounterArrayinv_count(not modeled)(skip)

Note: mdledit’s identification of +0x98/+0xA4 as texture name slots is incorrect for KotOR. In NWN, the mesh header has 4 texture name slots (64 bytes each) at this region. KotOR reduced to 2 texture names (32 bytes each at +0x58/+0x78) and repurposed the remaining space as CExoArrayList headers. The CExoArrayLists are always empty (all zeros) in vanilla KotOR, so mdledit’s string-based read/write produces byte-identical results.


8. MDL vs BWM Adjacency Encoding

A critical distinction for anyone working with both formats:

PropertyMDL Face AdjacencyBWM Walkmesh Adjacency
Storageu16 per edgei32 per edge
EncodingPlain face indexface_index * 3 + edge_index
No-neighbor0xFFFF-1 (0xFFFFFFFF)
PurposeGL rendering hintsPathfinding / collision

BWM’s edge-encoded adjacency tells you not just WHICH face is adjacent, but WHICH EDGE of that face connects – needed for the pathfinding walk algorithm. MDL only needs to know which face, not which edge.


9. Write-Order Dependencies

When writing a mesh node, fields must be emitted in a specific order because some fields are content-relative pointers that must be backpatched. The canonical order (from mdledit binarywrite.cpp) is:

  1. Face array (32 bytes per face)
  2. vertex_indices_count data (single u32: face_count * 3)
  3. Content vertex positions (12 bytes per vertex, only for MDL content blob)
  4. mdx_offsets data (single u32: placeholder, backpatched)
  5. index_buffer_pools data (single u32: inverted counter value)
  6. Packed u16 vertex indices (face_count * 3 u16 values)

After step 6, backpatch the mdx_offsets pointer to point to the start of step 6’s data.

CExoArrayList headers at +0x98..+0xC8 are written as part of the mesh extra header (332 bytes), with pointer values backpatched after the data is written.

Texture Formats

KOTOR handles graphics via multiple tailored texture formats. It uses hardware-accelerated DXT compression techniques natively supported by its OpenGL backend.


Implementation Blueprints

This section details the primary texture architectures parsed natively by rakata-formats.

FormatNameLayout & Purpose
TPCTexture Pack CompressedA proprietary BioWare wrapper around native DXT-compressed OpenGL texture data. This is the primary format used for all base-game environment and character textures.
DDSDirectDraw SurfaceA proprietary BioWare variation of the standard Microsoft DDS format. Rather than utilizing standard headers, the legacy engine requires a bespoke 20-byte magic wrapper.
TGATruevision TargaAn uncompressed, lossless visual format. Used for rendering crisp UI elements, visual effects (VFX), etc.
TXITexture ExtensionsPlaintext routing files that accompany primary textures. They direct the engine how to apply advanced rendering hints, such as procedural animations or bump-mapping.

TPC (Texture Pack Compressed)

TPC is the proprietary bundled texture format created by BioWare. It contains the raw DXT-compressed texture data, pre-computed mipmaps, and potentially appended TXI configuration data all in one blob.

At a Glance

PropertyValue
Extension(s).tpc
Magic SignatureNone
TypeCompressed Texture Pack
Rust ReferenceView rakata_formats::Tpc in Rustdocs

Data Model Structure

The rakata-formats crate provides a formally mapped Tpc container that completely shields you from managing pixel type bitmasks.

  • Pixel Enum Decoding: Instead of raw integer flag codes, calling known_pixel_format() instantly resolves the byte code into a robust TpcHeaderPixelFormat enumeration (e.g., Dxt1, Dxt5, Rgb, Greyscale).
  • Footer Management: Trailing TXI text is seamlessly maintained, and can be cleanly updated via .set_txi_text_strict().

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for TPC textures mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CAuroraProcessedTexture::ReadProcessedTextureHeader at 0x0070f590.)

Pipeline EventGhidra Provenance & Engine Behavior
Format Byte MappingThe single header format byte acts as a strict bitmask. The engine explicitly checks bit0, bit1, and bit2 to generate internal format codes: 1, 3, and 4.
Compression DispatchThe runtime fundamentally ignores other variants. It strictly requires Code 3 to process 8-byte geometry chunks (standard S3TC DXT1) or Code 4 to process 16-byte chunks (standard S3TC DXT5).
Mipmap CalculationsRather than parsing explicit counts, the engine calculates mipmap storage dimensions by blindly right-shifting the base dimensions for each depth level without natively clamping the integer to 1. Because of this, extremely deep architectural mip levels can produce 0 geometry bytes!
OpenGL Hardware BindingWhen aggressively pushing the TPC bytes into OpenGL video memory, the engine natively maps Code 3 directly to OpenGL constant 0x83F0 (DXT1) and Code 4 straight to 0x83F3 (DXT5). Technically, there is zero branching logic to support native DXT3 (0x83F2) inside the vanilla engine’s parser.

DDS (DirectDraw Surface)

The .dds extension in KOTOR does not represent a standard Microsoft DirectDraw Surface file. Instead, the engine strictly expects a proprietary format consisting of a bespoke 20-byte configuration prefix followed by raw DXT compression blocks. The vanilla parsing logic completely ignores standard 124-byte DDS magic headers.

At a Glance

PropertyValue
Extension(s).dds
Magic SignatureNone (Proprietary 20-Byte Prefix)
TypeBioWare DirectDraw Wrapper
Rust ReferenceView rakata_formats::Dds in Rustdocs

Data Model Structure

rakata-formats is built to natively parse both standard Microsoft DDS architecture and KotOR’s proprietary CResDDS format transparently. When evaluating a .dds file via rakata_formats::Dds:

  • Bilateral Read Path: If the file begins with the standard Microsoft DDS magic bytes, Rakata leverages a standard pipeline to extract the payload. If those magic bytes are missing, Rakata immediately pivots and parses the data natively as a proprietary K1 CResDDS 20-byte payload.
  • Strict Serialization: Regardless of which variation is ingested from the disk, Rakata will strictly emit valid 20-byte KotOR-compliant payloads during binary serialization.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for DDS textures mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CResDDS::GetDDSAttrib at 0x00710ee0.)

Pipeline EventGhidra Provenance & Engine Behavior
Prefix StrippingThe engine’s parser explicitly expects and strips a proprietary 20-byte magic header wrapper prepended to the DDS buffer: width (+0x00), height (+0x04), byte code (+0x08), base-size (+0x0C), and an alpha_mean FLOAT (+0x10).
Block CalculationThe runtime completely mimics the TPC logic for memory block sizing. Fundamentally, the algorithm determines the 3D dimensions via the formula: (pixel_type == 4) * 8 + 8. Code 3 explicitly evaluates into 8-byte texture blocks, while Code 4 evaluates to 16-byte blocks.

Tip

Reserved Gaps: The bytes spanning +0x09 to +0x0B in the header prefix are entirely ignored by the GetDDSAttrib read path. We preserve them strictly for round-trip fidelity.


TGA (Truevision Targa)

TGA is the standard uncompressed image format utilized by the engine, typically reserved for UI elements, icons, or high-fidelity models that demand lossless alpha channels.

At a Glance

PropertyValue
Extension(s).tga
Magic SignatureTruevision Standard
TypeUncompressed RGB/A Raster
Rust ReferenceView rakata_formats::Tga in Rustdocs

Data Model Structure

rakata-formats natively emulates the engine’s parsing logic. When evaluating a .tga file, Rakata ignores non-essential Truevision header flags (such as image_type and id_len) and strictly validates the payload against the engine’s natively supported pixel_depth thresholds.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for TGA textures mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from ImageReadTGAHeader at 0x0045e2e0.)

Pipeline EventGhidra Provenance & Engine Behavior
Header StrippingFunction: ImageReadTGAHeader (0x0045e2e0)
The native engine parser is exceptionally loose. Standard Truevision fields such as image_type (offset +0x02), image_descriptor (offset +0x11 governing the origin bit), and the id_len field are completely ignored and never validated during a read sequence.
Depth ValidationFunction: ImageReadTGAHeader (0x0045e2e0)
The sole structural validation check performed before memory allocation dictates that the pixel_depth must strictly equal 8, 24, or 32. Any other depth integer triggers an immediate process failure.
Write GenerationFunction: ImageWriteTGA
The engine’s in-memory rasterization is strictly top-left, but its canonical on-disk .tga format is entirely bottom-left. When saving screenshot files or extracting buffers to disk, the engine forcefully accommodates this by hardcoding image_type=2, id_len=0, and image_descriptor=0, explicitly triggering an ImageFlipY vertical inversion on the memory payload before pushing the image to disk.

TXI (Texture Extensions)

TXI files (or TPC appended arrays) are highly forgiving plain-text metadata blocks applied adjacent to graphical files to enforce custom mipmap, bumpmap, or animation shaders.

At a Glance

PropertyValue
Extension(s).txi
Magic SignatureNone
TypeASCII Configuration Strings
Rust ReferenceView rakata_formats::Txi in Rustdocs

Data Model Structure

rakata-formats inherently pairs TXI payload access alongside its target texture. When querying the virtual resolver, textures are natively returned as a combined TextureWithTxiResult object. This architecture guarantees that the raw graphic bytes and their exact applied TXI rule block are inextricably tracked as a coupled pair throughout the virtual environment.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for TXI text configurations mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CAurTextureBasic::ParseField at 0x00422390.)

Pipeline EventGhidra Provenance & Engine Behavior
Invalid CommandsFunction: CAurTextureBasic::ParseField (0x00422390)
Unknown or unsupported TXI commands are safely bypassed. If the parsed string evaluation fails to match an explicit configuration branch, the subroutine immediately exits without throwing any logger alarms or terminating texture load.
Case AgnosticismFunction: CAurTextureBasic::ParseField (0x00422390)
Field matching acts strictly case-insensitive (e.g. cMgTxi == cmgtxi).
Line NormalizationFunction: CAurTextureBasic::ParseField (0x00422390)
The native internal engine scanner searches exclusively for LF (\n) bounds. However, if the read targets an active disk file, the underlying standard C fgets call automatically handles CRLF normalization before handing strings to the regex evaluator.
Boolean ParsingFunction: Parse_bool (0x00463680)
The native Parse_bool validation explicitly performs lowercase scans evaluating against exact variants of "true", "false", "1", or "0".

Note

Boolean Parsing Nuance Modding documentation often warns against specific formats or keywords (like decal). Decompilation reveals the universal behavior applied to all boolean flags:

  • Missing Space: Keys merged with their arguments (e.g. "decal1", "mipmap0") silently abort. The firstword() extractor pulls the merged string, completely failing the target evaluation list.
  • Separated Numbers: Space-separated numbers (e.g. "decal 1") are completely structurally valid. firstword() pulls "decal" and hands " 1" off to Parse_bool(). An sscanf strips the whitespace and evaluates "1" to true.
  • Argument-less Flags: Passing just a flag ("decal") triggers the branch, but Parse_bool physically finds no argument. It fails to match "true", "false", "1", or "0", silently safely leaving the boolean integer unchanged from its previous memory allocation.

Text & Data Formats

KOTOR heavily relies on structured text and data layouts to manage everything from stat numbers to map meshes. Engine-native evidence for these varied structures (2DA, TLK, VIS, LYT, LTR) is documented below.


Implementation Blueprints

SpecificationCore Focus
2DA (2D Array)Binary/text relational database format managing core engine rules, constants, and stats.
TLK (Talk Table)Centralized localized string dictionary managing all in-game dialogue and UI text.
VIS (Visibility Graph)Binary topology mapping the rendering culling relationships between area geometry rooms.
LYT (Layout File)ASCII configuration defining spatial positioning and linking of a module’s room geometry.
LTR (Letter Frequency)Character-frequency matrices supporting the in-game random name generator algorithms.

2DA (2D Array)

2DAs are data tables defining the engine’s core rules and constraints (such as item costs and Force powers, which the engine internally stores as spells.2da). They bridge the gap between human-readable text for modding and fast-loading binaries for the final game.

At a Glance

PropertyValue
Extension(s).2da
Magic Signature2DA / V2.b (Binary) or V2.0 (Text)
TypeTabular Data
Rust ReferenceView rakata_formats::TwoDa in Rustdocs

Data Model Structure

The rakata-formats crate parses 2DAs so that binary and text formats look identical to the rest of the application. The TwoDa container lets developers simply retrieve cells using twoda.cell(row, "Label"), completely hiding the inner offset calculations and padding differences between text and binary structures.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for 2D Arrays mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from C2DA::Load2DArray at 0x004143b0.)

Pipeline EventGhidra Provenance & Engine Behavior
Magic/Version GateThe engine first checks for the "2DA " signature. It then branches down a binary parsing path for "V2.b" or a text parsing path for "V2.0". Any other version string triggers an instant load failure.
Binary Load (V2.b)The parser starts with an 8-byte skip into the file (data_ptr = raw_data_ptr + 8), jumping right past the header to the starting newline character. Column headers are a tab-separated, null-terminated block. The cell offsets are then parsed as an array of u16 integers (rows × cols) in row-major order.
Text Load (V2.0)The text parser strips whitespace and newlines, specifically hunting for "DEFAULT:" or "DEFAULT" blocks. When parsing individual cells, the literal text "****" is converted into an empty string "" to signal the fallback rule. Finally, it runs _strlwr on all column headers to immediately convert them to lowercase.

Tip

Orphaned Size Field: In binary row blocks, the 2-byte cell_data_size u16 is completely bypassed. The engine skips it with +2 and performs no reading or validation.


TLK (Talk Table)

The Talk Table is a massive localized string repository. Every item description, line of dialogue, and UI text in KOTOR references an index (a StrRef) pointing into this master dictionary file.

At a Glance

PropertyValue
Extension(s).tlk
Magic SignatureTLK / V3.0
TypeLocalized String Bundle
Rust ReferenceView rakata_formats::Tlk in Rustdocs

Data Model Structure

The entire Talk Table format maps to the rakata_formats::Tlk struct. Each entry fuses the separated audio and text flags into a single TlkEntry. The struct safely handles missing text flags natively, preventing out-of-bounds string lookups if an entry contains audio parameters but no valid string text offset.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for Talk Tables mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CTlkFile::ReadHeader at 0x0041d890 and CTlkFile::AddFile.)

Pipeline EventGhidra Provenance & Engine Behavior
Magic CheckFunction: CTlkFile::ReadHeader (0x0041d890)
The parser requires a "TLK " signature. However, strict version validation is entirely absent. The engine accepts essentially any version tag without raising a failure.
Size DispatchingFunction: CTlkFile::ReadHeader (0x0041d890)
While the version isn’t used for rejection, it dynamically determines memory block sizing. A "V3.0" tag dictates 40 bytes (0x28) per entry, whereas any other version tag automatically falls back to 36 bytes (0x24).
Feminine DialectsFunction: CTlkFile::AddFile
When mounting the primary archive, the engine systematically queries the directory for a secondary <basename>F.tlk (e.g., dialogF.tlk) specifically to supply overriding feminine vocabulary strings for character-gendered text queries.

VIS (Visibility Graph)

VIS is an ASCII graph structure used extensively by the rendering engine to calculate occlusion culling. It plots mathematical relationships defining which room meshes are visible from any given observer room.

At a Glance

PropertyValue
Extension(s).vis
Magic SignatureNone
TypeRoom Graph
Rust ReferenceView rakata_formats::Vis in Rustdocs

Data Model Structure

The rakata-formats crate parses raw VIS text blocks into a strongly typed Vis structure. Rather than storing flat arrays of strings, Vis models room visibility as an adjacency list using BTreeMap<String, BTreeSet<String>>. This structural choice guarantees deterministic lookups while automatically mimicking the engine’s internal deduplication algorithms.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for Visibility graphs mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from Scene::LoadVisibility at 0x004568d0.)

Pipeline EventGhidra Provenance & Engine Behavior
Text LoadingFunction: Scene::LoadVisibility (0x004568d0)
The .vis file is executed purely as raw text. The engine continuously extracts observer and child string pairs by looping AurResGetNextLine() over the file buffer.
Silent ForgivenessFunction: Scene::LoadVisibility (0x004568d0)
If the parser extracts a room reference (either observer or child) that does not exist in the active area layout (which it verifies via a FindRoom call), the visibility entry is quietly dropped without crashing or generating logs.
Bidirectional ApplicationFunction: Scene::SetVisibility
Calling SetVisibility(room_a, room_b, 1) inherently maps both visualization paths. The function inserts room_b into room_a’s visibility list, and immediately mirrors by adding room_a to room_b’s list while executing native deduplication.
Write GenerationFunction: Scene::SaveVisibility
When generating a .vis file natively, the engine relies on an _sscanf block structure mapping to "%s%d" and uniformly pads a dual-space indent onto all child elements beneath observer headers.

LYT (Layout File)

LYT files are ASCII configuration arrays that define the spatial 3D placement and orientation of independent room models to construct a complete area map.

At a Glance

PropertyValue
Extension(s).lyt
Magic SignatureNone
TypePlain Text Layout
Rust ReferenceView rakata_formats::Lyt in Rustdocs

Data Model Structure

The rakata-formats crate parses LYT files into the strongly-typed Lyt container. The parser segregates the raw nested lines into distinct rooms, tracks, obstacles, and doorhooks collections, natively mapping coordinate strings into engine-standard Vec3 and Quaternion structs for immediate mathematical interoperability.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for Layout configurations mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CLYT::LoadLayout at 0x005de900.)

Pipeline EventGhidra Provenance & Engine Behavior
Newline BoundsThe parser heavily expects explicit \r\n (CRLF) endings. Scanning extracts target strings utilizing _sscanf("%[^\r\n]", ...) patterns and frequently relies on blind +2 byte pointer leaps to manually clear the terminators.
Preamble SkippingAll file lines existing prior to the beginlayout execution marker (such as the ubiquitous #MAXLAYOUT ASCII header) are deliberately skipped and ignored.
Sequential ParsingThe structure mandates a rigid sequential ingestion. Data collections must explicitly appear geographically in the exact order: roomcounttrackcountobstaclecountdoorhookcountdonelayout.

Warning

Boundary Oversight While the engine systematically verifies donelayout boundaries separating the primary collections, the underlying parse loop functionally neglects to verify the final donelayout signature upon closing the doorhooks segment.


LTR (Letter Frequency)

LTR files contain matrices defining the probabilistic sequence groupings of letters used by the engine’s random name generator.

At a Glance

PropertyValue
Extension(s).ltr
Magic SignatureLTR / V1.0
TypeNaming State Matrix
Rust ReferenceView rakata_formats::Ltr in Rustdocs

Data Model Structure

The rakata-formats crate maps character frequency architectures directly into the strongly-typed Ltr container, safely abstracting away the fallible raw string-parsing logic for downstream implementations.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for Letter Frequency structures mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CResLTR::OnResourceServiced at 0x00712410.)

Pipeline EventGhidra Provenance & Engine Behavior
Magic ValidationThe native parser enforces a mandatory "LTR " signature and strictly validates the "V1.0" format tag. These parameters collectively structure a rigid 9-byte header block. The sequence natively defines the letter_count variable as a single byte resting exactly at offset +0x08.
Contiguous IngestionMemory buffer extraction initiates immediately at offset +0x09. The parser algorithm sequentially extracts natively chained string arrays grouping start, middle, and end blocks to map against procedural probability matrices.
Payload Bounds CheckUpon closing the read operations, the memory allocator immediately verifies a structural bounding condition asserting that the terminal parsing offset explicitly matches the buffer array’s total byte allocation length.

Audio Formats

KOTOR handles audio via specialized implementations of the Miles Sound System, utilizing specific prefix wrappers for streaming dialogue, sound effects, and lip-syncing animations.


Implementation Blueprints

SpecificationCore Focus
WAV (Waveform Audio)Modified audio streams typically utilizing a proprietary Miles Sound System prefix wrapper.
LIP (Lip Synching)Timed phonetic animation sequence data mapped explicitly to character speech tracks.
SSF (Sound Set File)Mapping configuration assigning specific audio events to standard creature interaction triggers (e.g., attacking or dying).

WAV (Waveform Audio)

While standard RIFF WAV files are supported, KOTOR utilizes a multi-tiered routing structure to evaluate audio buffers dynamically based on whether the file encapsulates voice-overs (VO), ambient sound effects (SFX), or unmodified bytes.

At a Glance

PropertyValue
Extension(s).wav
Magic SignatureRIFF
TypeStreamed / Buffered Audio
Rust ReferenceView rakata_formats::Wav in Rustdocs

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for audio formats mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoSoundInternal::LoadSoundProvider at 0x005d9140.)

Pipeline EventGhidra Provenance & Engine Behavior
Standard Audio (WAV)If the payload begins with the exact "RIFF" 4-byte signature and evaluates dynamically as a non-MP3 track, the parser initiates at offset 0 and transmits the contiguous buffer to the Miles Sound System without execution modification.
Ambient Audio (SFX)When evaluated as an SFX structure, the "RIFF" signature is deliberately absent from offset 0. The engine interprets a custom proprietary configuration prefix that displaces the standard "RIFF" block exactly 470 bytes into the payload buffer (+0x01d6). The execution structure calculates size = file_size - 0x1d6 and strictly extracts the resulting sub-segment.
Voice Audio (VO)For streaming voice-over tracks, the .wav wrapper successfully begins with a "RIFF" tag. However, structural logic asserting riff_size + 8 < file_size effectively succeeds. The memory engine immediately seeks to byte offset riff_size + 8 and subsequently pipes the remaining data exclusively as a literal .mp3 stream.
Delegation Hand-offThe main executable natively acts as a dispatch router, executing almost zero internal chunk structural parsing routines. Total specialization for deep RIFF chunk deserialization is deferred unconditionally to the external Miles Sound System layer.

LIP (Lip Synching)

LIP files provide keyframed facial morph data directly bound to audio streams, instructing character models how to physically animate their mouths to match speech.

At a Glance

PropertyValue
Extension(s).lip
Magic SignatureLIP V1.0
TypeFacial Animation Keyframes
Rust ReferenceView rakata_formats::Lip in Rustdocs

Data Model Structure

The rakata-formats crate maps LIP binaries into the Lip structure. It extracts the raw 5-byte sequential keyframe array and cleanly projects it into a format that pairs each chronological float timestamp directly with its localized mouth shape.

Structural Layout

OffsetTypeDescription
0x00CHAR[8]Signature (LIP V1.0)
0x08FLOATAnimation Length
0x0CDWORDEntry Count
0x10Struct[]Keyframe Array (5 bytes per entry)

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for Lip Synching animations mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CLIP::LoadLip at 0x0070c590.)

Pipeline EventGhidra Provenance & Engine Behavior
Zero-Copy LoadingThe engine handles LIP files as completely flat structures. Instead of parsing the variables out individually, it simply verifies the "LIP V1.0" signature and pulls the animation length and entry count directly from offsets +0x08 and +0x0C.
Direct Array AssignmentThe keyframes are packed into identical 5-byte chunks (a 4-byte float for the timestamp, and a 1-byte integer determining the mouth shape). Because of this flat layout, the engine never loops through the data to read it. It simply points its internal animations memory pointer perfectly to file offset +0x10 and natively runs the animation straight off the raw file buffer.

SSF (Sound Set File)

Sound sets map specific generic triggers (e.g. “Battle Cry”, “Agony”, “Selected”) to physical sound references by mapping enum hooks to strings.

At a Glance

PropertyValue
Extension(s).ssf
Magic SignatureNone
TypeEnum-String Mapping
Rust ReferenceView rakata_formats::Ssf in Rustdocs

Data Model Structure

The rakata-formats crate maps SSF files into the Ssf structure. It parses the raw table offset and builds a collection of 28 nullable sound reference integers mapped directly back to their standard gameplay triggers.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for Sound Set mappings mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSoundSet::GetStrres at 0x00678820.)

Pipeline EventGhidra Provenance & Engine Behavior
Finding the TableThe parser reads a single 4-byte integer (DWORD) at offset +0x08. This number acts as a direct distance pointer, telling the game explicitly where the audio mapping table begins inside the file payload.
Reading the SlotsStarting directly at that pointer, the engine grabs exactly 28 continuous integers. Each position in this span represents a hardcoded character action (e.g. slot 1 is always ‘Battle Cry’, slot 2 is always ‘Agony’).
Handling BlanksObviously, not all characters have recorded audio for every obscure trigger. If a sound slot is supposed to be empty, it utilizes the default sentinel value 0xFFFFFFFF (-1) to let the engine know to skip playback.

Note

1-Indexed Triggers When modders fire off audio events using gameplay scripts, the event identifiers are natively 1-indexed (1 to 28). To find the matching audio string underneath, the engine simply subtracts 1 behind the scenes to correctly navigate the literal 0-indexed array in memory.

Resource System & Resolution

The Odyssey Engine’s resource resolution dictates exactly how the game searches for files when it needs to render a texture, load a module, or mount a script – including the exact precedence logic when multiple mods attempt to overwrite the same asset.


TXI Sidecar Lookup

Texture Extensions (TXIs) are independent ascii text configurations used to override material instructions for specific graphics.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for TXI sidecar files mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from AurResGet at 0x0044c740.)

Pipeline EventGhidra Provenance & Engine Behavior
Global CallbackWhen the game needs a TXI file, it always routes through a global helper calling AurResGet(name, ".txi", ..., true). Three different rendering systems use this exact same path to hunt for TXIs: CAurTextureBasic::Init, Gob::EnableRenderBumpedOut, and Material::Init.
Total IndependenceBecause AurResGet only checks the raw filename and the .txi extension, it performs a totally fresh, global search through the game’s file systems. It does not know or care where the parent texture actually came from (like a specific BIF archive).

Note

Because it is entirely independent from the parent texture handle, swkotor.exe supports pulling a TXI from the /override folder even if the parent texture was sourced natively from a KEY/BIF package. Rakata maintains this independent sidecar lookup model natively via the rakata_extract::resolver::TextureWithTxiResult logic to guarantee resolver parity.


Key/BIF Resolution Mapping

The engine has a strict hierarchical override order when hunting for identical overlapping resource identifiers across multiple virtual disk mounts.

Engine Audits & Decompilation

The following documents the engine’s exact resource directory search order mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoKeyTable::FindKey at 0x0040ec50.)

Pipeline EventGhidra Provenance & Engine Behavior
First-Match ExitWhen hunting for a file, the key table loops through standard folders in a hardcoded order. The second it finds a matching file name, FindKey returns success and completely ignores any duplicates hiding deeper in other archives.
Duplicate CheckingDuring startup, the engine’s AddKey function actually scans for duplicates. If it finds one, it ignores it, permanently locking in the file that had the higher resolution priority.

Tip

Resolution Priority:
resource_directory (Override folder) → ERF (Pass 1) → RIMERF (Pass 2) → Fixed / Archive


Module Loading Priorities

Modules orchestrate KOTOR’s area hubs. They are layered collections of ERF/RIM files functioning as a localized state.

Composition Loading Precedence

Because KOTOR modules are often fragmented into multiple discrete archive files (e.g., separating rigid layouts from variable area dialog), it uses the following concrete precedence when constructing a single “virtual” module (the order below lists the highest priority target first).

1. <root>_dlg.erf (K2 Dialog overrides)
2. <root>_s.rim (Supplemental properties)
3. <root>_a.rim (Base Area) if present, ELSE <root>_adx.rim (Extended Area) if present, ELSE <root>.rim (Main/Vanilla)
4. <root>.mod (Single-file Mod archive)

Tip

Rust Integration The rakata-extract crate natively replicates this exact priority order through the CompositeModule struct. When you pass a directory path to CompositeModule::load_from_directory, it automatically scans the folder and merges the _dlg, _s, _a / _adx, and base .mod files together using the engine’s strict precedence hierarchy.

Engine Audits & Decompilation

The following documents the engine’s exact load sequence for module assemblies mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoResMan::AsyncLoad at 0x004094a0.)

Pipeline EventGhidra Provenance & Engine Behavior
Primary MOD SearchThe game natively attempts to load the highest-level package by explicitly targeting the MODULES:<root>.mod path first.
RIM Fallback ChainsIf the .mod file doesn’t exist, the system catches the failure and immediately shifts to look for the <root>_s.rim fallback.
Area Extension ProbesThroughout the module loading process, the engine actively probes for the <root>_a.rim and <root>_adx.rim extension files to violently merge in the physical area geometry.

Save Game

While KotOR Save Games (.sav) are structurally just ERF containers under the hood, the engine employs complex party-synchronization and module loading logic to physically reconstruct the player’s session.


Data Model Structure

A standard Save ERF container packages a specific set of internal GUI and logic files that the game actively requires to reconstruct a valid player state.

savenfo.res

The overarching save metadata block, primarily responsible for the main menu UI.

  • save_name: Display string for the save file.
  • pc_name: (Optional in K1) Player character name.
  • area_name: Localized display name for the area.
  • last_module: The resref of the specific module being loaded.
  • time_played: Running game time.
  • cheat_used: Global boolean flag to mark corrupted/cheat sessions.

globalvars.res

The universal state trackers running the campaign plot.

  • Segmented explicitly into numbers and booleans.
  • Each global uses a strict Symbol Name.

partytable.res

The live snapshot of the physical adventuring group and global resources:

  • Shared credits and shared party_xp.
  • cheat_used: Independent table-specific cheat flag.
  • members: Fixed list of party members tracking who is currently active and who is is_leader.
  • journal_entries: Currently active quest plot_ids and their numeric stages.

Additional Constituents

  • Character List: Populated using a mix of sources (leader, pc, and availnpc* resources).
  • Inventory: Represents all items the player carries (inventory.res), tracking stack sizes, charges, and upgrade bitfields.
  • Doors: Transient state (locked/open attributes) extracted directly from the module’s GIT.

Tip

Rust Integration The rakata-save crate handles this structure natively via the SaveEditorModel struct. You can use it to directly parse, validate, and write back these internal save components without manually managing the ERF layer.


Engine Audits & Decompilation

The following documents the engine’s exact state restoration logic mapped from swkotor.exe.

(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWPartyTable::SaveTableInfo at 0x005648c0.)

Pipeline EventGhidra Provenance & Engine Behavior
Cheat SynchronizationThe CHEATUSED flag in the savenfo header file isn’t tracked in isolation. When saving the game, the engine simply copies the raw PT_CHEAT_USED numeric flag directly from the party table to keep the UI in sync.
Character Loading HierarchyWhen the engine pushes a character into the game during a standard load (via LoadCharacterFromIFO), it defaults to pulling character data strictly from the active module’s IFO roster matrix. It only falls back to reading the save’s dedicated .pifo (Party Info) file if the target parameter index explicitly equals 0xffffffff.
Area RestorationTo rebuild the dynamic state of the room you were standing in (like which doors are open or locked), the engine restores the area’s Game Instance data by targeting the GIT resource type (0x7E7) and matching it against the module’s core resref string.

Warning

There is zero K1 runtime string evidence for K2 (The Sith Lords) crafting or influence fields (PT_ITEM_COMPONEN, PT_INFLUENCE, UpgradeSlot*). If constructing K1-native party utilities, those fields must be aggressively excluded!

Engine Internals

This section contains notes and breakdowns of the Odyssey engine’s execution pipelines, case studies on community tooling bugs, and other engine-level logic or behaviors that are discovered during clean-room reverse engineering. These notes partially serve as the foundational research powering rakata-lint.

Research Notes

TopicDescription
MDL & MDX Deep DiveDeep dive into the Ghidra decompilation notes detailing the exact byte-level layout of the binary MDL/MDX format and the engine loading pipeline.
GFF List CorruptionCase study analyzing out-of-bounds GFF list behavior in the Odyssey engine vs. loose community tooling abstractions.

GFF List Index Corruption

Summary

A binary GFF writer can silently corrupt list mapping if it writes list index entries in a way that allows recursive nested-list writes to interleave with the parent list’s index block.

This is a compatibility-critical issue for KOTOR data because many resources depend on stable list ordering and correct struct index mapping.

How GFF Lists Work

In binary GFF, a List field stores:

  1. A relative offset into the list_indices table.
  2. At that offset:
    1. count (u32)
    2. count struct indices (u32 each), each pointing into the struct table.

If these indices are wrong, the parser will load the wrong list structs.

Failure Mode

The bug class occurs when a writer:

  1. Starts writing a parent list.
  2. Recursively builds child structs.
  3. Appends list indices directly while recursion is still producing nested list index data.

Because nested lists also write into the same list_indices buffer, parent and child index blocks can interleave and the parent list can point at unintended structs.

Observable Symptoms

  • Struct IDs in list entries change after roundtrip.
  • Expected fields are missing from entries after roundtrip.
  • Mod compatibility breaks for list-heavy resources due to reordered/remapped entries.

Correct Writer Strategy

For each list field:

  1. Write list count.
  2. Reserve contiguous slots for all struct indices up front.
  3. Build each child struct recursively.
  4. Backfill each reserved slot with the final struct index.

This guarantees parent list index layout is stable even when nested lists write their own index blocks.

Implementation Status

In this repository:

  • rakata-formats/src/gff/writer.rs reserves list index slots and backfills them.
  • Regression tests cover:
    • synthetic list order + struct-id stability
    • UTC fixture roundtrip stability on lists like FeatList, ItemList, ClassList.
  • rakata-generics/src/utc.rs includes a no-op rebuild test to ensure typed conversion does not drift list order/IDs.

The MDL/MDX Format

BioWare’s Aurora/Odyssey engine stores 3D models in a pair of files: .mdl and .mdx. This page documents what’s inside them, how the engine consumes them, and – occasionally – why they look the way they do. Evidence throughout is drawn from Ghidra decompilation of swkotor.exe (K1 GOG build), cross-checked against hex dumps of vanilla assets and community references (kotorblender, mdledit, mdlops, pykotor, reone, xoreos).

Overview

At a glance:

PropertyValue
Extensions.mdl, .mdx
MagicBinary: first u32 == 0. ASCII: text (filedependancy, newmodel, …)
TypeHierarchical scene graph + animation + vertex data
Resource type ID2002 (MDL), 3008 (MDX) in KEY/BIF
Rust referencerakata_formats::Mdl

A model is a tree of nodes. Each node carries a transform (position + orientation), an animation track (“controllers”), and – depending on its type – geometry, light parameters, particle-emitter configuration, a skinning skeleton, a lightsaber blade, and so on. One MDL file can carry multiple named animations that operate on that tree.

The surprising shape of the format only makes sense once you understand one design choice, so let’s start there.

The core idea: load-and-fixup

The binary MDL is not a parsed format in the usual sense. The engine does not walk a byte stream field by field, calling read_u32, read_string, read_float. Instead, it does this:

  1. Allocate a buffer exactly the size of the model data.
  2. Copy the whole file into that buffer in one memcpy.
  3. Walk the now-in-memory structure and convert relative offsets into absolute pointers.

That’s it. The “parser” is a pointer rewriter. Every Reset* function you’ll see in the engine (InputBinary::Reset, ResetMdlNode, ResetTriMeshParts, …) takes a buffer base pointer and a struct pointer, and its job is essentially struct->field += base for every relocatable pointer in the struct, recursing into children as it goes.

An analogy: think of IKEA instructions that say “screw part A into the hole next to part B” rather than giving exact millimetre coordinates. The instructions are valid anywhere you choose to assemble the furniture. The MDL blob is identical: every pointer is expressed relative to the blob’s origin, so the engine can drop the blob anywhere in memory and then do a one-time pass to convert those relative offsets to real addresses.

This design choice ripples through everything:

  • On-disk layout matches in-memory layout exactly. If a MdlNodeTriMesh is 412 bytes in RAM, it’s 412 bytes on disk. Struct field offsets you see in a Ghidra decompilation are the file offsets.
  • Binary files are architecture-bound. This format is a snapshot of a specific compiler’s struct layout on 32-bit Windows. Field alignment, pointer size (4 bytes), endianness (little), and even padding bytes all match that ABI.
  • “Parsing” is really validation + relocation. A Rust reader doesn’t need to convert a byte stream into a Rust struct; it needs to interpret a memory image as a struct overlay, following pointers to walk the tree.
  • The engine never writes binary MDL. The shipping engine only has code to emit ASCII MDL. Binary MDL is produced exclusively by BioWare’s model compiler (a build-time tool). The runtime reads it but never round-trips it.

With that frame in place, the rest of the format falls into shape.

File structure

The 12-byte wrapper

The file begins with a tiny header:

OffsetTypeFieldNotes
+0x00u32zero markerAlways 0. Used to tell binary from ASCII.
+0x04u32MDL content sizeBytes of model data that follow.
+0x08u32MDX file sizeSize of the accompanying .mdx file.

Input::Read at 0x004a14b0 is the dispatcher: it peeks at the first byte, and if it’s \0 the file is binary (the first u32 is always zero). Otherwise the file starts with ASCII tokens like filedependancy or newmodel, and processing hands off to a line-based interpreter.

For binary files, InputBinary::Read at 0x004a1260 does the rest:

  1. Record mdl_content_size and mdx_file_size from the wrapper.
  2. Allocate a heap buffer the size of the MDL content; memcpy the model data into it.
  3. If MDX size is non-zero, allocate a second buffer and memcpy the MDX file into it.
  4. Call Reset(mdl_buf, mdx_buf, resource_handle).

Note: the wrapper is not part of the model data. Byte 12 of the on-disk file is byte 0 of the in-memory MDL blob. All internal offsets are relative to the in-memory origin.

Three kinds of pointer

Inside the MDL blob you’ll encounter three distinct flavours of “pointer”, which is worth keeping straight:

  1. MDL-relative offsets – the vast majority. Relocated to absolute pointers by Reset* functions. On re-serialization, they must be rewritten back to relative offsets.
  2. MDX-file byte offsets – used by a few fields (e.g. per-mesh mdx_data_offset at +0x144) to locate vertex data in the separate MDX file.
  3. String pointers – themselves MDL-relative, but pointing into a string table at the end of the blob, pointed to by the name-offsets array at model +0xB8.

Confusingly, there are two similarly named fields on each mesh node: mdx_data_offset at +0x144 (an MDX file offset) and vert_array_offset at +0x148 (a content-relative pointer to embedded position data). Conflating these produced one of the nastier bugs in our reader’s history (see War stories below).

Model header

Once the blob is in memory, InputBinary::Reset at 0x004a1030 walks the model header. Here’s the relevant field map:

OffsetFieldNotes
+0x00ModelDestructor vptrPopulated at load time.
+0x04ModelParseField vptrPopulated at load time.
+0x28root node offsetRelocated. ResetMdlNode recurses from here.
+0x48resource handlePopulated at load time.
+0x4Ctype byte`GetType()
+0x50classification0=Other, 1=Effect, 2=Tile, 4=Character, 8=Door.
+0x54ref count
+0x58animations array ptrRelocated; count at +0x5C.
+0x64supermodel pointerPopulated via FindModel(buf+0x88).
+0x68..+0x80bbox min/maxVector bmin, bmax.
+0x80radiusf32, default 7.0.
+0x84animation scalef32, default 1.0. ASCII: setanimationscale.
+0x88supermodel namechar[36], null-terminated. Drives recursive model load.
+0xA8node array (secondary)Relocated if non-zero.
+0xACMDX vertex pool offsetSource offset into MDX data (consumed into a GL pool).
+0xB0MDX data sizeSize of the vertex-pool copy.
+0xB8name offsets array ptrRelocated; count at +0xBC. Array entries also relocated.

Two fields deserve special mention:

  • +0x50 classification is the model’s high-level category (Character, Door, Tile, …). It’s never read during the Reset pass – it’s carried through as part of the memory-mapped blob and consulted at runtime. Cross-validated against hex dumps:

    File+0x50Category
    c_dewback.mdl0x04Character ✓
    dor_lhr01.mdl0x08Door ✓
    m01aa_01a.mdl0x00Other ✓
  • +0x88 supermodel name is a 32-byte (plus 4 padding) ASCII name. Loading a model with a supermodel triggers a recursive FindModel call for that name – think of supermodels as CSS-style inheritance, where animation data and bones defined on the parent are available to the child.

The node tree

The root node sits at model +0x28. From there, children are reached through a standard in-memory array layout: ptr + count_used + count_allocated at offsets +0x2C, +0x30, +0x34. This three-u32 pattern is BioWare’s CExoArrayList and shows up everywhere in the format – any time you see “12 bytes of array header”, this is what it is.

Base node layout (80 bytes)

All node types begin with the same 80-byte header:

OffsetSizeFieldNotes
+0x00u16node_typeFlag bitmask. Drives type dispatch.
+0x02u16node_idSequential 0..N-1.
+0x04u16node_id_dupIdentical copy of node_id. Never read.
+0x06u16paddingAlways zero.
+0x08u32name pointerRelocated. Points into the string table.
+0x0Cu32parent pointerRelocated if non-zero.
+0x1012positionVector{x, y, z} as 3×f32.
+0x1C16orientationQuaternion{w, x, y, z} as 4×f32.
+0x2C12children arrayCExoArrayList of MdlNode*.
+0x3812controller keys arrayCExoArrayList of NewController (16B each).
+0x4412controller data arrayCExoArrayList of float (packed key data).

The two bytes at +0x04 are a redundant duplicate of node_id – always identical to +0x02 across 209 nodes verified across four vanilla files, zero mismatches. No known engine function reads it. Best guess: legacy field or exporter artifact. It’s preserved for round-trip fidelity but has no semantic meaning.

A few conventions worth noting:

  • Quaternion order is (w, x, y, z). Confirmed via Gob::GetOrientation at 0x004499a0 which copies fields in that order. Identity quaternion is [1.0, 0.0, 0.0, 0.0]. The Rust API uses the same convention.
  • Position and orientation are read directly from the blob. They’re not relocated – they’re inline values, not pointers.
  • Only two fields need relocation in the base header: name pointer at +0x08 and parent pointer at +0x0C.

InputBinary::ResetMdlNodeParts at 0x004a0b60 handles the base relocations and then recurses: for each entry in the children array, relocate the child pointer and call ResetMdlNode on it.

Type dispatch

InputBinary::ResetMdlNode at 0x004a0900 reads the node_type field and dispatches:

node_typeHandlerKind
0x0001ResetMdlNodeParts onlyDummy / base
0x0003ResetLightLight
0x0005ResetMdlNodeParts onlyEmitter
0x0009ResetMdlNodeParts onlyCamera
0x0011ResetMdlNodeParts onlyReference
0x0021ResetTriMeshResetTriMeshPartsTriMesh
0x0061ResetSkinSkin mesh
0x00A1ResetAnimAnimMesh
0x0121ResetDanglyDangly mesh (cloth)
0x0221ResetAABBTree + ResetTriMeshPartsWalkmesh with AABB
0x0401(no-op)Trigger / unused
0x0821ResetLightsaberSaber mesh

The type values are stored as a lookup table in the executable at 0x00740a18 (12 × u32).

Though the type codes are shaped like a bitmask – HEADER=0x01, LIGHT=0x02|HEADER, EMITTER=0x04|HEADER, TRIMESH=0x20|HEADER, SKIN=0x40|TRIMESH, SABER=0x800|TRIMESH, and so on – the dispatch is an exact value match, not individual bit checks. The bitmask structure is meaningful (skin is a superset of trimesh, for instance), it’s just not how the engine branches.

Size summary

Every node type has a known fixed size, both on disk and in memory:

FlagTypeTotalBaseExtraExtends
0x0001Base80800
0x0003Light1728092MdlNode
0x0005Emitter30480224MdlNode
0x0009Camera80800MdlNode
0x0011Reference1168036MdlNode
0x0021TriMesh41280332MdlNode
0x0061Skin512412100TriMesh
0x00A1AnimMesh46841256TriMesh
0x0121Dangly44041228TriMesh
0x0221AABB4164124TriMesh
0x0401Trigger80800MdlNode
0x0821Saber43241220TriMesh

Verified via ParseNode’s operator_new(size) calls and Ghidra struct definitions. All mesh subtypes extend MdlNodeTriMesh – their extra data begins at node offset +0x19C, immediately after the TriMesh block.

Node types in depth

The lightweight types

Camera (0x009) has no extra data. Same 80-byte footprint as the base node. ResetMdlNode dispatches to ResetMdlNodeParts only. There are no camera-specific ASCII fields either – the ASCII parser also falls through to the base handler.

Reference (0x011) carries just two fields in 36 extra bytes: a 32-byte ref_model name and a 4-byte reattachable flag. Both inline (no pointers to relocate).

Trigger (0x401) – the decompiled ResetMdlNode explicitly returns void without calling any reset function for this type. In practice it appears to be unused in shipping content.

Light (0x003)

Lights carry 92 bytes of extra data. Most of the scalar fields are straightforward (priority, shadow flag, ambient-only flag, flare radius, etc.), but lights are the most complex non-mesh type because of their array fields:

Extra offsetFieldLayoutRuntime relocation
+0x04texture SafePointers12-byte array headerZeroed on disk
+0x10flaresizesCExoArrayListptr relocated
+0x1CflarepositionsCExoArrayListptr relocated
+0x28flarecolorshiftsCExoArrayListptr relocated
+0x34texturenamesCExoArrayList<char*> (each ptr too!)all ptrs relocated

Lights also drive their colour, radius, shadow radius, vertical displacement, and multiplier via controllers (types 0x4C, 0x58, 0x60, 0x64, 0x8C) – these live in the base node’s controller arrays, not in the light-specific block.

Emitter (0x005)

Emitters are 304 bytes and – pleasantly – contain no relocatable pointers. Everything is inline: a fistful of floats and ints, four 32-byte name fields (update, render, blend, texture), and a 16-byte chunk_name. The full field map is in the appendix.

The most important field is update at extra offset +0x20. It’s the emitter type string, a case-sensitive selector against:

  • "Fountain" → steady particle stream (most common)
  • "Explosion" → one-shot burst
  • "Single" → single particle
  • "Lightning" → lightning-bolt effect

MdlNodeEmitter::InternalCreateInstance at 0x0049d5c0 branches on this string to instantiate the appropriate runtime emitter class.

Known engine-level footgun: controller 502 (detonate) is only valid on "Explosion" emitters. InternalCreateInstance only allocates the detonation memory for that branch, so a detonate controller on a "Fountain" emitter reads unallocated memory at runtime and crashes. This is a known flaw in mdlops-based exporters (KotorMax); rakata-lint will validate this.

TriMesh (0x021)

This is the big one. 332 bytes of extra data, encoding everything you’d expect in a mesh plus many things you wouldn’t.

Inline fields

At a high level:

  • Runtime function pointers (+0x00, +0x04): written by the constructor. Zero on disk; never consumed from a file.
  • Faces array (+0x08): CExoArrayList of MaxFace (32 bytes each). See Face layout below.
  • Bounding volumes (+0x14..+0x38): bbox min, bbox max, bounding sphere (radius + centre xyz). The sphere is the one actually consumed at runtime – PartTriMesh::GetMinimumSphere hierarchically unions it with children’s spheres for culling. These sphere fields have no ASCII-parser equivalent; they’re exclusively binary-format fields written by the BioWare toolset.
  • Material (+0x3C..+0x54): diffuse RGB, ambient RGB, transparencyhint.
  • Textures (+0x58..+0x98): texture_0 (primary/diffuse) and texture_1 (secondary/lightmap), each a 32-byte null-terminated string, plus 32 bytes of padding up to +0xE8.
  • UV animation (+0xEC..+0xF8): uv_direction_x, uv_direction_y, uv_jitter, uv_jitter_speed. Gated by animate_uv (+0xE8).
  • MDX vertex layout (+0x100..+0x12F): flags bitmask plus 11 per-attribute byte offsets. Described in the next subsection.
  • Counts and flags (+0x130..+0x13B): vertex_count (u16), texture_channel_count (u16), six 1-byte booleans (light_mapped, rotate_texture, is_background_geometry, shadow, beaming, render).
  • Tail (+0x13C..+0x14B): total_surface_area, one unresolved reserved slot, mdx_data_offset, vertex_data_ptr.

Out of 332 bytes, 61 fields are fully confirmed through Ghidra cross-referencing, 5 are confirmed-unused, 1 is “very likely” (the always-3 indices_per_face), and exactly 1 remains unresolved (the 4 bytes at +0x140, which the constructor initializes to zero and no known function ever touches).

MDX vertex layout

The flags field at extra +0x100 is a bitmask describing what each MDX vertex record contains:

BitComponentSize
0x01position3×f32 (12B) – always set
0x02UV1 / tverts02×f32 (8B)
0x04UV2 / tverts12×f32 (8B)
0x08UV3 / tverts22×f32 (8B)
0x10UV4 / tverts32×f32 (8B)
0x20normal3×f32 (12B) – always set
0x80tangent space3×3×f32 (36B) – bump-mapped meshes

Common patterns in vanilla K1: 0x21 (pos+norm only, 24B stride), 0x23 (+UV1, 32B), 0x27 (+UV2, 40B), 0xA7 (+tangent, 76B).

Note that vertex colours have no flag bit. Their presence is signalled by the per-attribute offset slot being != -1. The 11 offset slots are:

SlotExtra offsetFieldEvidence
0+0x104positionLightPartTriMesh reads 3×f32, world-transforms
1+0x108normalLightPartTriMesh reads 3×f32, rotation only
2+0x10Cvertex colorChecked != -1, reads RGB only. Alpha unused.
3+0x110UV1PartTriMesh reads 2×f32
4+0x114UV2Structural: tverts1 in InternalGenVertices
5+0x118UV3Structural: tverts2
6+0x11CUV4Structural: tverts3
7+0x120tangent spaceFilled by CalculateTangentSpaceBasis
8–10+0x124..+0x12CreservedAlways -1 across 215 surveyed vanilla meshes

Vertex colour alpha is unused (confirmed 2026-04-04). LightPartTriMesh reads only bytes [0], [1], [2] (RGB). Byte [3] is stored but never read. The rendered output hardcodes alpha to 0xFF. The fourth byte exists purely for alignment.

Important subtlety: the engine doesn’t trust any of these values on load. InternalPostProcess at 0x0043cf00 recomputes the flags, stride, per-attribute offsets, and mdx_data_offset from scratch, based on which vertex components are actually present in the node’s arrays. It also recomputes vertex normals via edge cross products, and re-derives the bounding box and sphere. The on-disk values preserve the compiler’s original output, but they’re cosmetic from the engine’s perspective.

This has a consequence for tooling: you can largely get away with wrong values in these fields as long as your mesh is otherwise valid, because the engine will fix them up at load time. But a correct writer should still populate them – community tools (kotorblender, mdledit) depend on them, and the BioWare build pipeline does too.

Skin mesh (0x061)

100 extra bytes beyond TriMesh. Skinning data (bone weights, inverse-bind-pose rotation and translation, bone-index mapping) sits here, along with several padding regions:

Skin offsetFieldLayoutNotes
+0x00weightsCExoArrayListAlways zero in binary files.
+0x14bone_weight_dataptrRelocated if count at +0x18 > 0.
+0x1Cqbone_ref_invCExoArrayListInverse-bind rotations.
+0x28tbone_ref_invCExoArrayListInverse-bind translations.
+0x34bone_constant_indicesCExoArrayListBone-index remap.

The weights array deserves a call-out. A 52-byte SkinVertexWeight struct exists and is fully specified by the ASCII parser – 4 bone names, 4 weights, some metadata – but in the binary path, ResetSkin never relocates its pointer, and a corpus scan of all 968 skin nodes across 2832 vanilla models found zero non-empty weights arrays. Binary models store per-vertex bone data exclusively in MDX (via dedicated bone-weight and bone-index offsets), and the weights CExoArray is just a 12-byte zero blob on disk.

AnimMesh (0x0A1)

56 extra bytes. Carries a sample_period scalar and two CExoArrayList fields (anim_verts, anim_t_verts) for time-sampled vertex animation. The remaining six fields (three pointers + three counts + some padding) are runtime-only and zero on disk. Fun fact: no community tool (kotorblender, mdledit, kotormax, reone, xoreos, pykotor) parses AnimMesh nodes – we may have the first structured reader for this type.

Also: ResetAnim is peculiar in that it processes the extra data before calling ResetTriMeshParts, the reverse of every other mesh subtype. There’s no obvious reason for this.

Dangly mesh (0x121)

The simplest mesh subtype, 28 extra bytes. Four fields: a per-vertex constraints CExoArrayList, and three inline floats (displacement, tightness, period) that parameterize the soft-body simulation. A single conditional pointer at the tail is relocated only when the TriMesh vertex count is non-zero.

Dangly meshes are BioWare’s hack for cloth and hair – rigged to the skeleton like a skin mesh, but with simulation parameters that let parts of the geometry lag and swing.

AABB walkmesh (0x221)

4 extra bytes: a single pointer to the root of an AABB tree stored inline in the MDL blob.

The AABB tree is a flattened binary search tree written in DFS preorder. Each node is 40 bytes:

OffsetSizeFieldNotes
+0x0012box_min3×f32 AABB minimum corner
+0x0C12box_max3×f32 AABB maximum corner
+0x184right_childContent-relative offset (0 = no child)
+0x1C4left_childContent-relative offset (0 = no child)
+0x204face_indexi32. Leaves: ≥ 0. Internal: −1.
+0x244split_direction_flagsAxis bitmask: 1=+X, 2=+Y, 4=+Z, 8=−X, 16=−Y, 32=−Z

Note that right_child comes before left_child in the struct – this is the actual field order, not a typo. Matches Ghidra and the mdledit/mdlops implementations.

Leaf nodes have left = 0, right = 0, face_index ≥ 0, split_direction_flags = 0. Internal nodes have both children non-zero, face_index = -1, and flags computed from the child bounding-box separation. The format is the classic spatial subdivision tree used for fast triangle lookups during pathfinding and collision queries.

ResetAABBTree at 0x004a0260 recurses the tree, relocating each child pointer. It manually unrolls to depth 4 before recursing (the engine’s author was clearly worried about stack depth on a modest C++ compiler).

Lightsaber (0x821)

20 extra bytes – small but architecturally notable:

Saber offsetFieldNotes
+0x00saber vert dataRelocated pointer
+0x04saber UV dataRelocated pointer
+0x08saber normal dataRelocated pointer
+0x0CGL vertex pool IDRuntime-only (set by RequestPool)
+0x10GL index pool IDRuntime-only

Three arrays of exactly 176 vertices each (NUM_SABER_VERTS = 176, confirmed by kotorblender): position, UV, normal. The saber blade is a fixed-topology mesh – BioWare pre-baked the geometry as a flexible band that can be animated by swinging the endpoint controllers.

Unlike Skin/Dangly/AnimMesh, the saber uses the base TriMesh gen_vertices and remove_temporary_array callbacks. Its geometry doesn’t morph dynamically at the vertex-processing level – the animation is in the controller track.

Controllers and animation

The controller header

Controllers are the keyframe-animation primitive. Each node has an array of 16-byte NewController headers (at node +0x38) plus a shared pool of float data (at +0x44). Each header describes one animatable property of that node:

OffsetSizeFieldNotes
+0x00u32type_codeByte offset of the target property in the Part struct.
+0x04i16supermodel_linkAdditive-blending property offset; -1 = no blending.
+0x06u16row_countNumber of keyframes.
+0x08u16time_data_offsetFloat-array index for time values.
+0x0Au16data_offsetFloat-array index for value data.
+0x0Cu8value_type_and_flagsLow nibble: 1=float, 2/4=quaternion, 3=vector. Bit 4=0x10=Bezier.
+0x0D3paddingAlignment to 16 bytes. Never read.

The type_code is elegant: it’s literally the byte offset into the Part struct where the animated value lives. NewController::Control dereferences it as *(float*)(part_ptr + type_code). So type_code = 8 means “position” because position sits at Part+0x08; type_code = 20 means “orientation” because orientation sits at Part+0x14 (as a compressed axis-angle quaternion); and so on. This collapses what would otherwise be a switch over property IDs into direct pointer arithmetic.

The value_type_and_flags byte at +0x0C has a compound encoding that bit us hard early on:

  • Low nibble (& 0x0F) – value-type discriminator: 1=float, 2 or 4=quaternion, 3=vector. Selects the interpolation path (Lerp/Slerp/VectorLerp).
  • High nibble (& 0xF0) – flags. 0x10 signals Bezier interpolation, which triples the per-keyframe value count (each keyframe is value + in-tangent + out-tangent).
  • Special case: for orientation controllers (type code 20) with raw byte value == 2, the keyframe is a compressed quaternion packed into a single u32, not two f32 values.

The low nibble happens to coincide with the “number of floats per keyframe row” for simple cases (1, 3, 4), which is why the earlier interpretation of this byte as column_count mostly worked – until it didn’t. See the controller bug below.

Self-describing rows

Because value_type_and_flags is inline in each controller header, the binary format is entirely self-describing for animation data. The reader doesn’t need a lookup table mapping type codes to column counts – it reads the flags byte and knows how many floats to consume per row.

This is useful because vanilla K1 contains controller type codes (0x68, 0x188) that aren’t documented in any community reference. Trying to parse these with a closed enum caused 517 of 2832 vanilla MDLs (18.3%) to fail. MdlControllerType is therefore a newtype struct MdlControllerType(u32) with named constants for the three universally-confirmed base types (POSITION = 8, ORIENTATION = 20, SCALE = 36) and accepts any other u32 losslessly.

Base vs type-specific controllers

Three controllers are universal – they exist on every node type:

ASCII nameCodeColumnsMeaning
position83x, y, z
orientation204x, y, z, angle (compressed axis-angle)
scale361uniform scale factor

Type-specific codes live at higher numbers: light controllers start at 76 (color), emitter controllers are at 88+. All three base codes also support a Bezier variant (signalled by the flag bit, not a separate type code).

The MDX file: a mystery

Now for the strangest part of the format.

The MDX file contains interleaved vertex data – positions, normals, UVs, tangent space, colours – packed into records of width given by the mesh vertex_stride field, aligned into per-mesh blocks with sentinel-float terminators separating them. It looks exactly like what you’d expect a GPU vertex buffer to look like.

And the K1 engine never reads it.

Here’s the complete trace through InputBinary::Read:

  1. Read the MDX file into a buffer (pbVar9).
  2. Call Reset(mdl_content, mdx_content, resource).
  3. Reset passes mdx_content as param_3 through a chain of function calls (ResetMdlNode, ResetTriMeshParts, …). Every downstream function has param_3 as a formal parameter.
  4. param_3 is never used. In ResetTriMeshParts, it’s literally overwritten as a loop counter on line 67.
  5. Back in InputBinary::Read, line 78: _free(pbVar9). The MDX buffer is freed.

At no point does any vertex-related code path consume MDX data. InternalGenVertices builds vertex buffers from verts_arrays, which lives in the MDL content blob. ProcessVerts recomputes normals from geometry. LightPartTriMesh reads from the GL pool populated at +0xAC of the model header – which is sourced from the MDL content, not the MDX file.

So where does the vertex data actually come from? From a parallel set of position-only arrays stored inside the MDL content blob, pointed to by vert_array_offset at mesh +0x148 (content-relative), with additional UV/colour/normal data in the MdlNodeTriMeshVertArrays structures.

The MDX file, in short, is a redundant interleaved copy of data that the K1 engine could reconstruct from the MDL alone. Most likely theories for why it exists:

  • Build-pipeline artifact. BioWare’s Aurora engine (Neverwinter Nights) may have used the MDX format directly, and the K1 pipeline inherited the file-layout convention without the consuming code path.
  • Toolset requirement. Third-party editors and the BioWare toolset itself may still parse MDX for authoring workflows.
  • ResetLite path. There’s a separate “lightweight” loader (InputBinary::ResetLite at 0x004a11b0) that may use MDX for a reduced in-memory representation – unverified.

For our purposes, this has two consequences:

  1. Engine-functional MDX is near-trivial. Any MDX file the K1 engine happily ignores is a valid MDX file. You could write all zeros and the game would run.
  2. Round-trip-accurate MDX requires the per-mesh terminator convention (described next), because community tools do read MDX, and byte-identical round-trip is a useful correctness check.

Per-mesh terminators and alignment

Empirically, vanilla MDX files are larger than sum(vertex_count × stride). Across 2832 vanilla K1 models, 2445 have MDX files with excess bytes, totalling 3,278,456 bytes corpus-wide.

The excess has structure. After each mesh’s vertex data, there’s a terminator row of exactly one stride’s worth of bytes, beginning with three sentinel floats and padded with zeros:

Mesh typeSentinel valueHex (f32 LE)
Non-skin (type & 0x40 == 0)10,000,000.000 96 18 4B
Skin (type & 0x40 != 0)1,000,000.000 24 74 49

Corpus sentinel detection: 6,973 non-skin sentinels, 6 skin sentinels, 0 unknown patterns.

Between meshes, the cursor is padded to the next 16-byte boundary. The last mesh has no trailing alignment:

cursor = 0
for each mesh in MDX order:
    cursor += vertex_count × stride   # vertex data
    cursor += stride                   # terminator row
    if not last mesh:
        cursor = (cursor + 15) & ~15   # 16-byte alignment
mdx_file_size = cursor

For stride-24 meshes, the gap between meshes is either 24 or 32 bytes depending on current alignment. For stride-32 and stride-64 meshes, it’s always exactly stride because the stride is already a multiple of 16.

Mesh ordering in MDX

Non-skin meshes come first, then skin meshes. Within each group, the order is DFS-traversal-of-the-tree – mostly. About 27% of vanilla models exhibit a compiler-specific permutation that defers “second children” of paired parents until after all their siblings’ first children. This is reproducible for our own output (if we write DFS, we read DFS), but not for byte-identical round-trip of every BioWare file.

Writing in standard DFS order (non-skin first, skin second) produces semantically identical MDX data with the correct total size. 1784 of 2444 models match byte-for-byte; the remaining 660 have the non-standard compiler ordering.

What this means for mdx_data_offset

The mesh header has two adjacent u32 fields at +0x144 and +0x148:

  • +0x144 mdx_data_offset: per-mesh byte offset into the MDX file. Used by community tools to seek directly to that mesh’s vertex block. The engine also uses this after InternalPostProcess overwrites it with a GL-pool offset.
  • +0x148 vert_array_offset: content-relative pointer to the position-only vertex data embedded in the MDL content blob. Used by the engine during load. Relocated by ResetTriMeshParts via param_1->field60_0x198 = param_2 + param_1->field60_0x198 – where param_2 is the MDL content base, not the MDX base.

These two fields were conflated under a single MDX_OFFSET = 0x148 constant in our implementation for several months, which caused the reader to lose the MDX offset entirely and the writer to overwrite the content pointer with an MDX offset. Full story in War stories.

Face layout

Faces are 32-byte records (MaxFace) stored in the TriMesh faces CExoArray:

OffsetSizeFieldTypeNotes
+0x0012plane_normal3×f32Face plane normal.
+0x0C4plane_distancef32Plane equation: n·p = d.
+0x104surface_idu32Walkability / material identifier.
+0x146adjacent3×u16Indices of adjacent faces (for AABB/pathfinding).
+0x1A6vertex_indices3×u16Triangle vertex indices.

The plane normal and distance are pre-computed by the BioWare toolset. They can be re-derived from the geometry but the binary format preserves them. The adjacency graph is what makes AABB walkmesh lookups fast – each triangle points to its neighbours, enabling constant-time stepping during pathfinding.

An early version of our reader assumed 12-byte faces (just the vertex indices). This led to every 2.67th “face” being interpreted from garbage bytes belonging to the next face’s plane normal. It was masked by synthetic round-trip tests – write wrong, read wrong, match! – and only caught when vanilla-file validation found vertex indices exceeding the mesh’s vertex count.

War stories and implementation history

A brief chronicle of the bugs found while building the Rust reader/writer, because the “how we know this” is often as useful as the “what we know”.

The 12-byte face bug

Described above. The MaxFace stride is 32 bytes, not 12. Caught by vertex-index bounds checking against vanilla files.

Mesh header size corrections

The whole mesh extra-header was misunderstood for a long time. A sample of the corrections, all fixed in late February 2026:

  • VERTEX_COUNT offset was 0x9E → actually 0x130
  • MDX_OFFSET was 0xB8 → actually two separate fields at 0x144 and 0x148
  • VERTEX_STRUCT_SIZE was 0xBC → actually 0xFC
  • MESH_EXTRA_SIZE was 200 bytes → actually 332 (0x14C)
  • RENDER boolean was missing entirely → added at 0x139
  • SHADOW boolean was missing entirely → added at 0x137

All of these stemmed from extrapolating offsets from partial hex dumps rather than decompiling the struct. Ghidra’s MdlNodeTriMesh struct definition settled the whole thing – once the Ghidra type was aligned, the field offsets fell out directly.

Controller column-count encoding

Our reader initially used the raw value_type_and_flags byte (at controller +0x0C) directly as a float count per row. This worked for the common case (position=3, orientation=4, scale=1) but broke in two scenarios:

  • Bezier controllers set bit 0x10, turning raw=3 (Bezier position) into a byte value of 0x13 = 19 columns, not 9.
  • Integral orientation: ORIENTATION controllers with raw byte == 2 mean “compressed quaternion packed into one u32 per row”, not “2 f32 values per row”.

The integral-orientation case was the more painful bug: a c_dewback scan showed 876 integral-orientation controllers; c_rancor had 1,212. Reading 2 floats instead of 1 consumed double the expected data, desynchronizing every subsequent controller in the data array. Every node’s animation after the first compressed-quaternion keyframe was reading from a shifted window of garbage.

Fix: decode the raw byte with & 0x0F masking plus the two special cases (Bezier multiplies by 3; integral orientation uses 1 u32 per row regardless). The raw byte is preserved in a raw_column_count field for round-trip fidelity.

Animation node_number at +0x02

The 80-byte node header’s first 8 bytes are type_flags (u16), node_number (u16), name_index (u16), padding (u16). Our offset map had NODE_ID = 0x04, which pointed to name_index, not node_number.

For animation nodes specifically, node_number is the engine’s key for matching animation keyframe nodes to their geometry-side skeleton bones. Writing zeros at +0x02 and stuffing the name_index at +0x04 meant every animation node had node_number = 0, so every keyframe targeted the root bone. Visually: characters froze in T-pose with no skeletal motion whatsoever.

Fix: read node_number from +0x02 explicitly; derive name_index from the name map at +0x04.

MDX per-mesh seeking

Our MDX reader used a cumulative cursor assuming non-skin-first DFS ordering. For the ~51% of vanilla models where MDX layout doesn’t match that assumption, vertex data was assigned to the wrong mesh nodes. Self-round-trip tests couldn’t detect this – we were reading and writing the same wrong assignment, which is a consistency check for the tool’s own output, not for correctness against vanilla.

Fix: seek to info.mdx_data_offset (the +0x144 field) for each mesh, matching kotorblender and mdledit behaviour. The cumulative-cursor logic remains in the writer, which produces its own layout and backpatches the offset field; the reader trusts whatever the file says.

Name-table dead entries

220 vanilla K1 models have name tables containing entries that no node references. These turn out to be walkmesh node names (*_wok, *_pwk, *_dwk variants) from BioWare’s build pipeline, which apparently shared a single name table across the MDL and WOK outputs.

The engine only performs indexed lookups via name_index; it never iterates the full table or validates the count. Extra entries are harmless dead weight.

Decision: not preserved. Our writer builds the name table from the node tree (matching kotorblender and mdledit), producing files that are functionally identical but 20–80 bytes shorter. This is a known, benign size delta – not a parity bug.

Emitter controller code verification

All 48 emitter controller type codes were independently verified against the engine binary via Ghidra. For each, we located the __stricmp call for the ASCII field name and traced the controller type value stored on match. Every code matched mdledit’s ReturnControllerName table exactly – no additions, no omissions.

One naming correction: the engine’s canonical string for code 200 is "lightningZigzag" (camelCase Z). mdledit has "lightningzigzag" (all lowercase). Functionally identical because the engine uses __stricmp (case-insensitive), but the engine’s capitalization is now what we emit.

Corpus validation status

As of 2026-02-24: 2832/2832 (100%) structural round-trip success (parse → write → parse → compare). This was achieved after fixing three comparison issues in the test harness:

  1. NaN ≠ NaN (IEEE 754): 1559 false failures – floats containing NaN don’t equal themselves. Fixed with bitwise f32::to_bits() comparison.
  2. Parent index ordering: 135 mismatches from depth-first vs. original node ordering. The binary format preserves node ordering but our parent-index reconstruction uses DFS. Semantically equivalent, numerically different – skipped in comparison.
  3. Face NaN values: exactly one model (w_dblsbr_001) has NaN in its pre-computed plane_normal/plane_distance, because one of its faces is degenerate. Round-trips correctly once NaN-aware comparison is used.

Byte-level MDL/MDX equality is a separate target – 1784 of 2444 MDX files match byte-for-byte, with the remaining 660 showing the non-standard BioWare compiler traversal discussed earlier.

Appendix

Emitter field map

304 bytes total (80 base + 224 extra). Emitter-specific data:

Node offsetExtra offsetFieldType
+0x50+0x00deadspacef32
+0x54+0x04blast_radiusf32
+0x58+0x08blast_lengthf32
+0x5C+0x0Cnum_branchesi32
+0x60+0x10control_pt_smoothingi32
+0x64+0x14x_gridi32
+0x68+0x18y_gridi32
+0x6C+0x1Cspawn_typei32
+0x70+0x20updatechar[32]
+0x90+0x40renderchar[32]
+0xB0+0x60blendchar[32]
+0xD0+0x80texturechar[32]
+0xF0+0xA0chunk_namechar[16]
+0x100+0xB0two_sided_texi32
+0x104+0xB4loopi32
+0x108+0xB8render_orderu16
+0x10A+0xBAframe_blendingu8
+0x10B+0xBBdepth_texture_namechar[16]
+0x11B+0xCB(reserved)21 bytes

LOD naming convention

When a model has cullWithLOD set, the engine searches for LOD variants by appending suffixes to the model name:

  • <name>_x – medium LOD
  • <name>_z – far LOD

Loaded via FindModel(name + "_x") and FindModel(name + "_z") as separate Model instances linked to the primary. Not relevant to format parsing, but useful for model validation and lint rules.

Resource type IDs

FormatResource type
MDL2002 (0x7D2)
MDX3008 (0xBC0)

These map to the KEY/BIF resource type system. CAuroraInterface::RequestModel at 0x0070d8d0 resolves models through a sorted requestedModelList.

Dynamic type casts

The engine exposes As* functions for type-checked downcasts. Caller counts indicate runtime usage frequency:

FunctionCallers
AsModel34
AsMdlNodeTriMesh14
AsMdlNodeEmitter11
AsAnimation7
AsMdlNodeLightsaber5
AsMdlNodeSkin4
AsMdlNodeAABB3
AsMdlNodeDanglyMesh3
AsMdlNodeLight3
AsMdlNodeAnimMesh2
AsMdlNodeCamera2
AsMdlNodeReference2

TriMesh (14) and Emitter (11) are the most-queried node types – useful signal for prioritizing implementation completeness.

Binary MDL call graph

For reference when reading Ghidra decompilations:

NewCAurObject (0x00449cc0)
└── FindModel (0x00464110)           [by name; checks cache via BinarySearchModel]
    └── LoadModel (0x00464200)       [on cache miss]
        └── IODispatcher::ReadSync (0x004a15d0)
            └── Input::Read (0x004a14b0)          ← format dispatcher
                ├── InputBinary::Read (0x004a1260)   if first_byte == 0x00
                │   └── Reset / ResetLite                (pointer rewriting)
                │       ├── ResetMdlNode                  (per-node dispatch)
                │       │   ├── ResetMdlNodeParts         (base fields)
                │       │   ├── ResetTriMesh              (mesh subtypes)
                │       │   ├── ResetLight                (light extras)
                │       │   ├── ResetSkin, ResetAnim, ...
                │       │   └── ResetAABBTree             (recursive tree walk)
                │       └── ResetAnimation                (per-animation)
                └── FuncInterp loop                 otherwise (ASCII MDL)
    └── CreateInstanceTreeR (0x00449200)  [builds runtime Part tree from MdlNode tree]

Key Ghidra addresses

For anyone continuing this archaeology, the foundation set of function addresses in swkotor.exe (K1 GOG build):

FunctionAddress
Input::Read0x004a14b0
InputBinary::Read0x004a1260
InputBinary::Reset0x004a1030
InputBinary::ResetMdlNode0x004a0900
InputBinary::ResetMdlNodeParts0x004a0b60
InputBinary::ResetTriMeshParts0x004a0c00
InputBinary::ResetAABBTree0x004a0260
InputBinary::ResetLight0x004a05e0
InputBinary::ResetSkin0x004a01b0
InputBinary::ResetDangly0x004a0100
InputBinary::ResetAnim0x004a0060
InputBinary::ResetLightsaber0x004a0460
InputBinary::ResetAnimation0x004a0fb0
MdlNodeTriMesh::InternalPostProcess0x0043cf00
MdlNodeTriMesh::InternalGenVertices0x00439df0
MdlNodeTriMesh::InternalParseField0x004658b0
MdlNodeEmitter::InternalParseField0x004658b0
MdlNodeEmitter::InternalCreateInstance0x0049d5c0
PartTriMesh::GetMinimumSphere0x00443330
LightPartTriMesh0x0046a9e0
NewController::Control0x00483330
NewController::GetFloatValue0x00482bf0
Model constructor0x0044aa70
MaxTree constructor0x0044a900
ParseNode0x004680e0
Node type flag table0x00740a18