Rakata
Rakata is a clean-room Rust implementation of Knights of the Old Republic (KotOR) data formats and tooling. It provides a modular workspace designed for robust, type-safe, and canonical handling of Odyssey Engine game data.
This Wiki serves as the definitive reference manual for KOTOR Formats and Engine Behaviors, designed to decouple format knowledge from the underlying Rust source code.
Documentation Domains
Rakata’s documentation operates on two tiers: the Software API and the Format Specifications.
1. The Workspace (Code API)
The workspace is organized into focused crates and tools. If you are developing against Rakata and need to know the semantic layout of types, functions, and data structures, refer to the respective Rustdocs:
Libraries (crates/)
rakata-core: Foundational primitives (ResRef,ResourceType,ResourceId) and core utilities (encoding, filesystem, detection).rakata-formats: Binary and text format readers/writers for 19 KotOR formats including GFF, ERF, RIM, KEY/BIF, MDL/MDX, TPC, TGA, and more.rakata-generics: Typed wrappers around GFF-backed resources (all 13 types: UTW, UTC, UTI, etc.) with loss-aware conversion.rakata-extract: Resource resolution logic, composite module handling (.mod+_s.rim+_dlg.erf), and game-wide resource access (GameResources).rakata-lint: Comprehensive resource validation against engine-derived field schemas. Catches crash-causing mod errors across all formats before they hit the engine.rakata-save: Save game parsing and modification logic.rakata: Facade crate re-exporting the ecosystem.
Tools (tools/)
rakata-saveeditor: Desktop GUI application for editing save games.vanilla-inspector: Corpus validation tool for testing format implementations against all vanilla game assets.
2. Format Specifications (This Wiki)
The entire formats/ specification manual effectively serves as Rakata’s formal Evidence Log. If you need to understand binary structure, historical context, or how the original swkotor.exe engine interprets byte bounds under the hood (via Ghidra-backed engine constraints), you are in the right place!
Navigate through the sidebar to explore our exhaustive, decoupled format libraries:
- Archive Formats – Detailed overviews of encapsulated containers (
BIF,KEY,ERF,RIM). - GFF Structure – The bedrock of KOTOR’s data, exposing the
13distinct blueprint constraints (Creatures, Dialogues, Triggers, etc.). - 3D Models & Mesh – MDL/MDX structures and binary walkmesh topologies.
- Textures & Audio – Overviews detailing graphic compression (
TPC,DDS) and MP3/Miles Sound System wrappers. - Text & Data Formats – Localized Talk Tables (
TLK), rule mappings (2DA), and hierarchical layout geometries (LYT,VIS).
Ready to dive in? Head over to the Goals & Roadmap to see where the project is heading, or look into the Architecture logic that powers the Rakata suite.
Project Roadmap
This document outlines what we’re tinkering with in rakata and where the project is heading.
Note
For day-to-day progress, bug fixes, and specific technical tasks, check out the Codeberg Issues tracker instead.
The End Goal
Right now, our libraries are mostly just good at reading and writing individual game files - like extracting a 3D model, opening a save file, or decoding audio. But the real dream for rakata is to build a full, modern KOTOR engine integration.
Eventually, it would be cool to tie all these isolated pieces together into an actual rendering pipeline. For example: dropping a vanilla model into an active window and have the engine stream the textures and background audio straight from the game data.
How We Get There
Since this is a passion project, we try to match the original game behavior down to the exact byte before building higher-level abstractions on top of it. It takes a bit longer, but it keeps us from having to constantly rewrite core parsers when we stumble into weird edge cases.
1. Laying the Foundation (Mostly Done)
Our core libraries (rakata-formats, rakata-save, etc.) can currently read, write, and safely roundtrip over 17 different KOTOR file formats. We’ve tackled a lot of the weird legacy archives (BIF), models (MDL/MDX), and raw textures (TPC), ensuring they line up with vanilla behavior.
However, the foundation is still growing! We still have a handful of outstanding data formats to map out and implement, including Pathfinding (PTH), UI Layouts (GUI), and Walkmeshes (WOK/DWK/PWK).
Additionally, formatting and bytecode support for NCS (Compiled Scripts) is actively being prioritized (see Issue #19) to allow rakata to interface natively with upcoming Rust-based community compilers and decompilers.
2. Building Real Tools (Our Active Focus)
Now that we can parse the data reliably, we are building stuff the community can actually use:
- Mod Linter: A tool to scan modded files and point out if they break the game’s actual data constraints, catching crashes before you load them in-game.
- Save Editor: A basic offline save editor (
rakata-saveeditor) built directly on top of our stable format parsers. - Audio Streaming: Updating the generic audio logic (
rakata-audio) so we can natively stream game music and voice lines instead of loading giant buffers into memory. - Drop-in Replacements: Providing modern, reliable drop-in replacements for legendary (but aging) community tools. By backing these with
rakata’s strict parsing rules, we can offer faster, safer, cross-platform native tools for unpacking archives, compiling models, and building mods. (Note: While we aim to replace these tools, we will not inherit their legacy bugs or non-vanilla API quirks. When in doubt, the original game engine is our only source of truth).
3. KOTOR 2 (TSL) Support
We are strictly focusing on KOTOR 1 right now, but extending parsing support for TSL via compatibility flags is a planned enhancement for further down the line once K1 is completely stabilized.
4. The Runtime Engine (The dream but probably a few years away)
Once our standalone tools prove that our format parsers are perfectly stable, we have a pipedream to one day start weaving them together into a natively synchronized rendering loop.
Architecture Guide
This document outlines how the rakata workspace is structured and the design principles we try to stick to.
Core Principles
-
Vanilla K1 First
- By default, we target the original vanilla behavior of KotOR 1.
- Compatibility for TSL or community tools is strictly opt-in behind feature flags, not the default assumption.
- When deciding how to parse something, the original game engine is our ultimate source of truth. We use local fixtures and original game data to prove our parsers work, rather than just copying how older community tools did things.
-
Aim for Lossless
- We want to be able to read a file and write it back out to the exact same bytes. We’ve largely achieved this for standard archives and data formats (GFF, ERF, RIM, KEY, TLK, etc.).
- For highly complex formats (like MDL/MDX models), there are some known divergences where achieving a byte-exact roundtrip is essentially impossible due to how the original compilers ordered geometry blocks. We track these exceptions, but the output still safely runs in-game.
- No Lazy Pass-throughs: If a file has undocumented fields, we don’t just read them as an opaque
Vec<u8>blob and blindly pass them through. Our goal is to properly reverse-engineer and map every single struct boundary. However, if we identify defined “reserved” fields in the binary layout that we haven’t cracked the meaning of yet, we will map them as properly sized reserved values so we don’t accidentally drop data the engine might rely on. (Note: explicit blank padding bytes aren’t stored in memory at all - we just recalculate those dynamically on write).
-
Strict Text Handling
- All text decoding goes through
rakata-core::text. - Localized text (TLK entries, strings) uses language-aware encodings (Windows-1252, Shift-JIS, etc.) to match what the engine expects.
- Binary strings (like node names or texture paths) use
TextEncoding::Windows1252since that’s what the engine actually uses under the hood. No silently stripping weird characters with lossless backups.
- All text decoding goes through
(For day-to-day coding rules around iterators, zero-cost abstractions, and memory safety, see the Idiomatic Rust section in the contributing.md guide!)
Workspace Boundaries
Note: This layout is a living target! Some of these crates (like rakata-audio and rakata-saveeditor) are currently under active development. As we tackle our near-term roadmap goals – like building out the rakata-lint validation engine – expect these existing crates to flesh out, alongside brand new sibling crates being added to the ecosystem.
The workspace is organized in a clean dependency chain. Crates can only depend on crates listed “above” them:
rakata-core (no workspace deps)
rakata-formats (depends on: core)
rakata-audio (depends on: core, formats)
rakata-generics (depends on: core, formats)
rakata-extract (depends on: core, formats)
rakata-lint (depends on: formats, generics)
rakata-save (depends on: core, formats)
rakata (facade: re-exports all library crates)
Library Crates (crates/)
rakata-core: The absolute basics (ResRef, IDs) and core utilities like file streams and text encoding.rakata-formats: Our massive library of parsers and writers (GFF, ERF, BIF, MDL, TPC, etc.). This parses bytes into objects, but doesn’t know anything about how the game actually uses them.rakata-audio: Audio streaming and decoding for the engine’s various sound formats (WAV, ADPCM, MP3).rakata-generics: Strongly-typed Rust models for all the different GFF files (like Doors, Items, Characters).rakata-extract: The logic for hunting down actual game files in the wild. It knows how to look inside ERFs, check the Override folder, and resolve files just like the engine does.rakata-lint: Our rule engine for scanning modded files and checking them against vanilla schema constraints.rakata-save: High-level logic for safely reading, editing, and backing up save files.rakata: A handy facade crate that re-exports everything so you only need to add one dependency.
Tool Crates (tools/)
rakata-saveeditor: The actual desktop application for editing save files.vanilla-inspector: A testing utility for validating our parsers against the actual mass of game files.
Format API Guidelines
Public API Shape
Every format parser in rakata-formats generally provides the same clean interface:
read_<fmt><R: Read>(reader: &mut R) -> Result<T, E>read_<fmt>_from_bytes(bytes: &[u8]) -> Result<T, E>write_<fmt><W: Write>(writer: &mut W, data: &T) -> Result<(), E>write_<fmt>_to_vec(data: &T) -> Result<Vec<u8>, E>
Formats with multiple output modes (like exporting models to ASCII text or JSON) just use variations of these names (read_mdl_ascii()).
- Generic Traits: We strongly prefer accepting generic I/O trait bounds (
Read,BufRead,Write,Seek) over concrete types. Accept the narrowest trait that covers your API’s needs so callers aren’t forced to jump through hoops.
Error Handling
Robust parsing means strict error boundaries:
- Each format module must define its own domain-specific error enum (e.g.,
GffError,ErfError) using thethiserrorcrate. Do not use generic stringly-typed errors orBox<dyn Error>. - Low-level read failures (like sudden bounds exhaustion or bad magic numbers) should wrap our shared
BinaryLayoutError. - Never
unwrap()at an API boundary! Only fail explicitly viaResultor use.expect()with a hardcoded rationale if it is impossible to fail.
Memory & Ownership
While we try to avoid deep cloning and heavy allocations behind the scenes, we default to owned data types when crossing public API boundaries. Unless a module is explicitly built and documented as a zero-copy “View” type, you should avoid passing nasty lifetimes into the caller’s lap.
Keeping Concerns Separated
- Dumb Parsers: Format modules in
rakata-formatsare intentionally “dumb”. They solely translate between raw byte streams and Rust structs without any awareness of game architecture, filesystems, or what a “module” is. - Smart Extractors: All the messy environment logic – hunting down loose files, enforcing vanilla precedence rules (e.g., checking the Override folder before extracting from a BIF archive), and assembling composite files – lives safely isolated inside
rakata-extract. This separation guarantees our parsers can cleanly process isolated test files just as well as they operate in a massive live-game workflow.
Tracing & Telemetry
We strongly encourage instrumenting format parsers with tracing::instrument spans to help pinpoint exactly where a badly formed file breaks during a parse. However, this telemetry must remain entirely zero-cost for consumers who don’t need it! We achieve this by wrapping public parser entry points in conditional attributes: #[cfg_attr(feature = "tracing", tracing::instrument(...))]. If a user doesn’t explicitly opt-in via their Cargo.toml, the Rust compiler strips the instrumentation entirely.
Serialization (Serde)
Just like tracing, serde support for exporting our parsed files to JSON or YAML must be treated as a zero-cost, opt-in feature. Format structs and types should generously derive Serialize and Deserialize when the serde feature flag is enabled. This allows downstream utilities (like the Save Editor) to effortlessly convert memory layouts into text formats, while ensuring the core parsers stay extremely light for purely binary-focused applications.
Beyond Basic Parsing
While rakata-formats gives us the ability to parse isolated bytes, the game engine is much more complicated. Our higher-level crates exist to bridge that gap between “dumb bytes” and “actual game logic”.
Finding Files (rakata-extract)
rakata-extract handles the messy reality of finding files scattered across a massive KOTOR installation. It mirrors the vanilla engine’s lookup hierarchy in three distinct layers:
- Primitives: Grabbing a file out of a single archive (like unpacking a standalone ERF or BIF file).
- Composition: Treating related archive sets as a single “Module” (like grouping a
.modfile with its matching_s.rimand_dlg.erffiles so they load transparently together). - Game-wide: Creating a unified
GameResourcestree that maps out the entire game installation.
Because we want our extraction to perfectly mirror vanilla behavior, lookups are strictly case-insensitive, and loading precedence is explicitly designed to mirror how the original game works (so a file in the Override folder automatically beats a file buried in a BIF archive).
Strongly-Typed Data (rakata-generics)
When we parse a .utc Character file, rakata-formats just hands us a raw GFF tree of untyped labels and values. rakata-generics wraps those raw data blobs in strongly-typed Rust structs (like Character, Item, Door). This guarantees that if a developer needs to access a character’s “Strength” stat, they get a guaranteed u8 property rather than blindly guessing string handles inside a raw binary tree.
High-Level Interaction (rakata-save & rakata-lint)
Finally, crates at the top of the stack use our extraction logic and strongly typed generic structs to actually do things. rakata-lint compares typed structs against vanilla constraints to catch modding errors, while rakata-save gracefully handles unpacking, editing, and re-compressing massive save-game directories without corrupting the player’s campaign!
Contributing Guide
Welcome to the Rakata workspace! This guide outlines how we build, how we test, and the core rules for keeping our code clean, compliant, and maintainable.
License Policy
- License: All workspace crates use
GPL-3.0-or-later. - Third-Party Components: New dependencies must be compatible (MIT, Apache-2.0, BSD). Add them to
THIRD_PARTY_NOTICES.mdbefore merge.
Clean Room Implementation
To ensure everything we build is 100% our own original work and we aren’t accidentally borrowing from other community tools (if you’re curious about why we’re so strict about this, check out docs/LEGAL.md):
- Reference Policy: Treat existing tools (like PyKotor) as behavioral references, not copy sources.
- No Copy-Paste: Do not copy source code blocks, large comments, or docstrings from third-party sources into Rust files.
- Re-Derivation: Derive implementation logic from format documentation, observed behavior (hex dumps), and black-box fixture analysis.
- Reverse Engineering:
- Behavior verification via disassembly tools (e.g., Ghidra) is allowed for interoperability analysis.
- Do not copy decompiled code into source files.
- Record findings as paraphrased behavior notes natively within the relevant format specification under
docs/src/formats/.
What belongs in the Engine Audits
The entire Rakata format specifications manual (docs/src/formats/) serves as the engine audit layer between reverse engineering and implementation. All Rust code is written strictly from these engine audits (specifically the Engine Audits & Decompilation sections embedded in each format’s blueprint), not from raw decompilation output.
- Record: Field names, data types, default values, error conditions, and observable behavioral rules (e.g., “field X is clamped to range 0–100”, “list is sorted ascending by field Y”).
- Do not record: Step-by-step algorithmic sequences, control flow structure, or implementation details that go beyond what is needed for interoperability. The test is: could someone implement correct behavior from this note without it dictating a specific code structure?
Format Work vs Engine Reimplementation
Right now, this workspace is exclusively focused on format parsing, linting, and modding tools - reading, writing, and validating the game’s actual data files. We are fundamentally just mapping out how the original game structures its data so we can build cool tools around it.
Building an actual game engine replacement (with gameplay logic, AI, and rendering pipelines) is a completely different beast for another day. But that’s exactly why these format blueprints are so critical: if someone wants to build an engine later, they can just use our shared engine audits to understand the data, rather than having to dig through raw decompiled binaries themselves!
Code Style & Linting
Pre-commit Hooks
We use pre-commit to keep the codebase consistently formatted without anyone having to manually police it. After cloning the repository, it’s highly recommended to set up the hooks:
pre-commit install
pre-commit install --hook-type pre-push
This registers two quick automated stages:
- pre-commit: Formats your code via
cargo fmt --all(auto-fixing it for you) and runscargo clippyacross all targets. - pre-push: Runs
cargo test --workspace --all-featuresto ensure tests are green before you push.
Try to avoid skipping hooks using --no-verify. If a hook catches something, it’s usually just a helpful clippy suggestion or a quick formatting tweak!
Manual Checks
If you don’t like automated hooks and prefer running things manually from the workspace root before committing, you absolutely can:
cargo fmt --all
cargo clippy --workspace --all-targets --all-features
cargo test --workspace --all-features
Note: Passing --all-features to clippy and test is important so it catches optional code paths like serde and tracing! We just ask that fmt and clippy run cleanly before you open a Pull Request.
Idiomatic Rust
To keep the codebase consistently safe, lean, and fast, we heavily rely on a few core Rust principles:
- Safe Numeric Casts: To prevent silent truncation bugs, we enforce
#![warn(clippy::as_conversions)]. Avoid the rawaskeyword; lean onFrom,TryFrom, or.into(). If an unsafe cast is truly unavoidable (like anf32down to ani32), use a scoped#[allow(clippy::as_conversions)]and drop an inline comment explaining why it’s safe. - No Primitive Obsession: We heavily utilize strongly-typed wrappers (like
ResRef) rather than passing raw[u8; 16]orStringprimitives around. - Strict Error Handling: We explicitly forbid
.unwrap()and.unwrap_unchecked()in library code. Everything must propagate cleanly viaResultusing typed error enums (managed viathiserror). - Composition over Hierarchy: We prefer lean, flat structs and trait combinators over deep, messy object-oriented class hierarchies.
- Iterators over Loops: We prefer functional iterator chains (
map,filter,fold) over maintaining manual mutable state inforloops. - Zero-cost Features: Optional functionality (like
serdeserialization ortracingtelemetry) must introduce absolutely zero overhead when disabled. - Safe by Default: We use
#![forbid(unsafe_code)]across all core parser crates to enforce strict memory safety boundaries.
Testing & Quality
Our testing approach is a Gray Box strategy: we use our hard-earned white-box knowledge of the game engine (via Ghidra audits) to build extremely strictly-validated black-box test cases for our parsers. We want to test against how the real game engine behaves, not against artificial mocks.
When adding a brand new format, please make sure your PR includes:
- Fixture-Backed Tests: Full roundtrip coverage using synthetic test files (stored in
fixtures/). We never commit real game assets; runcargo test --test gen_fixtures -- --ignoredto safely generate them! Byte-exact roundtrip assertions are the gold standard for any format where the engine consumes bytes exactly as written. - Mutation Tests: A quick pass to verify the parser safely rejects malformed or corrupted inputs without panicking (usually wired up via
corruption_matrix.rs). - Module Documentation: A clean rustdoc block showing the basic format layout.
The Reserved Field Rule
Game engines are weird, and sometimes they leave mysterious “padding” or “reserved” sections in their binary formats. Every struct field that corresponds to a reserved region must be:
- Stored strictly as a named array (e.g.,
reserved: [u8; N]) in the format struct. - Read directly from the source bytes verbatim.
- Written back verbatim during a roundtrip.
If a writer zeroes out or silently drops a reserved field you parsed, we consider that a “lossless bug” – even if the engine doesn’t explicitly seem to use those bytes. If you’re constructing a brand new file from scratch, you can safely write zeroes for reserved regions, but the struct must be capable of storing exactly what it read off disk.
Release Process
(TODO: We haven’t cut an official production release yet! Right now we are aggressively building out the rakata-lint engine rules and expanding our format coverage. Once we officially stabilize v0.3.0 to crates.io, we’ll formalize our exact release checklist, dependency license refreshes, and CI pipelines here.)
Legal & Compliance
Disclaimer: We aren’t lawyers! The following information references specific legal statutes regarding software interoperability and reverse engineering simply to clearly demonstrate our commitment to strictly lawful development.
Project Intent
Rakata is an open-source research project and software library strictly designed to build interoperability with the data formats used by Star Wars: Knights of the Old Republic (KOTOR).
- Our Goal: We want to empower users to access, read, edit, and safely modify their own legally purchased game files on modern operating systems using open-source tools.
- No DRM Circumvention: This project completely avoids the game executable. We do not bypass, strip, or defeat any Digital Rights Management (DRM) or software encryption. We solely parse static data files (like
.rim,.bif, and.mdlfiles) for the pure purpose of compatibility. - No Pirated Assets: This repository does not contain, distribute, or host any copyrighted game assets (art, sound, proprietary code, or binaries) owned by the original rights holders. You must supply your own legally obtained copy of the game to do anything useful with this software.
Legal Basis for Reverse Engineering
This project operates under the specific “Interoperability” exceptions provided by copyright law in major jurisdictions:
🇨🇦 Canada (Jurisdiction of Maintainer)
Under the Copyright Act (R.S.C., 1985, c. C-42), this project relies on Section 30.61, which permits the reproduction of a computer program for the purpose of:
- (a) obtaining information that is necessary to allow the computer program to be compatible with another computer program; or
- (b) correcting errors in the computer program.
🇺🇸 United States
Under the Digital Millennium Copyright Act (DMCA), this project operates under the 17 U.S.C. § 1201(f) exception for Reverse Engineering, which states:
- (1) … a person who has lawfully obtained the right to use a copy of a computer program may circumvent a technological measure… for the sole purpose of identifying and analyzing those elements of the program that are necessary to achieve interoperability of an independently created computer program with other programs…
🇪🇺 European Union (Host Jurisdiction - Codeberg)
Under Directive 2009/24/EC (Legal Protection of Computer Programs), this project adheres to Article 6 (Decompilation), which allows for the reproduction of code and translation of its form when:
- (a) these acts are performed by the licensee or by another person having a right to use a copy of a program…
- (b) the information necessary to achieve interoperability has not previously been readily available…
- (c) these acts are confined to the parts of the original program which are necessary to achieve interoperability.
Acknowledgements
Portions of the initial file format logic were originally derived from research by the awesome PyKotor project (licensed under LGPL-3.0-or-later) and verified against original game binaries using clean-room reverse engineering techniques (via Ghidra and ret-sync).
- This project is open-source and licensed under GPL-3.0-or-later.
- Star Wars: Knights of the Old Republic is a trademark of its respective owners. This passion project is not affiliated with, endorsed by, or connected to Bioware, LucasArts, or Disney in any way.
Format Implementation Reference
This launchpad tracks the implementation status of KotOR file formats across our parsing libraries (rakata-formats) and our strongly-typed wrappers (rakata-generics).
Status Legend:
Full: Binary reader/writer implemented with roundtrip tests.Generics: Strongly-typed wrappers and linting schemas implemented.Partial: Basic parsing support, advanced features deferred.Canonical: Validated against vanilla KotOR (K1) runtime behavior.
Archive Formats
| Format | Status | Notes |
|---|---|---|
| BIF | Full | Supports variable/fixed tables. Deterministic 4-byte payload alignment. BZF compression feature-gated. |
| KEY | Full | First-match lookup semantics (native verified). Duplicate key insertions ignored. |
| ERF | Full | Supports ERF/MOD/SAV. Optional blank-block emission for MODs is explicit opt-in. |
| RIM | Full | Supports V1.0. Offset fallback handled. Tight packing. |
GFF & Blueprints
| Format | Status | Notes |
|---|---|---|
| GFF Structure | Full | Core binary parity for structs/lists/fields. Localized strings supported. Stable list ordering. |
| Generics | Generics | 13 typed blueprints completed: ARE, DLG, GIT, IFO, UTC, UTD, UTE, UTI, UTM, UTP, UTS, UTT, UTW. Tied into rakata-lint. |
3D Models & Walkmeshes
| Format | Status | Notes |
|---|---|---|
| MDL/MDX | Full | Binary reader/writer with full geometry, node hierarchy, controllers, and MDX vertex data. ASCII reader/writer for modder interop. In-game verified. |
| BWM / WOK | Full | V1.0 binary tables (vertices, faces, materials, etc.). Strict bounds validation. |
Texture Formats
| Format | Status | Notes |
|---|---|---|
| TPC | Full | Container header/payload/footer. Canonical pixel-type mapping (DXT5 for type 4). Mip payload sizing matches native right-shift. |
| DDS | Full | Supports standard D3D headers and K1-specific CResDDS prefix (20-byte metadata). |
| TGA | Full | Reader normalizes to RGBA8888. Canonical mode rejects grayscale RLE. Lossless passthrough when source pixels are unmodified. |
| TXI | Full | ASCII format. Case-insensitive command tokens (native verified). Coordinate block support. |
Text & Data Formats
| Format | Status | Notes |
|---|---|---|
| 2DA | Full | Binary V2.b. |
| TLK | Full | Strict language-aware decode/encode. Validated against test.tlk. |
| VIS | Full | ASCII format. Case-insensitive room normalization. Deterministic ordering. |
| LYT | Full | ASCII format. Strict Windows-1252 text handling. Count-driven parsing. |
| LTR | Full | V1.0 headers. 28-char probability tables. |
Audio Formats
| Format | Status | Notes |
|---|---|---|
| WAV | Full | Standard RIFF + KotOR SFX/VO obfuscation wrappers. MP3-in-WAV unwrapping support. |
| LIP | Full | V1.0 header + keyframes. Deterministic writer. |
| SSF | Full | V1.1 header + 28-slot sound table. |
Missing / Deferred Formats
These formats are currently unimplemented or do not yet have strongly-typed wrappers in rakata-generics.
| Format | Status | Notes |
|---|---|---|
| NCS / NSS | Deferred | NWScript Source and Compiled bytecode. NCS decompilation is slated for future work via an independent pipeline. |
| GUI | Deferred | Graphical User Interface layout blueprints (GFF). |
| JRL | Deferred | Journal and quest tracking blueprints (GFF). |
| FAC | Deferred | Faction mappings and reputations (GFF). |
| PTH | Deferred | Pathfinding graphs and navigation waypoints (GFF). |
| ITP | Deferred | Item Palette definitions (GFF). |
| BIK | Deferred | Bink Video container (proprietary video format). Unlikely to be implemented natively. |
Provenance Policy
Because this project seeks to achieve strict interoperability with a two-decade-old engine, mere “correctness” is insufficient. We guarantee canonical behavior.
- Target: Canonical vanilla Star Wars: Knights of the Old Republic 1 (2003).
- Engine Audits: We do not guess how the engine behaves. Code is written exclusively from observed engine evidence notes derived from clean-room reverse engineering (via Ghidra/ret-sync). Every implementation choice is documented directly inside that format’s specific page on this site.
- Verification: Behaviors are locked via deep integration tests against synthetic fixtures. If a parser perfectly round-trips an invalid file but the game engine rejects it, it is treated as a critical bug.
Archive Formats
At the heart of the Odyssey Engine is its virtual file system. Instead of loading tens of thousands of tiny loose files straight from the local disk, the engine efficiently streams them from large, concatenated archive blobs. You can think of these formats as extremely specialized zip files used to store binary models, compiled scripts, textures, and UI data.
Note
KOTOR utilizes a highly strict two-tier architecture. BIF & KEY act as the core foundational registry for all base-game assets (e.g.
data/models.bifis mapped usingchitin.keyas the absolute global lookup index). Meanwhile, ERF & RIM files act as completely independent, self-contained archives used aggressively for loading localized module levels, stateful save games, and community mods.
Implementation Blueprints
| Format | Name | Layout & Purpose |
|---|---|---|
| BIF | Binary Information File | Massive binary payload silos containing raw game assets packed end-to-end. |
| KEY | Global Index File | Master lookup table mapping precise file names directly to their internal BIF payload offset block. |
| ERF | Encapsulated Resource File | Extremely versatile package format utilized heavily for modules (.mod), stateful save games (.sav), and generic archives (.erf). |
| RIM | Resource Image | Stripped-down, fast-loading, highly compact localized module containers (often used to split up geometry models vs dynamic entity layouts). |
BIF (Binary Information File)
BIFs are essentially giant, uncompressed data silos. Because they act as the raw storage tier of the KOTOR engine, they don’t waste bytes on complex metadata or internal filenames – they are simply pure, tightly packed continuous byte arrays for game resources. They are designed to be randomly accessed extremely quickly at runtime strictly via their companion KEY index file.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .bif |
| Magic Signatures | BIFF (version V1 ) |
| Type | Archive Blob Payload |
| Rust Reference | View rakata_formats::Bif in Rustdocs |
Data Model Structure
The rakata-formats crate handles raw Bif parsing for you by reading the internal offset tables. However, developers very rarely interact with a raw Bif file on its own.
- Unified Access: Typically, you’ll use the
KeyFileAPI (rakata_extract::keyfile::KeyFile), which automatically ties.keyindex files to their.bifdata payloads so you don’t have to map them yourself. - Seek Performance: To prevent loading 100MB+ binary files completely into memory just to read a tiny script, Rakata jumps directly to the exact file coordinate on your hard drive (via
KeyFile::read_resource_by_seek), extracting only the single resource you specifically asked for!
Tip
The compressed
BZFBIF variant did not exist in the original 2003 PC version of the game. It was added much later by Aspyr for their modern iOS, Android, and Nintendo Switch ports simply to save storage space on mobile devices. While our parser can read theBZFlayout, it falls slightly outside our core focus on the original PC version and hasn’t been heavily tested against real mobile game files yet.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for .bif archive headers mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoResFile::LoadHeader (0x0040d910) and CExoResFile::ReadResource (0x0040da20).)
Archive Initialization (CExoResFile::LoadHeader)
Mapped from 0x0040d910.
| Pipeline Event | Engine Behavior & Result |
|---|---|
| Signature Check | The engine strictly validates both the BIFF magic and the exact V1 version. It does not actively process any files that deviate from this signature pair. |
| Variable Table Loading | The system extracts the variable_count value from the header and physically reads variable_count * 16 bytes from the variable_table_offset to map the resource keys. |
| Fixed Table Bypass | The fixed_count header scalar is entirely decorative. It is not part of the active runtime read path (files with nonzero values are accepted but never mapped). |
| Direct Asset Extraction | When reading a physical asset out of the .bif, the engine isolates the entry_index using (resource_id & 0x3fff) * 0x10. It then calls a direct C fseek(SEEK_SET) strictly matching the raw data_offset extracted from the 16-byte variable table entry. No alignment or structural normalization is applied—the data is dumped entirely blindly. |
Caution
Because the engine passes the internal
data_offsetinteger directly into a raw Cfseek(SEEK_SET), any custom BIF files must meticulously guarantee byte-perfect offset tables. If the offset is even slightly misaligned, the engine will read garbage data into the stream, inevitably crashing the game.
KEY (Global Index)
Think of the KEY file as the absolute master table of contents governing the entire game directory. Because uncompressed BIF archives are completely blind payloads that contain no internal filenames, the KEY file acts as the singular, authoritative index that tells the engine exactly which BIF holds which file, and precisely where to seek inside that BIF to find it.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .key |
| Magic Signatures | KEY (version V1 ) |
| Type | Archive Global Index |
| Rust Reference | View rakata_formats::Key in Rustdocs |
Data Model Structure
The rakata-formats crate evaluates the .key file as the holy grail mapping for global engine initialization.
- Indices Hierarchy: Internally, the format houses an array of
bif_entriesbounding archive paths and sizes, alongside a massive array ofKeyResourceEntrystructures fusing a standardResRefstring and a formatTypeCodeto a bit-packed numericResourceId. - Conflict Resolution: Because the game engine relies on a strict override hierarchy, multiple KEYs might accidentally declare the same resource! When constructing the active dictionary out of a KEY file (
KeyFile::build_key_resource_index), Rakata explicitly utilizesor_insert()to strictly ensure only the first defined entry for a conflict is honored, perfectly mimicking the engine’s aggressive linear-scan precedence rules.
Engine Audits & Decompilation
The following information documents the KOTOR engine’s exact load sequence and field constraints for genuine .key files. All behavior was mapped natively from swkotor.exe during clean-room reverse engineering.
Key Table Registration (CExoKeyTable::AddKeyTableContents)
Mapped from 0x0040fb80.
| Action | Engine Behavior |
|---|---|
| Signature Check | Validates exactly for the KEY magic and the explicit V1 version signature. |
| Version Branching | There is absolutely zero logic handling any speculative V1.1 version branch in vanilla K1. It is currently unknown if a V1.1 KEY format actually exists in the wild, but the engine certainly wouldn’t load it. |
| Payload Mapping | Extrapolates the file location natively by tearing apart the ResourceId bitmask to locate both the target BIF file index and the internal struct array offset. |
Note
The engine handles
KEYtable loading extremely early in the application lifecycle duringCExoBase::InitObject. If a globalKEYfails to mount due to malformed headers, the engine immediately aborts execution.
ERF (Encapsulated Resource File)
ERFs are the heavy lifters for standard game modules (.mod) and save game architectures (.sav). Unlike BIFs, which rely entirely on an external KEY file to resolve their resource identities, ERFs are completely self-contained entities that carry their own internal file tables, localized descriptions, and asset payloads.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .erf, .mod, .hak, .sav |
| Magic Signatures | ERF , MOD , HAK , SAV (version V1.0) |
| Type | Self-Contained Archive |
| Rust Reference | View rakata_formats::Erf in Rustdocs |
Data Model Structure
Because ERF files share the exact same structural responsibility as RIM files (acting as self-contained module wrappers), the rakata-extract crate abstracts both ERF and RIM parsing directly behind the unified Capsule struct.
- Capsule Generalization: Standard module extraction relies entirely on calling
rakata_extract::Capsule::read_from_bytes(). This actively probes and dynamically mounts eitherERForRIMboundaries identically in memory, completely hiding the underlying structural container differences from the developer API.
Engine Audits & Decompilation
The following information documents the KOTOR engine’s exact load sequence and field requirements for genuine .erf capsule variants. All behavior was mapped natively from swkotor.exe during clean-room reverse engineering.
Capsule Header Initialization (CExoEncapsulatedFile::LoadHeader)
Mapped from 0x0040e1f0.
| Action | Engine Behavior |
|---|---|
| Signature Check | Explicitly validates the header against exactly matching ERF , MOD , or HAK signatures, paired with the mandatory V1.0 version string. |
| Unchecked Saves | The engine completely lacks a validation branch for .sav files. If a file is loaded as a Save Game (param flag 1), the engine falls through the validation tree and explicitly mandates the file use the MOD magic string natively. An ERF file with SAV magic will physically crash or reject here! |
| Header Truncation | The loader explicitly pulls the entire 160-byte header into scope (CExoFile::Read(..., 0xa0)), but only evaluates offsets 0x00 through 0x1C. Offset 0x18 (Key List) and anything beyond 0x1C is entirely ignored during initialization. |
Tip
The 116-Byte “Dead Zone” The giant block of bytes stretching from physical offsets
0x2Cdown to0xA0inside the 160-byte header is formally loaded into the engine’s active memory stack… and then completely discarded immediately. It is totally inert data containing old Bioware build metadata.
RIM (Resource Image)
RIM files operate as a radically leaner alternative to ERFs. They are used exclusively by the game engine for distributing absolutely essential or lightweight modules without the hefty structural metadata overhead of an ERF file. They provide rapid, self-contained loading for core engine environments.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .rim |
| Magic Signatures | RIM (version V1.0) |
| Type | Lightweight Archive |
| Rust Reference | View rakata_formats::Rim in Rustdocs |
Data Model Structure
Because RIM files act as a lightweight twin to the ERF format, the rakata-extract crate extracts them identically.
- Capsule Generalization: Standard module extraction relies entirely on calling
rakata_extract::Capsule::read_from_bytes(). The developer API makes absolutely no programmatic distinction between querying anERFmodule or aRIMmodule—it behaves perfectly seamlessly either way.
Engine Audits & Decompilation
The following information documents the KOTOR engine’s exact load sequence and field requirements for genuine .rim capsule variants. All behavior was mapped natively from swkotor.exe during clean-room reverse engineering.
Resource Image Overrides (CExoKeyTable::AddResourceImageContents)
Mapped from 0x0040f990.
| Action | Engine Behavior |
|---|---|
| Signature Check | Explicitly validates the exact RIM magic and the V1.0 version string implicitly upon loading. |
| Header Evaluation | The engine physically reads the entry_count (offset 0x0C) and the keys_offset (offset 0x10) from the header to explicitly navigate the file structures. |
Tip
The 96-Byte “Dead Zone” Exactly like the
ERFdead zone, RIM files feature a massive 96 bytes of completely inert padding sitting physically between offsets0x18and0x77inside the 120-byte header. The engine blindly sweeps right past it during initialization. It is perfectly safe to zero out this region when generating new synthetic fixtures.
GFF (Generic File Format)
The Generic File Format (GFF) is BioWare’s core binary serialization format, functioning like a binary JSON object or XML tree. It holds arbitrarily nested structures, typed fields, and lists, powering UI layouts, character sheets, dialogues, and area descriptions.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .gff, .utc, .uti, .utp, .ute, .utd, .dlg, .are, .ifo, etc. |
| Magic Signature | Target type (e.g. UTC ) / V3.2 |
| Type | Generic Hierarchical Data |
| Rust Reference | View rakata_formats::Gff in Rustdocs |
Data Model Structure
The rakata-formats crate gracefully abstracts the GFF struct/field/list indexing graph into a user-friendly memory model (rakata_formats::Gff).
- Typestate Wrapping: GFF natively supports discrete types (e.g.,
BYTE,SHORT,VOID,STRUCT,LIST).rakata_formats::GffValueencapsulates these identically, shielding developers from raw byte layouts and indirect arrays. - Data Deduplication: Unlike standard web JSON, GFF binaries limit all field labels to 16 characters and deduplicate them via a contiguous
LabelTable. Therakata-formatsimplementation mimics this memory layout exactly during serialization, guaranteeing structurally deterministic binaries natively acceptable by the engine!
Engine Audits & Decompilation
Binary: swkotor.exe
Serialization Architecture (WriteGFFFile)
Derived from 0x00413030 / 0x004113d0.
The engine allocates the output buffer entirely in-memory and serializes exactly 7 contiguous sections in an absolutely strict order. No inter-section padding or reserved alignment bytes are inserted anywhere natively. Each section’s byte-offset is dynamically snapshotted into the 56-byte header, operating as the canonical write path utilized for save games and area extraction.
| Phasing Order | Section Component | Memory Footprint / Quirk |
|---|---|---|
| Phase 1 | Root Header | Exactly 56 bytes (0x38). |
| Phase 2 | Struct Array | 12B × struct_count |
| Phase 3 | Field Array | 12B × field_count |
| Phase 4 | Label Array | 16B × label_count |
| Phase 5 | Field Data Blob | Arbitrary bounds constraint. |
| Phase 6 | Field Indices | Dynamic array bounds. |
| Phase 7 | List Indices | Dynamic array bounds. |
Warning
Because BioWare enforces fixed 16-byte elements inside the Label arrays, any label that exceeds 16 characters is strictly truncated by the engine array bounds.
Engine Blueprints: Specialized GFF Containers
While the gff.md reference explains the layout of raw GFF nodes, the engine frequently uses GFF as a structural wrapper to serialize completely deterministic entities known as Blueprints. These blueprints operate as the strict layouts defining creatures, dialogue trees, placeables, and area parameters.
Because rakata-lint provides deep behavioral validation over these blueprints natively, we have comprehensively audited how the K1 GOG executable (swkotor.exe) maps these layouts into active memory via its Load*FromGFF functions.
The Blueprint Engine Audits
The audits listed in this section’s navigation bar are formal, decompilation-backed blueprints cataloging KOTOR’s physical constraints. They document the exact fields, load phrasing, and engine rule evaluations that supersede any generic structural validity.
If a field exists in GFF but breaks the engine, our Linter rules will flag it using these documentation audits as the source of truth.
| Ext | Type | Core Function |
|---|---|---|
.are | Area Static Blueprint | Defines overarching static world properties (weather, day/night limits, physics constraints). |
.dlg | Dialogue | Encapsulates the conversation graph, branching logic, and cinematic execution sequences. |
.git | Game Instance Template | The physical object manifest. Orchestrates exact placement, vector orientations, and template spawning. |
.ifo | Module Info | Root environment metadata bridging modules together and orchestrating spawn states. |
.utc | Creature | Instantiates NPCs, stat-blocks, and character body configurations. |
.utd | Door | Configures transitions, linked bounds, and structural barriers. |
.ute | Encounter | Orchestrates dynamic boundary triggers and valid enemy spawning constraints. |
.uti | Item | Unifies structural stats across weapons, armors, and consumables. |
.utm | Store | Limits merchant arrays and details markup/markdown behaviors. |
.utp | Placeable | Standardizes interactive storage boxes, unusable statues, and deployable traps. |
.uts | Sound | Configures local dynamic audio emitters and distance volume calculations. |
.utt | Trigger | Plots physical interactive polygons tracking spatial events. |
.utw | Waypoint | Anchors spatial float positions for navigation grids and area transitions. |
ARE Format (Area Static Blueprint)
The Area (.are) blueprint format operates as the static environmental foundation of any game module. It establishes the rigid, overarching properties of a level, orchestrating the terrain’s grass rendering definitions, dynamic sunlight and fog constraints, ambient audio scale, and the primary interior/exterior state configurations. It effectively constructs the structural ‘stage’ that dynamic entities (like creatures and doors) populate later on.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .are |
| Magic Signature | ARE / V3.2 |
| Type | Area Static Blueprint |
| Rust Reference | View rakata_generics::Are in Rustdocs |
Data Model Structure
Rakata maps the Area definition directly into the rakata_generics::Are struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for .are files mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSArea::LoadArea at 0x0050e190.)
The initial LoadArea dispatch branches out to parse the .are GFF, .lyt layout, .git instance tracking, and .pth bounds. The engine processes roughly 61 scalar fields, 4 scripts, 3 lists, and a nested minigame struct natively within the LoadAreaHeader subroutine.
Core Environmental Identity
| Field Category | Engine Property & Behavioral Quirk |
|---|---|
| Identity | Name (LocString), Comments (String), ID (Int) -> Standard definition strings. |
| Identity | Tag (String) -> Lowercased on load (via CExoString::LowerCase). The only tag to behave this way! |
| Scripts | OnHeartbeat, OnUserDefined, ... -> CResRef script payloads. |
| State Flags | Flags (DWord) -> Bit 0 explicitly marks an Interior environment. |
| State Flags | RestrictMode (Byte) -> Hardcoded Event: Changing this to a non-zero value during gameplay forces CSWPartyTable::UnstealthParty. |
Note
Internal Weather Truncation If
Flags(Bit 0) marks the area as an interior space, the engine zeros out all weather properties upon load, actively discarding any prior weather assignments.
Weather & Terrain Generation
| Field | Type | Engine Evaluation |
|---|---|---|
ChanceFog | INT | Stored persistently as an integer. |
ChanceRain, ChanceSnow, ChanceLightning, WindPower | INT | Warning: The engine explicitly truncates these INT properties to 8-bit bytes at runtime. Values over 255 silently wrap around. |
Grass_TexName | ResRef | If empty or invalid, the engine forces a hard fallback to "grass". |
AlphaTest | FLOAT | Defaults to 0.2 (older tools commonly assume 0.0). |
Area Lighting & Sun/Moon Tracking
KOTOR handles dynamic sunlight constraints separately between Sun and Moon.
| Property Groups | Type | Engine Evaluation |
|---|---|---|
Fog Ranges (MoonFogNear/Far, SunFogNear/Far) | FLOAT | Defaults to an immense distance of 10000.0. The engine aggressively clamps values to be ≥0.0. |
Tints (*AmbientColor, *DiffuseColor, *FogColor) | DWORD | Processed seamlessly as standard DWORD color masks. |
Environment Shadows (ShadowOpacity, *Shadows) | BYTE | Basic toggles and opacities orchestrating render limits. |
Map Transitions & Saving states
| Feature Category | Engine Evaluation & Triggers |
|---|---|
| Minimap Logic | Geographic vectors (MapResX, spatial coordinate structs like WorldPt1X) are only loaded if an actual Minimap TGA/TPC asset matching the level name exists on disk! |
| Parsing Type | If read, the engine parses MapPt along a dual-path logic checking if it is formally a FLOAT or INT type. |
| Zoom Bias | Area maps evaluate MapZoom to a default scaling scalar of 1, not 0! |
| Stealth Save-States | The stealth framework leverages the .are struct to snapshot .StealthXPMax and .StealthXPCurrent directly as DWORDs when parsing the layout. |
The Minigame Struct
Read via CSWMiniGame::Load (0x006723d0). If a minigame context triggers, the .are reads the nested Type (DWORD mapping 1=Swoop, 2=Turret). It injects highly specialized float properties modifying basic terrain speeds:
| Field | Injection Default / Constraint |
|---|---|
LateralAccel | Defaults safely to 60.0. |
MovementPerSec | Scales to 6.0 (Swoops), 90.0 (Turrets), or 0.0 otherwise! |
Bump_Plane | Bounds are heavily clamped to 0..3. |
| Nested Arrays | The struct natively requires sub-struct Player arrays (Models, Camera, Axes) and Enemy/Obstacles lists to operate properly. |
Rakata Linter Rules
The core priority of rakata-lint is shielding users from fields that look valid in older editors but fail in the K1 engine. E.g., there are 19 distinct fields generated by standard modding tools (like DisableTransit, KOTOR 2 ForceRating, etc) that are completely evaluated as dead data by the vanilla engine.
(Seven crucial engine-read fields were previously obfuscated by strict model bindings, but now pass through source_root validation.)
Lint Diagnostics Implemented:
- Weather Truncation: Identifies Rain/Snow/Lightning chance above
255before they wrap around as bytes. - Context Discards: Flags interior environments that contain weather parameters the engine will inevitably zero out.
- Index Fallbacks: Informs the developer that an empty
Grass_TexNameoperates identically as"grass". - Behavioral Flags: Warns that area Tag edits are natively lowercased during instantiation.
DLG Format (Dialogue Blueprint)
Description: The Dialogue (.dlg) format is the beating heart of KOTOR’s storytelling. It acts as the master “script” for every conversation, cutscene, and cinematic sequence. Rather than just holding localized text, it acts as a branching storyboard that tells the engine exactly what the characters should say in audio, what animations they should perform, which camera angles to use, and when to fire off scripts that impact the plot.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .dlg |
| Magic Signature | DLG / V3.2 |
| Type | Dialogue Blueprint |
| Rust Reference | View rakata_generics::Dlg in Rustdocs |
Data Model Structure
Rakata parses the raw GFF structure into the rakata_generics::Dlg struct.
- Typestate Extraction: By extracting the loosely-typed GFF binary into a strict Rust struct, Rakata replaces unsafe dynamic string queries with compile-time guaranteed data types (such as
DlgAnimationandDlgCameramodels). - (Note:
rakata-lintdoes not currently implement behavioral validation for.dlgformats.)
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for .dlg files mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSDialog::LoadDialog (0x005a2ae0), cascading through LoadDialogBase (0x005a11c0) and LoadDialogCamera (0x005a1ab0).)
The LoadDialog subroutine processes the root-level conversation configuration before iterating over the heavily nested EntryList and ReplyList. For each of those conversational nodes, it delegates parsing to LoadDialogBase (for text and scripts) and LoadDialogCamera (for viewport directions).
Additionally, StartingList provides the dialogue entry points, while the StuntList associates cutscene actor models.
Root Conversation Configuration
| Field Category | Engine Property & Type | Notable Default or Behavioral Quirk |
|---|---|---|
| Identity & Rules | CameraModel (ResRef), DelayEntry/Reply (DWord) | Standard execution behaviors. |
| Identity & Rules | Skippable (Byte) | Defaults to 1 (True). |
| Logic Hooks | EndConversation, EndConverAbort (ResRefs) | Fire when the dialogue terminates abruptly or via conclusion. |
| Hardware Interfacing | ConversationType (Int) | 0 = Cinematic, 1 = Computer, 2 = Special. Cinematic explicitly unstealths the party. |
| Hardware Interfacing | ComputerType (Byte) | Only evaluated if ConversationType is 1. Otherwise, completely dead data. |
| Equipment & Actions | UnequipItems, UnequipHItem, AnimatedCut | AnimatedCut forces a global unpauseable state within the client if non-zero. |
Shared Dialogue Node Properties (LoadDialogBase)
These fields apply to both entries (NPC spoken) and replies (Player spoken), and are parsed via LoadDialogBase.
| Field | Type | Engine Evaluation |
|---|---|---|
Text | LocString | The spoken localized string. |
Script, Speaker, Quest | Strings/ResRefs | Standard execution scripts and entity mapping. |
Sound, VO_ResRef | ResRef | Sound Fallback: If Sound fails to execute, the engine will attempt to play VO_ResRef. If both fail, the bitmask SoundExists is forcibly downgraded to 0. |
Delay | DWord | Delay Special Case: If value is 0xFFFFFFFF, the engine explicitly reads from the root DelayEntry/DelayReply field instead and modulates WaitFlags! |
FadeType | Byte | Determines the FadeDelay and FadeLength. If set to 0 or missing, all fade configurations are zeroed inherently. |
Viewport Framing (LoadDialogCamera)
| Field | Type | Engine Evaluation |
|---|---|---|
CameraID | INT | Dependent Field: Only permitted when CameraAngle = 6 (Placeable Camera). Otherwise, the engine forces the ID to -1 regardless of the static binary value. |
CamFieldOfView | FLOAT | Aggressively validated. If the property is entirely missing or is explicitly negative, the engine forces the perspective to -1.0. |
CamHeightOffset, TarHeightOffset | FLOAT | Standard float deltas. |
Relational Data Trees
Dialogues operate as highly interconnected link-lists.
- Entry -> Reply Links (
RepliesListwithin an Entry Node): Maps theIndex(DWORD) to the overarching.ReplyListbounds. Unique in that it exclusively parses theDisplayInactiveByte. - Reply -> Entry Links (
EntriesListwithin a Reply Node): Maps theIndexto the.EntryListbounds. - Start Indices (
StartingList): Uses the exact same linkage schema as a Reply->Entry link. ValidatesIndexagainstentry_count.
Warning
Corrupted Link Constraints
Indexpaths are strictly evaluated against the internal array bounds prior to traversing. If a node tries to link out of bounds, it immediately triggers a fatalLoad Failurewithin the engine.
Ancillary Configuration Lists
- AnimList: Defines custom
Participantmodels and their accompanyingAnimation(WORD) action index to loop. - StuntList: Dictates which
StuntModelshould proxy standard rendering behavior for a givenParticipant.
Proposed Linter Rules
The rakata-lint dialogue ruleset has not been formally implemented yet. However, the following diagnostics are heavily recommended to combat engine failure domains directly derived from these decompilation audits:
- Camera Angle Compliance: Detect if
CameraIDholds a value whileCameraAngleis anything other than6, warning that the data is ignored by the engine. - Conversation Type Mismatch: Warn if a
ComputerTypesub-property is set, but the parentConversationTypeis not explicitly flagged to1(Computer Dialog). - Ghost Delay Flags: Warn when an entry delay is maxed (
0xFFFFFFFF), but execution triggers evaluate to an instantaneous termination sequence (Warning onSoundinvalidation). - Fatal Bounds Checking: Statically trace every
Indexparameter in node link-lists to ensure they never exceed array bounds and cause an engine hard-stop. - Context Zeroing: Inform the developer if fade delays are configured, but the parent
FadeTypeis0, causing the engine to discard the timings.
GIT Format (Game Instance Template)
Description: The Game Instance Template (.git) orchestrates the exact placement of every single entity within an environment. If the .are file is the underlying “stage”, the .git file acts as the blueprint for its “actors”–defining exactly where creatures initially spawn, where placeables sit, the physical rotation of doors, and the bounds of any active sound emitters.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .git |
| Magic Signature | GIT / V3.2 |
| Type | Instance Blueprint |
| Rust Reference | View rakata_generics::Git in Rustdocs |
Data Model Structure
Rakata parses the raw GFF structure into the rakata_generics::Git struct.
- Typestate Extraction: By extracting the loosely-typed GFF binary into a strict Rust struct, Rakata inherently standardizes all 13 object sub-lists, creating deterministic representations of
GitCreature,GitDoor,GitPlaceable, etc. - (Note:
rakata-lintdoes not currently implement behavioral validation for.gitformats.)
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for .git files mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSArea::LoadGIT at 0x0050dd80.)
The LoadGIT subroutine is a massive dispatcher. It evaluates 3 immediate root scalars before handing off evaluation to 13 distinct object-list loaders mapping entities. Crucially, the flag UseTemplates dominates this process by dictating whether these lists refer to external files or contain fully inline entity data.
Root Behavior Properties
| Field | Type | Engine Evaluation |
|---|---|---|
UseTemplates | BYTE | Controls whether object arrays read TemplateResRef to construct entities, or fall back to inline evaluation. |
CurrentWeather | BYTE | Standard BYTE. Zeroed to 0xFF on Interior Areas. |
WeatherStarted | BYTE | Standard BYTE. Zeroed to 0 on Interior Areas. |
(The engine validates weather fields against the .are properties immediately during load).
Field Naming Inconsistencies
Due to legacy asset sprawl, the engine evaluates vectors explicitly according to vastly different naming conventions depending entirely on the entity class. This is hardcoded into swkotor.exe.
| Target Lists | Position Paradigm | Orientation Paradigm |
|---|---|---|
| Creatures, Items, Waypoints, Stores | XPosition, YPosition, ZPosition | XOrientation, YOrientation, ZOrientation |
| Doors, Placeables | X, Y, Z | Bearing (Float angle) |
| Area Effects | PositionX, PositionY, PositionZ | OrientationX, OrientationY, OrientationZ |
Warning
Orientation Normalization The engine strictly evaluates 3D orientation logic. If a normalized orientation vector (like in
StoreListorAreaEffectList) inadvertently resolves to0.0unconditionally, the engine catches the math fault and applies a hard fallback vector to(0, 1, 0).
Standard Instance Arrays
Standard loaders evaluate the generic ObjectId, process the localized position/orientation floats, and dispatch behavior mapping logic.
| List Name | Struct Target | Engine Triggers & Fallbacks |
|---|---|---|
| Creature List | LoadCreatures | Positions are explicitly validated defensively through ComputeSafeLocation bounds. |
| Door List | LoadDoors | Save states trigger LoadObjectState. External templates dynamically route to LoadDoorExternal. |
| WaypointList | LoadWaypoints | Completely ignores UseTemplates–it solely relies on inline data! Z-height is shifted dynamically via ComputeHeight. |
| TriggerList | LoadTriggers | Geometry properties reuse native UTT formatting. Contains unique linkage arrays: LinkedToModule, TransitionDestination, LinkedTo. |
Specialized Struct Parsings
| Engine Dispatch Target | Description & Findings |
|---|---|
LoadSounds (0x00505560) | Discard logic: Translates GeneratedType via DWord, but physically truncates it to an 8-bit byte on save, silently discarding the upper 24 bits! |
LoadEncounters (0x00505060) | Highly nested structural array reusing both Geometry and SpawnPointList formats natively built for UTE boundaries. |
LoadPlaceableCameras (0x00505eb0) | Client-side only struct that reads composite GFF spatial types correctly natively! Camera Limit: If it hits 51 camera entries, the loader formally rejects it. |
“List” (Items) (0x00504de0) | Bizarrely, the generic parent entity list List is used specifically to orchestrate Item instances! |
Singular Structs
- AreaProperties: Orchestrates stealth behavior state tracking and dynamic audio states. It physically reads
AmbientSndDayVol/AmbientSndNitVoland explicitly truncates theirINTdeclarations into a single native runtime byte value. - AreaMap: Strict binary blobs evaluating rendering properties (
AreaMapData). It is absolutely bypassed during fresh loads, only executed conditionally during save-game states.
Proposed Linter Rules (Rakata-Lint)
The rakata-lint engine hasn’t implemented git.rs validations yet. However, the exact engine behaviors discovered during decompilation dictate these static constraints:
- Weather Zeroing: If
CurrentWeatherorWeatherStartedare configured on an area interior, the engine forcibly zeroes them immediately on load. - Camera Array Bounds: If a
CameraListcontains 51 or more entries, it triggers an immediate engine-level loader failure. - Stealth Clamping Constraint: The engine triggers hard integer clamping on
StealthXPCurrentagainst theStealthXPMaxbounds thresholds during evaluation. - Volume Sub-Type Truncation: If
AmbientSndDayVolorGeneratedTypeexceed 255, the engine natively wraps the integer into an 8-bit byte value, resulting in immediate data wrapping/corruption.
IFO Format (Module Info Blueprint)
Description: The Module Info (.ifo) is the absolute root metadata file for any environment. It dictates global module behavior, handling everything from the starting spawn location, to the local calendar and time-of-day progression, to script execution for global module events.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .ifo |
| Magic Signature | IFO / V3.2 |
| Type | Module Blueprint |
| Rust Reference | View rakata_generics::Ifo in Rustdocs |
Data Model Structure
Rakata parses the raw GFF structure into the rakata_generics::Ifo struct.
- (Note:
rakata-lintdoes not currently implement behavioral validation for.ifoformats.)
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for .ifo files mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSModule::LoadModuleStart at 0x004c9050.)
Global State Configurations
| Field | Type | Engine Evaluation |
|---|---|---|
Mod_Entry_Area | ResRef | The primary spawning area ResRef. |
Mod_Entry_X / Mod_Entry_Y / Mod_Entry_Z | FLOAT | Exact spawning XYZ coordinates. |
Mod_Entry_Dir_X / Mod_Entry_Dir_Y | FLOAT | Entry Direction Fallback: If the engine cannot evaluate Mod_Entry_Dir_Y, it forces a hard graphical fallback rendering the entity facing east (X=1.0, Y=0.0). |
Mod_XPScale | BYTE | Globals XP multiplier scale. Defaults natively to 10. |
Time & Cycle Management
| Field | Type | Description |
|---|---|---|
Mod_DawnHour | BYTE | Dawn hour integer marker. |
Mod_DuskHour | BYTE | Dusk hour integer marker. |
Mod_MinPerHour | BYTE | Configuration for exactly how many real-time active gameplay minutes constitute a module hour limit. |
Note
Day/Night Cycle Computations The engine continuously computes localized day/night phases explicitly against
Mod_DawnHour,Mod_DuskHour, and thecurrent_hour. This dynamically updates an internal state flag denoting:1=Day,2=Night,3=Dawn,4=Dusk.
Global Event Scripts
Event scripts are universally evaluated as string ResRef pointers executing compiled NSS logic. The engine evaluates 15 separate global events (like Mod_OnHeartbeat, Mod_OnModLoad, Mod_OnClientEntr, Mod_OnPlrDeath, etc).
- Asymmetric I/O (Equipping): The
Mod_OnEquipItemarray natively loads during absolute module startup bounds (LoadModuleStart), however, it is entirely omitted and ignored during the save-game serialization cycle (SaveModuleIFOStart).
Safe-State Injection (Save Games Only)
Certain blocks of data inside the .ifo are deliberately evaluated only when the engine is mounting a module directly from a loaded .sav archive block.
| Engine Target | Description |
|---|---|
| Player / Mod Variables | Structures like Mod_PlayerList, Mod_Tokens, VarTable, and the EventQueue are strictly bypassed unless natively evaluated under is_save_game conditions. |
| Area Overrides | The Mod_Area_list technically supports arrays (for NWN legacy), but KOTOR strictly enforces a single active area boundary. The secondary ObjectId within this specific array is only ever read natively inside a save state flow. |
| Legacy Hak De-sync | “Hak Packs” are custom override archives natively used in Neverwinter Nights (the engine’s predecessor). While KOTOR’s save routine (SaveModuleIFOStart) blindly writes a Mod_Hak string into save-games as leftover legacy behavior, the actual load cycle (LoadModuleStart) completely ignores it. Modders cannot use this field to hook custom archives. |
Proposed Linter Rules (Rakata-Lint)
While rakata-lint does not currently implement .ifo validation, the exact engine behaviors discovered during decompilation dictate these static constraints:
- Direction Fallback: If
Mod_Entry_Dir_XandMod_Entry_Dir_Yboth evaluate unconditionally to0.0, the engine forces an unrecorded fallback direction locking the player spawn sequence toward(1.0, 0.0). - XP Dead-Scaling: Since the
Mod_XPScaledefault value evaluates to10, any unexpected baseline of0aggressively halts all localized XP acquisition flows. - Eternal Day/Night Bounds: If
Mod_DawnHourstrictly equalsMod_DuskHour, the module becomes hopelessly locked into perpetual daylight configurations. - Void Area Initialization: An empty
Mod_Area_listarray directly faults the load cycle, as the module has no physical payload layout to inject the player into. - Dangling NWM Structure: Setting
Mod_IsNWMFileto1without deploying the conditionally mandatoryMod_NWMResNameevaluates to an unstable execution state.
UTC Format (Creature Blueprint)
Description: The Creature (.utc) blueprint format defines the attributes, stats, and behavior of all in-scene NPCs and monsters. It covers a creature’s identity, class/level, appearance, equipment, and event scripts. Because they hold so much state, Creatures are one of the most dynamic and memory-heavy templates processed by the Odyssey Engine.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .utc |
| Magic Signature | UTC / V3.2 |
| Type | Creature Blueprint |
| Rust Reference | View rakata_generics::Utc in Rustdocs |
Data Model Structure
Rakata maps the Creature definition directly into the rakata_generics::Utc struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
A Creature breaks down into six main categories:
- Core Statistics: The basic stats that define the creature’s physical capabilities (e.g.,
Strength,Dexterity, baseHitPoints). - Identity & Graphics: Identifiers that define who the creature is and what 3D model they use (e.g.,
Tag,Appearance_Type,Conversation). - Class & Skill Progression: The mechanics that define their level, classes, and skills (e.g.,
ClassList,SkillList). - Combat Capabilities: The specific feats and Force powers the creature can use (e.g.,
FeatList,SpellList). - Inventory & Equipment: The exact items the creature spawns with, including both equipped gear and inventory drops (e.g.,
Equip_ItemList,ItemList). - Event Hooks (
Scripts): The behavior scripts that run when the creature reacts to the world, such as taking damage or noticing an enemy (e.g.,OnNotice,OnDamaged).
- State Validation:
rakata-lintchecks the data against engine constraints to prevent fatal runtime crashes.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for .utc files mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSCreatureStats::ReadStatsFromGff at 0x005afce0.)
Structural Load Phasing
| Function | Size | Behavior |
|---|---|---|
ReadStatsFromGff | 7835 B | The massive initial pass that parses 57 basic creature scalars including strength, dexterity, and physical appearance. |
LoadCreature | – | Sets up how the creature physically sits in the world, handling their stealth states, collision size, and idle animations. |
ReadScriptsFromGff | – | Attaches all the custom event scripts that fire when the creature notices an enemy, takes damage, dies, or simply stands around (heartbeat). |
ReadItemsFromGff | – | Pulls all loot into memory, structuring items specifically into equipped slots, the backpack, or dropping them entirely if a creature spawns dead. |
ReadSpellsFromGff | – | Specifically extracts the list of any Force powers or combat feats the creature is allowed to use. |
Note
Zeroed Data Elements Legacy structures referencing
TailandWingsare explicitly hardcoded to0during parsing and completely bypassed by the binary loader.
Core Structural Findings
The engine strictly validates parameters when loading a .utc file. Improper formatting will trigger some of KOTOR’s most notorious game crashes.
Warning
Understanding Fatal Crash Codes (
0x5fX) When the game engine parses a file and hits an invalid stat, it completely aborts loading. Instead of recovering gracefully, the engine deliberately triggers a fatal crash to your desktop and returns a specific hexadecimal error code (e.g.,0x5f7or0x5f4). The rules below track the specific scenarios where the game will crash.
| Engine Rule | Runtime Behavior |
|---|---|
| Class Limits | The engine expects a strict limit of 2 discrete class types. Providing duplicate class configuration completely crashes the game (Engine Error 0x5f7). |
| Race Bounds | The engine compares Race against the compiled row count of racialtypes.2da. Exceeding this boundary fatally crashes the map loader (Engine Error 0x5f4). |
| Saves Calculation | Pre-computed saving throws (SaveWill, SaveFortitude) in the .utc file are completely ignored dead data. The engine overrides them exclusively by reading willbonus and fortbonus. |
| Perception Faults | A non-PC PerceptionRange initiates a read against appearance.2da for PERCEPTIONDIST. Failing to resolve this distance fails the entire creature load (Engine Error 0x5f5). |
| Movement Fallbacks | If a unique MovementRate isn’t declared, the engine logic falls back directly to default WalkRate parameters. |
| Hard Clamping | The engine strictly limits specific numeric bounds upon load: Gender is clamped structurally at a maximum of 4, and GoodEvil is fiercely clamped so that it cannot exceed 100. |
| Appearance Shifting | If Appearance_Head is 0, the engine overrides it to 1 to prevent rendering bugs. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Legacy Engine Artifacts | A staggering 17 .utc fields (such as Morale, SaveWill, BlindSpot, PaletteID) present in older files are actually Neverwinter Nights or KOTOR 2 superset metrics that the K1 engine natively ignores. |
Implemented Linter Rules (Rakata-Lint)
These static constraints are targeted for implementation under rakata_lint::rules::utc.
- Class Duplications: Checks if a creature is misconfigured with identical core class identifiers (preventing Game Crash Error
0x5f7). - Race Bounds: Asserts the mapped
Raceidentifier exists against the actual row bounds of the compiledracialtypes.2damap (preventing Game Crash Error0x5f4). - Class Limit: Ensures the creature never exceeds the hard-limit of two defined classes.
- Structure Clamping: Flags invalid scalars by actively verifying
Gender(max 4) andGoodEvil(max 100) configurations, directly mirroring the binary’s hard clamp logic. - Appearance Correction: Detects unconfigured
Appearance_Headfields tracking to 0, predicting the engine’s hard override to 1. - Dead Field Tracking: Validates that legacy or ignored values (like
SaveWillandSaveFortitude) aren’t configured, saving payload evaluation cost.
UTD Format (Door Blueprint)
Description: The Door (.utd) blueprint defines interactive pathways on a level map. Beyond acting as physical barriers or transitions between areas, doors house lock mechanics, trap configurations, script hooks, and basic visual states (open, destroyed, jammed).
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .utd |
| Magic Signature | UTD / V3.2 |
| Type | Door Blueprint |
| Rust Reference | View rakata_generics::Utd in Rustdocs |
Data Model Structure
Rakata maps the Door definition directly into the rakata_generics::Utd struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
A Door breaks down into four main categories:
- Core Identity & Geometry: The configuration for what the door looks like, its faction, and the text displayed when targeted (e.g.,
Appearance,TemplateResRef,LocName). - Lock & Trap Mechanics: The parameters defining whether it’s locked, what key is needed, and rules for any attached traps (e.g.,
Locked,KeyName,TrapType,DisarmDC). - Transition Pathways: The linked destination used when a door acts as a loading zone to another area (e.g.,
LinkedTo,LinkedToFlags). - Behavioral Hooks (
Scripts): The scripts that run when a player opens, destroys, or fails to unlock the door (e.g.,OnOpen,OnFailToOpen,OnMeleeAttacked).
- Active Validation:
rakata-lintenforces checks against missing keys or invalid transition references before a module ever reaches the game engine.
Engine Audits & Decompilation
The following information documents the engine’s exact load sequence and field requirements for .utd files mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from the primary dispatcher CSWSDoor::LoadDoor at 0x0058a1f0.)
Structural Load Phasing
The engine processes a Door structurally by mapping its sub-fields into distinct operational constraints.
| Domain | Sub-fields Evaluated | Purpose |
|---|---|---|
| Scales & State | 22 | Reads the physical health, visual appearance, and base traits determining whether the door is locked or indestructible. |
| Hooks | 15 | Attaches custom event scripts that fire when the door is opened, forced, unlocked, or trapped. |
| Mechanical | 9 | Configures the lock difficulty tiers and the specific skill hurdles required to detect and disarm any attached traps. |
| Transitions | 4 | Links the door strictly to another area (.are), turning it into a physical loading screen transition node. |
Core Structural Findings
The CSWSDoor parser natively guarantees strict state adjustments upon parsing.
| Engine Rule | Runtime Behavior |
|---|---|
| Appearance Truncation | The engine reads Appearance as a 32-bit integer but forcefully truncates it to a single byte ((byte)uVar5). Any ID above 255 automatically wraps to 0 and breaks the physical door model. |
| Static Enforcement | If the door is marked Static, the engine automatically forces plot = 1. This safely guarantees that static level architecture cannot be destroyed by players. |
| Portrait Fallbacks | If PortraitId is 0, the engine hardcodes it to 0x22E. If it is >= 0xFFFE, the engine ignores the integer and falls back to looking up the string Portrait resref instead. |
| Trap Hook Fallback | If the OnTrapTriggered script is left empty, set to null, or literally named "default", the engine pulls the default standard script from traps.2da instead. |
| HP Synchronization | CurrentHP is securely clamped against the door’s maximum HP to prevent overflow bugs. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Legacy Engine Artifacts | 7 explicitly mapped template structures (like AnimationState, NotBlastable, OpenLockDiff) are Neverwinter Nights or KOTOR 2 legacy dependencies inherently ignored by the K1 parser. |
Implemented Linter Rules (Rakata-Lint)
These static constraints are targeted for implementation under rakata_lint::rules::utd.
- Truncation Faults: (Pending) Flags
Appearancevalues over255to prevent the engine from wrapping the 32-bit integer out of bounds. - Static Parity: Asserts that
Plotis active ifStaticis also active. - Invalid Hooks: (Pending) Scans for explicitly empty or
"default"OnTrapTriggeredreferences that invoke thetraps.2dafallback. - Portrait Anomalies: (Pending) Detects
PortraitIdmappings equal to0or>= 0xFFFE. - HP Bounds: Ensures initialized
CurrentHPsafely rests at or below the standardHPtotal.
UTE Format (Encounter Blueprint)
Description: The Encounter (.ute) blueprint defines interactive spawn points and boundary triggers across a level map. Instead of acting merely as a spatial zone, encounters handle complex difficulty scaling, bubble-sort creature limits, and explicit coordinate vertices to dynamically deploy combatants when a player crosses their geometry bounds.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .ute |
| Magic Signature | UTE / V3.2 |
| Type | Encounter Blueprint |
| Rust Reference | View rakata_generics::Ute in Rustdocs |
Data Model Structure
Rakata maps the Encounter definition directly into the rakata_generics::Ute struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
An Encounter breaks down into four main categories:
- Spawn Population (
CreatureList): The list of creature blueprints the encounter can spawn. - Difficulty & Limits: Setting how many creatures spawn at once and how difficult they should be relative to the player (e.g.,
MaxCreatures,DifficultyIndex). - Trigger Boundaries (
Geometry): The coordinates defining the physical tripwire that triggers the spawn. - Behavioral Hooks (
Scripts): The scripts that run when a player enters or exits the trigger, or when the spawn pool runs dry (e.g.,OnEntered,OnExhausted).
- Model Validation:
rakata-lintchecks the data against engine constraints to prevent fatal runtime crashes.
Engine Audits & Decompilation
The following information documents the engine’s exact load sequence and field requirements for .ute files mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from the primary dispatcher CSWSEncounter::LoadEncounter at 0x00593830.)
Structural Load Phasing
The engine processes an Encounter structurally across several chunked subroutines, each responsible for unique spatial and logic bindings.
| Function | Size | Behavior |
|---|---|---|
ReadEncounterFromGff | 3445 B | The initial pass that sets up the encounter’s identity, difficulty limits, and the spawn list. |
ReadEncounterScriptsFromGff | 567 B | Attaches scripts that trigger when players enter, exit, or exhaust the spawn pool. |
LoadEncounterSpawnPoints | 364 B | Reads the coordinates so the engine knows exactly where to spawn the creatures. |
LoadEncounterGeometry | 651 B | Reads the coordinates that trace the trigger’s boundaries on the floor. |
Core Structural Findings
The engine rigorously evaluates geometric and spatial boundaries. Improper definitions break the spawn mapping algorithm.
Warning
Understanding Fatal Log Drops While minor coordinate math errors usually just cause creatures to spawn inside walls, failing strict geometry constraints causes KOTOR to abruptly abort parsing the Encounter. Specifically, if a
.utefile declares it has geometry boundaries but fails to provide the actual coordinate vertices, the engine dumps a fatal error to its trace log and refuses to spawn the encounter at all.
| Engine Rule | Runtime Behavior |
|---|---|
| Tag Overrides | The engine forcefully converts any Tag to all-lowercase via CSWSObject::SetTag. Any static casing is lost immediately upon load. |
| Geometry Integrity | If Geometry is explicitly defined but has 0 vertices, the engine logs a “has geometry, but no vertices” error and aborts loading the encounter entirely. |
| Geometry Synthesis | If the Geometry list is completely omitted from the blueprint, the engine falls back and safely synthesizes a default 4-vertex spatial box. |
| Difficulty Resolution | The engine prioritizes using DifficultyIndex to look up the difficulty in encdifficulty.2da. The static Difficulty field is ignored unless the 2DA table fails to resolve. |
| Bubble Sorting | Upon loading the CreatureList, the engine runs a Bubble Sort algorithm to firmly re-order the encounter’s spawn pool by ascending CR (Challenge Rating), completely overriding any custom static display order. |
| Area Instantiation | AreaList buffer allocation size is strictly dictated by AreaListMaxSize. If the real list exceeds this size, the buffer will silently overrun. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Passive Legacy Artifacts | Unused fields left over from older tools or Odyssey branches (e.g., TemplateResRef, Comment, PaletteID) are completely dark. The engine inherently ignores them. |
| Superseded Legacy Fields | The static Difficulty field is a completely inactive legacy metric as long as DifficultyIndex maps to a valid row inside encdifficulty.2da. |
Implemented Linter Rules (Rakata-Lint)
These rules are documented for engine parity but are not yet implemented into rakata-lint/src/rules/.
- Dead Difficulty Traces: (Pending) Flags instances where a file statically defines
Difficultyalongside a validDifficultyIndex. - Deficient Spawn Loops: (Pending) Warns when an Encounter evaluates as
Activebut initializes a completely emptyCreatureList. - Dead Field Evaluation: (Pending) Maps extraneous legacy engine artifacts (
TemplateResRef,Comment,PaletteID) as dead fields.
UTI Format (Item Blueprint)
The Item (.uti) blueprint serves as the central data model for all tangible loot, weapons, armor, and usable gear in the game. It defines how an item physically appears on characters, what custom properties or stat bonuses it applies through specific upgrade hierarchies, its intrinsic monetary cost, and exactly what its runtime state behaves like when dropped into the world map.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .uti |
| Magic Signature | UTI / V3.2 |
| Type | Item Blueprint |
| Rust Reference | View rakata_generics::Uti in Rustdocs |
Data Model Structure
Rakata maps the Item definition directly into the rakata_generics::Uti struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
An Item breaks down into four main categories:
- Core Identity: The basic text strings that provide the item’s name and description, including both identified and unidentified states (e.g.,
TemplateResRef,LocName,Description). - Economic & Charge Mechanics: The value of the item, and the number of charges left for consumable abilities (e.g.,
Cost,Charges). - Visual Geometry (Appearance): Setting what the item looks like when dropped on the floor or equipped (e.g.,
ModelVariation,TextureVar). - Combat & Upgrade Properties (
PropertiesList): The stat buffs, damage modifiers, and abilities bound to the item, alongside slots for workbench upgrades.
- Model Validation:
rakata-lintchecks the data against engine constraints to prevent fatal runtime crashes.
Engine Audits & Decompilation
The following information documents the engine’s exact load sequence and field requirements for .uti files mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from the primary dispatcher CSWSItem::LoadDataFromGff at 0x0055fcd0.)
Structural Load Phasing
The engine processes an Item structurally across multi-pass capabilities mappings.
| Function | Size | Behavior |
|---|---|---|
LoadDataFromGff | – | The main parser that sets what the item is, how many charges it holds, and its descriptions. |
LoadItemPropertiesFromGff | – | Reads the special properties (like energy damage or stat boosts), splitting them into ‘useable’ abilities versus permanent buffs. |
LoadItem | – | The constructor that decides whether to load the item onto a character or leave it idle in an inventory. |
LoadFromTemplate | – | A fallback used when spawning an item dynamically from a script instead of off a character. |
SaveItem / SaveItemProperties | – | The opposite pipeline that writes the item into a save game, which notoriously forces the item to always be flagged as “Identified”. |
Core Structural Findings
The engine rigorously evaluates base-item mapping constraints from 2DA arrays and aggressively overrides improperly defined models.
| Engine Rule | Runtime Behavior |
|---|---|
| Description Cross-Swap | If either Description or DescIdentified is missing, the engine automatically duplicates the provided string into the missing field so item identification mechanics never crash the game. |
| Model Truncation | If an older tool incorrectly configures ModelVariation to 0, the engine forcefully bumps it to 1 upon load, ensuring the item always has visible geometry instead of rendering an invisible weapon or armor piece. |
| Model & Body Variation Hooks | The engine completely ignores the .uti’s BodyVariation field, opting instead to enforce the exact body_var value predefined in baseitems.2da. Additionally, TextureVar is unconditionally bypassed unless the item’s base type is strictly configured as Model Type 1. |
| Cost Generation Fallback | The physical Cost integer provided in the file is dead data. The engine strictly computes economic value actively via GetCost() calculations based on its properties, completely ignoring your defined value. |
| Identifier Enforcement | During explicit serializing via SaveItem (when the player creates a save game), the engine actively forces and hardcodes Identified to 1 unconditionally. |
| Property Capabilities | Item properties are structurally split into Active and Passive memory tables at load. The engine evaluates every PropertyName index: any ID strictly mapping to 10, 37, 46, or 53 (e.g., Cast Power, Trap) is actively hooked as a usable player ability, while all other integers are silently applied as passive stat modifiers. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Superseded Legacy Fields | Directly supplying static Cost or BodyVariation values is a byproduct of older file versions; these remain inherently unused overhead compared to the physical runtime 2DA evaluation. |
| Passive Legacy Artifacts | General nodes left over from older tools (like TemplateResRef, Comment, PaletteID, and explicitly UpgradeLevel) are bypassed on load entirely. |
Linter Rules
These rules are documented for engine parity but are not yet implemented into rakata-lint/src/rules/.
- Dead Cost Fields: (Pending) Diagnoses static
.utifiles configured with explicitCostdeclarations tracking identically to dead data. - Model Truncation Safety: (Pending) Throws a validation error if
ModelVariationstatically rests at0to prevent runtime geometric wrapping to1. - Dead Body Overrides: (Pending) Flags redundant definitions of
BodyVariationto eliminatebaseitems.2daduplicate resolution. - Valid Capability Bounds: (Pending) Scans all properties directly ensuring
PropertyName,UpgradeType(0xFF), andUsesPerDay(0xFF) meet standard operational targets.
UTM Format (Merchant Blueprint)
Description: The Merchant (.utm) blueprint natively handles the interactive storefront data for merchants and shops. Because shops strictly behave as container interfaces that dynamically buy, sell, and map economic value onto spawned .uti items, the structure of a .utm is highly compact, primarily consisting of economic markups and inventory sorting parameters.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .utm |
| Magic Signature | UTM / V3.2 |
| Type | Merchant Blueprint |
| Rust Reference | View rakata_generics::Utm in Rustdocs |
Data Model Structure
Rakata maps the Merchant definition directly into the rakata_generics::Utm struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
A Merchant breaks down into three main categories:
- Core Identity: The basic identifiers providing the shop’s name and tag (e.g.,
Tag,LocName). - Economic Metrics: The percentages controlling price scaling when buying or selling items, alongside basic shop rules (e.g.,
MarkUp,MarkDown,BuySellFlag). - Store Inventory (
ItemList): The list of items actively available in the shop’s stock, including rules for infinite regeneration.
- State Validation:
rakata-lintchecks the data against engine constraints to ensure merchants don’t silently fail during initialization.
Engine Audits & Decompilation
Because .utm evaluating is structurally straightforward, the engine bypasses heavy memory allocations and maps fields in an incredibly fast iteration.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSStore::LoadStore at 0x005c7180.)
Structural Load Phasing
| Function | Size | Behavior |
|---|---|---|
LoadStore | 1341 B | The primary parser that pulls the merchant’s basic identity, economic constraints (MarkUp/MarkDown), and buying capabilities. |
ItemList Read | – | Iterates through the list of store stock, actively pulling either explicitly saved item instances or generating them freshly from templates (InventoryRes). |
AddItemToInventory | – | Pushes the fully sorted loot stack into the physical storefront container so the player can actually interact with and purchase them. |
Core Structural Findings
| Engine Rule | Runtime Behavior |
|---|---|
| Cost Sorting | When building the store inventory, the engine actively sorts the merchant’s final stock from cheapest to most expensive by checking the cost of each item. This completely overrides whatever custom display order you try to dictate statically. |
| Dynamic Economics | The engine relies entirely on the MarkUp and MarkDown integers to control shop prices. These act as simple percentages that mathematically bump or slash the base cost of every item the merchant sells or buys. |
| Buy/Sell Bit Flags | BuySellFlag is split into basic toggles: bit 0 controls whether you are allowed to sell your gear to the merchant, and bit 1 controls whether the merchant will actually sell anything to you. |
| Infinite Stacking | If an item is flagged as Infinite, the engine specifically locks that item in memory so that no matter how many times a player buys it, the shop never physically runs out of stock. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Legacy Interface Configurations | Some older tools expose positional values like Repos_PosX or Repos_PosY inherited from other Odyssey games, but the engine completely ignores them. The game physically builds its shop UI dynamically when you open it, rendering those grid coordinates totally useless. |
Implemented Linter Rules (Rakata-Lint)
These static constraints are targeted for implementation under rakata_lint::rules::utm.
- Economic Bounding: (Pending) Ensures
MarkUpandMarkDownexist natively asINTtypes, preventing memory reads from failing parsing boundaries. - Flag Enforcement: (Pending) Actively asserts
BuySellFlagandInfinitemap strictly toBYTElogic to prevent memory overhang collisions. - Reference Mapping: (Pending) Confirms
OnOpenStorescript hooks perfectly resolve to active files natively. - Inventory Integrity: (Pending) Prevents broken shops by verifying
InventoryResstrings identically match standard 16-character limits natively linking to valid.utiitems.
UTP Format (Placeable Blueprint)
Description: The Placeable (.utp) blueprint dictates the configuration of universally interactive scenery and containers within a map. Ranging from simple locked footlockers to rigged command consoles and explodable starship barricades, .utp structs blend physical static properties (like structural HP and lock difficulties) with heavy dynamic script bindings.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .utp |
| Magic Signature | UTP / V3.2 |
| Type | Placeable Blueprint |
| Rust Reference | View rakata_generics::Utp in Rustdocs |
Data Model Structure
Rakata maps the Placeable definition directly into the rakata_generics::Utp struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
A Placeable breaks down into five main categories:
- Core Identity & Geometry: The configuration for what the placeable looks like, its faction, and the text displayed when targeted (e.g.,
Appearance,TemplateResRef,LocName). - Interactive State & Dialogue: Flags determining if the placeable can be clicked, if it starts a conversation/computer sequence, or if it acts as a loot container (e.g.,
Useable,Conversation,HasInventory). - Lock & Trap Mechanics: The parameters defining whether it’s locked, what key is needed, and rules for any attached traps (e.g.,
Locked,KeyName,TrapType,DisarmDC). - Health & Destruction: The physical integrity of the object, defining if it can be destroyed and its defensive thresholds (e.g.,
HP,Hardness,Static,Plot). - Behavioral Hooks (
Scripts): The scripts that run when a player explores, attacks, or opens the placeable (e.g.,OnOpen,OnInvDisturbed,OnDamaged).
- State Validation:
rakata-lintchecks the data against engine constraints to prevent runtime bugs.
Engine Audits & Decompilation
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSPlaceable::LoadPlaceable at 0x00585670.)
Because Placeables act as physical junctions for event hooking, they expose a massive suite of script triggers natively.
Structural Load Phasing
| Function | Size | Behavior |
|---|---|---|
LoadPlaceable | 5092 B | The primary physical parser evaluating 46 core metrics including health, conversation dialogues, basic trap bindings, and physical alignment states. |
ReadScriptsFromGff | – | Attaches 16 dedicated script hooks dictating behavior when the placeable is bashed, opened, unlocked, or triggered. |
Core Structural Findings
| Engine Rule | Runtime Behavior |
|---|---|
| Appearance Truncation | The engine reads Appearance as a 32-bit integer but forcefully truncates it to a single byte. Any ID above 255 automatically wraps to 0 and physically breaks the placeable model rendering. |
| Static vs. Plot Chaining | Just like Doors, if a Placeable is marked Static=1, the engine completely overrides all other behaviors and acts as if Plot=1 is true, making the placeable totally indestructible even if it has an HP value defined. |
| Default Usability Check | If the Static toggle is completely missing from the binary file, the engine automatically derives it by actively checking if the Placeable is marked as usable (!Useable). |
| Ground Pile Forcing | The engine reads whatever value you place in GroundPile, but physically overwrites it and forces it to 1 in memory, making native static configuration of this field utterly pointless. |
| Missing Door Hooks | Toolsets erroneously expose OnFailToOpen for Placeables, but the engine specifically treats this as a Door-exclusive (.utd) script hook and completely ignores it here. |
| Trap Hook Fallback | If a trap bounds check fails or the OnTrapTriggered script is left blank, the engine automatically attempts to read the traps.2da table and pulls the default script based on the specific TrapType. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Legacy Engine Artifacts | Placeable binaries are littered with legacy metrics from older tools or other Odyssey games (Comment, OpenLockDiff, Interruptable, Type, PaletteID). The physical KOTOR engine constructor entirely ignores these. |
Implemented Linter Rules (Rakata-Lint)
These static constraints are targeted for implementation under rakata_lint::rules::utp.
- Appearance Truncation: (Pending) Prevents rendering crashes by asserting
Appearancenever mathematically exceeds255. - Plot Chaining Context: (Pending) Asserts that if
Static=1is defined,Plotmust explicitly match the forced reality of being indestructible. - Ghost Value Detection: (Pending) Warns when
GroundPiledefaults to anything structurally since the engine forces it to1. - Dead Hook Pruning: (Pending) Flags
OnFailToOpeninstances because Placeables physically lack the event memory map to trigger it. - HP Health Ceiling: (Pending) Confirms
CurrentHPis less than or mathematically equal toHP, preventing immediate game-break physics on spawn. - Animation Conditional Limits: (Pending) Verifies that custom
AnimationStateindices are strictly guarded byOpen==0closures.
UTS Format (Sound Object Blueprint)
Description: The Sound Object (.uts) blueprint defines dynamic, positional, and ambient audio emitters placed throughout a game map. Ranging from environmental hums and randomized crowd chatter to highly localized looping sound effects, .uts files act as physical sound nodes combining strict spatial coordinates with randomized pitch, interval, and varying volume matrices.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .uts |
| Magic Signature | UTS / V3.2 |
| Type | Sound Object Blueprint |
| Rust Reference | View rakata_generics::Uts in Rustdocs |
Data Model Structure
Rakata maps the Sound Object definition directly into the rakata_generics::Uts struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
A Sound Object breaks down into five main categories:
- Audio Emitters (
SoundsList): An array containing the audio files (.wavfiles) the engine will sequence or shuffle through. - Spatial Geometry: Distance boundaries determining exactly where the sound is audible in the map (
MinDistance,MaxDistance). - Playback Automation: Rules for how the sound loops and strings together (
Continuous,Random,Active,Looping). - Algorithmic Variations: Modifiers that dynamically distort the audio file’s pitch and volume at runtime (
PitchVariation,FixedVariance,VolumeVrtn). - Procedural Generators: Identifiers that tell the engine if the sound represents specific background noise like crowd chatter or combat ambiance (
GeneratedType).
- State Validation:
rakata-lintchecks the data against engine constraints to prevent runtime bugs.
Engine Audits & Decompilation
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSSoundObject::Load at 0x005c9040.)
Sound Objects represent one of the most streamlined parsers in the engine. They completely lack script triggers and rely almost entirely on mathematically calculating randomized positional matrices and variations natively.
Structural Load Phasing
| Function | Size | Behavior |
|---|---|---|
Load | 1345 B | The primary physical parser evaluating 24 core audio metric bounds, defining spatial positioning, volume variation, pitch scales, and active looping capabilities. |
Sounds List | – | Iterates through the list of associated audio clips, actively loading sound resrefs into memory sequentially for playback. |
Core Structural Findings
| Engine Rule | Runtime Behavior |
|---|---|
| Generated Type Truncation | The engine reads GeneratedType as a massive 32-bit integer from the file, but forcefully truncates it and stores only the bottom single byte in memory. Setting this number astronomically high physically corrupts the expected generator type. |
| Constructor Defaults | If fields are missing from the .uts binary, the engine physically relies on its internal C++ constructor to populate default values, completely avoiding hardcoded literal checks during parse time. |
| Spatial Loading Context | When loaded globally via a static map (CSWSArea::LoadSounds), the engine skips reading positional coordinates from the .uts file entirely and strictly enforces the X, Y, and Z vectors defined practically in the area’s .git file. |
| Silent Sound Lists | When pulling the list of sounds, the engine actively ignores missing entries. It only pushes a sound struct into playable memory if the file actually provided a valid Sound reference string. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Legacy Engine Artifacts | Some older tools and legacy file revisions include values like TemplateResRef, LocName, Comment, Elevation, Priority, and PaletteID. These are artifacts from other Odyssey Engine branches (like Neverwinter Nights) and the KOTOR engine never evaluates them natively. |
Implemented Linter Rules (Rakata-Lint)
These static constraints are targeted for implementation under rakata_lint::rules::uts.
- Volume Ceiling: (Pending) Prevents rendering distortion by asserting
Volumestays strictly within the standard0-127engine byte threshold. - Float Sanity Parsing: (Pending) Confirms
FixedVariancemathematically parses as a validFLOAT, protecting the engine from invalid arithmetic operations during randomization. - Audio Integrity: (Pending) Asserts that every defined
Soundreference resolves precisely to a physical audio stream in the active game modules. - Emitter Verification: (Pending) Structurally ensures the emitter has at least 1 actively mapped
Soundsentry to prevent dead objects from polluting active map memory. - Byte Truncation Warnings: (Pending) Flags when
GeneratedTypeoverflows heavily past255, predicting the engine’s physical byte wrap.
UTT Format (Trigger Blueprint)
Description: The Trigger (.utt) blueprint defines invisible zones placed across level maps. While encounters spawn creatures, triggers operate as tripwires – firing scripts, acting as loading zones to new areas, or springing mechanical traps when a character crosses them.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .utt |
| Magic Signature | UTT / V3.2 |
| Type | Trigger Blueprint |
| Rust Reference | View rakata_generics::Utt in Rustdocs |
Data Model Structure
Rakata maps the Trigger definition directly into the rakata_generics::Utt struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
A Trigger breaks down into four main categories:
- Core Identity & Geometry: The basic identifiers and coordinate boundaries that define what the trigger is and where it sits on the ground (e.g.,
Tag,Geometry). - Interactive State & Sub-types: Settings that determine if the trigger acts as a loading zone, a trap, or just a generic scripting boundary (e.g.,
Type,Cursor,HighlightHeight). - Trap Mechanics: The parameters defining rules for trap visibility and skill checks required to disarm them (e.g.,
TrapType,TrapOneShot). - Transition & Behavioral Hooks (
Scripts): The event scripts that fire when a character enters, clicks, leaves, or disarms the trigger, as well as the destination area if the trigger acts as a loading zone (e.g.,ScriptOnEnter,LinkedTo).
- State Validation:
rakata-lintchecks the GFF structure directly against the constraints the engine expects.
Engine Audits & Decompilation
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSTrigger::LoadTrigger at 0x0058da80.)
Structural Load Phasing
| Function | Size | Behavior |
|---|---|---|
LoadTrigger | 3381 B | The main constructor. It reads the trigger’s properties, scripts, and trap rules. |
LoadTriggerGeometry | 743 B | Reads the X, Y, and Z coordinates that draw the trigger’s boundary on the floor. |
Core Structural Findings
| Engine Rule | Runtime Behavior |
|---|---|
| Behavior Derived from Type | The engine determines the trigger’s behavior and UI cursor based on the Type field. Type 1 makes it a map transition zone. Type 2 makes it a trap. |
| OnClick Duplication Bug | The engine has a known bug where it copies the ScriptOnEnter value and uses it to overwrite the OnClick listener by default, unless explicitly overridden. |
| Trap Hook Fallback | If the OnTrapTriggered script is left empty, set to null, or named "default", the engine ignores it and pulls the default script from traps.2da based on the TrapType. |
| Highlight Clamping | The trigger’s HighlightHeight is ignored by the engine unless it is greater than 0.0. If it is exactly zero or negative, the engine falls back to a default rendering height of 0.1. |
| Contextual Loading | Fields like LinkedTo, LinkedToModule, AutoRemoveKey, Tag, and Faction are only loaded into memory when the Trigger is processed from a .git area layout file. |
| Dual-Path Portraits | If PortraitId is < 0xFFFE, the engine treats it as an ID to resolve the 2DA map icon. If it is >= 0xFFFE, the engine ignores the integer and uses the explicit Portrait string instead. |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Legacy Engine Artifacts | As with other templates, older asset revisions include TemplateResRef, Comment, PaletteID, and PartyRequired. The engine completely ignores these. |
| Superseded Legacy Fields | Older asset revisions typically map TrapDetectDC and DisarmDC in the .utt file itself, but the engine ignores them – it calculates DCs dynamically using the rules in the .2da files instead. |
Implemented Linter Rules (Rakata-Lint)
These static constraints are targets for implementation under rakata_lint::rules::utt.
- Trap Type Verification: (Pending) Warns if a trigger has its
TrapFlagset but itsTypeis not equal to2. The engine will ignore its trap settings in this state. - Transition Enforcement: (Pending) Flags triggers where
Type==1is set but noLinkedToorTransitionDestinationis defined. - Height Bounding: (Pending) Detects configuration patterns where
HighlightHeightis≤ 0.0, triggering the mandatory engine fallback to0.1. - Default Script Identification: (Pending) Identifies empty or
"default"OnTrapTriggeredentries to explicitly document which default script the engine will pull fromtraps.2da. - Geometry Safety: (Pending) Ensures that the trigger’s geometry contains at least 3 vertices to form a valid map boundary.
UTW Format (Waypoint Blueprint)
Description: The Waypoint (.utw) blueprint defines static reference coordinates within an area map. Unlike functional triggers or physical placeables, waypoints act exclusively as invisible logic markers. They provide coordinate anchors for creature patrol routes, spawn locations, camera focal points, or visible map pins in the player’s UI.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .utw |
| Magic Signature | UTW / V3.2 |
| Type | Waypoint Blueprint |
| Rust Reference | View rakata_generics::Utw in Rustdocs |
Data Model Structure
Rakata maps the Waypoint definition directly into the rakata_generics::Utw struct. To view the exhaustive binary schema and strict GFF field mappings, please refer to the Rustdocs for this struct, where each field is explicitly documented.
A Waypoint breaks down into three main categories:
- Core Identity: The basic identifiers that define the waypoint’s name and tag used heavily by scripts (e.g.,
Tag,LocalizedName). - Spatial Geometry: The exact map coordinates and facing orientation that creatures or cameras will reference (e.g.,
XPosition,XOrientation). - Map Navigation Notes: The text and toggles that dictate whether the waypoint draws a physical pin on the player’s mini-map UI (e.g.,
HasMapNote,MapNote).
- State Validation:
rakata-lintchecks the data against engine constraints to prevent runtime bugs or dead data paths.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for .utw files mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWSWaypoint::LoadWaypoint at 0x005c7f30.)
Structural Load Phasing
| Function | Size | Behavior |
|---|---|---|
LoadWaypoint | 682 B | The main constructor. It loads the waypoint’s identity, map geometry, and checks for mini-map pins. |
LoadFromTemplate | 134 B | A fallback used when dynamically spawning a waypoint from a script. |
Core Structural Findings
| Engine Rule | Runtime Behavior |
|---|---|
| Map Note Two-Gate Pattern | If HasMapNote is 0 or missing, the engine skips reading the map note entirely. If it is 1, it reads the strings but uses a second gate: if the MapNote string itself is missing, the entire map pin block is discarded silently. |
| Orientation Normalization | The engine computes the squared magnitude of the orientation vectors. If it is not exactly 1.0, it automatically calls Vector::Normalize() to fix the math. Non-unit vectors are tolerated but corrected instantly at load. |
| Position Override | When a waypoint is loaded from a .git area layout via LoadWaypoints, the engine re-reads the X and Y coordinates directly from the .git file, completely overriding the .utw. It also forcefully calculates the Z height based on the terrain collision mesh via ComputeHeight. |
| Dynamic Identification | Waypoints never pull an ObjectId from their own .utw file. It is always forcibly assigned by the .git list element (defaulting to 0x7f000000). |
Legacy & Ignored Data
| Finding Type | Explanation |
|---|---|
| Superseded Legacy Fields | Older asset revisions pad the file with fields like TemplateResRef, Appearance, PaletteID, Comment, LinkedTo, and Description. The KOTOR engine completely ignores these. |
Implemented Linter Rules (Rakata-Lint)
These static constraints are targeted for implementation under rakata_lint::rules::utw.
- Tag enforcement: (Pending) Flags if
Tagis completely empty, as waypoints are primarily targeted by scripts. - Boolean Clamping: (Pending) Ensures
HasMapNoteacts properly as a BYTE constraint. - Double-Gating Check: (Pending) Detects dead data patterns where
MapNoteorMapNoteEnabledare defined butHasMapNoteis configured to0. - Orientation Warnings: (Pending) Warns if orientation vectors do not mathematically normalize to ~
1.0, documenting the engine’s forced correction.
3D Geometry & Models
At the heart of the Odyssey Engine’s visual presentation is a proprietary structural design for interpreting and rendering 3D geometry. Modern formats like .glTF or .fbx bundle all visual and physical data into a single asset. KotOR however, splits this data across several distinct files. The engine strictly decouples the node hierarchy tree, the raw vertex buffers, and the mathematical collision boundaries.
Note
If you are looking for the exact underlying raw Ghidra decompilation notes detailing the K1 Engine’s
InputBinary::Readpipeline and structural layout bytes, please refer to the preserved Raw MDL Decompilation Archive.
Implementation Blueprints
This section documents the primary pillars of KOTOR geometry and their mathematical foundations, backed by swkotor.exe clean-room reverse engineering.
| Format | Name | Layout & Purpose |
|---|---|---|
| MDL | Model Hierarchy | The architectural scaffold holding the model together. It defines the scene bounding volumes, spatial rotations, embedded animations, engine rendering parameters, and a deep recursive tree of typed Nodes (e.g., Lights, Bones, Emitters, Trimeshes). |
| MDX | Vertex Data | The abstract mathematical arrays defining the actual rendering payload. It directly encodes interleaved array blocks mapping exact spatial coordinates (X, Y, Z), texture UV layouts, and Lighting Normals. |
| BWM | Walkmeshes | The raw mathematical graph of AABB bounds and face intersections that serve as physics collision boxes for area environments (.wok), placeables (.pwk), and interactive doors (.dwk). |
| Math | TriMesh Derivations | Documentation explaining exactly how variables like coordinate bounds and face offsets are mathematically derived across both visual Trimeshes and collision Walkmeshes. |
MDL Format (Model Hierarchy)
The .mdl format serves as the overarching structural spine for 3D model geometry. Rather than storing literal vertex positions directly, it recursively structures a tree of generalized nodes (Bones, Trimeshes, Lights, Emitters) into a unified visual mesh. It delegates vertex geometry out, binds textures, links dynamic controllers (keyframe transformations), and maps bounding sphere matrices directly to the model’s rigid physical space.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .mdl |
| Magic Signature | Text (filedependancy) or Binary (\0 byte header) |
| Type | 3D Hierarchical Mesh |
| Rust Reference | View rakata_formats::Mdl in Rustdocs |
Data Model Structure
Rakata maps the .mdl binary tree exactly into rakata_formats::Mdl.
Because a model intrinsically utilizes 11 distinct struct sub-types, Rakata resolves the pointer-based tree structure into a secure Rust Vec<MdlNode>. Native file pointer offsets which are normally resolved inside KOTOR via an explicit raw memory relocation dump are converted into safe recursive structures at parse time.
Node Sub-Types
The engine determines exact node allocations using a rigid bitflag header.
| Sub-Type | Description |
|---|---|
| Base | A pure structure node (Dummy) acting strictly as an invisible visual group or spatial pivot. |
| Light | Projects localized dynamic lighting, lens flares, and shading priorities. |
| Emitter | Configures particle spawning systems (fountains, single-shots, lightning, explosions). |
| Camera | An empty node serving as a static viewport anchor for dialogue cinematics. |
| Reference | An anchor point explicitly linking an external 3D model asset to a point. |
| TriMesh | A rigid standard triangle geometry boundary carrying static vertex arrays. |
| SkinMesh | A procedural mesh utilizing skeleton bone-weights and vectors to calculate organic deformations. |
| AnimMesh | A mesh carrying hardcoded, explicitly sampled vertex coordinate animation loops. |
| DanglyMesh | A sub-mesh evaluated through swinging physics constraints (displacement, tightness, period). |
| AABB | A strict spatial collision tree structurally defining an internal walkmesh barrier. |
| Saber | Allocates dynamic 3D quad arrays utilized exclusively to generate stretching lightsaber swing trails. |
Engine Audits & Decompilation
Deep Dive: For an exhaustive archive of the Ghidra decompilation notes detailing the exact byte-level layout of the binary MDL format and engine loading pipeline, refer to the MDL & MDX Deep Dive.
The following information documents the engine’s exact load sequence for genuine Binary MDL models. All behavior was mapped from natively analyzing swkotor.exe execution pipelines via Ghidra.
Loading and Wrapper Validation
Read initially via Input::Read (0x004a14b0).
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Binary vs ASCII Detection | The engine checks the exact first byte of the file. If it hits a \0 (NULL), it dispatches the asset entirely to the InputBinary track. If it hits text ("filedependancy" or "newmodel"), it loops into the FuncInterp ASCII parser track. |
| Wrapper Mapping | The Binary format evaluates the initial 12 bytes as an abstract Wrapper block defining explicit sizes for the .MDL and the associated .MDX geometry. |
| In-Memory Heap Dump | The engine allocates the sizes noted in the wrapper, runs memcpy on both the .MDL and .MDX assets blindly into memory, and then runs the recursive Reset path to relocate spatial internal pointer offsets to absolute memory addresses. |
Node Dispatch Architecture
Read initially via InputBinary::ResetMdlNode (0x004a0900). The engine recursively navigates downwards matching against a constant 16-bit node-type flag lookup spanning from 0x0001 (Base Node) to 0x0821 (Lightsaber).
| Mapped Property | Engine Behavior |
|---|---|
| Sub-node Allocation Sizes | Nodes are dynamically allocated varying byte lengths strictly based on their type-mask. A root Base node only evaluates 80 contiguous bytes, but an Emitter allocates 304, and a Skin allocates 512. |
| Parent/Child Graph Resolution | Engine structures evaluate nodes continuously downward via embedded raw pointer arrays. These arrays branch a group of distinct sub-children implicitly off their master parent. At load time, the engine must safely rewrite all relative file offsets into absolute physical memory locations, otherwise the entire hierarchy will instantly detach. |
Mapped Behavior Quirks
| Mapped Property | Ghidra Provenance & Engine Behavior |
|---|---|
| LOD Suffix Generation | The engine natively evaluates if the cullWithLOD property is set. If true, it explicitly triggers string concatenations for FindModel(name + "_x") and FindModel(name + "_z") sequentially to dynamically attach lower-quality auxiliary geometry instances based on viewport distance. |
| Animation Bone Binding | When building the live hierarchy tree for a rendering sequence, the engine explicitly ignores the node’s textual string name. Instead, it rigidly evaluates physical pairings against a mapped node_id integer. If the bone isn’t properly sequenced to that numeric ID array, it detaches from the runtime arrays entirely. |
| Self-Describing Keyframes | Unlike older properties that rely on rigid dictionaries, KOTOR determines how an animation was saved dynamically by reading the keyframe’s controller type integer. It applies a bitwise AND check against the type’s lowest hex digit (& 0x0F) to instantly dictate whether the loaded keyframe is a single float (like scaling), 3 floats (like an XYZ positional vector), or 4 floats (for a Slerp quaternion rotation). |
Proposed Linter Rules (Rakata-Lint)
While rakata-lint currently only evaluates GFF formats and does not yet parse .mdl models dynamically, the engine behaviors above hint at some suggested lint diagnostics:
Planned Lint Diagnostics:
- Skeleton / Animation Tracing: Flags animation nodes where the internal skeletal
node_numberbinding parameter implicitly equals0, ensuring the mesh does not hard freeze via pointing to the rigid root spine. - Controller Mask Encoding: Validates that generic Controller properties properly bit-mask against the Bezier indicator (
0x10) rather than reading explicitly raw quaternion values (which causes cascading loop failures through the rest of the array block). - Emitter Detonation Allocation: Flags interactive
Emitternodes attempting to bind thedetonatekey (Controller502) while structurally mis-identifying as"Fountain". The engine native only maps controller 502 data to strict"Explosion"memory paths, resulting in an aggressive Access Violation engine crash otherwise. - Name Graph Sanitization: Notifies developers if the node graph contains artificially un-referenced graph pointers mapped under the unified Name Table. (BioWare notoriously shipped identical shared name tables compiling
.pwkand.wokmodels into.mdlnodes natively throughout the 2003 pipeline).
MDX Format (Vertex Data)
The .mdx format is a companion file that always pairs tightly with a .mdl model. While the .mdl file handles the complex math, skeletal hierarchy, and animation logic, the .mdx file acts as bulk storage; holding the massive lists of raw 3D coordinates (vertices) that make up the physical shape of the model.
Architecturally, the swkotor.exe engine treats these two files as a single combined asset: the .mdl dictates where and how the model moves, and the .mdx provides the points to physically draw on the screen.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .mdx |
| Magic Signature | Raw binary stream (No explicit signature block) |
| Type | Interleaved Vertex Payload Array |
| Rust Reference | View rakata_formats::Mdx in Rustdocs |
Data Model Structure
Rakata safely consumes the unindexed byte sequences into a typed geometry definition mapped within rakata_formats::Mdx.
At the raw binary level, .mdx data is strictly an interleaved buffer. Variables (like positional 3D XYZ vectors, Texture Parameter UV planes, and light-calculating Normals) are sequentially woven directly across the byte stream.
Engine Audits & Decompilation
Deep Dive: For an exhaustive archive of the Ghidra decompilation notes detailing the exact byte-level layout of the binary MDL/MDX format and engine loading pipeline, refer to the MDL & MDX Deep Dive.
The following documents the engine’s exact load sequence and structure for .mdx interleaved data pipelines mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from InputBinary::Read (0x004a1230) and InputBinary::ResetMdlNode (0x004a0900).)
Loading and Lifecycle
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Memory Wrapping | Triggered immediately alongside the .mdl. The wrapper dynamically outlines the exact byte-count of .mdx data required (wrapper + 0x08). |
| Buffer Liberation | MDX arrays are entirely stateless. Once InputBinary::ResetMdlNode computes the geometry arrays and translates the buffer directly into the OpenGL hardware render-pools during loading, the engine immediately calls free() wiping the MDX byte arrays from physical memory entirely. |
TriMesh Structural Addressing
The KOTOR Engine avoids parsing the MDX data by scanning through it block-for-block. Instead, traversing the actual MDL hierarchy drives vertex payload requests explicitly.
| Mapped Property | Ghidra Provenance & Engine Behavior |
|---|---|
| Array Slicing | Every distinct TriMesh instantiated in the parent MDL tree explicitly registers an mdx_data_offset pointer (TriMesh + 0x144). This dictates exactly where the engine explicitly seeks within the interleaved .mdx payload array to fetch this mesh’s native points. |
| Node Alignment Constraints | Vanilla assets maintain extremely strict alignment formats. Meshes are dynamically sorted prior to hardware parsing: static rendering models fall to the top of the index chain, whereas dynamic procedural meshes (like character .Skin nodes) are specifically dumped sequentially to the rear of the .mdx. |
Note
Ghost Payload Sentinels During memory extraction, the engine implicitly pads geometric mesh payloads out to distinct 16-byte aligned boundaries using Terminator Rows. Any mesh vertex iteration falling slightly out of stride will be explicitly back-filled with ghost/sentinel float arrays (
[0.0, 0.0, 0.0]) to ensure OpenGL buffer calculations remain strictly uniform without overflowing pointer indexes during hardware streaming.
Proposed Linter Rules (Rakata-Lint)
Incorrectly calculated .mdx offset spans or payload array lengths can cause the engine to read misaligned bytes or overflow data bounds. Providing a linter rule to validate these payload alignments helps prevent geometry corruption and potential engine/gpu crashes.
While rakata-lint currently only evaluates GFF formats and does not yet parse .mdx buffers dynamically, the engine behaviors above hint at the foundational requirements for .mdx stability:
Planned Lint Diagnostics:
- Mesh Slice Verification: Enforces explicit iteration seeking. Validates
.mdxvector boundaries by explicitly jumping pointers down the file according to individualmdx_data_offsetassignments mapped on explicitly boundTriMeshheaders, rather than assuming unverified sequential payload lengths.
Walkmesh (BWM / WOK)
Walkmeshes govern physical collision and pathfinding across an area. They dictate exactly where a character can stand, what slopes they can climb, and what physical materials block their path.
BWM Binary
The binary implementation of the Walkmesh is entirely designed to be dumped straight into memory. Instead of smoothly parsing the file piece-by-piece, the engine constantly jumps around the file using a complex array of offsets located at the very top.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .bwm, .wok |
| Magic Signature | None standard header block |
| Type | Memory-Mapped Collision Net |
| Rust Reference | View rakata_formats::Bwm in Rustdocs |
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and field requirements for Binary Walkmeshes mapped from swkotor.exe.
(Decompilation logic for this section was entirely audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWCollisionMesh::LoadMeshBinary at 0x00597120.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Pointer Jumping | The engine doesn’t read the file linearly from top to bottom. Instead, it uses direct memory math (pointer arithmetic) to aggressively jump between the header and the raw data payload. |
| Offset Extraction | The beginning of the file contains exact byte locations the engine uses to orient itself: • +0x08 yields the total vertex_count• +0x0C..+0x18 provides the maximum limits for faces, materials, and walk-edges• +0x18..+0x24 yields adjacency boundaries• +0x3C..+0x48 stores the direct starting addresses for the geometry data |
| Bounding Box Offsets | The spans immediately following (+0x48..+0x6C and +0x6C..+0x84) are reserved specifically for tracking offsets that point to the Axis-Aligned Bounding Box (AABB) collision trees. |
| Ignoring the Magic ID | Magic bypass: Magic and version identifiers (BWM ) are actually ignored natively during the LoadMeshBinary process. It relies on a different system entirely to verify file signatures beforehand. |
| Read-Only Format | One-Way Flow: Vanilla KOTOR contains strictly read-only capabilities for BWM binaries. Developers removed any functionality needed to compile or save collision data dynamically! |
Tip
Orphaned Memory Gaps: The engine entirely skips reading two massive blocks of bytes off the disk:
+0x24..+0x3C(24 bytes) and+0x64..+0x6C(8 bytes). For a byte-perfect roundtrip toolset, these gaps must absolutely be preserved verbatim!
BWM ASCII
For tooling purposes, BioWare engine modules support a raw ASCII readable version of the walkmesh that can be dynamically parsed at runtime at a massive performance cost.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .bwm (ASCII formatted) |
| Magic Signature | ASCII Text Directives |
| Type | Uncompiled Collision Text |
Engine Audits & Decompilation
The following documents the engine’s exact load sequence and constraints for ASCII text walkmeshes mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWRoomSurfaceMesh::LoadMeshText at 0x00582d70.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Searching for Keywords | The engine scans the text file reading line-by-line to look for the specific keywords node, verts, faces, and aabb. |
| Strict Face Formatting | Every defined face string must strictly format exactly 8 numbers separated by spaces. Interestingly, while the engine reads the adjacency input, it immediately deletes it! The engine forces adjacency math to be physically recomputed from scratch post-load to prevent geometric errors from old assets. |
| Line Length Limits | The engine will aggressively truncate or glitch if any single text line stretches beyond 256 characters (0x100 bytes). |
| Face Reordering | Using the surfacemat.2da file, the engine completely shuffles the order of the faces while loading. It essentially groups every geometry face marked “walkable” at the absolute top of the array, and pushes all non-walkable geometry straight to the bottom. |
| Fudging the Boundaries | When figuring out the Axis-Aligned Bounding Box (AABB) limits, the text loader artificially stretches the box outwards by roughly 0.01 across every axis. Due to the face reordering mentioned above, the engine also has to build a temporary remap table under the hood just to keep track of where everything moved! |
Warning
Because the ASCII face-reordering mechanism radically shuffles the root array indexes from walkable to unwalkable clusters via the LoadMeshText routine, it is impossible to do a clean 1-to-1 binary-to-ASCII-to-binary round trip of a KOTOR walkmesh without completely losing the original face indexing format!
TriMesh Derived & Computed Fields Reference
This document catalogs derived or computable fields specifically impacting TriMesh generation for MDL/MDX structures.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .mdl |
| Domain | Geometry Math / Model Reconstruction |
| Rust Reference | View rakata_formats::MdlNodeTriMesh in Rustdocs |
Data Model Structure
Rakata attempts to make building a TriMesh as painless as possible by handling the complex math under the hood.
- Derived Fields: Rakata explicitly understands the difference between data you must supply (like static 3D coordinates) and data that can safely be calculated on the fly (like bounding limits, spherical radii, or adjacency maps). The
rakata-formatsAPI automatically calculates all of these required boundaries for you seamlessly whenever you serialize the file!
Engine Audits & Decompilation
This document catalogues every field on MdlMesh and MdlFace that can be
derived from geometry, documenting what each field means, how community tools
handle it, and what algorithm is needed to recompute it. This is the reference
for future model-editing API work.
Field Categories
- User-authored: Provided by the modeller. Never recomputed.
- Derivable: Can be recomputed from geometry. Tools recompute on ASCII import / model rebuild; preserve verbatim on binary roundtrip.
- Runtime-only: Written by the engine at load time. On-disk values are meaningless stubs.
1. Internal CExoArrayList Fields (+0x98 .. +0xC8)
The five CExoArrayList slots in the TriMesh header form a coordinated GL index buffer submission system. Each stores a 12-byte header (ptr/count/alloc) in the mesh header plus a single u32 data value in the content blob.
1.1 vertex_indices (+0x98) – Dead in KotOR
What it is: A legacy engine array block. In BioWare’s older titles (like Neverwinter Nights), this block pointed to vertex index data. In KOTOR, the engine never actually looks at this field at all.
Community tools:
- mdledit: Misidentifies as
cTexture3(12-byte string). Byte-exact preserve. - mdlops: Reads as raw bytes via darray struct. Byte-exact preserve.
- PyKotor: Reads as
indices_counts. Byte-exact preserve. - xoreos/reone: Skip entirely.
Vanilla values: Always zeros (ptr=0, count=0, alloc=0).
Rakata Processing Rule: Store as [u8; 12] for lossless preservation, or zero on
write. No computation needed.
1.2 left_over_faces (+0xA4) – Dead in KotOR
What it is: Another legacy array block. In NWN, this stored “left over” face geometry. In KOTOR, the engine updates the pointer location dynamically but completely forgets to actually use or read the data during the OpenGL rendering cycle. rendering loop.
Community tools:
- mdledit: Misidentifies as
cTexture4(12-byte string). Byte-exact preserve. - mdlops: Reads as raw bytes via darray struct. Points to the packed u16 vertex index data (mdlops uses this as the indirection to find face indices).
- PyKotor: Reads as
indices_offsets. Byte-exact preserve. - xoreos: Only field it actually follows – reads the pointer to find packed u16 face vertex indices.
- reone: Reads as
indicesOffsetArrayDef. Uses first element as pointer to u16 index data.
Vanilla values: Typically non-zero. The pointer value points to the packed u16 face vertex index data. Count is 1, alloc is 1.
Rakata Processing Rule: Store the raw pointer and count variables. The pointer is content-relative and must be explicitly backpatched on write to point to the packed u16 face index data block.
1.3 vertex_indices_count (+0xB0) – Derivable
What it is: Single u32 value = total number of u16 vertex indices in the face index buffer.
Formula: face_count * 3
Community tools:
- mdledit: Recomputes on every write (
nVertIndicesCount = Faces.size() * 3). - mdlops: Recomputes on ASCII import.
- PyKotor: Preserves from binary, creates empty for new models.
Rakata Processing Rule: Dynamically derive from faces.len() * 3. Never store a static value in the struct.
1.4 mdx_offsets (+0xBC) – Derivable (pointer)
What it is: Single u32 value = content-relative offset to the packed u16 face vertex index data in the MDL content blob.
Community tools:
- mdledit: Writes placeholder, backpatches when VertIndices data is written.
- mdlops: Same approach.
- PyKotor: Same approach.
Rakata Processing Rule: Compute strictly at serialization time via the binary writer. Never store a static value in the struct.
1.5 index_buffer_pools / Inverted Counter (+0xC8) – Preserve or Derive
What it is: A standard 32-bit number. On the physical hard drive, this acts exclusively as a sequence counter that numbers meshes using a bizarre “inverted” counting pattern. However, the moment the engine loads the file into memory, it deletes this number and overwrites the exact memory space with an OpenGL hardware connection handle.
The inverted counter formula (from mdledit asciipostprocess.cpp:1024):
mesh_counter: sequential 1-based index across all mesh nodes in DFS tree order.
Saber meshes consume TWO increments (one per inverted counter).
Quo = mesh_counter / 100
Mod = mesh_counter % 100
inverted_counter = (2^Quo) * 100 - mesh_counter
+ (Mod != 0 ? Quo * 100 : 0)
+ (Quo != 0 ? 0 : -1)
Example sequence: 98, 97, 96, …, 1, 0, 100, 199, 198, …, 101, 200, …
Community tools:
- mdledit: Preserves from binary. Recomputes from formula only for ASCII
import when value is missing (
!nMeshInvertedCounter.Valid()). - mdlops: Recomputes on ASCII import using same formula.
- PyKotor: Preserves from binary.
Rakata Processing Rule: Map as a static u32 field to perfectly preserve binary roundtripping. When natively constructing new models, dynamically compute the inverted sequence according to the formula using a DFS mesh counter.
2. Packed u16 Face Vertex Indices
What it is: A tightly packed list of u16 index triplets (yielding exactly 6 bytes per face). Each 3-piece triplet tells the renderer which three vertex dots to connect to draw one flat triangle. This entire block is physically uploaded straight to the graphics card to render the final model.
Relationship to MdlFace: The packed u16 data is identical to
MdlFace.vertex_indices for each face, laid out sequentially. It is fully
redundant with the face array.
Community tools:
- mdledit: Reads from binary into
nVertIndices(3 u16 per face, stored alongside face data). Writes from face data. - mdlops: Reads as
vertindexesdarray. Writes from face data on ASCII import. - xoreos/reone: Read from the pointer at +0xA4 or +0xBC.
Rakata Processing Rule: Always dynamically derive identical copies directly from faces[i].vertex_indices during binary emission. Never map a redundant array inside the Rakata struct.
3. Face Fields (MdlFace, 32 bytes per face)
3.1 plane_normal ([f32; 3]) – Derivable
What it is: The geometric direction the triangle’s flat surface is facing (a unit normal vector).
Formula:
edge1 = positions[v1] - positions[v0]
edge2 = positions[v2] - positions[v0]
normal = normalize(cross(edge1, edge2))
Community tools: All tools that recompute adjacency also recompute normals.
3.2 plane_distance (f32) – Derivable
What it is: The raw distance measured straight from the physical center of the world (origin) to the face’s flat surface along its normal vector.
Formula: plane_distance = -dot(plane_normal, positions[v0])
Note: some tools negate this differently. Verify against vanilla data.
3.3 surface_id (u32) – User-authored
What it is: Material/surface type identifier. Determines footstep sounds, walkability, etc. in walkmeshes; material properties in render meshes.
Not derivable – assigned by the modeller or inherited from the source asset.
3.4 adjacent ([u16; 3]) – Derivable
What it is: For each edge of the triangle, the index of the face sharing
that edge. 0xFFFF means no adjacent face (boundary edge).
Edge-to-adjacent mapping:
adjacent[0]: face sharing edge (v0, v1)adjacent[1]: face sharing edge (v1, v2)adjacent[2]: face sharing edge (v2, v0)
Rakata Hash-Map Adjacency Algorithm:
1. Build position_key(v) = format!("{:.4e},{:.4e},{:.4e}", pos[0], pos[1], pos[2])
2. Build vertex_group: HashMap<String, Vec<usize>>
For each vertex index i:
vertex_group[position_key(i)].push(i)
3. Build vertex_to_faces: HashMap<usize, Vec<usize>>
For each face f, for each vertex v in face.vertex_indices:
vertex_to_faces[v].push(f)
4. Build face_set(vertex_index) -> HashSet<usize>:
Collect all faces touching any vertex in the same position group:
group = vertex_group[position_key(vertex_index)]
union of vertex_to_faces[g] for all g in group
5. For each face f:
For each edge (va, vb) in [(v0,v1), (v1,v2), (v2,v0)]:
candidates = face_set(va) & face_set(vb) - {f}
adjacent[edge] = if candidates.is_empty() { 0xFFFF }
else { min(candidates) }
Complexity: O(F * V_avg) where V_avg is the average number of faces per vertex group. Effectively O(F) for well-behaved meshes.
No-neighbor sentinel: 0xFFFF (u16::MAX). All tools agree except PyKotor
which incorrectly uses 0 (bug – face 0 is a valid index).
Non-manifold edges: When more than 2 faces share an edge, tools differ:
- mdledit: First match wins, logs a warning.
- mdlops: Arbitrary (hash iteration order).
- PyKotor: Smallest face index wins (
min(candidates)).
Rakata Processing Rule: Always use min(candidates) internally so evaluation remains deterministic and aligns with PyKotor output. If non-manifold geometric edges are detected, the formatter must throw a logger warning.
Important: Vertex matching must be position-based, not index-based. Meshes commonly have duplicate vertices at the same position with different normals/UVs (hard edges, UV seams). Index-based matching would miss adjacency across these seams.
3.5 vertex_indices ([u16; 3]) – User-authored
What it is: The three vertex indices forming this triangle.
Not derivable – defines the mesh topology.
4. Mesh Bounding Geometry – Derivable
4.1 bounding_min / bounding_max ([f32; 3])
What it is: A perfect, square box drawn tightly around every single vertex dot in the model (an Axis-Aligned Bounding Box).
Formula:
bounding_min = [min of all positions[i][0], min of [1], min of [2]]
bounding_max = [max of all positions[i][0], max of [1], max of [2]]
4.2 bsphere_center / bsphere_radius ([f32; 3], f32)
What it is: Minimum bounding sphere enclosing all vertices. Used by the
engine for frustum culling (PartTriMesh::GetMinimumSphere at 0x00443330).
Engine algorithm (from Ghidra, confirmed in mdl_mdx.md):
center = average of all vertex positions (centroid)
radius = max distance from center to any vertex
This is NOT the true minimum bounding sphere (Welzl’s algorithm), but a simpler centroid-based approximation. Matches what vanilla files contain.
4.3 total_surface_area (f32)
What it is: Sum of all triangle areas in the mesh.
Formula:
For each face:
edge1 = positions[v1] - positions[v0]
edge2 = positions[v2] - positions[v0]
area += 0.5 * length(cross(edge1, edge2))
total_surface_area = sum of all face areas
5. AABB Tree – Derivable (complex)
What it is: A mathematical collision-detection tree (Binary Space Partition) built over the faces of the mesh. It recursively slices the physics block into smaller and smaller floating boxes so the engine can quickly determine if a player bumps into a wall, saving it from checking collision against every single polygon.
When needed: Only for MdlNodeData::Aabb nodes (walkmesh-like collision
geometry). Regular render meshes don’t have AABB trees.
Node layout: 40 bytes (see mdl_mdx.md for full struct).
Build algorithm: Recursive spatial partition:
- Compute AABB of all face centroids.
- Choose split axis (longest AABB dimension).
- Sort faces by centroid along split axis.
- Split at median into left/right subsets.
- Recurse on each subset until single-face leaves.
Community tools generally don’t rebuild AABB trees from scratch – they preserve the existing tree or require external tooling to generate it.
6. Fields That Are NOT Derivable
These distinct fields are explicitly user-authored or carried over from tooling. Rakata must treat them strictly as rigid payload endpoints. They are never mathematically recomputed across the pipeline:
| Field | Source |
|---|---|
| Vertex positions, normals, UVs, tangent space | 3D modeller |
| Vertex colors | 3D modeller or material editor |
| Texture names (texture_0, texture_1) | Material assignment |
| Diffuse/ambient colors | Material properties |
| Transparency hint, light_mapped, beaming, etc. | Material flags |
| Surface ID per face | Surface type assignment |
| Vertex indices per face | Mesh topology |
| Controller keyframes | Animation data |
| Bone weights, indices, bonemap | Rigging tool |
| Emitter properties | Particle editor |
7. Tool Cross-Reference: CExoArrayList Naming
The naming across tools is wildly inconsistent:
| Offset | Engine (Ghidra) | rakata | mdledit | mdlops | PyKotor | xoreos |
|---|---|---|---|---|---|---|
| +0x98 | vertex_indices | vertex_indices_array | cTexture3 | pntr_to_vert_num | indices_counts | (skip) |
| +0xA4 | left_over_faces | left_over_faces_array | cTexture4 | pntr_to_vert_loc | indices_offsets | offOffVerts |
| +0xB0 | vertex_indices_count | vertex_indices_count_array | IndexCounterArray | array3 | counters | (skip) |
| +0xBC | mdx_offsets | mdx_offsets_array | IndexLocationArray | (backpatch only) | (not modeled) | offOffVerts |
| +0xC8 | index_buffer_pools | index_buffer_pools_array | MeshInvertedCounterArray | inv_count | (not modeled) | (skip) |
Note: mdledit’s identification of +0x98/+0xA4 as texture name slots is incorrect for KotOR. In NWN, the mesh header has 4 texture name slots (64 bytes each) at this region. KotOR reduced to 2 texture names (32 bytes each at +0x58/+0x78) and repurposed the remaining space as CExoArrayList headers. The CExoArrayLists are always empty (all zeros) in vanilla KotOR, so mdledit’s string-based read/write produces byte-identical results.
8. MDL vs BWM Adjacency Encoding
A critical distinction for anyone working with both formats:
| Property | MDL Face Adjacency | BWM Walkmesh Adjacency |
|---|---|---|
| Storage | u16 per edge | i32 per edge |
| Encoding | Plain face index | face_index * 3 + edge_index |
| No-neighbor | 0xFFFF | -1 (0xFFFFFFFF) |
| Purpose | GL rendering hints | Pathfinding / collision |
BWM’s edge-encoded adjacency tells you not just WHICH face is adjacent, but WHICH EDGE of that face connects – needed for the pathfinding walk algorithm. MDL only needs to know which face, not which edge.
9. Write-Order Dependencies
When writing a mesh node, fields must be emitted in a specific order because
some fields are content-relative pointers that must be backpatched. The
canonical order (from mdledit binarywrite.cpp) is:
- Face array (32 bytes per face)
vertex_indices_countdata (single u32:face_count * 3)- Content vertex positions (12 bytes per vertex, only for MDL content blob)
mdx_offsetsdata (single u32: placeholder, backpatched)index_buffer_poolsdata (single u32: inverted counter value)- Packed u16 vertex indices (
face_count * 3u16 values)
After step 6, backpatch the mdx_offsets pointer to point to the start of
step 6’s data.
CExoArrayList headers at +0x98..+0xC8 are written as part of the mesh extra header (332 bytes), with pointer values backpatched after the data is written.
Texture Formats
KOTOR handles graphics via multiple tailored texture formats. It uses hardware-accelerated DXT compression techniques natively supported by its OpenGL backend.
Implementation Blueprints
This section details the primary texture architectures parsed natively by rakata-formats.
| Format | Name | Layout & Purpose |
|---|---|---|
| TPC | Texture Pack Compressed | A proprietary BioWare wrapper around native DXT-compressed OpenGL texture data. This is the primary format used for all base-game environment and character textures. |
| DDS | DirectDraw Surface | A proprietary BioWare variation of the standard Microsoft DDS format. Rather than utilizing standard headers, the legacy engine requires a bespoke 20-byte magic wrapper. |
| TGA | Truevision Targa | An uncompressed, lossless visual format. Used for rendering crisp UI elements, visual effects (VFX), etc. |
| TXI | Texture Extensions | Plaintext routing files that accompany primary textures. They direct the engine how to apply advanced rendering hints, such as procedural animations or bump-mapping. |
TPC (Texture Pack Compressed)
TPC is the proprietary bundled texture format created by BioWare. It contains the raw DXT-compressed texture data, pre-computed mipmaps, and potentially appended TXI configuration data all in one blob.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .tpc |
| Magic Signature | None |
| Type | Compressed Texture Pack |
| Rust Reference | View rakata_formats::Tpc in Rustdocs |
Data Model Structure
The rakata-formats crate provides a formally mapped Tpc container that completely shields you from managing pixel type bitmasks.
- Pixel Enum Decoding: Instead of raw integer flag codes, calling
known_pixel_format()instantly resolves the byte code into a robustTpcHeaderPixelFormatenumeration (e.g.,Dxt1,Dxt5,Rgb,Greyscale). - Footer Management: Trailing TXI text is seamlessly maintained, and can be cleanly updated via
.set_txi_text_strict().
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for TPC textures mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CAuroraProcessedTexture::ReadProcessedTextureHeader at 0x0070f590.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Format Byte Mapping | The single header format byte acts as a strict bitmask. The engine explicitly checks bit0, bit1, and bit2 to generate internal format codes: 1, 3, and 4. |
| Compression Dispatch | The runtime fundamentally ignores other variants. It strictly requires Code 3 to process 8-byte geometry chunks (standard S3TC DXT1) or Code 4 to process 16-byte chunks (standard S3TC DXT5). |
| Mipmap Calculations | Rather than parsing explicit counts, the engine calculates mipmap storage dimensions by blindly right-shifting the base dimensions for each depth level without natively clamping the integer to 1. Because of this, extremely deep architectural mip levels can produce 0 geometry bytes! |
| OpenGL Hardware Binding | When aggressively pushing the TPC bytes into OpenGL video memory, the engine natively maps Code 3 directly to OpenGL constant 0x83F0 (DXT1) and Code 4 straight to 0x83F3 (DXT5). Technically, there is zero branching logic to support native DXT3 (0x83F2) inside the vanilla engine’s parser. |
DDS (DirectDraw Surface)
The .dds extension in KOTOR does not represent a standard Microsoft DirectDraw Surface file. Instead, the engine strictly expects a proprietary format consisting of a bespoke 20-byte configuration prefix followed by raw DXT compression blocks. The vanilla parsing logic completely ignores standard 124-byte DDS magic headers.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .dds |
| Magic Signature | None (Proprietary 20-Byte Prefix) |
| Type | BioWare DirectDraw Wrapper |
| Rust Reference | View rakata_formats::Dds in Rustdocs |
Data Model Structure
rakata-formats is built to natively parse both standard Microsoft DDS architecture and KotOR’s proprietary CResDDS format transparently. When evaluating a .dds file via rakata_formats::Dds:
- Bilateral Read Path: If the file begins with the standard Microsoft
DDSmagic bytes, Rakata leverages a standard pipeline to extract the payload. If those magic bytes are missing, Rakata immediately pivots and parses the data natively as a proprietary K1CResDDS20-byte payload. - Strict Serialization: Regardless of which variation is ingested from the disk, Rakata will strictly emit valid 20-byte KotOR-compliant payloads during binary serialization.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for DDS textures mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CResDDS::GetDDSAttrib at 0x00710ee0.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Prefix Stripping | The engine’s parser explicitly expects and strips a proprietary 20-byte magic header wrapper prepended to the DDS buffer: width (+0x00), height (+0x04), byte code (+0x08), base-size (+0x0C), and an alpha_mean FLOAT (+0x10). |
| Block Calculation | The runtime completely mimics the TPC logic for memory block sizing. Fundamentally, the algorithm determines the 3D dimensions via the formula: (pixel_type == 4) * 8 + 8. Code 3 explicitly evaluates into 8-byte texture blocks, while Code 4 evaluates to 16-byte blocks. |
Tip
Reserved Gaps: The bytes spanning
+0x09to+0x0Bin the header prefix are entirely ignored by theGetDDSAttribread path. We preserve them strictly for round-trip fidelity.
TGA (Truevision Targa)
TGA is the standard uncompressed image format utilized by the engine, typically reserved for UI elements, icons, or high-fidelity models that demand lossless alpha channels.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .tga |
| Magic Signature | Truevision Standard |
| Type | Uncompressed RGB/A Raster |
| Rust Reference | View rakata_formats::Tga in Rustdocs |
Data Model Structure
rakata-formats natively emulates the engine’s parsing logic. When evaluating a .tga file, Rakata ignores non-essential Truevision header flags (such as image_type and id_len) and strictly validates the payload against the engine’s natively supported pixel_depth thresholds.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for TGA textures mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from ImageReadTGAHeader at 0x0045e2e0.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Header Stripping | Function: ImageReadTGAHeader (0x0045e2e0)The native engine parser is exceptionally loose. Standard Truevision fields such as image_type (offset +0x02), image_descriptor (offset +0x11 governing the origin bit), and the id_len field are completely ignored and never validated during a read sequence. |
| Depth Validation | Function: ImageReadTGAHeader (0x0045e2e0)The sole structural validation check performed before memory allocation dictates that the pixel_depth must strictly equal 8, 24, or 32. Any other depth integer triggers an immediate process failure. |
| Write Generation | Function: ImageWriteTGAThe engine’s in-memory rasterization is strictly top-left, but its canonical on-disk .tga format is entirely bottom-left. When saving screenshot files or extracting buffers to disk, the engine forcefully accommodates this by hardcoding image_type=2, id_len=0, and image_descriptor=0, explicitly triggering an ImageFlipY vertical inversion on the memory payload before pushing the image to disk. |
TXI (Texture Extensions)
TXI files (or TPC appended arrays) are highly forgiving plain-text metadata blocks applied adjacent to graphical files to enforce custom mipmap, bumpmap, or animation shaders.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .txi |
| Magic Signature | None |
| Type | ASCII Configuration Strings |
| Rust Reference | View rakata_formats::Txi in Rustdocs |
Data Model Structure
rakata-formats inherently pairs TXI payload access alongside its target texture. When querying the virtual resolver, textures are natively returned as a combined TextureWithTxiResult object. This architecture guarantees that the raw graphic bytes and their exact applied TXI rule block are inextricably tracked as a coupled pair throughout the virtual environment.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for TXI text configurations mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CAurTextureBasic::ParseField at 0x00422390.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Invalid Commands | Function: CAurTextureBasic::ParseField (0x00422390)Unknown or unsupported TXI commands are safely bypassed. If the parsed string evaluation fails to match an explicit configuration branch, the subroutine immediately exits without throwing any logger alarms or terminating texture load. |
| Case Agnosticism | Function: CAurTextureBasic::ParseField (0x00422390)Field matching acts strictly case-insensitive (e.g. cMgTxi == cmgtxi). |
| Line Normalization | Function: CAurTextureBasic::ParseField (0x00422390)The native internal engine scanner searches exclusively for LF (\n) bounds. However, if the read targets an active disk file, the underlying standard C fgets call automatically handles CRLF normalization before handing strings to the regex evaluator. |
| Boolean Parsing | Function: Parse_bool (0x00463680)The native Parse_bool validation explicitly performs lowercase scans evaluating against exact variants of "true", "false", "1", or "0". |
Note
Boolean Parsing Nuance Modding documentation often warns against specific formats or keywords (like
decal). Decompilation reveals the universal behavior applied to all boolean flags:
- Missing Space: Keys merged with their arguments (e.g.
"decal1","mipmap0") silently abort. Thefirstword()extractor pulls the merged string, completely failing the target evaluation list.- Separated Numbers: Space-separated numbers (e.g.
"decal 1") are completely structurally valid.firstword()pulls"decal"and hands" 1"off toParse_bool(). Ansscanfstrips the whitespace and evaluates"1"totrue.- Argument-less Flags: Passing just a flag (
"decal") triggers the branch, butParse_boolphysically finds no argument. It fails to match"true","false","1", or"0", silently safely leaving the boolean integer unchanged from its previous memory allocation.
Text & Data Formats
KOTOR heavily relies on structured text and data layouts to manage everything from stat numbers to map meshes. Engine-native evidence for these varied structures (2DA, TLK, VIS, LYT, LTR) is documented below.
Implementation Blueprints
| Specification | Core Focus |
|---|---|
| 2DA (2D Array) | Binary/text relational database format managing core engine rules, constants, and stats. |
| TLK (Talk Table) | Centralized localized string dictionary managing all in-game dialogue and UI text. |
| VIS (Visibility Graph) | Binary topology mapping the rendering culling relationships between area geometry rooms. |
| LYT (Layout File) | ASCII configuration defining spatial positioning and linking of a module’s room geometry. |
| LTR (Letter Frequency) | Character-frequency matrices supporting the in-game random name generator algorithms. |
2DA (2D Array)
2DAs are data tables defining the engine’s core rules and constraints (such as item costs and Force powers, which the engine internally stores as spells.2da). They bridge the gap between human-readable text for modding and fast-loading binaries for the final game.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .2da |
| Magic Signature | 2DA / V2.b (Binary) or V2.0 (Text) |
| Type | Tabular Data |
| Rust Reference | View rakata_formats::TwoDa in Rustdocs |
Data Model Structure
The rakata-formats crate parses 2DAs so that binary and text formats look identical to the rest of the application. The TwoDa container lets developers simply retrieve cells using twoda.cell(row, "Label"), completely hiding the inner offset calculations and padding differences between text and binary structures.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for 2D Arrays mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from C2DA::Load2DArray at 0x004143b0.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Magic/Version Gate | The engine first checks for the "2DA " signature. It then branches down a binary parsing path for "V2.b" or a text parsing path for "V2.0". Any other version string triggers an instant load failure. |
Binary Load (V2.b) | The parser starts with an 8-byte skip into the file (data_ptr = raw_data_ptr + 8), jumping right past the header to the starting newline character. Column headers are a tab-separated, null-terminated block. The cell offsets are then parsed as an array of u16 integers (rows × cols) in row-major order. |
Text Load (V2.0) | The text parser strips whitespace and newlines, specifically hunting for "DEFAULT:" or "DEFAULT" blocks. When parsing individual cells, the literal text "****" is converted into an empty string "" to signal the fallback rule. Finally, it runs _strlwr on all column headers to immediately convert them to lowercase. |
Tip
Orphaned Size Field: In binary row blocks, the 2-byte
cell_data_sizeu16is completely bypassed. The engine skips it with+2and performs no reading or validation.
TLK (Talk Table)
The Talk Table is a massive localized string repository. Every item description, line of dialogue, and UI text in KOTOR references an index (a StrRef) pointing into this master dictionary file.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .tlk |
| Magic Signature | TLK / V3.0 |
| Type | Localized String Bundle |
| Rust Reference | View rakata_formats::Tlk in Rustdocs |
Data Model Structure
The entire Talk Table format maps to the rakata_formats::Tlk struct. Each entry fuses the separated audio and text flags into a single TlkEntry. The struct safely handles missing text flags natively, preventing out-of-bounds string lookups if an entry contains audio parameters but no valid string text offset.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for Talk Tables mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CTlkFile::ReadHeader at 0x0041d890 and CTlkFile::AddFile.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Magic Check | Function: CTlkFile::ReadHeader (0x0041d890)The parser requires a "TLK " signature. However, strict version validation is entirely absent. The engine accepts essentially any version tag without raising a failure. |
| Size Dispatching | Function: CTlkFile::ReadHeader (0x0041d890)While the version isn’t used for rejection, it dynamically determines memory block sizing. A "V3.0" tag dictates 40 bytes (0x28) per entry, whereas any other version tag automatically falls back to 36 bytes (0x24). |
| Feminine Dialects | Function: CTlkFile::AddFileWhen mounting the primary archive, the engine systematically queries the directory for a secondary <basename>F.tlk (e.g., dialogF.tlk) specifically to supply overriding feminine vocabulary strings for character-gendered text queries. |
VIS (Visibility Graph)
VIS is an ASCII graph structure used extensively by the rendering engine to calculate occlusion culling. It plots mathematical relationships defining which room meshes are visible from any given observer room.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .vis |
| Magic Signature | None |
| Type | Room Graph |
| Rust Reference | View rakata_formats::Vis in Rustdocs |
Data Model Structure
The rakata-formats crate parses raw VIS text blocks into a strongly typed Vis structure. Rather than storing flat arrays of strings, Vis models room visibility as an adjacency list using BTreeMap<String, BTreeSet<String>>. This structural choice guarantees deterministic lookups while automatically mimicking the engine’s internal deduplication algorithms.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for Visibility graphs mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from Scene::LoadVisibility at 0x004568d0.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Text Loading | Function: Scene::LoadVisibility (0x004568d0)The .vis file is executed purely as raw text. The engine continuously extracts observer and child string pairs by looping AurResGetNextLine() over the file buffer. |
| Silent Forgiveness | Function: Scene::LoadVisibility (0x004568d0)If the parser extracts a room reference (either observer or child) that does not exist in the active area layout (which it verifies via a FindRoom call), the visibility entry is quietly dropped without crashing or generating logs. |
| Bidirectional Application | Function: Scene::SetVisibilityCalling SetVisibility(room_a, room_b, 1) inherently maps both visualization paths. The function inserts room_b into room_a’s visibility list, and immediately mirrors by adding room_a to room_b’s list while executing native deduplication. |
| Write Generation | Function: Scene::SaveVisibilityWhen generating a .vis file natively, the engine relies on an _sscanf block structure mapping to "%s%d" and uniformly pads a dual-space indent onto all child elements beneath observer headers. |
LYT (Layout File)
LYT files are ASCII configuration arrays that define the spatial 3D placement and orientation of independent room models to construct a complete area map.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .lyt |
| Magic Signature | None |
| Type | Plain Text Layout |
| Rust Reference | View rakata_formats::Lyt in Rustdocs |
Data Model Structure
The rakata-formats crate parses LYT files into the strongly-typed Lyt container. The parser segregates the raw nested lines into distinct rooms, tracks, obstacles, and doorhooks collections, natively mapping coordinate strings into engine-standard Vec3 and Quaternion structs for immediate mathematical interoperability.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for Layout configurations mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CLYT::LoadLayout at 0x005de900.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Newline Bounds | The parser heavily expects explicit \r\n (CRLF) endings. Scanning extracts target strings utilizing _sscanf("%[^\r\n]", ...) patterns and frequently relies on blind +2 byte pointer leaps to manually clear the terminators. |
| Preamble Skipping | All file lines existing prior to the beginlayout execution marker (such as the ubiquitous #MAXLAYOUT ASCII header) are deliberately skipped and ignored. |
| Sequential Parsing | The structure mandates a rigid sequential ingestion. Data collections must explicitly appear geographically in the exact order: roomcount → trackcount → obstaclecount → doorhookcount → donelayout. |
Warning
Boundary Oversight While the engine systematically verifies
donelayoutboundaries separating the primary collections, the underlying parse loop functionally neglects to verify the finaldonelayoutsignature upon closing thedoorhookssegment.
LTR (Letter Frequency)
LTR files contain matrices defining the probabilistic sequence groupings of letters used by the engine’s random name generator.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .ltr |
| Magic Signature | LTR / V1.0 |
| Type | Naming State Matrix |
| Rust Reference | View rakata_formats::Ltr in Rustdocs |
Data Model Structure
The rakata-formats crate maps character frequency architectures directly into the strongly-typed Ltr container, safely abstracting away the fallible raw string-parsing logic for downstream implementations.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for Letter Frequency structures mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CResLTR::OnResourceServiced at 0x00712410.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Magic Validation | The native parser enforces a mandatory "LTR " signature and strictly validates the "V1.0" format tag. These parameters collectively structure a rigid 9-byte header block. The sequence natively defines the letter_count variable as a single byte resting exactly at offset +0x08. |
| Contiguous Ingestion | Memory buffer extraction initiates immediately at offset +0x09. The parser algorithm sequentially extracts natively chained string arrays grouping start, middle, and end blocks to map against procedural probability matrices. |
| Payload Bounds Check | Upon closing the read operations, the memory allocator immediately verifies a structural bounding condition asserting that the terminal parsing offset explicitly matches the buffer array’s total byte allocation length. |
Audio Formats
KOTOR handles audio via specialized implementations of the Miles Sound System, utilizing specific prefix wrappers for streaming dialogue, sound effects, and lip-syncing animations.
Implementation Blueprints
| Specification | Core Focus |
|---|---|
| WAV (Waveform Audio) | Modified audio streams typically utilizing a proprietary Miles Sound System prefix wrapper. |
| LIP (Lip Synching) | Timed phonetic animation sequence data mapped explicitly to character speech tracks. |
| SSF (Sound Set File) | Mapping configuration assigning specific audio events to standard creature interaction triggers (e.g., attacking or dying). |
WAV (Waveform Audio)
While standard RIFF WAV files are supported, KOTOR utilizes a multi-tiered routing structure to evaluate audio buffers dynamically based on whether the file encapsulates voice-overs (VO), ambient sound effects (SFX), or unmodified bytes.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .wav |
| Magic Signature | RIFF |
| Type | Streamed / Buffered Audio |
| Rust Reference | View rakata_formats::Wav in Rustdocs |
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for audio formats mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoSoundInternal::LoadSoundProvider at 0x005d9140.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
Standard Audio (WAV) | If the payload begins with the exact "RIFF" 4-byte signature and evaluates dynamically as a non-MP3 track, the parser initiates at offset 0 and transmits the contiguous buffer to the Miles Sound System without execution modification. |
Ambient Audio (SFX) | When evaluated as an SFX structure, the "RIFF" signature is deliberately absent from offset 0. The engine interprets a custom proprietary configuration prefix that displaces the standard "RIFF" block exactly 470 bytes into the payload buffer (+0x01d6). The execution structure calculates size = file_size - 0x1d6 and strictly extracts the resulting sub-segment. |
Voice Audio (VO) | For streaming voice-over tracks, the .wav wrapper successfully begins with a "RIFF" tag. However, structural logic asserting riff_size + 8 < file_size effectively succeeds. The memory engine immediately seeks to byte offset riff_size + 8 and subsequently pipes the remaining data exclusively as a literal .mp3 stream. |
| Delegation Hand-off | The main executable natively acts as a dispatch router, executing almost zero internal chunk structural parsing routines. Total specialization for deep RIFF chunk deserialization is deferred unconditionally to the external Miles Sound System layer. |
LIP (Lip Synching)
LIP files provide keyframed facial morph data directly bound to audio streams, instructing character models how to physically animate their mouths to match speech.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .lip |
| Magic Signature | LIP V1.0 |
| Type | Facial Animation Keyframes |
| Rust Reference | View rakata_formats::Lip in Rustdocs |
Data Model Structure
The rakata-formats crate maps LIP binaries into the Lip structure. It extracts the raw 5-byte sequential keyframe array and cleanly projects it into a format that pairs each chronological float timestamp directly with its localized mouth shape.
Structural Layout
| Offset | Type | Description |
|---|---|---|
0x00 | CHAR[8] | Signature (LIP V1.0) |
0x08 | FLOAT | Animation Length |
0x0C | DWORD | Entry Count |
0x10 | Struct[] | Keyframe Array (5 bytes per entry) |
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for Lip Synching animations mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CLIP::LoadLip at 0x0070c590.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Zero-Copy Loading | The engine handles LIP files as completely flat structures. Instead of parsing the variables out individually, it simply verifies the "LIP V1.0" signature and pulls the animation length and entry count directly from offsets +0x08 and +0x0C. |
| Direct Array Assignment | The keyframes are packed into identical 5-byte chunks (a 4-byte float for the timestamp, and a 1-byte integer determining the mouth shape). Because of this flat layout, the engine never loops through the data to read it. It simply points its internal animations memory pointer perfectly to file offset +0x10 and natively runs the animation straight off the raw file buffer. |
SSF (Sound Set File)
Sound sets map specific generic triggers (e.g. “Battle Cry”, “Agony”, “Selected”) to physical sound references by mapping enum hooks to strings.
At a Glance
| Property | Value |
|---|---|
| Extension(s) | .ssf |
| Magic Signature | None |
| Type | Enum-String Mapping |
| Rust Reference | View rakata_formats::Ssf in Rustdocs |
Data Model Structure
The rakata-formats crate maps SSF files into the Ssf structure. It parses the raw table offset and builds a collection of 28 nullable sound reference integers mapped directly back to their standard gameplay triggers.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for Sound Set mappings mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSoundSet::GetStrres at 0x00678820.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Finding the Table | The parser reads a single 4-byte integer (DWORD) at offset +0x08. This number acts as a direct distance pointer, telling the game explicitly where the audio mapping table begins inside the file payload. |
| Reading the Slots | Starting directly at that pointer, the engine grabs exactly 28 continuous integers. Each position in this span represents a hardcoded character action (e.g. slot 1 is always ‘Battle Cry’, slot 2 is always ‘Agony’). |
| Handling Blanks | Obviously, not all characters have recorded audio for every obscure trigger. If a sound slot is supposed to be empty, it utilizes the default sentinel value 0xFFFFFFFF (-1) to let the engine know to skip playback. |
Note
1-Indexed Triggers When modders fire off audio events using gameplay scripts, the event identifiers are natively 1-indexed (1 to 28). To find the matching audio string underneath, the engine simply subtracts
1behind the scenes to correctly navigate the literal0-indexedarray in memory.
Resource System & Resolution
The Odyssey Engine’s resource resolution dictates exactly how the game searches for files when it needs to render a texture, load a module, or mount a script – including the exact precedence logic when multiple mods attempt to overwrite the same asset.
TXI Sidecar Lookup
Texture Extensions (TXIs) are independent ascii text configurations used to override material instructions for specific graphics.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for TXI sidecar files mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from AurResGet at 0x0044c740.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Global Callback | When the game needs a TXI file, it always routes through a global helper calling AurResGet(name, ".txi", ..., true). Three different rendering systems use this exact same path to hunt for TXIs: CAurTextureBasic::Init, Gob::EnableRenderBumpedOut, and Material::Init. |
| Total Independence | Because AurResGet only checks the raw filename and the .txi extension, it performs a totally fresh, global search through the game’s file systems. It does not know or care where the parent texture actually came from (like a specific BIF archive). |
Note
Because it is entirely independent from the parent texture handle,
swkotor.exesupports pulling a TXI from the/overridefolder even if the parent texture was sourced natively from aKEY/BIFpackage. Rakata maintains this independent sidecar lookup model natively via therakata_extract::resolver::TextureWithTxiResultlogic to guarantee resolver parity.
Key/BIF Resolution Mapping
The engine has a strict hierarchical override order when hunting for identical overlapping resource identifiers across multiple virtual disk mounts.
Engine Audits & Decompilation
The following documents the engine’s exact resource directory search order mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoKeyTable::FindKey at 0x0040ec50.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| First-Match Exit | When hunting for a file, the key table loops through standard folders in a hardcoded order. The second it finds a matching file name, FindKey returns success and completely ignores any duplicates hiding deeper in other archives. |
| Duplicate Checking | During startup, the engine’s AddKey function actually scans for duplicates. If it finds one, it ignores it, permanently locking in the file that had the higher resolution priority. |
Tip
Resolution Priority:
resource_directory(Override folder) →ERF(Pass 1) →RIM→ERF(Pass 2) →Fixed / Archive
Module Loading Priorities
Modules orchestrate KOTOR’s area hubs. They are layered collections of ERF/RIM files functioning as a localized state.
Composition Loading Precedence
Because KOTOR modules are often fragmented into multiple discrete archive files (e.g., separating rigid layouts from variable area dialog), it uses the following concrete precedence when constructing a single “virtual” module (the order below lists the highest priority target first).
1. <root>_dlg.erf (K2 Dialog overrides)
2. <root>_s.rim (Supplemental properties)
3. <root>_a.rim (Base Area) if present, ELSE <root>_adx.rim (Extended Area) if present, ELSE <root>.rim (Main/Vanilla)
4. <root>.mod (Single-file Mod archive)
Tip
Rust Integration The
rakata-extractcrate natively replicates this exact priority order through theCompositeModulestruct. When you pass a directory path toCompositeModule::load_from_directory, it automatically scans the folder and merges the_dlg,_s,_a/_adx, and base.modfiles together using the engine’s strict precedence hierarchy.
Engine Audits & Decompilation
The following documents the engine’s exact load sequence for module assemblies mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CExoResMan::AsyncLoad at 0x004094a0.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Primary MOD Search | The game natively attempts to load the highest-level package by explicitly targeting the MODULES:<root>.mod path first. |
| RIM Fallback Chains | If the .mod file doesn’t exist, the system catches the failure and immediately shifts to look for the <root>_s.rim fallback. |
| Area Extension Probes | Throughout the module loading process, the engine actively probes for the <root>_a.rim and <root>_adx.rim extension files to violently merge in the physical area geometry. |
Save Game
While KotOR Save Games (.sav) are structurally just ERF containers under the hood, the engine employs complex party-synchronization and module loading logic to physically reconstruct the player’s session.
Data Model Structure
A standard Save ERF container packages a specific set of internal GUI and logic files that the game actively requires to reconstruct a valid player state.
savenfo.res
The overarching save metadata block, primarily responsible for the main menu UI.
save_name: Display string for the save file.pc_name: (Optional in K1) Player character name.area_name: Localized display name for the area.last_module: The resref of the specific module being loaded.time_played: Running game time.cheat_used: Global boolean flag to mark corrupted/cheat sessions.
globalvars.res
The universal state trackers running the campaign plot.
- Segmented explicitly into
numbersandbooleans. - Each global uses a strict Symbol Name.
partytable.res
The live snapshot of the physical adventuring group and global resources:
- Shared
creditsand sharedparty_xp. cheat_used: Independent table-specific cheat flag.members: Fixed list of party members tracking who is currently active and who isis_leader.journal_entries: Currently active quest plot_ids and their numeric stages.
Additional Constituents
- Character List: Populated using a mix of sources (
leader,pc, andavailnpc*resources). - Inventory: Represents all items the player carries (
inventory.res), tracking stack sizes, charges, and upgrade bitfields. - Doors: Transient state (locked/open attributes) extracted directly from the module’s
GIT.
Tip
Rust Integration The
rakata-savecrate handles this structure natively via theSaveEditorModelstruct. You can use it to directly parse, validate, and write back these internal save components without manually managing the ERF layer.
Engine Audits & Decompilation
The following documents the engine’s exact state restoration logic mapped from swkotor.exe.
(Decompilation logic for this section was audited and verified via native Ghidra pipeline against swkotor.exe, explicitly pulling from CSWPartyTable::SaveTableInfo at 0x005648c0.)
| Pipeline Event | Ghidra Provenance & Engine Behavior |
|---|---|
| Cheat Synchronization | The CHEATUSED flag in the savenfo header file isn’t tracked in isolation. When saving the game, the engine simply copies the raw PT_CHEAT_USED numeric flag directly from the party table to keep the UI in sync. |
| Character Loading Hierarchy | When the engine pushes a character into the game during a standard load (via LoadCharacterFromIFO), it defaults to pulling character data strictly from the active module’s IFO roster matrix. It only falls back to reading the save’s dedicated .pifo (Party Info) file if the target parameter index explicitly equals 0xffffffff. |
| Area Restoration | To rebuild the dynamic state of the room you were standing in (like which doors are open or locked), the engine restores the area’s Game Instance data by targeting the GIT resource type (0x7E7) and matching it against the module’s core resref string. |
Warning
There is zero K1 runtime string evidence for K2 (The Sith Lords) crafting or influence fields (
PT_ITEM_COMPONEN,PT_INFLUENCE,UpgradeSlot*). If constructing K1-native party utilities, those fields must be aggressively excluded!
Engine Internals
This section contains notes and breakdowns of the Odyssey engine’s execution pipelines, case studies on community tooling bugs, and other engine-level logic or behaviors that are discovered during clean-room reverse engineering. These notes partially serve as the foundational research powering rakata-lint.
Research Notes
| Topic | Description |
|---|---|
| MDL & MDX Deep Dive | Deep dive into the Ghidra decompilation notes detailing the exact byte-level layout of the binary MDL/MDX format and the engine loading pipeline. |
| GFF List Corruption | Case study analyzing out-of-bounds GFF list behavior in the Odyssey engine vs. loose community tooling abstractions. |
GFF List Index Corruption
Summary
A binary GFF writer can silently corrupt list mapping if it writes list index entries in a way that allows recursive nested-list writes to interleave with the parent list’s index block.
This is a compatibility-critical issue for KOTOR data because many resources depend on stable list ordering and correct struct index mapping.
How GFF Lists Work
In binary GFF, a List field stores:
- A relative offset into the
list_indicestable. - At that offset:
count(u32)countstruct indices (u32 each), each pointing into the struct table.
If these indices are wrong, the parser will load the wrong list structs.
Failure Mode
The bug class occurs when a writer:
- Starts writing a parent list.
- Recursively builds child structs.
- Appends list indices directly while recursion is still producing nested list index data.
Because nested lists also write into the same list_indices buffer, parent and child index blocks can interleave and the parent list can point at unintended structs.
Observable Symptoms
- Struct IDs in list entries change after roundtrip.
- Expected fields are missing from entries after roundtrip.
- Mod compatibility breaks for list-heavy resources due to reordered/remapped entries.
Correct Writer Strategy
For each list field:
- Write list count.
- Reserve contiguous slots for all struct indices up front.
- Build each child struct recursively.
- Backfill each reserved slot with the final struct index.
This guarantees parent list index layout is stable even when nested lists write their own index blocks.
Implementation Status
In this repository:
rakata-formats/src/gff/writer.rsreserves list index slots and backfills them.- Regression tests cover:
- synthetic list order + struct-id stability
- UTC fixture roundtrip stability on lists like
FeatList,ItemList,ClassList.
rakata-generics/src/utc.rsincludes a no-op rebuild test to ensure typed conversion does not drift list order/IDs.
The MDL/MDX Format
BioWare’s Aurora/Odyssey engine stores 3D models in a pair of files:
.mdland.mdx. This page documents what’s inside them, how the engine consumes them, and – occasionally – why they look the way they do. Evidence throughout is drawn from Ghidra decompilation ofswkotor.exe(K1 GOG build), cross-checked against hex dumps of vanilla assets and community references (kotorblender,mdledit,mdlops,pykotor,reone,xoreos).
Overview
At a glance:
| Property | Value |
|---|---|
| Extensions | .mdl, .mdx |
| Magic | Binary: first u32 == 0. ASCII: text (filedependancy, newmodel, …) |
| Type | Hierarchical scene graph + animation + vertex data |
| Resource type ID | 2002 (MDL), 3008 (MDX) in KEY/BIF |
| Rust reference | rakata_formats::Mdl |
A model is a tree of nodes. Each node carries a transform (position + orientation), an animation track (“controllers”), and – depending on its type – geometry, light parameters, particle-emitter configuration, a skinning skeleton, a lightsaber blade, and so on. One MDL file can carry multiple named animations that operate on that tree.
The surprising shape of the format only makes sense once you understand one design choice, so let’s start there.
The core idea: load-and-fixup
The binary MDL is not a parsed format in the usual sense. The engine does not walk a byte stream field by field, calling read_u32, read_string, read_float. Instead, it does this:
- Allocate a buffer exactly the size of the model data.
- Copy the whole file into that buffer in one
memcpy. - Walk the now-in-memory structure and convert relative offsets into absolute pointers.
That’s it. The “parser” is a pointer rewriter. Every Reset* function you’ll see in the engine (InputBinary::Reset, ResetMdlNode, ResetTriMeshParts, …) takes a buffer base pointer and a struct pointer, and its job is essentially struct->field += base for every relocatable pointer in the struct, recursing into children as it goes.
An analogy: think of IKEA instructions that say “screw part A into the hole next to part B” rather than giving exact millimetre coordinates. The instructions are valid anywhere you choose to assemble the furniture. The MDL blob is identical: every pointer is expressed relative to the blob’s origin, so the engine can drop the blob anywhere in memory and then do a one-time pass to convert those relative offsets to real addresses.
This design choice ripples through everything:
- On-disk layout matches in-memory layout exactly. If a
MdlNodeTriMeshis 412 bytes in RAM, it’s 412 bytes on disk. Struct field offsets you see in a Ghidra decompilation are the file offsets. - Binary files are architecture-bound. This format is a snapshot of a specific compiler’s struct layout on 32-bit Windows. Field alignment, pointer size (4 bytes), endianness (little), and even padding bytes all match that ABI.
- “Parsing” is really validation + relocation. A Rust reader doesn’t need to convert a byte stream into a Rust struct; it needs to interpret a memory image as a struct overlay, following pointers to walk the tree.
- The engine never writes binary MDL. The shipping engine only has code to emit ASCII MDL. Binary MDL is produced exclusively by BioWare’s model compiler (a build-time tool). The runtime reads it but never round-trips it.
With that frame in place, the rest of the format falls into shape.
File structure
The 12-byte wrapper
The file begins with a tiny header:
| Offset | Type | Field | Notes |
|---|---|---|---|
| +0x00 | u32 | zero marker | Always 0. Used to tell binary from ASCII. |
| +0x04 | u32 | MDL content size | Bytes of model data that follow. |
| +0x08 | u32 | MDX file size | Size of the accompanying .mdx file. |
Input::Read at 0x004a14b0 is the dispatcher: it peeks at the first byte, and if it’s \0 the file is binary (the first u32 is always zero). Otherwise the file starts with ASCII tokens like filedependancy or newmodel, and processing hands off to a line-based interpreter.
For binary files, InputBinary::Read at 0x004a1260 does the rest:
- Record
mdl_content_sizeandmdx_file_sizefrom the wrapper. - Allocate a heap buffer the size of the MDL content;
memcpythe model data into it. - If MDX size is non-zero, allocate a second buffer and
memcpythe MDX file into it. - Call
Reset(mdl_buf, mdx_buf, resource_handle).
Note: the wrapper is not part of the model data. Byte 12 of the on-disk file is byte 0 of the in-memory MDL blob. All internal offsets are relative to the in-memory origin.
Three kinds of pointer
Inside the MDL blob you’ll encounter three distinct flavours of “pointer”, which is worth keeping straight:
- MDL-relative offsets – the vast majority. Relocated to absolute pointers by
Reset*functions. On re-serialization, they must be rewritten back to relative offsets. - MDX-file byte offsets – used by a few fields (e.g. per-mesh
mdx_data_offsetat +0x144) to locate vertex data in the separate MDX file. - String pointers – themselves MDL-relative, but pointing into a string table at the end of the blob, pointed to by the name-offsets array at model +0xB8.
Confusingly, there are two similarly named fields on each mesh node: mdx_data_offset at +0x144 (an MDX file offset) and vert_array_offset at +0x148 (a content-relative pointer to embedded position data). Conflating these produced one of the nastier bugs in our reader’s history (see War stories below).
Model header
Once the blob is in memory, InputBinary::Reset at 0x004a1030 walks the model header. Here’s the relevant field map:
| Offset | Field | Notes |
|---|---|---|
| +0x00 | ModelDestructor vptr | Populated at load time. |
| +0x04 | ModelParseField vptr | Populated at load time. |
| +0x28 | root node offset | Relocated. ResetMdlNode recurses from here. |
| +0x48 | resource handle | Populated at load time. |
| +0x4C | type byte | `GetType() |
| +0x50 | classification | 0=Other, 1=Effect, 2=Tile, 4=Character, 8=Door. |
| +0x54 | ref count | |
| +0x58 | animations array ptr | Relocated; count at +0x5C. |
| +0x64 | supermodel pointer | Populated via FindModel(buf+0x88). |
| +0x68..+0x80 | bbox min/max | Vector bmin, bmax. |
| +0x80 | radius | f32, default 7.0. |
| +0x84 | animation scale | f32, default 1.0. ASCII: setanimationscale. |
| +0x88 | supermodel name | char[36], null-terminated. Drives recursive model load. |
| +0xA8 | node array (secondary) | Relocated if non-zero. |
| +0xAC | MDX vertex pool offset | Source offset into MDX data (consumed into a GL pool). |
| +0xB0 | MDX data size | Size of the vertex-pool copy. |
| +0xB8 | name offsets array ptr | Relocated; count at +0xBC. Array entries also relocated. |
Two fields deserve special mention:
-
+0x50 classification is the model’s high-level category (Character, Door, Tile, …). It’s never read during the
Resetpass – it’s carried through as part of the memory-mapped blob and consulted at runtime. Cross-validated against hex dumps:File +0x50 Category c_dewback.mdl0x04 Character ✓ dor_lhr01.mdl0x08 Door ✓ m01aa_01a.mdl0x00 Other ✓ -
+0x88 supermodel name is a 32-byte (plus 4 padding) ASCII name. Loading a model with a supermodel triggers a recursive
FindModelcall for that name – think of supermodels as CSS-style inheritance, where animation data and bones defined on the parent are available to the child.
The node tree
The root node sits at model +0x28. From there, children are reached through a standard in-memory array layout: ptr + count_used + count_allocated at offsets +0x2C, +0x30, +0x34. This three-u32 pattern is BioWare’s CExoArrayList and shows up everywhere in the format – any time you see “12 bytes of array header”, this is what it is.
Base node layout (80 bytes)
All node types begin with the same 80-byte header:
| Offset | Size | Field | Notes |
|---|---|---|---|
| +0x00 | u16 | node_type | Flag bitmask. Drives type dispatch. |
| +0x02 | u16 | node_id | Sequential 0..N-1. |
| +0x04 | u16 | node_id_dup | Identical copy of node_id. Never read. |
| +0x06 | u16 | padding | Always zero. |
| +0x08 | u32 | name pointer | Relocated. Points into the string table. |
| +0x0C | u32 | parent pointer | Relocated if non-zero. |
| +0x10 | 12 | position | Vector{x, y, z} as 3×f32. |
| +0x1C | 16 | orientation | Quaternion{w, x, y, z} as 4×f32. |
| +0x2C | 12 | children array | CExoArrayList of MdlNode*. |
| +0x38 | 12 | controller keys array | CExoArrayList of NewController (16B each). |
| +0x44 | 12 | controller data array | CExoArrayList of float (packed key data). |
The two bytes at +0x04 are a redundant duplicate of node_id – always identical to +0x02 across 209 nodes verified across four vanilla files, zero mismatches. No known engine function reads it. Best guess: legacy field or exporter artifact. It’s preserved for round-trip fidelity but has no semantic meaning.
A few conventions worth noting:
- Quaternion order is
(w, x, y, z). Confirmed viaGob::GetOrientationat0x004499a0which copies fields in that order. Identity quaternion is[1.0, 0.0, 0.0, 0.0]. The Rust API uses the same convention. - Position and orientation are read directly from the blob. They’re not relocated – they’re inline values, not pointers.
- Only two fields need relocation in the base header: name pointer at +0x08 and parent pointer at +0x0C.
InputBinary::ResetMdlNodeParts at 0x004a0b60 handles the base relocations and then recurses: for each entry in the children array, relocate the child pointer and call ResetMdlNode on it.
Type dispatch
InputBinary::ResetMdlNode at 0x004a0900 reads the node_type field and dispatches:
node_type | Handler | Kind |
|---|---|---|
0x0001 | ResetMdlNodeParts only | Dummy / base |
0x0003 | ResetLight | Light |
0x0005 | ResetMdlNodeParts only | Emitter |
0x0009 | ResetMdlNodeParts only | Camera |
0x0011 | ResetMdlNodeParts only | Reference |
0x0021 | ResetTriMesh → ResetTriMeshParts | TriMesh |
0x0061 | ResetSkin | Skin mesh |
0x00A1 | ResetAnim | AnimMesh |
0x0121 | ResetDangly | Dangly mesh (cloth) |
0x0221 | ResetAABBTree + ResetTriMeshParts | Walkmesh with AABB |
0x0401 | (no-op) | Trigger / unused |
0x0821 | ResetLightsaber | Saber mesh |
The type values are stored as a lookup table in the executable at 0x00740a18 (12 × u32).
Though the type codes are shaped like a bitmask – HEADER=0x01, LIGHT=0x02|HEADER, EMITTER=0x04|HEADER, TRIMESH=0x20|HEADER, SKIN=0x40|TRIMESH, SABER=0x800|TRIMESH, and so on – the dispatch is an exact value match, not individual bit checks. The bitmask structure is meaningful (skin is a superset of trimesh, for instance), it’s just not how the engine branches.
Size summary
Every node type has a known fixed size, both on disk and in memory:
| Flag | Type | Total | Base | Extra | Extends |
|---|---|---|---|---|---|
| 0x0001 | Base | 80 | 80 | 0 | – |
| 0x0003 | Light | 172 | 80 | 92 | MdlNode |
| 0x0005 | Emitter | 304 | 80 | 224 | MdlNode |
| 0x0009 | Camera | 80 | 80 | 0 | MdlNode |
| 0x0011 | Reference | 116 | 80 | 36 | MdlNode |
| 0x0021 | TriMesh | 412 | 80 | 332 | MdlNode |
| 0x0061 | Skin | 512 | 412 | 100 | TriMesh |
| 0x00A1 | AnimMesh | 468 | 412 | 56 | TriMesh |
| 0x0121 | Dangly | 440 | 412 | 28 | TriMesh |
| 0x0221 | AABB | 416 | 412 | 4 | TriMesh |
| 0x0401 | Trigger | 80 | 80 | 0 | MdlNode |
| 0x0821 | Saber | 432 | 412 | 20 | TriMesh |
Verified via ParseNode’s operator_new(size) calls and Ghidra struct definitions. All mesh subtypes extend MdlNodeTriMesh – their extra data begins at node offset +0x19C, immediately after the TriMesh block.
Node types in depth
The lightweight types
Camera (0x009) has no extra data. Same 80-byte footprint as the base node. ResetMdlNode dispatches to ResetMdlNodeParts only. There are no camera-specific ASCII fields either – the ASCII parser also falls through to the base handler.
Reference (0x011) carries just two fields in 36 extra bytes: a 32-byte ref_model name and a 4-byte reattachable flag. Both inline (no pointers to relocate).
Trigger (0x401) – the decompiled ResetMdlNode explicitly returns void without calling any reset function for this type. In practice it appears to be unused in shipping content.
Light (0x003)
Lights carry 92 bytes of extra data. Most of the scalar fields are straightforward (priority, shadow flag, ambient-only flag, flare radius, etc.), but lights are the most complex non-mesh type because of their array fields:
| Extra offset | Field | Layout | Runtime relocation |
|---|---|---|---|
| +0x04 | texture SafePointers | 12-byte array header | Zeroed on disk |
| +0x10 | flaresizes | CExoArrayList | ptr relocated |
| +0x1C | flarepositions | CExoArrayList | ptr relocated |
| +0x28 | flarecolorshifts | CExoArrayList | ptr relocated |
| +0x34 | texturenames | CExoArrayList<char*> (each ptr too!) | all ptrs relocated |
Lights also drive their colour, radius, shadow radius, vertical displacement, and multiplier via controllers (types 0x4C, 0x58, 0x60, 0x64, 0x8C) – these live in the base node’s controller arrays, not in the light-specific block.
Emitter (0x005)
Emitters are 304 bytes and – pleasantly – contain no relocatable pointers. Everything is inline: a fistful of floats and ints, four 32-byte name fields (update, render, blend, texture), and a 16-byte chunk_name. The full field map is in the appendix.
The most important field is update at extra offset +0x20. It’s the emitter type string, a case-sensitive selector against:
"Fountain"→ steady particle stream (most common)"Explosion"→ one-shot burst"Single"→ single particle"Lightning"→ lightning-bolt effect
MdlNodeEmitter::InternalCreateInstance at 0x0049d5c0 branches on this string to instantiate the appropriate runtime emitter class.
Known engine-level footgun: controller 502 (
detonate) is only valid on"Explosion"emitters.InternalCreateInstanceonly allocates the detonation memory for that branch, so adetonatecontroller on a"Fountain"emitter reads unallocated memory at runtime and crashes. This is a known flaw inmdlops-based exporters (KotorMax);rakata-lintwill validate this.
TriMesh (0x021)
This is the big one. 332 bytes of extra data, encoding everything you’d expect in a mesh plus many things you wouldn’t.
Inline fields
At a high level:
- Runtime function pointers (+0x00, +0x04): written by the constructor. Zero on disk; never consumed from a file.
- Faces array (+0x08): CExoArrayList of
MaxFace(32 bytes each). See Face layout below. - Bounding volumes (+0x14..+0x38): bbox min, bbox max, bounding sphere (radius + centre xyz). The sphere is the one actually consumed at runtime –
PartTriMesh::GetMinimumSpherehierarchically unions it with children’s spheres for culling. These sphere fields have no ASCII-parser equivalent; they’re exclusively binary-format fields written by the BioWare toolset. - Material (+0x3C..+0x54): diffuse RGB, ambient RGB,
transparencyhint. - Textures (+0x58..+0x98):
texture_0(primary/diffuse) andtexture_1(secondary/lightmap), each a 32-byte null-terminated string, plus 32 bytes of padding up to +0xE8. - UV animation (+0xEC..+0xF8):
uv_direction_x,uv_direction_y,uv_jitter,uv_jitter_speed. Gated byanimate_uv(+0xE8). - MDX vertex layout (+0x100..+0x12F): flags bitmask plus 11 per-attribute byte offsets. Described in the next subsection.
- Counts and flags (+0x130..+0x13B):
vertex_count(u16),texture_channel_count(u16), six 1-byte booleans (light_mapped,rotate_texture,is_background_geometry,shadow,beaming,render). - Tail (+0x13C..+0x14B):
total_surface_area, one unresolved reserved slot,mdx_data_offset,vertex_data_ptr.
Out of 332 bytes, 61 fields are fully confirmed through Ghidra cross-referencing, 5 are confirmed-unused, 1 is “very likely” (the always-3 indices_per_face), and exactly 1 remains unresolved (the 4 bytes at +0x140, which the constructor initializes to zero and no known function ever touches).
MDX vertex layout
The flags field at extra +0x100 is a bitmask describing what each MDX vertex record contains:
| Bit | Component | Size |
|---|---|---|
| 0x01 | position | 3×f32 (12B) – always set |
| 0x02 | UV1 / tverts0 | 2×f32 (8B) |
| 0x04 | UV2 / tverts1 | 2×f32 (8B) |
| 0x08 | UV3 / tverts2 | 2×f32 (8B) |
| 0x10 | UV4 / tverts3 | 2×f32 (8B) |
| 0x20 | normal | 3×f32 (12B) – always set |
| 0x80 | tangent space | 3×3×f32 (36B) – bump-mapped meshes |
Common patterns in vanilla K1: 0x21 (pos+norm only, 24B stride), 0x23 (+UV1, 32B), 0x27 (+UV2, 40B), 0xA7 (+tangent, 76B).
Note that vertex colours have no flag bit. Their presence is signalled by the per-attribute offset slot being != -1. The 11 offset slots are:
| Slot | Extra offset | Field | Evidence |
|---|---|---|---|
| 0 | +0x104 | position | LightPartTriMesh reads 3×f32, world-transforms |
| 1 | +0x108 | normal | LightPartTriMesh reads 3×f32, rotation only |
| 2 | +0x10C | vertex color | Checked != -1, reads RGB only. Alpha unused. |
| 3 | +0x110 | UV1 | PartTriMesh reads 2×f32 |
| 4 | +0x114 | UV2 | Structural: tverts1 in InternalGenVertices |
| 5 | +0x118 | UV3 | Structural: tverts2 |
| 6 | +0x11C | UV4 | Structural: tverts3 |
| 7 | +0x120 | tangent space | Filled by CalculateTangentSpaceBasis |
| 8–10 | +0x124..+0x12C | reserved | Always -1 across 215 surveyed vanilla meshes |
Vertex colour alpha is unused (confirmed 2026-04-04).
LightPartTriMeshreads only bytes [0], [1], [2] (RGB). Byte [3] is stored but never read. The rendered output hardcodes alpha to0xFF. The fourth byte exists purely for alignment.
Important subtlety: the engine doesn’t trust any of these values on load. InternalPostProcess at 0x0043cf00 recomputes the flags, stride, per-attribute offsets, and mdx_data_offset from scratch, based on which vertex components are actually present in the node’s arrays. It also recomputes vertex normals via edge cross products, and re-derives the bounding box and sphere. The on-disk values preserve the compiler’s original output, but they’re cosmetic from the engine’s perspective.
This has a consequence for tooling: you can largely get away with wrong values in these fields as long as your mesh is otherwise valid, because the engine will fix them up at load time. But a correct writer should still populate them – community tools (kotorblender, mdledit) depend on them, and the BioWare build pipeline does too.
Skin mesh (0x061)
100 extra bytes beyond TriMesh. Skinning data (bone weights, inverse-bind-pose rotation and translation, bone-index mapping) sits here, along with several padding regions:
| Skin offset | Field | Layout | Notes |
|---|---|---|---|
| +0x00 | weights | CExoArrayList | Always zero in binary files. |
| +0x14 | bone_weight_data | ptr | Relocated if count at +0x18 > 0. |
| +0x1C | qbone_ref_inv | CExoArrayList | Inverse-bind rotations. |
| +0x28 | tbone_ref_inv | CExoArrayList | Inverse-bind translations. |
| +0x34 | bone_constant_indices | CExoArrayList | Bone-index remap. |
The weights array deserves a call-out. A 52-byte SkinVertexWeight struct exists and is fully specified by the ASCII parser – 4 bone names, 4 weights, some metadata – but in the binary path, ResetSkin never relocates its pointer, and a corpus scan of all 968 skin nodes across 2832 vanilla models found zero non-empty weights arrays. Binary models store per-vertex bone data exclusively in MDX (via dedicated bone-weight and bone-index offsets), and the weights CExoArray is just a 12-byte zero blob on disk.
AnimMesh (0x0A1)
56 extra bytes. Carries a sample_period scalar and two CExoArrayList fields (anim_verts, anim_t_verts) for time-sampled vertex animation. The remaining six fields (three pointers + three counts + some padding) are runtime-only and zero on disk. Fun fact: no community tool (kotorblender, mdledit, kotormax, reone, xoreos, pykotor) parses AnimMesh nodes – we may have the first structured reader for this type.
Also: ResetAnim is peculiar in that it processes the extra data before calling ResetTriMeshParts, the reverse of every other mesh subtype. There’s no obvious reason for this.
Dangly mesh (0x121)
The simplest mesh subtype, 28 extra bytes. Four fields: a per-vertex constraints CExoArrayList, and three inline floats (displacement, tightness, period) that parameterize the soft-body simulation. A single conditional pointer at the tail is relocated only when the TriMesh vertex count is non-zero.
Dangly meshes are BioWare’s hack for cloth and hair – rigged to the skeleton like a skin mesh, but with simulation parameters that let parts of the geometry lag and swing.
AABB walkmesh (0x221)
4 extra bytes: a single pointer to the root of an AABB tree stored inline in the MDL blob.
The AABB tree is a flattened binary search tree written in DFS preorder. Each node is 40 bytes:
| Offset | Size | Field | Notes |
|---|---|---|---|
| +0x00 | 12 | box_min | 3×f32 AABB minimum corner |
| +0x0C | 12 | box_max | 3×f32 AABB maximum corner |
| +0x18 | 4 | right_child | Content-relative offset (0 = no child) |
| +0x1C | 4 | left_child | Content-relative offset (0 = no child) |
| +0x20 | 4 | face_index | i32. Leaves: ≥ 0. Internal: −1. |
| +0x24 | 4 | split_direction_flags | Axis bitmask: 1=+X, 2=+Y, 4=+Z, 8=−X, 16=−Y, 32=−Z |
Note that right_child comes before left_child in the struct – this is the actual field order, not a typo. Matches Ghidra and the mdledit/mdlops implementations.
Leaf nodes have left = 0, right = 0, face_index ≥ 0, split_direction_flags = 0. Internal nodes have both children non-zero, face_index = -1, and flags computed from the child bounding-box separation. The format is the classic spatial subdivision tree used for fast triangle lookups during pathfinding and collision queries.
ResetAABBTree at 0x004a0260 recurses the tree, relocating each child pointer. It manually unrolls to depth 4 before recursing (the engine’s author was clearly worried about stack depth on a modest C++ compiler).
Lightsaber (0x821)
20 extra bytes – small but architecturally notable:
| Saber offset | Field | Notes |
|---|---|---|
| +0x00 | saber vert data | Relocated pointer |
| +0x04 | saber UV data | Relocated pointer |
| +0x08 | saber normal data | Relocated pointer |
| +0x0C | GL vertex pool ID | Runtime-only (set by RequestPool) |
| +0x10 | GL index pool ID | Runtime-only |
Three arrays of exactly 176 vertices each (NUM_SABER_VERTS = 176, confirmed by kotorblender): position, UV, normal. The saber blade is a fixed-topology mesh – BioWare pre-baked the geometry as a flexible band that can be animated by swinging the endpoint controllers.
Unlike Skin/Dangly/AnimMesh, the saber uses the base TriMesh gen_vertices and remove_temporary_array callbacks. Its geometry doesn’t morph dynamically at the vertex-processing level – the animation is in the controller track.
Controllers and animation
The controller header
Controllers are the keyframe-animation primitive. Each node has an array of 16-byte NewController headers (at node +0x38) plus a shared pool of float data (at +0x44). Each header describes one animatable property of that node:
| Offset | Size | Field | Notes |
|---|---|---|---|
| +0x00 | u32 | type_code | Byte offset of the target property in the Part struct. |
| +0x04 | i16 | supermodel_link | Additive-blending property offset; -1 = no blending. |
| +0x06 | u16 | row_count | Number of keyframes. |
| +0x08 | u16 | time_data_offset | Float-array index for time values. |
| +0x0A | u16 | data_offset | Float-array index for value data. |
| +0x0C | u8 | value_type_and_flags | Low nibble: 1=float, 2/4=quaternion, 3=vector. Bit 4=0x10=Bezier. |
| +0x0D | 3 | padding | Alignment to 16 bytes. Never read. |
The type_code is elegant: it’s literally the byte offset into the Part struct where the animated value lives. NewController::Control dereferences it as *(float*)(part_ptr + type_code). So type_code = 8 means “position” because position sits at Part+0x08; type_code = 20 means “orientation” because orientation sits at Part+0x14 (as a compressed axis-angle quaternion); and so on. This collapses what would otherwise be a switch over property IDs into direct pointer arithmetic.
The value_type_and_flags byte at +0x0C has a compound encoding that bit us hard early on:
- Low nibble (
& 0x0F) – value-type discriminator:1=float,2or4=quaternion,3=vector. Selects the interpolation path (Lerp/Slerp/VectorLerp). - High nibble (
& 0xF0) – flags.0x10signals Bezier interpolation, which triples the per-keyframe value count (each keyframe is value + in-tangent + out-tangent). - Special case: for orientation controllers (type code 20) with raw byte value
== 2, the keyframe is a compressed quaternion packed into a singleu32, not two f32 values.
The low nibble happens to coincide with the “number of floats per keyframe row” for simple cases (1, 3, 4), which is why the earlier interpretation of this byte as column_count mostly worked – until it didn’t. See the controller bug below.
Self-describing rows
Because value_type_and_flags is inline in each controller header, the binary format is entirely self-describing for animation data. The reader doesn’t need a lookup table mapping type codes to column counts – it reads the flags byte and knows how many floats to consume per row.
This is useful because vanilla K1 contains controller type codes (0x68, 0x188) that aren’t documented in any community reference. Trying to parse these with a closed enum caused 517 of 2832 vanilla MDLs (18.3%) to fail. MdlControllerType is therefore a newtype struct MdlControllerType(u32) with named constants for the three universally-confirmed base types (POSITION = 8, ORIENTATION = 20, SCALE = 36) and accepts any other u32 losslessly.
Base vs type-specific controllers
Three controllers are universal – they exist on every node type:
| ASCII name | Code | Columns | Meaning |
|---|---|---|---|
position | 8 | 3 | x, y, z |
orientation | 20 | 4 | x, y, z, angle (compressed axis-angle) |
scale | 36 | 1 | uniform scale factor |
Type-specific codes live at higher numbers: light controllers start at 76 (color), emitter controllers are at 88+. All three base codes also support a Bezier variant (signalled by the flag bit, not a separate type code).
The MDX file: a mystery
Now for the strangest part of the format.
The MDX file contains interleaved vertex data – positions, normals, UVs, tangent space, colours – packed into records of width given by the mesh vertex_stride field, aligned into per-mesh blocks with sentinel-float terminators separating them. It looks exactly like what you’d expect a GPU vertex buffer to look like.
And the K1 engine never reads it.
Here’s the complete trace through InputBinary::Read:
- Read the MDX file into a buffer (
pbVar9). - Call
Reset(mdl_content, mdx_content, resource). Resetpassesmdx_contentasparam_3through a chain of function calls (ResetMdlNode,ResetTriMeshParts, …). Every downstream function hasparam_3as a formal parameter.param_3is never used. InResetTriMeshParts, it’s literally overwritten as a loop counter on line 67.- Back in
InputBinary::Read, line 78:_free(pbVar9). The MDX buffer is freed.
At no point does any vertex-related code path consume MDX data. InternalGenVertices builds vertex buffers from verts_arrays, which lives in the MDL content blob. ProcessVerts recomputes normals from geometry. LightPartTriMesh reads from the GL pool populated at +0xAC of the model header – which is sourced from the MDL content, not the MDX file.
So where does the vertex data actually come from? From a parallel set of position-only arrays stored inside the MDL content blob, pointed to by vert_array_offset at mesh +0x148 (content-relative), with additional UV/colour/normal data in the MdlNodeTriMeshVertArrays structures.
The MDX file, in short, is a redundant interleaved copy of data that the K1 engine could reconstruct from the MDL alone. Most likely theories for why it exists:
- Build-pipeline artifact. BioWare’s Aurora engine (Neverwinter Nights) may have used the MDX format directly, and the K1 pipeline inherited the file-layout convention without the consuming code path.
- Toolset requirement. Third-party editors and the BioWare toolset itself may still parse MDX for authoring workflows.
ResetLitepath. There’s a separate “lightweight” loader (InputBinary::ResetLiteat0x004a11b0) that may use MDX for a reduced in-memory representation – unverified.
For our purposes, this has two consequences:
- Engine-functional MDX is near-trivial. Any MDX file the K1 engine happily ignores is a valid MDX file. You could write all zeros and the game would run.
- Round-trip-accurate MDX requires the per-mesh terminator convention (described next), because community tools do read MDX, and byte-identical round-trip is a useful correctness check.
Per-mesh terminators and alignment
Empirically, vanilla MDX files are larger than sum(vertex_count × stride). Across 2832 vanilla K1 models, 2445 have MDX files with excess bytes, totalling 3,278,456 bytes corpus-wide.
The excess has structure. After each mesh’s vertex data, there’s a terminator row of exactly one stride’s worth of bytes, beginning with three sentinel floats and padded with zeros:
| Mesh type | Sentinel value | Hex (f32 LE) |
|---|---|---|
Non-skin (type & 0x40 == 0) | 10,000,000.0 | 00 96 18 4B |
Skin (type & 0x40 != 0) | 1,000,000.0 | 00 24 74 49 |
Corpus sentinel detection: 6,973 non-skin sentinels, 6 skin sentinels, 0 unknown patterns.
Between meshes, the cursor is padded to the next 16-byte boundary. The last mesh has no trailing alignment:
cursor = 0
for each mesh in MDX order:
cursor += vertex_count × stride # vertex data
cursor += stride # terminator row
if not last mesh:
cursor = (cursor + 15) & ~15 # 16-byte alignment
mdx_file_size = cursor
For stride-24 meshes, the gap between meshes is either 24 or 32 bytes depending on current alignment. For stride-32 and stride-64 meshes, it’s always exactly stride because the stride is already a multiple of 16.
Mesh ordering in MDX
Non-skin meshes come first, then skin meshes. Within each group, the order is DFS-traversal-of-the-tree – mostly. About 27% of vanilla models exhibit a compiler-specific permutation that defers “second children” of paired parents until after all their siblings’ first children. This is reproducible for our own output (if we write DFS, we read DFS), but not for byte-identical round-trip of every BioWare file.
Writing in standard DFS order (non-skin first, skin second) produces semantically identical MDX data with the correct total size. 1784 of 2444 models match byte-for-byte; the remaining 660 have the non-standard compiler ordering.
What this means for mdx_data_offset
The mesh header has two adjacent u32 fields at +0x144 and +0x148:
- +0x144
mdx_data_offset: per-mesh byte offset into the MDX file. Used by community tools to seek directly to that mesh’s vertex block. The engine also uses this afterInternalPostProcessoverwrites it with a GL-pool offset. - +0x148
vert_array_offset: content-relative pointer to the position-only vertex data embedded in the MDL content blob. Used by the engine during load. Relocated byResetTriMeshPartsviaparam_1->field60_0x198 = param_2 + param_1->field60_0x198– whereparam_2is the MDL content base, not the MDX base.
These two fields were conflated under a single MDX_OFFSET = 0x148 constant in our implementation for several months, which caused the reader to lose the MDX offset entirely and the writer to overwrite the content pointer with an MDX offset. Full story in War stories.
Face layout
Faces are 32-byte records (MaxFace) stored in the TriMesh faces CExoArray:
| Offset | Size | Field | Type | Notes |
|---|---|---|---|---|
| +0x00 | 12 | plane_normal | 3×f32 | Face plane normal. |
| +0x0C | 4 | plane_distance | f32 | Plane equation: n·p = d. |
| +0x10 | 4 | surface_id | u32 | Walkability / material identifier. |
| +0x14 | 6 | adjacent | 3×u16 | Indices of adjacent faces (for AABB/pathfinding). |
| +0x1A | 6 | vertex_indices | 3×u16 | Triangle vertex indices. |
The plane normal and distance are pre-computed by the BioWare toolset. They can be re-derived from the geometry but the binary format preserves them. The adjacency graph is what makes AABB walkmesh lookups fast – each triangle points to its neighbours, enabling constant-time stepping during pathfinding.
An early version of our reader assumed 12-byte faces (just the vertex indices). This led to every 2.67th “face” being interpreted from garbage bytes belonging to the next face’s plane normal. It was masked by synthetic round-trip tests – write wrong, read wrong, match! – and only caught when vanilla-file validation found vertex indices exceeding the mesh’s vertex count.
War stories and implementation history
A brief chronicle of the bugs found while building the Rust reader/writer, because the “how we know this” is often as useful as the “what we know”.
The 12-byte face bug
Described above. The MaxFace stride is 32 bytes, not 12. Caught by vertex-index bounds checking against vanilla files.
Mesh header size corrections
The whole mesh extra-header was misunderstood for a long time. A sample of the corrections, all fixed in late February 2026:
VERTEX_COUNToffset was 0x9E → actually 0x130MDX_OFFSETwas 0xB8 → actually two separate fields at 0x144 and 0x148VERTEX_STRUCT_SIZEwas 0xBC → actually 0xFCMESH_EXTRA_SIZEwas 200 bytes → actually 332 (0x14C)RENDERboolean was missing entirely → added at 0x139SHADOWboolean was missing entirely → added at 0x137
All of these stemmed from extrapolating offsets from partial hex dumps rather than decompiling the struct. Ghidra’s MdlNodeTriMesh struct definition settled the whole thing – once the Ghidra type was aligned, the field offsets fell out directly.
Controller column-count encoding
Our reader initially used the raw value_type_and_flags byte (at controller +0x0C) directly as a float count per row. This worked for the common case (position=3, orientation=4, scale=1) but broke in two scenarios:
- Bezier controllers set bit 0x10, turning
raw=3(Bezier position) into a byte value of0x13= 19 columns, not 9. - Integral orientation: ORIENTATION controllers with raw byte
== 2mean “compressed quaternion packed into one u32 per row”, not “2 f32 values per row”.
The integral-orientation case was the more painful bug: a c_dewback scan showed 876 integral-orientation controllers; c_rancor had 1,212. Reading 2 floats instead of 1 consumed double the expected data, desynchronizing every subsequent controller in the data array. Every node’s animation after the first compressed-quaternion keyframe was reading from a shifted window of garbage.
Fix: decode the raw byte with & 0x0F masking plus the two special cases (Bezier multiplies by 3; integral orientation uses 1 u32 per row regardless). The raw byte is preserved in a raw_column_count field for round-trip fidelity.
Animation node_number at +0x02
The 80-byte node header’s first 8 bytes are type_flags (u16), node_number (u16), name_index (u16), padding (u16). Our offset map had NODE_ID = 0x04, which pointed to name_index, not node_number.
For animation nodes specifically, node_number is the engine’s key for matching animation keyframe nodes to their geometry-side skeleton bones. Writing zeros at +0x02 and stuffing the name_index at +0x04 meant every animation node had node_number = 0, so every keyframe targeted the root bone. Visually: characters froze in T-pose with no skeletal motion whatsoever.
Fix: read node_number from +0x02 explicitly; derive name_index from the name map at +0x04.
MDX per-mesh seeking
Our MDX reader used a cumulative cursor assuming non-skin-first DFS ordering. For the ~51% of vanilla models where MDX layout doesn’t match that assumption, vertex data was assigned to the wrong mesh nodes. Self-round-trip tests couldn’t detect this – we were reading and writing the same wrong assignment, which is a consistency check for the tool’s own output, not for correctness against vanilla.
Fix: seek to info.mdx_data_offset (the +0x144 field) for each mesh, matching kotorblender and mdledit behaviour. The cumulative-cursor logic remains in the writer, which produces its own layout and backpatches the offset field; the reader trusts whatever the file says.
Name-table dead entries
220 vanilla K1 models have name tables containing entries that no node references. These turn out to be walkmesh node names (*_wok, *_pwk, *_dwk variants) from BioWare’s build pipeline, which apparently shared a single name table across the MDL and WOK outputs.
The engine only performs indexed lookups via name_index; it never iterates the full table or validates the count. Extra entries are harmless dead weight.
Decision: not preserved. Our writer builds the name table from the node tree (matching kotorblender and mdledit), producing files that are functionally identical but 20–80 bytes shorter. This is a known, benign size delta – not a parity bug.
Emitter controller code verification
All 48 emitter controller type codes were independently verified against the engine binary via Ghidra. For each, we located the __stricmp call for the ASCII field name and traced the controller type value stored on match. Every code matched mdledit’s ReturnControllerName table exactly – no additions, no omissions.
One naming correction: the engine’s canonical string for code 200 is "lightningZigzag" (camelCase Z). mdledit has "lightningzigzag" (all lowercase). Functionally identical because the engine uses __stricmp (case-insensitive), but the engine’s capitalization is now what we emit.
Corpus validation status
As of 2026-02-24: 2832/2832 (100%) structural round-trip success (parse → write → parse → compare). This was achieved after fixing three comparison issues in the test harness:
- NaN ≠ NaN (IEEE 754): 1559 false failures – floats containing NaN don’t equal themselves. Fixed with bitwise
f32::to_bits()comparison. - Parent index ordering: 135 mismatches from depth-first vs. original node ordering. The binary format preserves node ordering but our parent-index reconstruction uses DFS. Semantically equivalent, numerically different – skipped in comparison.
- Face NaN values: exactly one model (
w_dblsbr_001) has NaN in its pre-computed plane_normal/plane_distance, because one of its faces is degenerate. Round-trips correctly once NaN-aware comparison is used.
Byte-level MDL/MDX equality is a separate target – 1784 of 2444 MDX files match byte-for-byte, with the remaining 660 showing the non-standard BioWare compiler traversal discussed earlier.
Appendix
Emitter field map
304 bytes total (80 base + 224 extra). Emitter-specific data:
| Node offset | Extra offset | Field | Type |
|---|---|---|---|
| +0x50 | +0x00 | deadspace | f32 |
| +0x54 | +0x04 | blast_radius | f32 |
| +0x58 | +0x08 | blast_length | f32 |
| +0x5C | +0x0C | num_branches | i32 |
| +0x60 | +0x10 | control_pt_smoothing | i32 |
| +0x64 | +0x14 | x_grid | i32 |
| +0x68 | +0x18 | y_grid | i32 |
| +0x6C | +0x1C | spawn_type | i32 |
| +0x70 | +0x20 | update | char[32] |
| +0x90 | +0x40 | render | char[32] |
| +0xB0 | +0x60 | blend | char[32] |
| +0xD0 | +0x80 | texture | char[32] |
| +0xF0 | +0xA0 | chunk_name | char[16] |
| +0x100 | +0xB0 | two_sided_tex | i32 |
| +0x104 | +0xB4 | loop | i32 |
| +0x108 | +0xB8 | render_order | u16 |
| +0x10A | +0xBA | frame_blending | u8 |
| +0x10B | +0xBB | depth_texture_name | char[16] |
| +0x11B | +0xCB | (reserved) | 21 bytes |
LOD naming convention
When a model has cullWithLOD set, the engine searches for LOD variants by appending suffixes to the model name:
<name>_x– medium LOD<name>_z– far LOD
Loaded via FindModel(name + "_x") and FindModel(name + "_z") as separate Model instances linked to the primary. Not relevant to format parsing, but useful for model validation and lint rules.
Resource type IDs
| Format | Resource type |
|---|---|
| MDL | 2002 (0x7D2) |
| MDX | 3008 (0xBC0) |
These map to the KEY/BIF resource type system. CAuroraInterface::RequestModel at 0x0070d8d0 resolves models through a sorted requestedModelList.
Dynamic type casts
The engine exposes As* functions for type-checked downcasts. Caller counts indicate runtime usage frequency:
| Function | Callers |
|---|---|
AsModel | 34 |
AsMdlNodeTriMesh | 14 |
AsMdlNodeEmitter | 11 |
AsAnimation | 7 |
AsMdlNodeLightsaber | 5 |
AsMdlNodeSkin | 4 |
AsMdlNodeAABB | 3 |
AsMdlNodeDanglyMesh | 3 |
AsMdlNodeLight | 3 |
AsMdlNodeAnimMesh | 2 |
AsMdlNodeCamera | 2 |
AsMdlNodeReference | 2 |
TriMesh (14) and Emitter (11) are the most-queried node types – useful signal for prioritizing implementation completeness.
Binary MDL call graph
For reference when reading Ghidra decompilations:
NewCAurObject (0x00449cc0)
└── FindModel (0x00464110) [by name; checks cache via BinarySearchModel]
└── LoadModel (0x00464200) [on cache miss]
└── IODispatcher::ReadSync (0x004a15d0)
└── Input::Read (0x004a14b0) ← format dispatcher
├── InputBinary::Read (0x004a1260) if first_byte == 0x00
│ └── Reset / ResetLite (pointer rewriting)
│ ├── ResetMdlNode (per-node dispatch)
│ │ ├── ResetMdlNodeParts (base fields)
│ │ ├── ResetTriMesh (mesh subtypes)
│ │ ├── ResetLight (light extras)
│ │ ├── ResetSkin, ResetAnim, ...
│ │ └── ResetAABBTree (recursive tree walk)
│ └── ResetAnimation (per-animation)
└── FuncInterp loop otherwise (ASCII MDL)
└── CreateInstanceTreeR (0x00449200) [builds runtime Part tree from MdlNode tree]
Key Ghidra addresses
For anyone continuing this archaeology, the foundation set of function addresses in swkotor.exe (K1 GOG build):
| Function | Address |
|---|---|
Input::Read | 0x004a14b0 |
InputBinary::Read | 0x004a1260 |
InputBinary::Reset | 0x004a1030 |
InputBinary::ResetMdlNode | 0x004a0900 |
InputBinary::ResetMdlNodeParts | 0x004a0b60 |
InputBinary::ResetTriMeshParts | 0x004a0c00 |
InputBinary::ResetAABBTree | 0x004a0260 |
InputBinary::ResetLight | 0x004a05e0 |
InputBinary::ResetSkin | 0x004a01b0 |
InputBinary::ResetDangly | 0x004a0100 |
InputBinary::ResetAnim | 0x004a0060 |
InputBinary::ResetLightsaber | 0x004a0460 |
InputBinary::ResetAnimation | 0x004a0fb0 |
MdlNodeTriMesh::InternalPostProcess | 0x0043cf00 |
MdlNodeTriMesh::InternalGenVertices | 0x00439df0 |
MdlNodeTriMesh::InternalParseField | 0x004658b0 |
MdlNodeEmitter::InternalParseField | 0x004658b0 |
MdlNodeEmitter::InternalCreateInstance | 0x0049d5c0 |
PartTriMesh::GetMinimumSphere | 0x00443330 |
LightPartTriMesh | 0x0046a9e0 |
NewController::Control | 0x00483330 |
NewController::GetFloatValue | 0x00482bf0 |
Model constructor | 0x0044aa70 |
MaxTree constructor | 0x0044a900 |
ParseNode | 0x004680e0 |
| Node type flag table | 0x00740a18 |