Architecture Guide

This document outlines how the rakata workspace is structured and the design principles we try to stick to.

Core Principles

Vanilla K1 First
- By default, we target the original vanilla behavior of KotOR 1.
- Compatibility for TSL or community tools is strictly opt-in behind feature flags, not the default assumption.
- When deciding how to parse something, the original game engine is our ultimate source of truth. We use local fixtures and original game data to prove our parsers work, rather than just copying how older community tools did things.
Aim for Lossless
- We want to be able to read a file and write it back out to the exact same bytes. We’ve largely achieved this for standard archives and data formats (GFF, ERF, RIM, KEY, TLK, etc.).
- For highly complex formats (like MDL/MDX models), there are some known divergences where achieving a byte-exact roundtrip is essentially impossible due to how the original compilers ordered geometry blocks. We track these exceptions, but the output still safely runs in-game.
- No Lazy Pass-throughs: If a file has undocumented fields, we don’t just read them as an opaque Vec<u8> blob and blindly pass them through. Our goal is to properly reverse-engineer and map every single struct boundary. However, if we identify defined “reserved” fields in the binary layout that we haven’t cracked the meaning of yet, we will map them as properly sized reserved values so we don’t accidentally drop data the engine might rely on. (Note: explicit blank padding bytes aren’t stored in memory at all - we just recalculate those dynamically on write).
- Layer scope: this lossless guarantee applies to the byte-level format layer (rakata-formats). Typed views in rakata-generics (Utc, Uti, Are, …) are explicitly honest projections that model only the fields they enumerate; byte-exact preservation stays with the raw Gff tree. See Typed Views and Raw GFF below for the full rule.
Strict Text Handling
- All text decoding goes through rakata-core::text.
- Localized text (TLK entries, strings) uses language-aware encodings (Windows-1252, Shift-JIS, etc.) to match what the engine expects.
- Binary strings (like node names or texture paths) use TextEncoding::Windows1252 since that’s what the engine actually uses under the hood. No silently stripping weird characters with lossless backups.

(For day-to-day coding rules around iterators, zero-cost abstractions, and memory safety, see the Idiomatic Rust section in the contributing.md guide!)

Note: This layout is a living target! Some of these crates (like rakata-audio and rakata-saveeditor) are currently under active development. As we tackle our near-term roadmap goals – like building out the rakata-lint validation engine – expect these existing crates to flesh out, alongside brand new sibling crates being added to the ecosystem.

The workspace is organized in a clean dependency chain. Crates can only depend on crates listed “above” them:

rakata-core          (no workspace deps)
  rakata-formats     (depends on: core)
    rakata-audio     (depends on: core, formats)
    rakata-generics  (depends on: core, formats)
    rakata-extract   (depends on: core, formats)
    rakata-lint      (depends on: formats, generics)
    rakata-save      (depends on: core, formats)
rakata               (facade: re-exports all library crates)

Library Crates (`crates/`)

rakata-core: The absolute basics (ResRef, IDs) and core utilities like file streams and text encoding.
rakata-formats: Our massive library of parsers and writers (GFF, ERF, BIF, MDL, TPC, etc.). This parses bytes into objects, but doesn’t know anything about how the game actually uses them.
rakata-audio: Audio streaming and decoding for the engine’s various sound formats (WAV, ADPCM, MP3).
rakata-generics: Strongly-typed Rust models for all the different GFF files (like Doors, Items, Characters).
rakata-extract: The logic for hunting down actual game files in the wild. It knows how to look inside ERFs, check the Override folder, and resolve files just like the engine does.
rakata-lint: Our rule engine for scanning modded files and checking them against vanilla schema constraints.
rakata-save: High-level logic for safely reading, editing, and backing up save files.
rakata: A handy facade crate that re-exports everything so you only need to add one dependency.

Tool Crates (`tools/`)

rakata-saveeditor: The actual desktop application for editing save files.
vanilla-inspector: A testing utility for validating our parsers against the actual mass of game files.

Format API Guidelines

Public API Shape

Every format parser in rakata-formats generally provides the same clean interface:

read_<fmt><R: Read>(reader: &mut R) -> Result<T, E>
read_<fmt>_from_bytes(bytes: &[u8]) -> Result<T, E>
write_<fmt><W: Write>(writer: &mut W, data: &T) -> Result<(), E>
write_<fmt>_to_vec(data: &T) -> Result<Vec<u8>, E>

Formats with multiple output modes (like exporting models to ASCII text or JSON) just use variations of these names (read_mdl_ascii()).

Generic Traits: We strongly prefer accepting generic I/O trait bounds (Read, BufRead, Write, Seek) over concrete types. Accept the narrowest trait that covers your API’s needs so callers aren’t forced to jump through hoops.

Error Handling

Robust parsing means strict error boundaries:

Each format module must define its own domain-specific error enum (e.g., GffError, ErfError) using the thiserror crate. Do not use generic stringly-typed errors or Box<dyn Error>.
Low-level read failures (like sudden bounds exhaustion or bad magic numbers) should wrap our shared BinaryLayoutError.
Never unwrap() at an API boundary! Only fail explicitly via Result or use .expect() with a hardcoded rationale if it is impossible to fail.

Memory & Ownership

While we try to avoid deep cloning and heavy allocations behind the scenes, we default to owned data types when crossing public API boundaries. Unless a module is explicitly built and documented as a zero-copy “View” type, you should avoid passing nasty lifetimes into the caller’s lap.

Keeping Concerns Separated

Dumb Parsers: Format modules in rakata-formats are intentionally “dumb”. They solely translate between raw byte streams and Rust structs without any awareness of game architecture, filesystems, or what a “module” is.
Smart Extractors: All the messy environment logic – hunting down loose files, enforcing vanilla precedence rules (e.g., checking the Override folder before extracting from a BIF archive), and assembling composite files – lives safely isolated inside rakata-extract. This separation guarantees our parsers can cleanly process isolated test files just as well as they operate in a massive live-game workflow.

Tracing & Telemetry

We strongly encourage instrumenting format parsers with tracing::instrument spans to help pinpoint exactly where a badly formed file breaks during a parse. However, this telemetry must remain entirely zero-cost for consumers who don’t need it! We achieve this by wrapping public parser entry points in conditional attributes: #[cfg_attr(feature = "tracing", tracing::instrument(...))]. If a user doesn’t explicitly opt-in via their Cargo.toml, the Rust compiler strips the instrumentation entirely.

Serialization (Serde)

Just like tracing, serde support for exporting our parsed files to JSON or YAML must be treated as a zero-cost, opt-in feature. Format structs and types should generously derive Serialize and Deserialize when the serde feature flag is enabled. This allows downstream utilities (like the Save Editor) to effortlessly convert memory layouts into text formats, while ensuring the core parsers stay extremely light for purely binary-focused applications.

Beyond Basic Parsing

While rakata-formats gives us the ability to parse isolated bytes, the game engine is much more complicated. Our higher-level crates exist to bridge that gap between “dumb bytes” and “actual game logic”.

Finding Files (`rakata-extract`)

rakata-extract handles the messy reality of finding files scattered across a massive KOTOR installation. It mirrors the vanilla engine’s lookup hierarchy in three distinct layers:

Primitives: Grabbing a file out of a single archive (like unpacking a standalone ERF or BIF file).
Composition: Treating related archive sets as a single “Module” (like grouping a .mod file with its matching _s.rim and _dlg.erf files so they load transparently together).
Game-wide: Creating a unified GameResources tree that maps out the entire game installation.

Because we want our extraction to perfectly mirror vanilla behavior, lookups are strictly case-insensitive, and loading precedence is explicitly designed to mirror how the original game works (so a file in the Override folder automatically beats a file buried in a BIF archive).

Strongly-Typed Data (`rakata-generics`)

When we parse a .utc Character file, rakata-formats just hands us a raw GFF tree of untyped labels and values. rakata-generics wraps those raw data blobs in strongly-typed Rust structs (Utc, Uti, Are, Git, Dlg, Ifo, and friends). This guarantees that if a developer needs to access a character’s “Strength” stat, they get a guaranteed u8 property rather than blindly guessing string handles inside a raw binary tree.

Typed Views and Raw GFF

These typed structs sit beside the raw Gff tree, not on top of it. They are projections, not replacements. You construct one with Uti::from_gff(&gff) and round-trip back with uti.to_gff(); the original Gff stays accessible the whole time.

The projection layer follows one load-bearing rule: model what’s enumerated; drop what isn’t. from_gff extracts the fields each typed view documents and silently ignores anything else; to_gff writes only those documented fields. There is intentionally no extra_fields: Vec<GffField> accumulator on Utc / Uti / Are / etc. that would round-trip unmodelled fields through the typed layer.

The reason is correctness. Unmodelled fields often depend semantically on neighbouring fields (a savegame’s animation state only makes sense at the exact moment of save; a toolset’s custom annotations describe a specific revision). If the typed view silently preserved them while a caller edited a modelled field, the output would be internally inconsistent. The staleness contract is real, but it belongs explicitly with whoever needs byte-exact preservation, not buried inside a layer whose only job is type-safe access to known fields.

This naturally splits into two audiences served by one storage layer:

Tools (save editor, mod linter, format inspector) reach for the typed views. They want type-safe access to known fields and don’t care about unmodelled bytes.
Engines or byte-fidelity workflows (a future engine shim, a roundtrip auditor, anyone preserving toolset annotations) work directly with the raw Gff from rakata-formats. They own the staleness contract explicitly.

The one place where projection meets enumeration-by-design is rakata_generics::decoded::DecodedProperty. UTI item properties carry a PropertyName that indexes into itempropdef.2da, a table mods can extend with new rows. The enum has an Unknown variant that preserves the raw fields for one entry within an enumerated list, so an unrecognized property kind still surfaces through the decode pass instead of being dropped. It is a per-entry catch-all, not a struct-level accumulator, and the staleness risk is low because property entries are independent records.

When in doubt: if you need byte-exact preservation across a parse-then-write cycle, work with the raw Gff. If you need ergonomic, type-safe access to the fields Rakata has audited, work with the typed view.

Decoded Views: Projection and Snapshot

The typed structs (Uti, Utc, etc.) bring file-native fields into Rust types. A second layer on top, the decoded view, resolves those file-native fields against external context. For UTI, that context is the 2DA tables the engine consults at item-property evaluation time: itempropdef.2da for property-kind dispatch, baseitems.2da for combat / equip metadata, the iprp_* cost tables for magnitude resolution.

A decoded view splits into two stages so cross-scope analysis is a first-class operation:

Uti::project(itempropdef) -> UtiProjection<'_>. File-native typed-variant dispatch. Cheap, scope-free, takes only the minimal context (the property-kind dispatch table) needed to pick variants. The projection is the intermediate from which one or many snapshots are built.
UtiProjection::snapshot(&mut TwoDaCache) -> UtiSnapshot<'_>. Resolves the projection against a full per-scope context. Loads every table the snapshot’s query methods could need and caches the resolved values. All query methods on the snapshot are &self borrow-free reads against that cached snapshot.
Uti::snapshot(&mut TwoDaCache) -> UtiSnapshot<'_>. Single-scope shortcut for project(...).snapshot(...). Most callers want this.

The split exists because tools, the linter, and a future engine shim want to ask “what does this UTI look like under condition X” without re-running the file-native dispatch step for each context. Mod conflict analysis (does this item resolve differently with mod A loaded?), vanilla-vs-modded diffs, and the upcoming save / module VFS scoping (Codeberg #27, #28) all reduce to “build one projection, snapshot under several contexts, compare.” The projection step is shared across snapshots; only the per-scope resolution repeats.

Snapshots do not retain the cache borrow once constructed. To query a snapshot under a different scope, call projection.snapshot(&mut other_cache) again on the same projection. The typed-variant dispatch is not redone.

The cost-table magnitude resolution recipe each UTI snapshot bakes in is documented in the Cost-Table Magnitude Resolution subsection of the UTI engine audit: which iprp_costtable.2da index every typed property kind dispatches through, which column carries the magnitude, and which handlers bypass the dispatch chain entirely.

UTC follows the same shape with format-specific differences: Utc::project() takes no minimal context (UTC has no single dispatch table; typed list dispatch happens at snapshot time against per-list 2DAs), while UtcProjection::snapshot(&mut TwoDaCache) loads racialtypes.2da / appearance.2da / portraits.2da / soundset.2da / classes.2da / spells.2da and caches scalar-id resolutions, typed DecodedClass variants, and typed DecodedSpecialAbility variants. UtcSnapshot exposes the same &self borrow-free query surface (race_label, classes, total_level, is_force_user, is_droid, has_class, special_abilities, equipment, inventory, etc.). AreSnapshot and any future generic that grows a decoded view follow the same two-stage rule.

High-Level Interaction (`rakata-save` & `rakata-lint`)

Finally, crates at the top of the stack use our extraction logic and strongly typed generic structs to actually do things. rakata-lint compares typed structs against vanilla constraints to catch modding errors, while rakata-save gracefully handles unpacking, editing, and re-compressing massive save-game directories without corrupting the player’s campaign!

Rakata Documentation