Architecture Guide
This document outlines how the rakata workspace is structured and the design principles we try to stick to.
Core Principles
-
Vanilla K1 First
- By default, we target the original vanilla behavior of KotOR 1.
- Compatibility for TSL or community tools is strictly opt-in behind feature flags, not the default assumption.
- When deciding how to parse something, the original game engine is our ultimate source of truth. We use local fixtures and original game data to prove our parsers work, rather than just copying how older community tools did things.
-
Aim for Lossless
- We want to be able to read a file and write it back out to the exact same bytes. We’ve largely achieved this for standard archives and data formats (GFF, ERF, RIM, KEY, TLK, etc.).
- For highly complex formats (like MDL/MDX models), there are some known divergences where achieving a byte-exact roundtrip is essentially impossible due to how the original compilers ordered geometry blocks. We track these exceptions, but the output still safely runs in-game.
- No Lazy Pass-throughs: If a file has undocumented fields, we don’t just read them as an opaque
Vec<u8>blob and blindly pass them through. Our goal is to properly reverse-engineer and map every single struct boundary. However, if we identify defined “reserved” fields in the binary layout that we haven’t cracked the meaning of yet, we will map them as properly sized reserved values so we don’t accidentally drop data the engine might rely on. (Note: explicit blank padding bytes aren’t stored in memory at all - we just recalculate those dynamically on write).
-
Strict Text Handling
- All text decoding goes through
rakata-core::text. - Localized text (TLK entries, strings) uses language-aware encodings (Windows-1252, Shift-JIS, etc.) to match what the engine expects.
- Binary strings (like node names or texture paths) use
TextEncoding::Windows1252since that’s what the engine actually uses under the hood. No silently stripping weird characters with lossless backups.
- All text decoding goes through
(For day-to-day coding rules around iterators, zero-cost abstractions, and memory safety, see the Idiomatic Rust section in the contributing.md guide!)
Workspace Boundaries
Note: This layout is a living target! Some of these crates (like rakata-audio and rakata-saveeditor) are currently under active development. As we tackle our near-term roadmap goals – like building out the rakata-lint validation engine – expect these existing crates to flesh out, alongside brand new sibling crates being added to the ecosystem.
The workspace is organized in a clean dependency chain. Crates can only depend on crates listed “above” them:
rakata-core (no workspace deps)
rakata-formats (depends on: core)
rakata-audio (depends on: core, formats)
rakata-generics (depends on: core, formats)
rakata-extract (depends on: core, formats)
rakata-lint (depends on: formats, generics)
rakata-save (depends on: core, formats)
rakata (facade: re-exports all library crates)
Library Crates (crates/)
rakata-core: The absolute basics (ResRef, IDs) and core utilities like file streams and text encoding.rakata-formats: Our massive library of parsers and writers (GFF, ERF, BIF, MDL, TPC, etc.). This parses bytes into objects, but doesn’t know anything about how the game actually uses them.rakata-audio: Audio streaming and decoding for the engine’s various sound formats (WAV, ADPCM, MP3).rakata-generics: Strongly-typed Rust models for all the different GFF files (like Doors, Items, Characters).rakata-extract: The logic for hunting down actual game files in the wild. It knows how to look inside ERFs, check the Override folder, and resolve files just like the engine does.rakata-lint: Our rule engine for scanning modded files and checking them against vanilla schema constraints.rakata-save: High-level logic for safely reading, editing, and backing up save files.rakata: A handy facade crate that re-exports everything so you only need to add one dependency.
Tool Crates (tools/)
rakata-saveeditor: The actual desktop application for editing save files.vanilla-inspector: A testing utility for validating our parsers against the actual mass of game files.
Format API Guidelines
Public API Shape
Every format parser in rakata-formats generally provides the same clean interface:
read_<fmt><R: Read>(reader: &mut R) -> Result<T, E>read_<fmt>_from_bytes(bytes: &[u8]) -> Result<T, E>write_<fmt><W: Write>(writer: &mut W, data: &T) -> Result<(), E>write_<fmt>_to_vec(data: &T) -> Result<Vec<u8>, E>
Formats with multiple output modes (like exporting models to ASCII text or JSON) just use variations of these names (read_mdl_ascii()).
- Generic Traits: We strongly prefer accepting generic I/O trait bounds (
Read,BufRead,Write,Seek) over concrete types. Accept the narrowest trait that covers your API’s needs so callers aren’t forced to jump through hoops.
Error Handling
Robust parsing means strict error boundaries:
- Each format module must define its own domain-specific error enum (e.g.,
GffError,ErfError) using thethiserrorcrate. Do not use generic stringly-typed errors orBox<dyn Error>. - Low-level read failures (like sudden bounds exhaustion or bad magic numbers) should wrap our shared
BinaryLayoutError. - Never
unwrap()at an API boundary! Only fail explicitly viaResultor use.expect()with a hardcoded rationale if it is impossible to fail.
Memory & Ownership
While we try to avoid deep cloning and heavy allocations behind the scenes, we default to owned data types when crossing public API boundaries. Unless a module is explicitly built and documented as a zero-copy “View” type, you should avoid passing nasty lifetimes into the caller’s lap.
Keeping Concerns Separated
- Dumb Parsers: Format modules in
rakata-formatsare intentionally “dumb”. They solely translate between raw byte streams and Rust structs without any awareness of game architecture, filesystems, or what a “module” is. - Smart Extractors: All the messy environment logic – hunting down loose files, enforcing vanilla precedence rules (e.g., checking the Override folder before extracting from a BIF archive), and assembling composite files – lives safely isolated inside
rakata-extract. This separation guarantees our parsers can cleanly process isolated test files just as well as they operate in a massive live-game workflow.
Tracing & Telemetry
We strongly encourage instrumenting format parsers with tracing::instrument spans to help pinpoint exactly where a badly formed file breaks during a parse. However, this telemetry must remain entirely zero-cost for consumers who don’t need it! We achieve this by wrapping public parser entry points in conditional attributes: #[cfg_attr(feature = "tracing", tracing::instrument(...))]. If a user doesn’t explicitly opt-in via their Cargo.toml, the Rust compiler strips the instrumentation entirely.
Serialization (Serde)
Just like tracing, serde support for exporting our parsed files to JSON or YAML must be treated as a zero-cost, opt-in feature. Format structs and types should generously derive Serialize and Deserialize when the serde feature flag is enabled. This allows downstream utilities (like the Save Editor) to effortlessly convert memory layouts into text formats, while ensuring the core parsers stay extremely light for purely binary-focused applications.
Beyond Basic Parsing
While rakata-formats gives us the ability to parse isolated bytes, the game engine is much more complicated. Our higher-level crates exist to bridge that gap between “dumb bytes” and “actual game logic”.
Finding Files (rakata-extract)
rakata-extract handles the messy reality of finding files scattered across a massive KOTOR installation. It mirrors the vanilla engine’s lookup hierarchy in three distinct layers:
- Primitives: Grabbing a file out of a single archive (like unpacking a standalone ERF or BIF file).
- Composition: Treating related archive sets as a single “Module” (like grouping a
.modfile with its matching_s.rimand_dlg.erffiles so they load transparently together). - Game-wide: Creating a unified
GameResourcestree that maps out the entire game installation.
Because we want our extraction to perfectly mirror vanilla behavior, lookups are strictly case-insensitive, and loading precedence is explicitly designed to mirror how the original game works (so a file in the Override folder automatically beats a file buried in a BIF archive).
Strongly-Typed Data (rakata-generics)
When we parse a .utc Character file, rakata-formats just hands us a raw GFF tree of untyped labels and values. rakata-generics wraps those raw data blobs in strongly-typed Rust structs (like Character, Item, Door). This guarantees that if a developer needs to access a character’s “Strength” stat, they get a guaranteed u8 property rather than blindly guessing string handles inside a raw binary tree.
High-Level Interaction (rakata-save & rakata-lint)
Finally, crates at the top of the stack use our extraction logic and strongly typed generic structs to actually do things. rakata-lint compares typed structs against vanilla constraints to catch modding errors, while rakata-save gracefully handles unpacking, editing, and re-compressing massive save-game directories without corrupting the player’s campaign!