MARS Technical Architecture
Deep dive into the forensic recovery pipeline
System Overview
The MARS architecture follows a two-phase approach: first building a reference knowledge base (exemplars), then using that knowledge to classify and recover data from damaged or carved databases (candidates).
High-Level Architecture
┌─────────────────────────────────────────────────────────┐
│ EXEMPLAR PHASE │
│ (Known-Good macOS System → Reference Knowledge Base) │
└─────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ Scan Live System / Image │
│ Extract Database Schemas │
└──────────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ Generate Rubrics (JSON) │
│ Create Hash Lookup Table │
└──────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ CANDIDATE PHASE │
│ (Carved Files → Classified & Recovered Data) │
└─────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ File Categorization │
│ (raw_scanner/file_categorizer) │
└──────────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ Database Variant Selection │
│ (O/C/R/D/X variant testing) │
└──────────────────────────────────┘
│
┌──────────┴──────────┐
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Variant X │ │ Valid Variants │
│ (Empty/Failed) │ │ (O/C/R/D) │
└──────────────────┘ └──────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Byte Carving │ │ LF Processor │
│ (Extract raw) │ │ (4 modes) │
└──────────────────┘ └──────────────────┘
│ │
└──────────┬──────────┘
▼
┌──────────────────────────────────┐
│ Final Output Organization │
│ (catalog/, metamatches/, etc.) │
└──────────────────────────────────┘
Data Flow: Exemplar → Candidate
The system operates in two phases:
Scans a known-good macOS system to extract database schemas and create matching rubrics. This builds the "source of truth" for identifying carved databases.
- Input: Live macOS system, disk image, or directory
- Output: Database schemas, JSON rubrics, hash lookup table
-
Key Artifact:
exemplar_hash_lookup.jsonenables O(1) matching
Processes carved/recovered files and matches them against exemplar schemas. Recovers data from damaged databases using multiple strategies.
- Input: Carved files from PhotoRec, Scalpel, etc.
- Output: Classified databases, reconstructed tables, carved data
Key Terminology
| Term | Definition |
|---|---|
| Exemplar | Reference database from known-good system; provides schema baseline |
| Candidate | Carved/recovered database being analyzed against exemplars |
| Rubric | JSON schema definition with table structures, column roles, and statistics |
| Variant | Recovery method: O=Original, C=Clone, R=Recover, D=Dissect, X=Failed |
| Lost & Found | SQLite's recovery mechanism for orphaned database pages |
| Catalog | Exact schema matches - recovered databases that perfectly match exemplar schemas |
| Metamatch | Non-catalog matches with intact data; identical schemas are combined and deduplicated |
| Found Data | ORPHAN: Unmatched L&F fragments preserved for manual review |
Exemplar Scanning
Exemplar scanning builds the reference knowledge base that powers the entire recovery pipeline. By analyzing known-good databases from a live macOS system, MARS creates detailed schema fingerprints that enable accurate classification and reconstruction of carved data.
What Gets Extracted
The exemplar scanner performs comprehensive analysis of each database:
- Schema Structure: Full table and column definitions
- Column Roles: Semantic classification (timestamp, UUID, URL, email, etc.)
- String Statistics: Average string lengths for disambiguation
- Example Values: Sample data for pattern matching
- Row Counts: Expected data volumes per table
Rubric Generation Process
File Discovery
Scan source using dfVFS, matching patterns from
artifact_recovery_catalog.yaml.
Schema Extraction
For each SQLite database, extract complete schema using
sqlite_master queries. Identify primary keys, foreign keys, and
indexes.
Semantic Analysis
Analyze column data to assign semantic roles (timestamp, UUID, URL, email, path). Extract example values and calculate string statistics.
Rubric Generation
Combine schema + semantics into JSON rubric format. Save to
databases/schemas/{db_name}.rubric.json
Hash Lookup Creation
Compute MD5 hash of schema signature (tables + columns). Store in
exemplar_hash_lookup.json for O(1) matching.
Hash-Based Fast Matching
The hash lookup table enables rapid O(1) exemplar identification without loading full rubrics or iterating through databases:
MD5(table1|col1,col2,col3|table2|col4,col5)
Each table is represented by name + sorted column names, separated by pipes. This creates a unique fingerprint for each schema variant.
{
"a1b2c3d4": "Safari_History",
"e5f6g7h8": "Chrome_Cookies",
"i9j0k1l2": "Messages_chat"
}
Schema hashes include both table names AND column names, making false positives virtually impossible. Only databases with identical structures will match.
Rubric System
Rubrics are the intelligence layer that enables MARS to match fragmented data back to its original table structure.
Rubric Anatomy
A complete rubric contains three layers of information:
- Table names and column definitions
- Data types (INTEGER, TEXT, REAL, BLOB)
- Primary key and foreign key relationships
- Index definitions
- Column roles: timestamp, UUID, URL, email, path, domain
- Pattern detection: timestamp formats, ID patterns
- Semantic anchors: weighted confidence scores
- Example values for pattern matching
- Average string lengths per column (avg_length)
- Row counts and data volumes
- Value distributions and frequency patterns
- Most common values for disambiguation
Column Roles & Semantic Anchors
Semantic anchors are weighted scores that boost confidence when specific patterns are detected. They help create a strong "fingerprint" and disambiguate between similar schemas:
| Role | Weight | Detection Method |
|---|---|---|
timestamp |
0.9 | Unix epoch, Cocoa timestamp, ISO 8601 formats |
uuid |
1.0 | 8-4-4-4-12 hexadecimal pattern |
url |
0.8 | HTTP/HTTPS scheme detection |
email |
0.7 | name@domain.tld pattern |
domain |
0.6 | Hostname pattern without scheme |
path |
0.5 | Filesystem path patterns |
String Statistics for Disambiguation
Average string lengths (avg_length) help distinguish between similar tables
with different purposes:
| Column | Type | Avg Length |
|---|---|---|
| url | TEXT | 87 |
| title | TEXT | 32 |
Both are TEXT columns, but length statistics reveal their different purposes and improve matching accuracy.
Rubric Matching Algorithm
When matching lost_and_found fragments to rubrics, MARS uses a multi-factor
confidence scoring system:
- Column Matching: Compare column names and types between fragment and rubric
- Chunk Analysis: Identify longest matching column sequences (chunks)
- Semantic Boost: Add weights for detected semantic anchors
- String Validation: Compare avg_length statistics
- Threshold Filtering: Reject matches below minimum confidence (default 0.7)
Candidate Scan Pipeline
The candidate pipeline processes carved files through multiple specialized stages, each handling a specific aspect of recovery and classification.
Pipeline Stages
1. raw_scanner/ File Discovery & Categorization
file_categorizer ├─ Identify file types via fingerprinting
├─ Skip media/executables
└─ Organize by artifact type
│
▼
2. fingerprinter/ Type-Specific Identification
├─ SQLite: Magic bytes + schema extraction
├─ Text logs: Pattern-based classification
└─ Archives: Compression format detection
│
▼
3. matcher/ Schema Matching & Classification
rubric_matcher ├─ Hash-based O(1) exemplar lookup
├─ Rubric confidence scoring
└─ Group unmatched by schema similarity
│
▼
4. db_variant_selector/ Recovery Variant Testing
├─ O (Original): Test as-is
├─ C (Clone): Clean copy via VACUUM
├─ R (Recover): SQLite .recover output
├─ D (Dissect): sqlite_dissect rebuild
└─ X (Failed): Send to byte carving
│
┌─────────────┴─────────────┐
│ │
▼ ▼
5a. carver/ 5b. lf_processor/
Byte-Level Carving Fragment Reconstruction
├─ Extract timestamps ├─ MERGE: Metamatch groups
├─ Parse protobuf ├─ CATALOG: Exact matches
├─ Analyze URLs ├─ NEAREST: Best-fit exemplar
└─ Output JSONL/CSV └─ ORPHAN: Unmatched
│
▼
6. output/ Final Organization
structure ├─ catalog/ (exact matches)
├─ metamatches/ (identical schemas, no exemplar)
├─ found_data/ (L&F orphans)
└─ carved/ (byte-carved)
Stage 1: File Categorization
The file categorizer scans input directories and identifies file types using magic bytes and content analysis:
-
SQLite databases: Identified by
SQLite format 3\x00magic bytes - Text logs: WiFi logs, system logs, install logs via pattern matching
- Archives: gzip, bzip2 (decompressed and re-scanned)
- Ignored: Images, videos, executables (defined in config)
Stage 2: Fingerprinting
The fingerprinter performs deep type analysis beyond simple magic bytes:
- Extract full schema via sqlite_master queries
- Compute schema hash (MD5 of table+column structure)
- Count rows per table
- Detect lost_and_found tables from recovery
- Sample first 1000-10000 lines
- Detect timestamp patterns (multiple formats)
- Match against known log prefixes
- Calculate pattern frequency for confidence scoring
Stage 3: Schema Matching
The matcher uses the exemplar hash lookup table for instant classification:
- Compute schema hash of candidate database
- O(1) lookup in exemplar_hash_lookup.json
- On match: Load full rubric for detailed validation
- On miss: Group with other unmatched databases by schema hash
Stage 4: Database Variant Selection
See detailed section below on variant selection system.
Stage 5: Lost & Found Reconstruction
See detailed section below on LF processing (MERGE/CATALOG/NEAREST/ORPHAN).
Stage 6: Output Organization
Final databases are organized by match quality and processing path:
| Directory | Content | Quality |
|---|---|---|
catalog/ |
CATALOG exact matches + promoted NEAREST (successful L&F recovery) | Highest confidence |
metamatches/ |
MERGE combined identical schemas (no exemplar match) | High confidence |
found_data/ |
ORPHAN: Unmatched L&F fragments | Low confidence - requires review |
empty/ |
Catalog match with no usable data | May contain rejected/ data for manual review |
carved/ |
Byte-carved residue (variant X) | Requires manual review |
Database Variant Selection System
When a carved SQLite database is encountered, it may be corrupted, incomplete, or structurally damaged. The variant selector attempts multiple recovery strategies and chooses the best result.
The O/C/R/D/X Variant Approach
Each candidate database is processed through up to five different recovery methods:
| Variant | Method | When to Use |
|---|---|---|
| O | Original | Test raw carved file as-is. Often works for cleanly-carved databases. |
| C | Clone | Copy database to clean state using VACUUM INTO. Removes freelist corruption. |
| R | Recover |
Run sqlite3 .recover command. Creates lost_and_found tables for
orphaned pages.
|
| D | Dissect | Use sqlite_dissect to rebuild from raw pages. Only when exemplar match found. |
| X | Failed | All variants failed validation. Send to byte carving pipeline. |
Variant Selection Logic
Discovery & Introspection
Attempt O, C, R (and D if exemplar matched). Collect metadata: table sets, row counts, integrity check results.
Hash-Based Matching
Compute schema hash for each variant. O(1) lookup against exemplar_hash_lookup.json. Skip non-matching variants.
Profiling & Weighting
For matched variants, sample up to PROFILE_TABLE_SAMPLE_LIMIT tables.
Generate per-table row counts and completeness scores.
Best Variant Selection
Combine profile score with base heuristic (integrity + row count). Choose highest-scoring variant. Mark others for cleanup.
Variant Scoring Heuristics
Each variant is scored based on multiple factors:
- Integrity: PRAGMA integrity_check result (pass = higher score)
- Row Count: Total rows across all tables (more = better)
- Table Completeness: Percentage of expected tables present
- Match Quality: Exact hash match vs. fuzzy table match
- lost_and_found Presence: R variant bonus for LF reconstruction potential
Each database gets a decision record documenting the chosen variant and rationale:
{
"case_name": "f12345678",
"chosen_variant": "R",
"match_type": "hash",
"exemplar_name": "Safari_History",
"profile_score": 0.95,
"variants": {
"O": {"valid": true, "rows": 150, "integrity": "ok"},
"C": {"valid": true, "rows": 150, "integrity": "ok"},
"R": {"valid": true, "rows": 168, "has_lf": true},
"D": {"valid": false}
}
}
Residue Processing
After variant selection, the residue processor performs cleanup and extraction:
-
Lost-and-Found Extraction: Extract
lost_and_found_*tables from chosen variant into separate databases - Storage Cleanup: Delete non-chosen variants to save disk space
Lost & Found (LF) Reconstruction
When SQLite's .recover command succeeds, it creates
lost_and_found tables containing orphaned database pages. The LF processor
matches these fragments against exemplar rubrics and reconstructs coherent databases.
The Four Processing Modes
LF reconstruction follows a prioritized processing order based on match quality:
| Mode | Description | Match Type | Output |
|---|---|---|---|
| MERGE | Metamatch groups | Identical schemas, no exemplar match | metamatches/ |
| CATALOG | Exact matches | Exact schema match to exemplar | catalog/ |
| NEAREST | Best-fit exemplar |
Databases matched to nearest (not exact) exemplar. Rebuilds using exemplar schema as
template. Successful recoveries moved to catalog/.
|
catalog/ or empty/ |
| ORPHAN | Unmatched tables | No match found. Adds match hints to remnant LF tables if possible. | found_data/ |
Processing Order & Rationale
Phase 1: Split Databases
Extract lost_and_found tables from all recovered databases into separate "split" databases for matching.
Phase 2: MERGE (First)
Group databases with identical schemas (tables + columns) that don't match any exemplar. Combine into superrubrics for later matching.
Phase 3: CATALOG (Second)
Process exact matches using canonical exemplar rubrics. Highest-quality reconstruction with known-good schemas.
Phase 4: NEAREST (Third)
For databases that don't fit MERGE or CATALOG, match to the nearest exemplar based
on schema similarity. Rebuild using the exemplar schema as template. Results
initially go to found_data/, then Phase 7 reclassifies.
Phase 5: ORPHAN
Collect all unmatched fragments from MERGE/CATALOG/NEAREST. Preserve for manual forensic review. Must run after matching phases to capture remnants.
Phase 7: Reclassification (Final)
Reclassify NEAREST results based on recovery success: successful L&F recovery
(total_lf_rows > 0) promotes to catalog/; no recovery
moves to empty/. Also cleans up empty CATALOG entries.
MERGE: Metamatch Groups (Identical Schemas)
When multiple databases share identical schemas (same tables AND columns) but don't match any known exemplar, they're grouped by schema hash and processed together:
- Grouping: Databases classified by schema hash (tables + columns)
- Combining: Merge all group members into single database
- Superrubric Generation: Create schema rubric from merged data
- Fragment Matching: Match lost_and_found tables against superrubric
- Reconstruction: Rebuild combined database with intact + recovered data
Three carved databases with identical schema (no exemplar match):
- f12345678: Unknown app database from disk offset 0x00BC
- f23456789: Same schema found at disk offset 0x1A00
- f34567890: Same schema found at disk offset 0x2F00
Result: Combined into single unknown_app_a1b2c3d4
database with data from all three sources. The filename is based off the first table
name.
CATALOG: Exact Matches
Exact schema matches use the canonical exemplar rubric for reconstruction:
- Hash Match: Schema hash exactly matches exemplar
- Rubric Loading: Load canonical exemplar rubric
- Fragment Matching: Match lost_and_found columns to rubric tables
- Reconstruction: Rebuild using exemplar schema as template
-
Remnant Handling: Unmatched fragments saved to
lost_and_found/
NEAREST: Best-Fit Exemplar Matching
When a database doesn't exactly match any exemplar but is close enough for useful reconstruction, NEAREST matches it to the nearest exemplar schema based on similarity:
- Schema Comparison: Find the nearest matching exemplar based on table and column similarity
- Fragment Matching: Match lost_and_found fragments against the nearest exemplar rubric
- Schema Rebuild: Reconstruct database using the nearest exemplar schema as template (structurally compatible with CATALOG results)
-
Initial Output: Results go to
found_data/initially -
Phase 7 Reclassification: Successful recovery (L&F rows > 0) promotes
to
catalog/; no recovery moves toempty/
ORPHAN: Unmatched Fragments
Fragments that don't match any schema are preserved for manual review:
- Collection: Gather all remnants from MERGE/CATALOG/NEAREST processing
- Preservation: Create standalone databases with original fragment structure
- Naming: Include match hints when partial matches existed
- Traceability: Filenames preserved for forensic correlation
- Safari_History: Match hint (what we think it is)
- f12345678: Original file offset (where found)
- orphans: Contains unmatched lost_and_found fragments
Data Source Tracking
All reconstructed tables include a data_source column for provenance:
| Value | Meaning |
|---|---|
carved_{db_name} |
Intact data from original database structure |
found_{db_name} |
Reconstructed data from lost_and_found fragments |
Manifest Files
Each reconstructed database includes a
*_manifest.json documenting:
- Source databases that contributed data
- Intact rows vs. LF-recovered rows per source
- Remnant tables (unmatched fragments)
- Duplicates removed during deduplication
- Table-level statistics (row counts per table)
Byte-Carving Pipeline
When all variant recovery methods fail (variant X), MARS falls back to byte-level carving. This extracts raw data directly from database pages without relying on SQLite's structural integrity.
Carving Strategy
The byte carver processes databases page-by-page, extracting forensic artifacts:
Identifies multiple timestamp formats:
- Unix epoch: Seconds, milliseconds, nanoseconds since 1970
- Cocoa/Core Data: Seconds since 2001-01-01 (macOS/iOS)
- Chrome: Microseconds since 1601-01-01
- WebKit: Seconds since 2001-01-01
- Windows FILETIME: 100ns ticks since 1601
- ISO 8601: Text timestamps (2025-01-18T10:30:00Z)
Uses regex patterns to find URLs, then analyzes with Unfurl:
- Parse query parameters and fragments
- Extract embedded timestamps from URL structure
- Detect UUIDs and session IDs in paths
- Identify platform (Facebook, YouTube, Twitter, etc.)
Uses blackboxprotobuf for schema-agnostic decoding:
- Detect protobuf magic patterns in BLOB fields
- Decode without schema using type inference
- Extract nested timestamps and strings
- Generate JSON representation with inferred typedef
- Extract printable ASCII strings (minimum length filtering)
- Preserve context (surrounding bytes for analysis)
- Deduplicate close timestamps to reduce noise
Carving Process Flow
Database File (Variant X)
│
▼
┌───────────────────┐
│ Read Page-by-Page │ (4096-byte SQLite pages)
└───────────────────┘
│
├─── Scan for numeric values ───→ Timestamp Classifier
│ ├─ Unix epoch?
│ ├─ Cocoa timestamp?
│ ├─ Chrome time?
│ └─ Valid range filter
│
├─── Regex match URLs ──────────→ Unfurl Analyzer
│ ├─ Parse structure
│ ├─ Extract query params
│ └─ Detect embedded timestamps
│
├─── Scan for BLOB data ────────→ Protobuf Decoder
│ ├─ blackboxprotobuf decode
│ ├─ Extract nested data
│ └─ Convert to JSON
│
└─── Extract text strings ──────→ Text Scanner
├─ Printable ASCII filter
└─ Context preservation
│
▼
┌──────────────────────────────────┐
│ Output Generation │
│ ├─ timestamps.csv (optional) │
│ ├─ carved.jsonl (detailed) │
│ └─ carved.db (structured SQLite) │
└──────────────────────────────────┘
Integration with Unfurl
Unfurl provides context-aware URL analysis that helps distinguish real timestamps from ID values:
https://facebook.com/photo.php?fbid=123456789&id=987654321
Unfurl extracts:
- Platform: facebook
- Photo ID: 123456789 (NOT a timestamp, despite numeric format)
- User ID: 987654321 (confirmed ID, not time)
This context prevents false positive timestamp classifications.
Integration with blackboxprotobuf
Schema-agnostic protobuf decoding recovers structured data without requiring .proto definitions:
Input: Binary BLOB (unknown structure)
blackboxprotobuf output:
{
"message": {
"1": "user@example.com",
"2": 1705584600,
"3": {
"1": "https://example.com/path",
"2": 42
}
},
"typedef": {
"1": {"type": "string"},
"2": {"type": "int"},
"3": {"type": "message", ...}
}
}
Field "2" is identified as an integer and classified as a timestamp candidate by the timestamp detector.
Carving Output Formats
| Format | Use Case | Content |
|---|---|---|
timestamps.csv |
Timeline analysis | All detected timestamps with format, confidence, and offset |
carved.jsonl |
Detailed review | Page-by-page extraction with URLs, protobuf, text, and context |
carved.db |
Structured analysis | SQLite database with tables for timestamps, URLs, and extracted data |
dfVFS Integration
MARS uses Digital Forensics Virtual File System (dfVFS) for universal disk image access. This provides consistent file access across all forensic image formats and archive types.
Supported Formats
| Category | Formats |
|---|---|
| Disk Images | E01, Ex01, DD, DMG, VMDK, VHD |
| Volumes | GPT, APM, MBR, APFS containers |
| Filesystems | APFS, HFS+, NTFS, ext4, FAT |
| Archives | TAR, ZIP, GZIP, BZIP2 |
Glob Pattern Matching
MARS extends dfVFS with full globstar (**) support for flexible pattern
matching:
| Pattern | Matches |
|---|---|
/Users/*/Library/Safari/History.db |
Safari history for any user |
**/Library/Caches |
All cache directories (any depth) |
/private/var/**/com.apple.*.db |
Apple databases in /private/var tree |
Globstar Implementation
The ** wildcard matches zero or more directory levels:
**at start: Matches from root (any depth prefix)/**/in middle: Matches zero or more intermediate directories/**at end: Matches everything below current level*alone: Matches exactly one directory level
/Library/Caches
Matches:
- /Library/Caches (zero segments before)
- /Users/admin/Library/Caches (two segments before)
- /System/Volumes/Data/Users/admin/Library/Caches (many segments)
Does NOT match:
- /Users/admin/Library/Caches/Chrome (extends past pattern)
Volume System Integration
dfVFS provides automatic volume enumeration and metadata extraction:
- GPT Partitions: Partition name, GUID, size, type
- APFS Containers: Volume names, encryption status, roles
- Volume Labels: Filesystem labels from volume attributes
EWF/E01 Mount Utilities
MARS includes utilities for mounting forensic images for interactive exploration:
- macOS: Requires Fuse-T for userspace filesystem mounting
- Windows: Arsenal Image Mounter or similar tools
- Linux: libewf + FUSE for native EWF support
Directory Filtering & Exclusion
To prevent hangs on directories with millions of cache files, MARS includes smart exclusions:
-
*/*/Library/Caches/Google/Chrome/*/Code Cache */*/Library/Caches/com.apple.Safari/fsCachedData*/*/Library/Caches/*/com.apple.metal*/*/Library/Caches/*/GPUCache
These directories contain hundreds of thousands of small cache files that slow down scans without forensic value.
System/Volumes/Data/ pattern variants for
paths starting with Users/, Library/, private/,
or var/. This handles macOS Big Sur+ volume layout changes.
Configuration System
MARS uses a centralized configuration system defined in
config/schema.py. All settings are organized into logical sections with
dataclasses.
Configuration Sections
Controls database matching and confidence thresholds:
min_confidence: Minimum match confidence (default 0.7)min_rows: Minimum rows required for valid match (default 10)min_columns: Minimum columns for substantial match (default 3)semantic_anchor_threshold: Minimum anchor score (default 2.0)
Weights for pattern detection in matching:
uuid: 1.0 (UUID pattern weight)timestamp_text: 0.9 (timestamp detection)url: 0.8 (URL pattern)email: 0.7 (email pattern)uuid_in_pk: 2.0 (UUID in primary key - strong signal)
Tables and prefixes to ignore during schema comparison:
-
GLOBAL_IGNORABLE_TABLES: sqlite_sequence, sqlite_stat*, meta, z_metadata, etc. ignorable_prefixes: sqlite_, sqlean_ (system tables)ignorable_suffixes: _content, _segments, _segdir (FTS tables)salvage_tables: lost_and_found, carved, recovered_rows
Database variant selection behavior:
-
dissect_all: Attempt sqlite_dissect on all variants (default false)
Byte-carving settings:
ts_start: Filter timestamps after this date (default 2015-01-01)ts_end: Filter timestamps before this date (default 2030-01-01)filter_mode: 'permissive', 'balanced', 'strict', or 'all'decode_protobuf: Attempt protobuf decoding (default true)csv_export: Generate CSV output (default false)
Exemplar scanning parameters:
epoch_min: Minimum valid timestamp date (default 2000-01-01)epoch_max: Maximum valid timestamp date (default 2038-01-19)-
min_role_sample_size: Minimum rows for semantic role assignment (default 5) enabled_catalog_groups: Database groups to include (empty = all)excluded_file_types: File types to skip (e.g., 'cache', 'log')
Ignorable Tables
GLOBAL_IGNORABLE_TABLES defines tables that are always filtered during
schema comparison:
| Category | Tables |
|---|---|
| SQLite System | sqlite_sequence, sqlite_stat1/2/3/4, sqlite_master, sqlite_temp_master |
| Extensions | sqlean_define, rtree_*, fts5_*, etc. |
| CoreData | meta, dbinfo, z_primarykey, z_metadata, z_modelcache |
User-Configurable Settings
Settings marked with user_configurable=True are exposed in the UI and saved
to .marsproj files:
- Basic: Debug mode, progress bars
- Exemplar: Date ranges, catalog groups, excluded file types
- Carver: Timestamp filtering, protobuf decoding, CSV export
- Advanced: Variant selection, dissect options