Lost & Found (LF) Database Reconstruction
This module handles the reconstruction of SQLite databases from fragmented data
recovered from disk images. When SQLite databases are carved from
raw disk images, they often contain lost_and_found tables with orphaned
data fragments. This processor matches these fragments against known database
schemas (exemplars) and reconstructs coherent databases.
Processing Modes
Four modes handle recovered fragments based on their match quality and processing requirements.
The Four Modes
MERGE: Metamatch Groups
Multiple databases with identical schemas that don't match any exemplar
When multiple recovered databases share identical table and column structures but don't match any known exemplar, they're grouped by schema hash and processed together. The schema hash is computed from both table names AND column names, so only truly identical schemas are grouped.
Processing:
- Group databases by schema hash (tables + columns)
- Combine all databases in the group
- Generate a "superrubric" from the merged database
- Match lost_and_found fragments against the superrubric
- Reconstruct combined database with both intact and recovered data
Output: databases/metamatches/{group_label}/{group_label}.sqlite
Example: Three non-catalog databases, all with identical schema structure, combined into a single output database.
CATALOG: Exact Matches
Databases with exact schema matches to known exemplars
When a recovered database exactly matches a known exemplar schema from the catalog, we can confidently reconstruct it using that exemplar's rubric.
Processing:
- Match lost_and_found fragments against the exact exemplar rubric
- Reconstruct database using the canonical exemplar schema
- Combine intact data + matched fragments from all matching databases
- Create remnant databases for unmatched fragments
Output: databases/catalog/{exemplar_name}/{exemplar_name}.sqlite
Example: A Chrome history database that exactly matches the known Chrome exemplar schema.
NEAREST: Best-Fit Exemplar
Databases matched to nearest (but not exact) exemplar
When a database doesn't exactly match any exemplar but is close enough for useful reconstruction, we match it to the nearest exemplar schema.
Processing:
- Find nearest matching exemplar based on schema similarity
- Match lost_and_found fragments against nearest exemplar rubric
- Reconstruct database using nearest exemplar schema as template
- Initially output to
found_data/, then reclassified in Phase 7
Final Output: If L&F rows recovered → databases/catalog/{exemplar_name}/
(merged with existing CATALOG entry or created new). If no recovery → databases/empty/.
Example: A database with L&F fragments that resemble Firefox Places.
After reconstruction, if fragments were recovered, the output is promoted to
catalog/Firefox Places/.
ORPHAN: Unmatched Tables
Lost & found fragments that don't match any schema
When lost_and_found fragments can't be matched to any exemplar (catalog, metamatch, or nearest), they're preserved as "orphans" for manual review.
Processing:
- Collect all unmatched lost_and_found tables across MERGE/CATALOG/NEAREST
- Create standalone databases preserving the original fragment structure
- Name using match hints when available (e.g., database had partial matches)
Output: databases/found_data/{match_label}_{db_name}_orphans/{match_label}_{db_name}_orphans.sqlite
Example: Fragments from an unknown application database that don't match any known schema.
Processing Order & Rationale
The orchestrator processes use cases in this specific order:
Phase 1: Prepare Split Databases
Extract lost_and_found tables from all recovered databases into separate "split" databases for matching.
Phase 2: Group Databases
Classify databases into catalog matches, metamatch groups, or individual databases based on their schema matching results.
Phase 3: MERGE
Metamatch processing combines multiple databases and generates superrubrics. Metamatches:
- Creates new composite schemas that might help match fragments
- Reduces the number of databases to process in later phases
Phase 4: CATALOG
Catalog matches are high-confidence exact matchesthat use canonical schemas for highest quality reconstruction.
Phase 5: NEAREST
For databases that don't fit MERGE or CATALOG, NEAREST attempts to match their L&F fragments against the most similar exemplar schema. Processing them after exact matches:
- Ensures we've exhausted higher-confidence matching strategies first
- Provides match hints for fragments that resemble known schemas
- Rebuilds using the nearest exemplar schema as template
- Results go to
found_data/initially (may be promoted tocatalog/in Phase 7)
Phase 6: ORPHAN
Orphan processing collects all unmatched fragments from MERGE/CATALOG/NEAREST. Must run last because:
- Needs to know which fragments were successfully matched in previous phases
- Only processes remnants that couldn't be matched anywhere else
- Preserves everything for manual forensic review
Phase 7: Reclassification
NEAREST results initially go to found_data/ because they're based on
schema similarity rather than exact matches. However, NEAREST rebuilds
databases using the exemplar schema as template, making them structurally
compatible with CATALOG results.
Phase 7 reclassifies NEAREST results based on recovery success:
-
Successful recovery (
total_lf_rows > 0): Promoted tocatalog/. Since NEAREST outputs use the exemplar schema, they can be merged with existing CATALOG entries for the same exemplar. -
No recovery (
total_lf_rows == 0): Moved toempty/—matched a schema but had no recoverable L&F fragments. -
Orphans (no manifest): Remain in
found_data/for manual review.
Phase 7 also scans catalog/ for databases that are effectively empty
(only contain ignorable tables like sqlite_sequence) and moves them to empty/.
Output Directory Structure
databases/
├── selected_variants/ # Input: Variant selection results
│ └── {f_offset}_{hash}/ # Best variant for each carved database
│ └── {variant}.sqlite # O/C/R/D variant chosen
│
├── catalog/ # CATALOG + promoted NEAREST
│ └── {exemplar_name}/
│ ├── {exemplar_name}.sqlite
│ ├── {exemplar_name}_manifest.json
│ └── rejected/
│ └── {exemplar_name}_rejected.sqlite
│
├── metamatches/ # MERGE: Identical schema groups
│ └── {group_label}/
│ ├── {group_label}.sqlite
│ ├── {group_label}_manifest.json
│ └── rejected/
│ └── {group_label}_rejected.sqlite
│
├── found_data/ # ORPHAN
│ └── {match_hint}_{db_name}_orphans/
│ └── {match_hint}_{db_name}_orphans.sqlite
│
├── empty/ # Databases with no recoverable data
│ └── {exemplar_name}/ # Matched schema but total_lf_rows == 0
│
├── carved/ # Byte-carved residue (variant X)
│ └── {exemplar_name}_{f_offset}_carved/
│ └── {exemplar_name}_{f_offset}_carved.sqlite
│
└── schemas/ # Generated rubrics and schemas
├── {exemplar_name}/
│ └── {exemplar_name}.rubric.json
└── {group_label}/
└── {group_label}.superrubric.json
Manifest Files
Each reconstructed database includes a *_manifest.json file documenting:
- Source databases: Which original databases contributed data
- Intact rows: Rows copied from original database tables
- LF rows: Rows recovered from lost_and_found fragments
- Remnant tables: Number of unmatched fragments
- Duplicates removed: Deduplication statistics
- Table-level stats: Row counts per table
Example manifest:
{
"output_type": "catalog",
"output_name": "Chrome_History",
"created": "2025-01-18T10:30:00",
"source_databases": [
{
"db_name": "f12345678",
"intact_rows": 1500,
"lf_rows": 342,
"remnant_tables": 2
}
],
"combined_stats": {
"total_intact_rows": 1500,
"total_lf_rows": 342,
"total_remnant_tables": 2,
"duplicates_removed": 45,
"table_stats": [
{"name": "urls", "rows": 1200},
{"name": "visits", "rows": 642}
]
}
}
Data Source Tracking
All reconstructed databases include a data_source column in every table
(except FTS virtual tables) to track data provenance:
carved_{db_name}: Intact data from carved databasefound_{db_name}: Reconstructed data from lost_and_found fragments
This allows forensic analysts to distinguish between original intact data and recovered fragments, which can be especially helpful when combining with exemplar data.
Key Modules
Orchestrator
lf_orchestrator.py: Main orchestrator coordinating all 7 phases
Processor Modules
lf_merge.py: MERGE - Metamatch group processinglf_catalog.py: CATALOG - Exact match processinglf_nearest.py: NEAREST - Best-fit exemplar matchinglf_orphan.py: ORPHAN - Unmatched table preservation
Shared Logic
lf_reconstruction.py: Shared reconstruction logic for CATALOG/NEARESTuc_helpers.py: Utility functions (FTS detection, labeling, sanitization)
Core Components
lf_matcher.py: Fragment-to-exemplar matching engine with schema validationlf_combiner.py: Fragment combination and column mappingdb_reconstructor.py: Low-level database reconstructionlf_splitter.py: Extract lost_and_found tables into split databases