← Back to Help
WarpedWing Labs

Fingerprinter Module

File fingerprinting for accurate type detection.

Purpose

Analyze files to determine their type and structure without relying solely on file extensions or metadata. Essential for processing carved files where metadata is often lost.

Module Components

text_fingerprinter.py - Text Log Detection

Identifies log types from text content

from mars.pipeline.fingerprinter.text_fingerprinter import (
    LogType,
    identify_log_type,
)

# Identify log type
result = identify_log_type(
    file_path=Path("./unknown.log"),
    min_confidence=0.6
)

print(f"Type: {result.log_type}")
print(f"Confidence: {result.confidence}")
print(f"Reasons: {result.reasons}")
print(f"First timestamp: {result.first_timestamp}")

Supported Log Types:

Detection Methods:

Returns:


How Fingerprinting Works

Text Fingerprinting Process

  1. Quick Checks (fast rejection)
  2. File size < 50 bytes → UNKNOWN
  3. Binary magic bytes → classify as binary type

  4. Magic Byte Detection

  5. JSONLZ4: b"mozLz40\x00"
  6. ASL: b"ASL DB\x00"

  7. Content Analysis

  8. Read first 1000-10000 lines
  9. Look for timestamp patterns
  10. Match against known log prefixes
  11. Calculate pattern frequency

  12. Scoring

  13. Each detection method adds to confidence
  14. Multiple matching patterns → higher confidence
  15. Return best match above threshold

Usage Examples

Fingerprint Unknown Files

from pathlib import Path
from mars.pipeline.fingerprinter.text_fingerprinter import identify_log_type

file_path = Path("./unknown_file")

# Try text fingerprinting
result = identify_log_type(file_path, min_confidence=0.6)
if result.log_type != LogType.UNKNOWN:
    print(f"Identified as: {result.log_type}")
    print(f"Confidence: {result.confidence:.2%}")

Performance Notes