Case Study

    6 Hours a Week Back, 340 GB Recovered: A Real CLI Verification Workflow

    Published read
    Mehrab Ali

    Author

    Mehrab Ali

    Data Scientist, Researcher & Entrepreneur

    Founder of ARCED Foundation, ARCED International, and Solutions of Things Lab (SoTLab). Built FolderManifest to help teams protect file integrity and stay audit-ready.

    6 hrs/wk

    manual checking eliminated

    340 GB

    recovered from duplicate files

    3

    silent corruptions caught in month 1

    The situation

    Six-person video production team. Three editors, one colorist, one project coordinator, one producer. They run a Windows-based NAS (Synology, ~8 TB usable) for shared project storage, with a nightly robocopy job that mirrors the NAS to an offsite Windows Server box. Every active project lives on the NAS. Delivered projects get archived to the backup box and, eventually, to cold storage.

    The team delivers four to six short-form commercial projects per month. Each project folder averages 80–120 GB of raw footage, project files, exports, and client deliverables.

    They had no automated way to verify that a robocopy job actually produced a complete, uncorrupted copy. They also had no process to clean up accumulated duplicates — editors routinely copied project folders to make "backup versions" before major edits, and those copies piled up.

    What manual checking was costing them

    Before the CLI setup, their verification workflow was:

    1. After each project delivery, the coordinator manually browsed both the NAS and the backup box to spot-check that files were present.
    2. Before archiving to cold storage, someone would open the most important export files on both machines and visually compare them.
    3. Before starting a new project that reused assets from an old one, an editor would manually search for duplicates across the NAS.

    The coordinator estimated she spent roughly 90 minutes per week on backup spot-checks. Each editor spent 20–30 minutes per project hunting for duplicate assets or prior-version folders. Across six projects and three editors, that added up to about 4.5 hours on top of the coordinator's time — just over 6 hours weekly in total.

    The incident that triggered the change

    Two months before setting up the CLI, an editor opened a 4K export from the backup box for a client revision request — and found the file played back corrupted after the 1-hour mark. The original on the NAS was fine. The robocopy job had silently transferred a partially-written file during a crash. The file size looked correct; the timestamp was right. Only the bytes were wrong. They had to re-render a 4-hour job because the backup was unusable. That incident is what motivated the change.

    The CLI setup (30 minutes, one Saturday morning)

    They installed FolderManifest on the Windows Server backup box (one license). The CLI is available immediately after installation — no separate install, no PATH configuration needed beyond what the installer does.

    Step 1: Generate a baseline manifest right after robocopy

    The core idea: generate a SHA-256 manifest of the NAS source immediately after robocopy finishes. That manifest is the ground truth. Then verify the backup destination against that manifest to confirm it is byte-for-byte identical.

    PowerShell
    # wrap-backup.ps1 — runs after robocopy completes
    
    $SOURCE   = "\\nas\projects"
    $DEST     = "D:\backup\projects"
    $MANIFEST = "D:\manifests\projects-baseline.json"
    $LOG      = "D:\logs\backup-verify-$(Get-Date -Format 'yyyyMMdd').log"
    
    # 1. Generate baseline from NAS (source of truth)
    Write-Host "Generating baseline manifest..."
    foldermanifest generate `
      --path $SOURCE `
      --format json `
      --checksum sha256 `
      --output $MANIFEST
    
    if ($LASTEXITCODE -ne 0) {
      "BASELINE FAILED at $(Get-Date)" | Out-File $LOG
      exit 1
    }
    
    # 2. Run robocopy
    Write-Host "Running robocopy..."
    robocopy $SOURCE $DEST /MIR /MT:8 /LOG+:$LOG
    
    # 3. Verify destination against baseline
    Write-Host "Verifying backup integrity..."
    foldermanifest verify `
      --path $DEST `
      --manifest $MANIFEST `
      --report "D:\logs\verify-report-$(Get-Date -Format 'yyyyMMdd').html" `
      >> $LOG
    
    if ($LASTEXITCODE -eq 0) {
      "BACKUP OK — all files verified at $(Get-Date)" | Out-File $LOG -Append
    } elseif ($LASTEXITCODE -eq 1) {
      "BACKUP MISMATCH — see report at $(Get-Date)" | Out-File $LOG -Append
      # send email alert (they use Send-MailMessage to internal SMTP)
      Send-MailMessage `
        -To "team@example.com" `
        -Subject "BACKUP MISMATCH — action required" `
        -Body (Get-Content $LOG -Raw) `
        -SmtpServer "mail.example.com"
    } else {
      "BACKUP CHECK ERROR (exit $LASTEXITCODE)" | Out-File $LOG -Append
    }

    This script wraps their existing robocopy call. It generates a manifest from the NAS, runs robocopy, then verifies the destination against the manifest. If anything diverges — a file was corrupted during transfer, truncated, or missed — the script emails the team and writes an HTML report with the exact list of mismatches.

    Task Scheduler runs this at 2 AM daily. The team wakes up to either silence (all good) or an email with the report attached.

    Step 2: Project wrap verification

    When a project is delivered and moved to the archive path, the coordinator runs a one-liner to generate a permanent manifest for that project:

    PowerShell
    # run once per project at delivery
    foldermanifest generate `
      --path "D:\backup\projects\ClientName_ProjectSlug" `
      --format json `
      --checksum sha256 `
      --output "D:\manifests\delivered\ClientName_ProjectSlug.json"
    
    # save the manifest alongside the project in cold storage too
    # so verification is possible years later without the NAS

    The manifest file is small (a few KB for most projects) and gets archived alongside the project to cold storage. Two years from now, if a client requests an asset, they can verify the cold-storage copy against the manifest before downloading 80 GB.

    Results after 30 days

    Three silent corruptions caught

    In the first month, the nightly verify script fired three alerts. Two were the same cause: the NAS had a flaky SMB connection that caused robocopy to transfer a file while it was still being written. The files transferred but their SHA-256 hashes did not match the baseline. Robocopy reported success. Without manifest verification, the backup would have looked fine.

    The third was a different issue: a drive in the RAID array had a pending sector error, causing one 4K export file to return corrupted bytes on read. The file appeared intact in Windows Explorer — same size, same date. The hash mismatch was the only indication anything was wrong. They replaced the drive before it failed completely.

    6 hours a week back

    The coordinator stopped spot-checking backups manually. The script does it nightly and emails only on failure. She estimates she spends under 5 minutes a week on backup-related tasks now, down from 90 minutes.

    Editors stopped manually hunting for duplicate footage because the weekly dedup report (below) flags them automatically. Per-editor time on "is this folder already somewhere?" dropped to near zero.

    340 GB of storage recovered

    The dedup pass (see below) identified 340 GB of duplicate files across the NAS — about 4% of total capacity. The largest single offender was a 90 GB raw footage folder copied four times under different names across three projects by three different editors, each unaware the others had done it.

    The nightly verification cron (Linux/macOS equivalent)

    If your backup box runs Linux or macOS, the same workflow maps to a cron job:

    Bash
    #!/usr/bin/env bash
    # /usr/local/bin/wrap-backup.sh
    
    SOURCE="/mnt/nas/projects"
    DEST="/backup/projects"
    MANIFEST="/manifests/projects-baseline.json"
    LOG="/logs/backup-verify-$(date +%Y%m%d).log"
    REPORT="/logs/verify-report-$(date +%Y%m%d).html"
    
    echo "Generating baseline..." | tee "$LOG"
    foldermanifest generate \
      --path "$SOURCE" \
      --format json \
      --checksum sha256 \
      --output "$MANIFEST" >> "$LOG" 2>&1
    
    # rsync instead of robocopy
    rsync -av --delete "$SOURCE/" "$DEST/" >> "$LOG" 2>&1
    
    echo "Verifying..." | tee -a "$LOG"
    foldermanifest verify \
      --path "$DEST" \
      --manifest "$MANIFEST" \
      --report "$REPORT" >> "$LOG" 2>&1
    
    EXIT=$?
    if [ $EXIT -eq 0 ]; then
      echo "BACKUP OK at $(date)" >> "$LOG"
    elif [ $EXIT -eq 1 ]; then
      echo "BACKUP MISMATCH at $(date)" >> "$LOG"
      # mail using sendmail or mailx
      echo "See $REPORT for details" | mail -s "BACKUP MISMATCH" team@example.com
    fi
    Bash
    # crontab entry — runs at 2 AM daily
    0 2 * * * /usr/local/bin/wrap-backup.sh

    The dedup pass

    Separate from nightly verification, they run a weekly dedup scan across the NAS to identify duplicate files before they archive:

    PowerShell
    # weekly-dedup.ps1
    $NAS  = "\\nas\projects"
    $REPORT = "D:\logs\dedup-$(Get-Date -Format 'yyyyMMdd').html"
    
    # --dry-run: identify only, no deletions
    foldermanifest dedupe `
      --path $NAS `
      --checksum sha256 `
      --report $REPORT `
      --dry-run
    
    Write-Host "Dedup report written to $REPORT"

    The --dry-run flag is critical. It generates the report showing exactly which files are duplicates and how much space they consume, but deletes nothing. An editor reviews the report Monday morning and manually deletes the flagged folders — usually takes 5 minutes because the report groups duplicates with file sizes and paths side by side.

    Why --dry-run first, always

    Dedup by SHA-256 hash is safe in principle — two files with identical hashes are byte-for-byte the same. But on a creative team's NAS, "identical" files sometimes have different intended purposes: a master export copied as a "safety" before color grade, a client deliverable that happens to match an internal review export. Dry-run lets a human confirm before anything disappears. Automate the detection; keep the deletion decision human.

    What they learned

    robocopy exit codes are not enough

    robocopy uses exit codes 0–8 to indicate different sync outcomes, but none of them tell you whether the bytes on disk are correct after transfer. A file can transfer with exit code 0 and still be corrupted if the source was mid-write. SHA-256 verification is the only reliable check.

    File size and timestamp checks miss silent corruption

    Every silent corruption they caught had the correct file size and a plausible timestamp. Windows Explorer, robocopy logs, and a human spot-check all missed them. A cryptographic hash is non-negotiable for any file where corruption matters.

    The baseline must be generated from the source, not the destination

    They initially tried generating the manifest from the backup destination and verifying the NAS source against it. That's backwards — if the backup is already corrupt, you're just confirming the corruption is consistent. Generate the baseline from the authoritative source, then verify the destination matches.

    Small manifests, long shelf life

    A JSON manifest for a 120 GB project is typically 30–80 KB. Storing it costs nothing. But two years from now, when a client requests a revision and you need to verify your cold-storage archive hasn't degraded, that 80 KB file is the difference between "verified good" and "probably fine."

    Try this workflow on your team's backup

    The CLI is included in the free trial — no separate download, no npm install. Install the Windows app and run foldermanifest --help from any terminal.

    Start free trial — CLI included

    No card required during trial.

    Frequently asked questions

    How do you automatically verify that a backup completed correctly?

    Generate a SHA-256 manifest of the source folder immediately after a known-good state. Then run foldermanifest verify against that manifest on a nightly schedule. Exit code 0 means the backup is byte-for-byte identical; exit code 1 means something was added, removed, or corrupted. Your script can email or log the result so you find out without manual checking.

    How do you detect silent file corruption?

    Silent corruption — where a file exists but its bytes have changed — is invisible to name/size/date checks but detected by SHA-256. Run foldermanifest verify on any folder you care about. If a hash no longer matches its baseline, the file was corrupted, even if its size and timestamp look normal.

    How much storage can deduplication recover on a creative team's shared drive?

    It varies widely. Teams that copy project folders for "versioning" or sync the same assets across multiple machines are the biggest targets. In practice, identifying and removing duplicate files with foldermanifest dedupe commonly recovers 20–40% of total storage on unmanaged shared drives. The CLI --dry-run flag shows exactly what would be removed before you commit.

    Can I verify files on a NAS or network share with the FolderManifest CLI?

    Yes. The CLI operates on any path the running user can read, including mapped drives (Z:\), UNC paths (\\server\share), or mounted SMB shares on Linux/macOS. Performance depends on network speed — SHA-256 over gigabit is roughly 500 MB/s, so a 2 TB share takes about an hour.

    What is the difference between verifying a backup with rsync --checksum and FolderManifest?

    rsync --checksum compares source vs destination live and re-syncs differences. FolderManifest compares a folder against a saved baseline — ideal for detecting drift over time or catching corruption after the sync finished. They solve different problems: rsync keeps two folders in sync; FolderManifest proves a folder has not changed since a specific point in time.

    How do I get an alert when a file changes in a folder?

    Run foldermanifest verify on a schedule and check the exit code. Code 0 = no change (silent). Code 1 = something changed. In a batch script or cron job, branch on the exit code and trigger an email, Slack message, or log entry only when code is 1. This gives you change alerts without a background daemon.

    Is the FolderManifest CLI included in the trial?

    Yes. The CLI ships inside the desktop app and works during the trial. Install the Windows app, open any terminal, and run foldermanifest --help to get started. No separate download or npm install required.