Automation & DevOps

    Bulk-Dedupe Huge Drives from the Terminal Without Regret

    Duplicate cleanup is terrifying because it usually means irreversible deletes. The FolderManifest CLI flips that: report first, clean to a reversible Recovery Bin, and restore anything in one command.

    Published June 28, 202611 min read
    Mehrab Ali

    Author

    Mehrab Ali

    Data Scientist, Researcher & Entrepreneur

    Founder of ARCED Foundation, ARCED International, and Solutions of Things Lab (SoTLab). Built FolderManifest to help teams protect file integrity and stay audit-ready.

    Last updated June 28, 2026

    Quick answer

    To remove duplicate files from the command line on Windows, run foldermanifest duplicates <folder> to report byte-identical groups, then add --clean --keep first to move the extra copies to the reversible Recovery Bin. One file per group is kept, and nothing is permanently deleted until you run recovery empty.

    • duplicates <folder> reports byte-identical groups and changes nothing.
    • --clean --keep first moves extras to the Recovery Bin, never a hard delete.
    • recovery restore --all undoes the whole pass.
    • Duplicates are matched by content hash, not by file name.

    Why dedupe from the terminal?

    Drag-and-drop dedupe tools fall apart at scale: thousands of groups, no audit trail, and a delete button that means gone. When you’re reclaiming space on a backup drive or a photo archive, you want three things a GUI rarely gives you all at once — a preview you can read, a cleanup that is reversible, and a command you can script and repeat.

    The FolderManifest CLI duplicates command is built around exactly that: identification is by content hash, the default is to report rather than delete, and --clean moves files to a Recovery Bin you can restore from. The whole workflow is designed so the scary step — deleting — is never irreversible.

    How duplicates are detected

    A duplicate is defined by content, not by name. FolderManifest hashes each file and groups files whose hashes match, so two files are duplicates only when their bytes are identical. This avoids the classic mistakes of name-based tools:

    • Two files named IMG_0421.jpg with different edits are not grouped.
    • The same photo saved as copy.jpg and original.jpg is grouped.
    • Files that merely share a size are not assumed equal — the hash decides.

    Because matching is exact, a clean pass never removes a file that only looked like a duplicate. If you want to understand the hashing itself, the free checksum calculator lets you hash a file in the browser.

    Step 1: Report, don’t delete

    Always start with a dry report. Run duplicates with no --clean flag and nothing moves — you just get the groups.

    Command Prompt
    foldermanifest duplicates "D:\Photos" --report dupes-report --json

    --report dupes-report writes a branded dupes-report.html you can scroll through, and --json prints the same groups as machine output for a script. Neither flag changes a single file.

    Step 2: Review the groups

    Each group is a set of byte-identical files; one will be kept and the rest are candidates to move. Skim the HTML report (or filter the JSON) and sanity-check before you clean — especially for files that aresupposed to exist in two places, like deliberate backups you keep separate.

    dupes-report.json (excerpt)
    {
      "groups": [
        {
          "hash": "a1b2c3…",
          "size": 5242880,
          "files": ["D:\\Photos\\2019\\IMG_0421.jpg", "D:\\Photos\\Import\\IMG_0421.jpg"]
        }
      ],
      "groupCount": 1873,
      "reclaimableBytes": 41203918720
    }

    reclaimableBytes is your prize — the space freed once the redundant copies are out of the way. Here that’s roughly 38 GB across 1,873 duplicate groups.

    Step 3: Clean to the Recovery Bin

    Happy with the report? Add --clean. It keeps one file per group and moves the rest to the Recovery Bin.

    Command Prompt
    foldermanifest duplicates "D:\Photos" --clean --keep first

    This is the only step that moves anything, and even here nothing is destroyed — the extras are now in the Recovery Bin, not deleted. The kept copies stay exactly where they were.

    Which copy gets kept

    --keep controls which member of each duplicate group survives. The choice is by path order, so you can steer the keeper toward your canonical folder:

    FlagKeepsUse when
    --keep firstFirst file by path order (default)Your originals sort earliest
    --keep lastLast file by path orderDated or final folders sort latest

    Whichever you choose, only the redundant copies move; the kept file is never touched. If you’re unsure which order matches your layout, run a report first and look at the file paths in each group.

    Step 4: Undo on demand

    The Recovery Bin is the safety net that makes bulk cleanup sane. List what was moved, restore selectively or wholesale, and only purge once you’re certain.

    Command Prompt
    foldermanifest recovery list           :: see everything that was moved
    foldermanifest recovery restore --all  :: undo the entire cleanup
    foldermanifest recovery empty          :: commit: free the space for good

    Until you run recovery empty (or purge specific ids), the deleted space is recoverable. That’s the difference between a confident cleanup and a held breath.

    Case study: a 500 GB photo archive

    A common mess: years of camera imports where the same shoot got copied into Import,Edited, and a dated folder. Same files, three times over. The whole reclaim is three commands:

    Command Prompt
    :: 1. Report — read it, trust it
    foldermanifest duplicates "E:\PhotoArchive" --report archive-dupes
    
    :: 2. Clean — keep the copy in the dated folders (last by path)
    foldermanifest duplicates "E:\PhotoArchive" --clean --keep last
    
    :: 3. Verify the win, then commit
    foldermanifest recovery list
    foldermanifest recovery empty

    Because every step is a command, you can paste the same routine onto the next drive, or wrap it in a script with --json and log how much each cleanup reclaimed. To run dedupe across several drives in one go, move the jobs into a reproducible --config file, and to schedule recurring cleanups see automating folder jobs with the CLI.

    Reclaim your drive without the regret

    The CLI — duplicate detection, safe cleanup, and the Recovery Bin — is included with the 7-day trial and every lifetime license.

    Frequently asked questions

    How do I remove duplicate files from the command line on Windows?
    Run foldermanifest duplicates on the folder to report byte-identical groups, review it, then add --clean --keep first to move the extra copies to the reversible Recovery Bin. One file per group is kept; nothing is permanently deleted until you empty the bin.
    How does the CLI decide which files are duplicates?
    By content hash, not by name or size alone. Files are read and grouped into byte-identical sets, so two files are duplicates only when their contents are exactly equal. Same-named files with different contents are never grouped together.
    Does duplicates --clean delete files permanently?
    No. --clean moves the extra copies to the FolderManifest Recovery Bin, never a hard delete. It keeps one file per duplicate group and you can restore everything with recovery restore --all until you choose to empty the bin.
    How does it choose which copy to keep?
    The --keep flag decides which member of each duplicate set survives: first (default) or last, by path order. The kept file stays exactly where it is; only the redundant copies move to the Recovery Bin, so your canonical folder is preserved.
    Can I preview what would be removed before deleting anything?
    Yes. Run duplicates without --clean and nothing is moved. You get the full group report, optionally as an HTML --report or machine-readable --json. Add --clean only once the report looks right.
    Will it work on a large external or backup drive?
    Yes. Detection is checksum-based and streams file contents, so it handles multi-terabyte drives. For a very large archive, run a report pass first, review the groups, then clean, keeping the Recovery Bin as your undo.
    How much space will deduplication actually free?
    The report includes a reclaimableBytes figure: the total size of the redundant copies that would move out of the way. That number is the space you get back once the extra copies are cleaned and the Recovery Bin is emptied.
    How do I undo a cleanup I regret?
    Run recovery list to see everything that was moved, then recovery restore --all to put every file back where it came from. The cleanup is fully reversible until you run recovery empty or purge specific items.
    When is the disk space actually reclaimed?
    Moving duplicates to the Recovery Bin keeps them recoverable, so space is freed for good only after recovery empty (all items) or recovery purge (specific ids). Until then, a regretted cleanup costs you nothing.
    Is the duplicate cleaner a separate purchase?
    No. The duplicates command, the Recovery Bin, and the rest of the CLI ship inside the FolderManifest Windows app and are included with the trial and every paid license at no extra cost.

    Related: Automate folder verification with the CLI · Find duplicate files on Windows