Enterprise Guide

Corporate Batch Recovery: Unlock 100+ Excel Files at Once

When a corporate file server holds hundreds or thousands of password-protected Excel files — left behind by a departed employee, migrated from an old document management system, or protected by a now-forgotten departmental password — manually handling each file is not an option. Batch (automated) recovery turns a week of manual work into an overnight script run. This guide covers bulk structural removal (sheet and workbook protection), bulk cryptographic recovery (file-open passwords), the tools and scripts needed, and how to choose between in-house recovery and managed services for enterprise-scale operations.

The enterprise scale problem

Password-protected Excel files accumulate in corporate environments for many reasons: departing employees did not document their passwords, departmental shared passwords are lost when the team restructures, legacy document management systems apply a blanket password on exported files, or M&A integration brings in thousands of files from an acquired company with unknown protection.

At scale (100+ files), the manual approach — open each file, identify protection type, apply the appropriate removal — becomes impractical. A single batch recovery pipeline can process hundreds of files overnight with minimal human intervention.

The first step is always classification: separate files by protection type (structural vs file-open encryption) and by Office version (97-2003 vs 2007+ vs 2024+). The recovery technique for each category is different, and attempting the wrong technique wastes time or corrupts files.

Before batch recovery: take an inventory

Run a quick inventory script that opens each file, detects protection type (structural vs encryption), and reports the hash mode. This prevents applying XML edits to encrypted files (which does nothing) or submitting structural files to GPU recovery (which wastes money).

Batch structural removal — sheet and workbook protection

Sheet protection and workbook structure protection are the easiest to batch-remove because they are XML elements inside a ZIP archive. The approach: iterate through all .xlsx files, unzip each, delete the relevant XML elements, rezip.

A Python script using the zipfile and xml.etree.ElementTree modules handles this reliably. The script opens each .xlsx (which is a ZIP), parses the relevant XML files, removes the <sheetProtection> and/or <workbookProtection> elements, and writes back the modified ZIP. Full processing time per file is under a second.

For .xls (legacy binary) files with structural protection, the approach is different: use a hex editor script (or a tool like oletools) to locate and patch the protection bytes in the OLE2 compound document. Processing is slightly slower (5-10 seconds per file) but still feasible for thousands of files.

Critical: always process on copies, not originals. A single byte error during XML patching can corrupt the ZIP structure. Processing on copies with a verification step (open each output file and check that protection is removed) prevents data loss.

Batch VBA project password removal

VBA project passwords (DPB protection) are also batch-removable. Each .xlsm or .xls file contains a vbaProject.bin stream inside the ZIP. A script can extract this binary, locate the DPB protection flag, patch it, and replace it in the archive.

The standard approach: use a Python script with the oletools library (oledump.py or the higher-level API). The script walks through a directory of files, opens each, identifies the VBA stream, checks the DPB flag, patches it, and writes back. Processing time: 2-5 seconds per file depending on file size.

For very large batches (10,000+ files), parallel processing with multiprocessing or a thread pool brings total runtime down to minutes. The operation is CPU-bound per file (each requires ZIP decompression, binary patch, recompression) but scales linearly with core count.

Batch cryptographic recovery — file-open passwords

For files with file-open encryption (real encryption, not structural), batch recovery requires hashcat or a professional GPU service. Hashcat supports batch mode: you extract the hash from each file (using office2john from the John the Ripper suite), collect all hashes into a single file, and run hashcat against the combined hash file.

A single hashcat run on a batch of hashes is more efficient than running it separately per file. Hashcat cracks all hashes simultaneously in a single pass through the candidate list, which means weaker passwords in the batch are found early even if stronger ones take longer.

The challenge is that a batch of files likely has multiple different passwords. Hashcat handles this natively — it reports which hash (which file) was cracked by which candidate. You can split the batch: run a fast dictionary+ rules pass first (catches common passwords across the batch), then escalate each uncracked hash to individual mask attacks based on its specific password hints.

Enterprise recovery workflow architecture

A production batch recovery pipeline for an enterprise typically has four stages:

  • Stage 1 — Inventory and classification: walk the file tree, classify each file by extension (.xlsx, .xlsm, .xls) and protection type. Output a CSV with file path, protection type, hash mode (if encrypted), and file size.
  • Stage 2 — Structural removal: apply batch XML edit (for .xlsx) and oletools patch (for .xls) to all structurally protected files. Verify each output file opens correctly in a headless Excel-like validator.
  • Stage 3 — Hash extraction: for remaining encrypted files, run office2john to extract hashcat-compatible hashes. Group by hash mode (9400, 9500, 9600, 9700, 9800, 96100, 96200) for efficient batch cracking.
  • Stage 4 — GPU cracking: run hashcat batch on the combined hashes, starting with dictionary+ rules, then Markov/PCFG, then masks. Flag files that remain uncracked after exhaustive search for human review or professional escalation.

Tools and scripts for enterprise batch recovery

Python ecosystem: openpyxl (reading .xlsx metadata), zipfile + ElementTree (XML structural removal), oletools (binary .xls and VBA handling), office2john (hash extraction). For large file trees, os.walk or pathlib for directory traversal, concurrent.futures for parallelism.

Linux utilities: hashcat (GPU cracking), john (alternative cracking), unzip/zip (command-line ZIP manipulation), sed or xmlstarlet (command-line XML editing for simple cases).

For Windows-centric enterprise environments: PowerShell can handle ZIP manipulation and XML editing natively, though Python is more reliable for binary patching. Windows Subsystem for Linux (WSL) provides access to hashcat and John the Ripper.

Commercial enterprise tools: professional recovery services offer batch processing where you upload a ZIP of encrypted files and receive a report of recoverable passwords. This is the easiest option for IT departments without GPU infrastructure.

When to outsource vs build in-house

Build in-house if: you have a regular pipeline (monthly+ batches), your IT team includes someone comfortable with Python and hashcat, you have GPU hardware (even a single RTX 4090) or cloud GPU budget, and most files are structurally protected (cheap and fast to process).

Outsource if: this is a one-time or occasional cleanup, most files have file-open encryption (real AES), you lack GPU hardware or expertise, or the data is not sensitive enough to justify building infrastructure. Professional services charge per-file or per-password-recovered, often with volume discounts for batches over 100 files.

Hybrid: do structural removal in-house (XML edit is trivial and free), then send only the remaining encrypted files to a professional service. This minimizes cost while retaining control over the easy-to-process files.

Setting up a batch recovery pipeline

  1. 1

    Inventory all files

    Walk the directory tree. Separate by extension (.xlsx, .xlsm, .xls, .xltx) and file age. Back up everything before processing.

  2. 2

    Classify protection type

    Use Python + zipfile: if the ZIP opens and XML is readable = structural. If ZIP opens but XML is binary/encrypted = file-open encryption. If ZIP fails = possibly corrupted or legacy format.

  3. 3

    Run batch structural removal

    For all structurally protected .xlsx files: unzip, delete <sheetProtection> and <workbookProtection>, rezip. Verify each output file with a quick open test.

  4. 4

    Extract hashes for encrypted files

    Run office2john on all remaining .xlsx/.xls files. Group the output by hashcat mode (9400/9500/9600/9700/9800/96100/96200).

  5. 5

    Run hashcat batch

    For each mode group, run hashcat in batch mode with dictionary + rules as the first pass. Escalate uncracked hashes to mask or professional service.

Frequently Asked Questions

Can I batch-remove Excel passwords for free at enterprise scale?
Structural protection (sheet, workbook, VBA) can be batch-removed for free using Python scripts. File-open encryption requires hashcat (free but needs GPU hardware) or a paid service. The hybrid approach (DIY structural + outsourced crypto) is usually most cost-effective.
How long does batch recovery take for 1,000 files?
Structural removal: ~30 minutes (2 seconds per file, parallelised). Hash extraction: ~1 hour. GPU batch cracking: depends on password strength — simple dictionary catch may finish in minutes, full mask for strong passwords may take weeks per file.
Can I process files without modifying originals?
Yes. Always set up the pipeline to copy files to a working directory, process the copies, and write outputs to a separate directory. Keep originals untouched for audit trail and rollback.
What if some files have different passwords?
Hashcat's batch mode handles multiple hashes simultaneously. Each password test is checked against all hashes. Files with the same password are cracked together; files with unique passwords are cracked independently as their candidates are tested.
Is there any way to skip the password check for modern Excel encryption?
No. Modern Excel file-open encryption (mode 9600+) uses AES with SHA-512 key derivation. There are no backdoors for structural bypass at the file level. The password must be recovered or found.
Do professional services offer enterprise batch discounts?
Most do. Typical pricing: volume discounts at 50+ files, dedicated queue priority for 500+ files, and negotiated flat rates for recurring enterprise contracts. Always ask about batch pricing before uploading individual files.

Need Office password recovery?

Run a free analysis — encryption type detected automatically, fast techniques tried first, pay only on success.

Run Free Analysis

Related Reading