Corporate Batch Recovery: Unlock 100+ Excel Files at Once
When a corporate file server holds hundreds or thousands of password-protected Excel files — left behind by a departed employee, migrated from an old document management system, or protected by a now-forgotten departmental password — manually handling each file is not an option. Batch (automated) recovery turns a week of manual work into an overnight script run. This guide covers bulk structural removal (sheet and workbook protection), bulk cryptographic recovery (file-open passwords), the tools and scripts needed, and how to choose between in-house recovery and managed services for enterprise-scale operations.
The enterprise scale problem
Password-protected Excel files accumulate in corporate environments for many reasons: departing employees did not document their passwords, departmental shared passwords are lost when the team restructures, legacy document management systems apply a blanket password on exported files, or M&A integration brings in thousands of files from an acquired company with unknown protection.
At scale (100+ files), the manual approach — open each file, identify protection type, apply the appropriate removal — becomes impractical. A single batch recovery pipeline can process hundreds of files overnight with minimal human intervention.
The first step is always classification: separate files by protection type (structural vs file-open encryption) and by Office version (97-2003 vs 2007+ vs 2024+). The recovery technique for each category is different, and attempting the wrong technique wastes time or corrupts files.
Before batch recovery: take an inventory
Run a quick inventory script that opens each file, detects protection type (structural vs encryption), and reports the hash mode. This prevents applying XML edits to encrypted files (which does nothing) or submitting structural files to GPU recovery (which wastes money).
Batch structural removal — sheet and workbook protection
Sheet protection and workbook structure protection are the easiest to batch-remove because they are XML elements inside a ZIP archive. The approach: iterate through all .xlsx files, unzip each, delete the relevant XML elements, rezip.
A Python script using the zipfile and xml.etree.ElementTree modules handles this reliably. The script opens each .xlsx (which is a ZIP), parses the relevant XML files, removes the <sheetProtection> and/or <workbookProtection> elements, and writes back the modified ZIP. Full processing time per file is under a second.
For .xls (legacy binary) files with structural protection, the approach is different: use a hex editor script (or a tool like oletools) to locate and patch the protection bytes in the OLE2 compound document. Processing is slightly slower (5-10 seconds per file) but still feasible for thousands of files.
Critical: always process on copies, not originals. A single byte error during XML patching can corrupt the ZIP structure. Processing on copies with a verification step (open each output file and check that protection is removed) prevents data loss.
Batch VBA project password removal
VBA project passwords (DPB protection) are also batch-removable. Each .xlsm or .xls file contains a vbaProject.bin stream inside the ZIP. A script can extract this binary, locate the DPB protection flag, patch it, and replace it in the archive.
The standard approach: use a Python script with the oletools library (oledump.py or the higher-level API). The script walks through a directory of files, opens each, identifies the VBA stream, checks the DPB flag, patches it, and writes back. Processing time: 2-5 seconds per file depending on file size.
For very large batches (10,000+ files), parallel processing with multiprocessing or a thread pool brings total runtime down to minutes. The operation is CPU-bound per file (each requires ZIP decompression, binary patch, recompression) but scales linearly with core count.
Batch cryptographic recovery — file-open passwords
For files with file-open encryption (real encryption, not structural), batch recovery requires hashcat or a professional GPU service. Hashcat supports batch mode: you extract the hash from each file (using office2john from the John the Ripper suite), collect all hashes into a single file, and run hashcat against the combined hash file.
A single hashcat run on a batch of hashes is more efficient than running it separately per file. Hashcat cracks all hashes simultaneously in a single pass through the candidate list, which means weaker passwords in the batch are found early even if stronger ones take longer.
The challenge is that a batch of files likely has multiple different passwords. Hashcat handles this natively — it reports which hash (which file) was cracked by which candidate. You can split the batch: run a fast dictionary+ rules pass first (catches common passwords across the batch), then escalate each uncracked hash to individual mask attacks based on its specific password hints.
Enterprise recovery workflow architecture
A production batch recovery pipeline for an enterprise typically has four stages:
- Stage 1 — Inventory and classification: walk the file tree, classify each file by extension (.xlsx, .xlsm, .xls) and protection type. Output a CSV with file path, protection type, hash mode (if encrypted), and file size.
- Stage 2 — Structural removal: apply batch XML edit (for .xlsx) and oletools patch (for .xls) to all structurally protected files. Verify each output file opens correctly in a headless Excel-like validator.
- Stage 3 — Hash extraction: for remaining encrypted files, run office2john to extract hashcat-compatible hashes. Group by hash mode (9400, 9500, 9600, 9700, 9800, 96100, 96200) for efficient batch cracking.
- Stage 4 — GPU cracking: run hashcat batch on the combined hashes, starting with dictionary+ rules, then Markov/PCFG, then masks. Flag files that remain uncracked after exhaustive search for human review or professional escalation.
Tools and scripts for enterprise batch recovery
Python ecosystem: openpyxl (reading .xlsx metadata), zipfile + ElementTree (XML structural removal), oletools (binary .xls and VBA handling), office2john (hash extraction). For large file trees, os.walk or pathlib for directory traversal, concurrent.futures for parallelism.
Linux utilities: hashcat (GPU cracking), john (alternative cracking), unzip/zip (command-line ZIP manipulation), sed or xmlstarlet (command-line XML editing for simple cases).
For Windows-centric enterprise environments: PowerShell can handle ZIP manipulation and XML editing natively, though Python is more reliable for binary patching. Windows Subsystem for Linux (WSL) provides access to hashcat and John the Ripper.
Commercial enterprise tools: professional recovery services offer batch processing where you upload a ZIP of encrypted files and receive a report of recoverable passwords. This is the easiest option for IT departments without GPU infrastructure.
When to outsource vs build in-house
Build in-house if: you have a regular pipeline (monthly+ batches), your IT team includes someone comfortable with Python and hashcat, you have GPU hardware (even a single RTX 4090) or cloud GPU budget, and most files are structurally protected (cheap and fast to process).
Outsource if: this is a one-time or occasional cleanup, most files have file-open encryption (real AES), you lack GPU hardware or expertise, or the data is not sensitive enough to justify building infrastructure. Professional services charge per-file or per-password-recovered, often with volume discounts for batches over 100 files.
Hybrid: do structural removal in-house (XML edit is trivial and free), then send only the remaining encrypted files to a professional service. This minimizes cost while retaining control over the easy-to-process files.
Setting up a batch recovery pipeline
- 1
Inventory all files
Walk the directory tree. Separate by extension (.xlsx, .xlsm, .xls, .xltx) and file age. Back up everything before processing.
- 2
Classify protection type
Use Python + zipfile: if the ZIP opens and XML is readable = structural. If ZIP opens but XML is binary/encrypted = file-open encryption. If ZIP fails = possibly corrupted or legacy format.
- 3
Run batch structural removal
For all structurally protected .xlsx files: unzip, delete <sheetProtection> and <workbookProtection>, rezip. Verify each output file with a quick open test.
- 4
Extract hashes for encrypted files
Run office2john on all remaining .xlsx/.xls files. Group the output by hashcat mode (9400/9500/9600/9700/9800/96100/96200).
- 5
Run hashcat batch
For each mode group, run hashcat in batch mode with dictionary + rules as the first pass. Escalate uncracked hashes to mask or professional service.
Frequently Asked Questions
Can I batch-remove Excel passwords for free at enterprise scale?
How long does batch recovery take for 1,000 files?
Can I process files without modifying originals?
What if some files have different passwords?
Is there any way to skip the password check for modern Excel encryption?
Do professional services offer enterprise batch discounts?
Need Office password recovery?
Run a free analysis — encryption type detected automatically, fast techniques tried first, pay only on success.
Run Free Analysis