Automated Compliance Document Analysis: A Technical Guide
Automated Compliance Document Analysis
Executive Summary
Executive Order (EO) compliance review is a critical task for organizations ensuring their documentation aligns with current regulations. This project provides an AI-powered, automated solution to review large collections of documents (PDF, Word, Excel) for compliance-related keywords and phrasing. By leveraging Natural Language Processing (NLP) and semantic similarity, the tool reduces review time from days to minutes.
Project Overview
The Compliance Document Analysis tool is a web application built with Shiny for Python. It allows users to upload documents and automatically scans them for specific keywords related to various compliance categories, such as Executive Orders, Grants, Science Compliance, and Limited Waivers.
Key Features
- Multi-Format Support: Analyzes PDF (
.pdf), Word (.docx), and Excel (.xlsx,.xls) files. - Intelligent Analysis:
- Exact Match: precise keyword detection.
- Semantic Similarity: AI-powered detection of conceptually similar phrases using
sentence-transformers(inai-app.py).
- Categorized Compliance: Pre-configured checks for:
- Executive Order 14168 (Gender & Identity, DEI)
- Science Compliance (Research protocols, data collection)
- Grants Management (Per diem, travel allowances)
- Limited Waivers (Specific restricted terms)
- Interactive Reporting: View results in a responsive data grid and export detailed reports to Excel.
- Privacy-Focused: Files are processed in-memory and not permanently stored.
Technical Architecture
The project consists of two main application variants:
- Standard App (
app.py): Uses a combination of regex-based exact matching and basic semantic matching. It is lightweight and fast. - AI-Enhanced App (
ai-app.py): Levages theall-MiniLM-L6-v2model from Hugging Face viasentence-transformersfor robust semantic understanding, capable of catching contextually similar non-exact matches.
Core Dependencies
- User Interface:
shiny(Python) - Data Processing:
pandas,numpy - Document Parsing:
PyPDF2,pdfplumber(PDFs)python-docx(Word)openpyxl(Excel)
- AI & NLP:
sentence-transformers,scikit-learn(for cosine similarity)
Installation & Setup
This guide assumes you have Python 3.8+ installed. We recommend using uv for fast package management, though pip works as well.
1. Prerequisite: Install uv
# macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
2. Clone the Repository
git clone https://github.com/danielmaangi/compliance-python-shiny.git cd compliance-python-shiny
3. Initialize Environment
Initialize the project and install dependencies:
uv init . uv sync
Or manually with pip:
pip install -r requirements.txt
Key Dependencies to Install Manually (if not using uv sync):
uv add shiny pandas PyPDF2 openpyxl python-docx sentence-transformers pdfplumber scikit-learn
Running the Application
You can run either the standard version or the AI-enhanced version.
Option A: Standard Application
Best for quick, keyword-based checks.
uv run shiny run app.py --reload # or uv run python app.py
Access at: http://127.0.0.1:8001
Option B: AI-Enhanced Application
Best for deep analysis and finding semantic variations of compliance terms.
uv run shiny run ai-app.py --reload # or uv run python ai-app.py
Access at: http://127.0.0.1:8002
Usage Guide
- Select Compliance Type: Choose the category of compliance you are checking (e.g., "EO Compliance" or "Grants Compliance") from the sidebar.
- Configure Search (AI App only): Adjust the Similarity Threshold slider.
- Lower value (e.g., 0.4) = More matches, potentially more false positives.
- Higher value (e.g., 0.8) = Fewer, stricter matches.
- Upload Files: Drag and drop PDF, Word, or Excel files into the upload zone.
- Analyze: Click the Start Analysis button.
- Review Results:
- Summary Stats: View total matches and files processed.
- Results Table: Interactive table showing the file, keyword, location (page/row), and the exact sentence context.
- Export: Click Download Excel to get a comprehensive report containing a summary sheet and detailed line-by-line findings.
Monitored Keywords & Categories
The system monitors specific lexicons for each category. (See COMPLIANCE_KEYWORDS in app.py for the full dictionary).
| Category | Focus Area | Examples |
|---|---|---|
| Executive Order | Gender, Identity, DEI | gender, transgender, DEI, equity, inclusion, pronouns |
| Science | Research Protocols | medical records, focus group, surveillance, clinical study |
| Grants | Financial | per diem, travel allowance, conference |
| Limited Waiver | Restricted Activities | prison, family planning, MSM, needle exchange |
Deployment
The application is "deployment-ready" for Posit Connect or other Shiny-compatible hosting services.
Deploying to Posit Connect
- Configure
rsconnect:uv run rsconnect add --server https://your-connect-server.com --name myserver --key YOUR_API_KEY - Deploy:
uv run rsconnect deploy shiny --server myserver --title "EO Compliance Analysis" --python ./app.py
Troubleshooting
- "Model not found": The AI app requires downloading the
all-MiniLM-L6-v2model. This happens automatically on the first run but requires an internet connection. - File Read Errors: Ensure Word documents are valid
.docx(not renamed.doc) and are not password protected. - Performance: Large PDFs may take time to process. The UI provides a progress bar to track status.
Reproducibility Note
To reproduce the exact environment used in development, rely on pyproject.toml and uv.lock. This ensures all dependency versions are pinned correctly.
# Recreate environment exactly uv sync