Deep Learning
Projects
Data Analytics
Compliance
NLP

Automated Compliance Document Analysis: A Technical Guide

November 1, 2025•15 min read

Automated Compliance Document Analysis

Executive Summary

Executive Order (EO) compliance review is a critical task for organizations ensuring their documentation aligns with current regulations. This project provides an AI-powered, automated solution to review large collections of documents (PDF, Word, Excel) for compliance-related keywords and phrasing. By leveraging Natural Language Processing (NLP) and semantic similarity, the tool reduces review time from days to minutes.

Project Overview

The Compliance Document Analysis tool is a web application built with Shiny for Python. It allows users to upload documents and automatically scans them for specific keywords related to various compliance categories, such as Executive Orders, Grants, Science Compliance, and Limited Waivers.

Key Features

  • Multi-Format Support: Analyzes PDF (.pdf), Word (.docx), and Excel (.xlsx, .xls) files.
  • Intelligent Analysis:
    • Exact Match: precise keyword detection.
    • Semantic Similarity: AI-powered detection of conceptually similar phrases using sentence-transformers (in ai-app.py).
  • Categorized Compliance: Pre-configured checks for:
    • Executive Order 14168 (Gender & Identity, DEI)
    • Science Compliance (Research protocols, data collection)
    • Grants Management (Per diem, travel allowances)
    • Limited Waivers (Specific restricted terms)
  • Interactive Reporting: View results in a responsive data grid and export detailed reports to Excel.
  • Privacy-Focused: Files are processed in-memory and not permanently stored.

Technical Architecture

The project consists of two main application variants:

  1. Standard App (app.py): Uses a combination of regex-based exact matching and basic semantic matching. It is lightweight and fast.
  2. AI-Enhanced App (ai-app.py): Levages the all-MiniLM-L6-v2 model from Hugging Face via sentence-transformers for robust semantic understanding, capable of catching contextually similar non-exact matches.

Core Dependencies

  • User Interface: shiny (Python)
  • Data Processing: pandas, numpy
  • Document Parsing:
    • PyPDF2, pdfplumber (PDFs)
    • python-docx (Word)
    • openpyxl (Excel)
  • AI & NLP: sentence-transformers, scikit-learn (for cosine similarity)

Installation & Setup

This guide assumes you have Python 3.8+ installed. We recommend using uv for fast package management, though pip works as well.

1. Prerequisite: Install uv

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Clone the Repository

git clone https://github.com/danielmaangi/compliance-python-shiny.git
cd compliance-python-shiny

3. Initialize Environment

Initialize the project and install dependencies:

uv init .
uv sync

Or manually with pip:

pip install -r requirements.txt

Key Dependencies to Install Manually (if not using uv sync):

uv add shiny pandas PyPDF2 openpyxl python-docx sentence-transformers pdfplumber scikit-learn

Running the Application

You can run either the standard version or the AI-enhanced version.

Option A: Standard Application

Best for quick, keyword-based checks.

uv run shiny run app.py --reload
# or
uv run python app.py

Access at: http://127.0.0.1:8001

Option B: AI-Enhanced Application

Best for deep analysis and finding semantic variations of compliance terms.

uv run shiny run ai-app.py --reload
# or
uv run python ai-app.py

Access at: http://127.0.0.1:8002

Usage Guide

  1. Select Compliance Type: Choose the category of compliance you are checking (e.g., "EO Compliance" or "Grants Compliance") from the sidebar.
  2. Configure Search (AI App only): Adjust the Similarity Threshold slider.
    • Lower value (e.g., 0.4) = More matches, potentially more false positives.
    • Higher value (e.g., 0.8) = Fewer, stricter matches.
  3. Upload Files: Drag and drop PDF, Word, or Excel files into the upload zone.
  4. Analyze: Click the Start Analysis button.
  5. Review Results:
    • Summary Stats: View total matches and files processed.
    • Results Table: Interactive table showing the file, keyword, location (page/row), and the exact sentence context.
  6. Export: Click Download Excel to get a comprehensive report containing a summary sheet and detailed line-by-line findings.

Monitored Keywords & Categories

The system monitors specific lexicons for each category. (See COMPLIANCE_KEYWORDS in app.py for the full dictionary).

CategoryFocus AreaExamples
Executive OrderGender, Identity, DEIgender, transgender, DEI, equity, inclusion, pronouns
ScienceResearch Protocolsmedical records, focus group, surveillance, clinical study
GrantsFinancialper diem, travel allowance, conference
Limited WaiverRestricted Activitiesprison, family planning, MSM, needle exchange

Deployment

The application is "deployment-ready" for Posit Connect or other Shiny-compatible hosting services.

Deploying to Posit Connect

  1. Configure rsconnect:
    uv run rsconnect add --server https://your-connect-server.com --name myserver --key YOUR_API_KEY
    
  2. Deploy:
    uv run rsconnect deploy shiny --server myserver --title "EO Compliance Analysis" --python ./app.py
    

Troubleshooting

  • "Model not found": The AI app requires downloading the all-MiniLM-L6-v2 model. This happens automatically on the first run but requires an internet connection.
  • File Read Errors: Ensure Word documents are valid .docx (not renamed .doc) and are not password protected.
  • Performance: Large PDFs may take time to process. The UI provides a progress bar to track status.

Reproducibility Note

To reproduce the exact environment used in development, rely on pyproject.toml and uv.lock. This ensures all dependency versions are pinned correctly.

# Recreate environment exactly
uv sync