A lightweight Python library for converting Azure AI Document Intelligence AnalyzeResult objects into well-formatted HTML documents.
- π Clean HTML Generation: Converts document analysis results to semantic HTML
- π¨ Customizable Styling: Support for custom CSS styling
- π Table Support: Handles complex tables with row/column spans
- π Section Preservation: Maintains document structure with sections and paragraphs
- π§ Flexible API: Works with JSON files, stdin/stdout, or Python objects
pip install azure-ai-documentintelligence-htmlfrom azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential
from azure_ai_doc_intel_html import parse_analyze_result
# Analyze document with Azure Document Intelligence
client = DocumentIntelligenceClient(
endpoint="<your-endpoint>",
credential=AzureKeyCredential("<your-key>")
)
poller = client.begin_analyze_document("prebuilt-layout", document_url)
result = poller.result()
# Convert to HTML
html = parse_analyze_result(result)
# Save to file
with open("output.html", "w") as f:
f.write(html)
# Or save directly
parse_analyze_result(result, output_html="output.html")# From JSON file to HTML file
python -m azure_ai_doc_intel_html input.json output.html
# From stdin to stdout
cat input.json | python -m azure_ai_doc_intel_html > output.htmlfrom azure_ai_doc_intel_html import parse
# Parse from JSON file
html = parse("analyze_result.json")
# Save to file
parse("analyze_result.json", output_html="output.html")
# With custom title
html = parse("analyze_result.json", title="My Document Analysis")
# With custom CSS
custom_css = """
body { font-family: 'Segoe UI', Tahoma, sans-serif; }
h1 { color: #0078D4; }
table { border-color: #0078D4; }
"""
html = parse("analyze_result.json", custom_css=custom_css)Convert an AnalyzeResult object to HTML.
Parameters:
analyze_result: AnalyzeResult object from Azure AI Document Intelligenceoutput_html: Optional path to save HTML filetitle: Optional custom title (defaults to auto-detection from document)custom_css: Optional custom CSS styles
Returns: HTML string
Parse JSON file or stdin to HTML.
Parameters:
path_or_stdin: Path to JSON file or None for stdinoutput_html: Optional path to save HTML filetitle: Optional custom titlecustom_css: Optional custom CSS styles
Returns: HTML string
load_json(path_or_stdin): Load JSON from file or stdinwrite_html(path_or_stdout, html): Write HTML to file or stdoutrender_table_html(table): Render table object to HTMLrender_paragraph_html(paragraph): Render paragraph object to HTMLrender_sections_html(data): Render all sections to HTMLbuild_html_doc(title, sections_html, custom_css): Build complete HTML document
The generated HTML includes:
- Document Title: Extracted from page headers or custom-specified
- Sections: Logical document sections with proper hierarchy
- Paragraphs: Text content with role-based styling (headers, footers, body text)
- Tables: Complex tables with proper row/column spans and headers
- Semantic HTML: Uses appropriate HTML5 elements for better accessibility
.doc-table: Document tables.section: Document sections.page-header: Page header content.page-footer: Page footer content.summary: Summary sections with definition lists
custom_css = """
body {
font-family: 'Segoe UI', Tahoma, Geneva, sans-serif;
max-width: 1200px;
margin: 0 auto;
padding: 20px;
}
h1 {
color: #0078D4;
border-bottom: 2px solid #0078D4;
padding-bottom: 10px;
}
.doc-table {
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
border-radius: 4px;
overflow: hidden;
}
.doc-table th {
background: #0078D4;
color: white;
}
.section {
background: #f9f9f9;
padding: 15px;
margin: 20px 0;
border-radius: 4px;
}
"""
html = parse_analyze_result(result, custom_css=custom_css)This library is designed to work seamlessly with Azure AI Document Intelligence. Here's a complete example:
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential
from azure_ai_doc_intel_html import parse_analyze_result
# Setup client
endpoint = "https://your-resource.cognitiveservices.azure.com/"
key = "your-api-key"
client = DocumentIntelligenceClient(endpoint, AzureKeyCredential(key))
# Analyze document
with open("document.pdf", "rb") as f:
poller = client.begin_analyze_document(
"prebuilt-layout",
analyze_request=f,
content_type="application/pdf"
)
result = poller.result()
# Convert to HTML with custom title
html = parse_analyze_result(
result,
title="Financial Report Analysis",
output_html="report.html"
)
print(f"HTML generated: {len(html)} characters")- β Paragraphs with roles (headers, footers, section headings)
- β Tables with complex spans
- β Sections with hierarchical structure
- β Text content with proper escaping
- β Page structure preservation
- Python 3.7+
- azure-ai-documentintelligence>=1.0.0
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions, please use the GitHub issue tracker.