Function 5 performs bulk extraction of bibliographic record data from Alma, saving the complete XML of multiple records to a single JSON file. This function is essential for large-scale data analysis, backup, migration, or external processing of Alma records.
This function retrieves the full XML content of all records in a loaded set and exports them to a structured JSON file with:
- Each record's MMS ID as the key
- Complete XML as the value
- Proper JSON formatting and escaping
- UTF-8 character encoding
- Timestamped filename for versioning
- Batch processing: Handles any number of records in a set
- Complete records: Exports full bibliographic XML
- JSON format: Machine-readable, parseable output
- Progress tracking: Real-time progress bar during export
- Kill switch: Can stop export mid-process
- Error handling: Continues on individual record failures
- Automatic file naming: Uses timestamp and record count
- UTF-8 support: Preserves special characters
Alma's standard export tools have limitations:
- Limited to specific fields or formats
- May not include complete XML
- Not optimized for programmatic access
- Difficult to process large sets
Function 5 provides:
- Complete XML export
- Programmatically accessible format (JSON)
- Easy parsing with standard libraries
- Full control over exported data
Common scenarios requiring bulk XML access:
- Data quality analysis: Examine metadata patterns across records
- Migration preparation: Extract data for transformation
- External processing: Feed to XSLT, Python scripts, or other tools
- Backup before changes: Snapshot state before bulk edits
- API development: Test data for applications
- Documentation: Examples for training or specifications
-
Load Set:
- User enters set ID
- Clicks "Load Set"
- Set members retrieved from Alma
- MMS IDs stored in application state
-
Select Function:
- Choose "Batch Fetch Records to JSON" from dropdown
- Function 5 button appears
-
Execute:
- Click function button
- Progress bar appears
- For each MMS ID:
- Send GET request to Alma Bibs API
- Receive full XML response
- Store in dictionary:
{mms_id: xml_string}
-
Progress Tracking:
- Progress bar updates after each record
- Shows "Processing record X of Y"
- Percentage completion visible
- Can click "Kill" to stop
-
JSON Generation:
- Convert dictionary to JSON
- Pretty-print with indentation
- Ensure UTF-8 encoding
- Escape special characters
-
File Output:
- Generate filename:
batch_records_YYYYMMDD_HHMMSS_NNN.json - Write JSON to file in project directory
- Log file location
- Display success message
- Generate filename:
Python Dictionary (internal):
{
"991234567890104641": "<bib>...</bib>",
"991234567890204641": "<bib>...</bib>",
"991234567890304641": "<bib>...</bib>",
...
}JSON Output File:
{
"991234567890104641": "<?xml version=\"1.0\" ?>\n<bib>\n <mms_id>991234567890104641</mms_id>\n <record_format>marc21</record_format>\n ...\n</bib>",
"991234567890204641": "<?xml version=\"1.0\" ?>\n<bib>\n <mms_id>991234567890204641</mms_id>\n <record_format>marc21</record_format>\n ...\n</bib>",
"991234567890304641": "<?xml version=\"1.0\" ?>\n<bib>\n <mms_id>991234567890304641</mms_id>\n <record_format>marc21</record_format>\n ...\n</bib>"
}Step 1: Load Set
- Enter set ID in "Set ID" field
- Example:
7071087320004641(DCAP01 set) - Or click DCAP01 link to auto-populate
- Example:
- Click "Load Set" button
- Wait for confirmation: "Set loaded: 2,847 records"
Step 2: Select Function
- Open function dropdown
- Select "Batch Fetch Records to JSON"
- Function 5 button becomes active
Step 3: Execute Export
- Click function 5 button
- Progress bar appears
- Watch progress: "Processing record 1 of 2,847"
- Wait for completion (may take 1-2 hours for large sets)
Step 4: Locate Output File
- Check CABB project directory
- Find file:
batch_records_20241203_143022_2847.json - File contains JSON with all record XML
Step 5: Verify Export
- Open JSON file in text editor
- Verify record count matches set size
- Check sample records for completeness
- Confirm UTF-8 encoding preserved
When to Use:
- Export taking too long
- Need to stop for system maintenance
- Discovered wrong set was loaded
- Want partial export for testing
How to Use:
- During export, progress bar shows "Kill" button
- Click "Kill" button
- Current record completes
- Export stops
- Partial results saved to JSON file
- Filename reflects actual record count exported
Example:
- Set has 2,847 records
- Kill after 500 records
- Output file:
batch_records_20241203_143022_500.json - Contains first 500 records only
For sets with 1,000+ records:
Strategy 1: Full Export Overnight
- Start export at end of day
- Let run overnight
- Review results in morning
- Typical: 2,847 records = ~2 hours
Strategy 2: Subset Exports
- Create smaller temporary sets in Alma
- Export each subset separately
- Combine JSON files afterward
- More control, easier to restart
Strategy 3: Progressive Exports
- Export first 500 records
- Use kill switch
- Verify partial export
- Resume with new set (remaining records)
Pattern: batch_records_YYYYMMDD_HHMMSS_COUNT.json
Components:
batch_records: Fixed prefixYYYYMMDD: Date (e.g., 20241203 = December 3, 2024)HHMMSS: Time (e.g., 143022 = 2:30:22 PM)COUNT: Number of records exported.json: File extension
Examples:
batch_records_20241203_143022_2847.json- Full DCAP01 exportbatch_records_20241203_150000_500.json- Killed after 500 recordsbatch_records_20241203_160000_1.json- Single record test
Root Object:
{
"MMS_ID_1": "XML_STRING_1",
"MMS_ID_2": "XML_STRING_2",
...
}Key: MMS ID (string)
- 21-digit Alma record identifier
- Example: "991234567890104641"
Value: XML (string)
- Complete bibliographic record XML
- Escaped for JSON (quotes, newlines, etc.)
- Includes XML declaration
- UTF-8 encoded
Pretty Printing:
- Indent: 2 spaces
- Ensure ASCII: False (allows Unicode)
- Sort keys: False (maintains insertion order)
UTF-8 Throughout:
- File written with UTF-8 encoding
- Special characters preserved
- JSON escape sequences for quotes/newlines
- No mojibake or data corruption
Example Special Characters:
- Accented letters: é, ñ, ü
- Quotes: "", '', ""
- Dashes: —, –
- Symbols: ©, ®, °
Scenario: About to run Functions 2, 6, or 7 on large set, want backup first
Workflow:
- Load set (e.g., DCAP01 with 2,847 records)
- Run Function 5 to export all records
- Store JSON file safely
- Run editing functions (2, 6, 7)
- If issues arise, have complete pre-edit state
Benefits:
- Complete snapshot of data before changes
- Can analyze what changed
- Recovery option if needed
- Documentation of original state
Scenario: Analyze metadata patterns across collection
Workflow:
- Export entire collection to JSON
- Write Python script to parse JSON:
import json import xml.etree.ElementTree as ET with open('batch_records_20241203_143022_2847.json', 'r', encoding='utf-8') as f: records = json.load(f) # Analyze each record for mms_id, xml_string in records.items(): root = ET.fromstring(xml_string) # Extract and analyze specific fields
- Generate reports on:
- Missing fields
- Field value patterns
- Data quality issues
- Metadata completeness
Benefits:
- Comprehensive analysis
- Identify systematic issues
- Document metadata quality
- Inform cleanup priorities
Scenario: Migrate metadata to different system or format
Workflow:
- Export all records from Alma
- Process JSON with transformation scripts:
- Extract Dublin Core
- Convert to MODS, EAD, or other format
- Map to new system's schema
- Import to target system
- Verify migration completeness
Benefits:
- Single export contains all data
- Process offline
- Repeatable transformation
- Version control for scripts
Scenario: Building web application that displays Alma records
Workflow:
- Export test data set
- Use JSON for development/testing
- Parse XML to extract display fields
- Test application against real data
- Deploy with live API integration
Benefits:
- Realistic test data
- No API calls during development
- Fast iteration
- Consistent test dataset
Scenario: Document metadata state for annual report or compliance
Workflow:
- Export collection at end of fiscal year
- Archive JSON file with date
- Generate statistics from JSON:
- Total records
- Field usage frequencies
- Rights statements distribution
- Format types breakdown
- Include in annual report
Benefits:
- Point-in-time snapshot
- Reproducible statistics
- Compliance documentation
- Year-over-year comparison
Scenario: Create training materials showing real record examples
Workflow:
- Export sample set of diverse records
- Extract specific examples:
- Best practice records
- Problematic records
- Various content types
- Use in training documentation
- Provide to staff for reference
Benefits:
- Real examples from production
- Diverse record types
- Easily shareable
- Version controlled
For Each Record:
GET /almaws/v1/bibs/{mms_id}?view=full&expand=None
Accept: application/xml
Authorization: apikey {api_key}
Parameters:
mms_id: Bibliographic record identifierview=full: Returns complete record dataexpand=None: No additional linked data- API key from environment variables
Response:
- Content-Type: application/xml
- Body: Complete
<bib>...</bib>XML - Status: 200 on success
Python Code Pattern:
import json
# Dictionary to store records
records_dict = {}
# Fetch each record
for mms_id in set_members:
xml_string = fetch_record_xml(mms_id)
records_dict[mms_id] = xml_string
# Write to JSON file
filename = f"batch_records_{timestamp}_{len(records_dict)}.json"
with open(filename, 'w', encoding='utf-8') as f:
json.dump(records_dict, f, indent=2, ensure_ascii=False)JSON Settings:
indent=2: Pretty-print with 2-space indentationensure_ascii=False: Allow Unicode charactersencoding='utf-8': UTF-8 file encoding- Default key order: insertion order (Python 3.7+)
Write Mode: 'w' (write, overwrite if exists)
Encoding: 'utf-8' (explicit UTF-8)
Location: CABB project directory (/Users/mcfatem/GitHub/CABB/)
Permissions: Uses default system permissions
Time per Record:
- API call: 1-2 seconds
- JSON processing: negligible
- Total: ~1.5 seconds average per record
Total Time Estimates:
- 100 records: 2-3 minutes
- 500 records: 12-15 minutes
- 1,000 records: 25-30 minutes
- 2,847 records: 1-2 hours
File Sizes:
- Average record XML: ~15-20 KB
- 100 records: ~1.5-2 MB JSON
- 1,000 records: ~15-20 MB JSON
- 2,847 records: ~43-57 MB JSON
Factors Affecting Speed:
- Network latency
- Alma server load
- Record complexity (large records take longer)
- Time of day (peak vs. off-peak)
Individual Record Failures:
- Error logged with MMS ID and status code
- Record skipped in output JSON
- Processing continues to next record
- Final count reflects successful exports only
Common Errors:
| Error | Status | Cause | Handling |
|---|---|---|---|
| Not found | 404 | Invalid MMS ID | Skip, log error |
| Unauthorized | 401 | API key expired | Stop, display error |
| Forbidden | 403 | Insufficient permissions | Stop, display error |
| Timeout | - | Network issue | Skip, log error, continue |
| Rate limit | 429 | Too many requests | Retry with delay |
Network Failures:
- Logged with full traceback
- User notified of issue
- Can retry entire export or use kill switch
Load JSON:
import json
import xml.etree.ElementTree as ET
# Read JSON file
with open('batch_records_20241203_143022_2847.json', 'r', encoding='utf-8') as f:
records = json.load(f)
print(f"Loaded {len(records)} records")
# Access specific record
mms_id = "991234567890104641"
xml_string = records[mms_id]
# Parse XML
root = ET.fromstring(xml_string)
# Extract fields
title = root.find('.//title').text
print(f"Title: {title}")Extract Dublin Core:
namespaces = {
'dc': 'http://purl.org/dc/elements/1.1/',
'dcterms': 'http://purl.org/dc/terms/'
}
# Find Dublin Core section
record_elem = root.find('.//record[@xmlns]')
# Get all dc:title elements
titles = record_elem.findall('.//dc:title', namespaces)
for title in titles:
print(f"Title: {title.text}")
# Get all dc:creator elements
creators = record_elem.findall('.//dc:creator', namespaces)
for creator in creators:
print(f"Creator: {creator.text}")Iterate All Records:
for mms_id, xml_string in records.items():
try:
root = ET.fromstring(xml_string)
# Process record
process_record(root, mms_id)
except ET.ParseError as e:
print(f"Error parsing {mms_id}: {e}")Load JSON in Node.js:
const fs = require('fs');
const { DOMParser } = require('xmldom');
// Read JSON file
const data = fs.readFileSync('batch_records_20241203_143022_2847.json', 'utf-8');
const records = JSON.parse(data);
console.log(`Loaded ${Object.keys(records).length} records`);
// Parse specific record
const mmsId = '991234567890104641';
const xmlString = records[mmsId];
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'text/xml');
// Extract title
const titleElement = xmlDoc.getElementsByTagName('title')[0];
const title = titleElement.textContent;
console.log(`Title: ${title}`);Browser Example:
// Assuming JSON loaded as 'records' object
// Iterate all records
Object.entries(records).forEach(([mmsId, xmlString]) => {
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'text/xml');
// Extract metadata
const title = xmlDoc.querySelector('title')?.textContent;
console.log(`${mmsId}: ${title}`);
});Count Records:
jq 'length' batch_records_20241203_143022_2847.jsonList All MMS IDs:
jq 'keys[]' batch_records_20241203_143022_2847.jsonExtract Specific Record:
jq '.["991234567890104641"]' batch_records_20241203_143022_2847.jsonPretty Print Specific XML:
jq -r '.["991234567890104641"]' batch_records_20241203_143022_2847.json | xmllint --format -- Verify set membership: Check set contains intended records
- Estimate time: Calculate expected duration based on record count
- Check disk space: Ensure sufficient space for output file
- Test with small set: Export 10-20 records first to verify
- Note timestamp: Document when export starts for file identification
- Monitor progress: Check progress bar periodically
- Don't close application: Keep browser window open
- Avoid system sleep: Disable sleep mode for long exports
- Check logs: Review log file if errors appear
- Use kill switch wisely: Only stop if necessary
- Verify file created: Check project directory for JSON file
- Validate JSON: Use JSON validator to ensure well-formed
- Check record count: Compare file count to set count
- Sample records: Parse and examine a few records
- Backup file: Copy to secure location if important
- Document export: Note date, purpose, and set details
- Descriptive naming: Include date, set name in filename if renaming
- Version control: Keep exports in dated folders
- Compression: Gzip large files for storage (can compress to ~10% of size)
- Retention policy: Delete old exports after specific period
- Security: Protect files if they contain sensitive metadata
- Set-based only: Cannot export arbitrary MMS ID list (must be in set)
- No filtering: Exports all records in set, no field-level filtering
- JSON only: Does not support other formats (CSV, XML file per record, etc.)
- Full records: Cannot export subset of fields (always complete XML)
- No compression: Output file not automatically compressed
- Single file: All records in one JSON file (can be large)
- No resume: If export fails, must restart from beginning
- Memory usage: Large sets may require significant memory
Symptoms: Progress bar stops updating
Possible Causes:
- Network interruption
- Alma server timeout
- Very large record taking long time
Solutions:
- Wait 2-3 minutes before using kill switch
- Check network connection
- Review logs for error messages
- Use kill switch and retry
Symptoms: Cannot parse JSON, syntax errors
Possible Causes:
- Export interrupted
- File system error
- Character encoding issue
Solutions:
- Use JSON validator to identify problem
- Check if file ends abruptly (missing closing brace)
- Re-export if severely corrupted
- Contact support if persistent
Symptoms: Export completes but file not in directory
Possible Causes:
- Saved to different directory
- Permissions issue
- Filename different than expected
Solutions:
- Search entire system for "batch_records*.json"
- Check user has write permissions to CABB directory
- Review logs for actual filename
- Check for error messages during save
Symptoms: Accents, symbols appear as �� or ?
Possible Causes:
- File not opened with UTF-8 encoding
- Editor doesn't support UTF-8
- Character encoding lost
Solutions:
- Open file with UTF-8 encoding explicitly
- Use editor with good Unicode support (VS Code, Sublime)
- Verify JSON file itself is UTF-8 (check with file command)
- Re-export if file truly corrupted
Symptoms: Some MMS IDs missing from output
Possible Causes:
- Records returned 404 (deleted or invalid)
- API errors for specific records
- Kill switch used
Solutions:
- Check error log for failed MMS IDs
- Verify those records exist in Alma
- Compare file count to expected count
- Re-export missing records individually if needed
Backup Workflow:
- Load set to be edited
- Run Function 5 to export all records
- Verify export completed successfully
- Store JSON file securely
- Run editing function
- Compare results using JSON backup
Complementary Use:
- Function 3: Tabular data for spreadsheet analysis
- Function 5: Complete XML for programmatic processing
- Export both formats for different purposes
- CSV for human review, JSON for scripts
Detailed Inspection:
- Use Function 5 for bulk export
- Use Function 1 to examine individual records
- Cross-reference between file and live data
- Verify specific records after export
- Alma Bibs API: https://developers.exlibrisgroup.com/alma/apis/bibs/
- JSON Format: https://www.json.org/
- Python json module: https://docs.python.org/3/library/json.html
- XML Processing: https://docs.python.org/3/library/xml.etree.elementtree.html
- Initial Implementation: Batch export capability
- Purpose: Bulk data extraction for analysis and backup
- Status: Active, production-ready