Function 5: Batch Fetch Records to JSON

Overview

Function 5 performs bulk extraction of bibliographic record data from Alma, saving the complete XML of multiple records to a single JSON file. This function is essential for large-scale data analysis, backup, migration, or external processing of Alma records.

What It Does

This function retrieves the full XML content of all records in a loaded set and exports them to a structured JSON file with:

Each record's MMS ID as the key
Complete XML as the value
Proper JSON formatting and escaping
UTF-8 character encoding
Timestamped filename for versioning

Key Features

Batch processing: Handles any number of records in a set
Complete records: Exports full bibliographic XML
JSON format: Machine-readable, parseable output
Progress tracking: Real-time progress bar during export
Kill switch: Can stop export mid-process
Error handling: Continues on individual record failures
Automatic file naming: Uses timestamp and record count
UTF-8 support: Preserves special characters

The Need for This Function

Data Export and Backup

Alma's standard export tools have limitations:

Limited to specific fields or formats
May not include complete XML
Not optimized for programmatic access
Difficult to process large sets

Function 5 provides:

Complete XML export
Programmatically accessible format (JSON)
Easy parsing with standard libraries
Full control over exported data

Analysis and Processing

Common scenarios requiring bulk XML access:

Data quality analysis: Examine metadata patterns across records
Migration preparation: Extract data for transformation
External processing: Feed to XSLT, Python scripts, or other tools
Backup before changes: Snapshot state before bulk edits
API development: Test data for applications
Documentation: Examples for training or specifications

How It Works

Step-by-Step Process

Load Set:
- User enters set ID
- Clicks "Load Set"
- Set members retrieved from Alma
- MMS IDs stored in application state
Select Function:
- Choose "Batch Fetch Records to JSON" from dropdown
- Function 5 button appears
Execute:
- Click function button
- Progress bar appears
- For each MMS ID:
  - Send GET request to Alma Bibs API
  - Receive full XML response
  - Store in dictionary: {mms_id: xml_string}
Progress Tracking:
- Progress bar updates after each record
- Shows "Processing record X of Y"
- Percentage completion visible
- Can click "Kill" to stop
JSON Generation:
- Convert dictionary to JSON
- Pretty-print with indentation
- Ensure UTF-8 encoding
- Escape special characters
File Output:
- Generate filename: batch_records_YYYYMMDD_HHMMSS_NNN.json
- Write JSON to file in project directory
- Log file location
- Display success message

Data Structure

Python Dictionary (internal):

{
    "991234567890104641": "<bib>...</bib>",
    "991234567890204641": "<bib>...</bib>",
    "991234567890304641": "<bib>...</bib>",
    ...
}

JSON Output File:

{
  "991234567890104641": "<?xml version=\"1.0\" ?>\n<bib>\n  <mms_id>991234567890104641</mms_id>\n  <record_format>marc21</record_format>\n  ...\n</bib>",
  "991234567890204641": "<?xml version=\"1.0\" ?>\n<bib>\n  <mms_id>991234567890204641</mms_id>\n  <record_format>marc21</record_format>\n  ...\n</bib>",
  "991234567890304641": "<?xml version=\"1.0\" ?>\n<bib>\n  <mms_id>991234567890304641</mms_id>\n  <record_format>marc21</record_format>\n  ...\n</bib>"
}

Usage

Basic Export

Step 1: Load Set

Enter set ID in "Set ID" field
- Example: 7071087320004641 (DCAP01 set)
- Or click DCAP01 link to auto-populate
Click "Load Set" button
Wait for confirmation: "Set loaded: 2,847 records"

Step 2: Select Function

Open function dropdown
Select "Batch Fetch Records to JSON"
Function 5 button becomes active

Step 3: Execute Export

Click function 5 button
Progress bar appears
Watch progress: "Processing record 1 of 2,847"
Wait for completion (may take 1-2 hours for large sets)

Step 4: Locate Output File

Check CABB project directory
Find file: batch_records_20241203_143022_2847.json
File contains JSON with all record XML

Step 5: Verify Export

Open JSON file in text editor
Verify record count matches set size
Check sample records for completeness
Confirm UTF-8 encoding preserved

Kill Switch Usage

When to Use:

Export taking too long
Need to stop for system maintenance
Discovered wrong set was loaded
Want partial export for testing

How to Use:

During export, progress bar shows "Kill" button
Click "Kill" button
Current record completes
Export stops
Partial results saved to JSON file
Filename reflects actual record count exported

Example:

Set has 2,847 records
Kill after 500 records
Output file: batch_records_20241203_143022_500.json
Contains first 500 records only

Large Set Handling

For sets with 1,000+ records:

Strategy 1: Full Export Overnight

Start export at end of day
Let run overnight
Review results in morning
Typical: 2,847 records = ~2 hours

Strategy 2: Subset Exports

Create smaller temporary sets in Alma
Export each subset separately
Combine JSON files afterward
More control, easier to restart

Strategy 3: Progressive Exports

Export first 500 records
Use kill switch
Verify partial export
Resume with new set (remaining records)

Output File Format

Filename Convention

Pattern: batch_records_YYYYMMDD_HHMMSS_COUNT.json

Components:

batch_records: Fixed prefix
YYYYMMDD: Date (e.g., 20241203 = December 3, 2024)
HHMMSS: Time (e.g., 143022 = 2:30:22 PM)
COUNT: Number of records exported
.json: File extension

Examples:

batch_records_20241203_143022_2847.json - Full DCAP01 export
batch_records_20241203_150000_500.json - Killed after 500 records
batch_records_20241203_160000_1.json - Single record test

JSON Structure

Root Object:

{
  "MMS_ID_1": "XML_STRING_1",
  "MMS_ID_2": "XML_STRING_2",
  ...
}

Key: MMS ID (string)

21-digit Alma record identifier
Example: "991234567890104641"

Value: XML (string)

Complete bibliographic record XML
Escaped for JSON (quotes, newlines, etc.)
Includes XML declaration
UTF-8 encoded

Pretty Printing:

Indent: 2 spaces
Ensure ASCII: False (allows Unicode)
Sort keys: False (maintains insertion order)

Character Encoding

UTF-8 Throughout:

File written with UTF-8 encoding
Special characters preserved
JSON escape sequences for quotes/newlines
No mojibake or data corruption

Example Special Characters:

Accented letters: é, ñ, ü
Quotes: "", '', ""
Dashes: —, –
Symbols: ©, ®, °

Use Cases

1. Pre-Edit Backup

Scenario: About to run Functions 2, 6, or 7 on large set, want backup first

Workflow:

Load set (e.g., DCAP01 with 2,847 records)
Run Function 5 to export all records
Store JSON file safely
Run editing functions (2, 6, 7)
If issues arise, have complete pre-edit state

Benefits:

Complete snapshot of data before changes
Can analyze what changed
Recovery option if needed
Documentation of original state

2. Data Quality Analysis

Scenario: Analyze metadata patterns across collection

Workflow:

Export entire collection to JSON

Write Python script to parse JSON:

import json
import xml.etree.ElementTree as ET

with open('batch_records_20241203_143022_2847.json', 'r', encoding='utf-8') as f:
    records = json.load(f)

# Analyze each record
for mms_id, xml_string in records.items():
    root = ET.fromstring(xml_string)
    # Extract and analyze specific fields

Generate reports on:
- Missing fields
- Field value patterns
- Data quality issues
- Metadata completeness

Benefits:

Comprehensive analysis
Identify systematic issues
Document metadata quality
Inform cleanup priorities

3. Migration or Transformation

Scenario: Migrate metadata to different system or format

Workflow:

Export all records from Alma
Process JSON with transformation scripts:
- Extract Dublin Core
- Convert to MODS, EAD, or other format
- Map to new system's schema
Import to target system
Verify migration completeness

Benefits:

Single export contains all data
Process offline
Repeatable transformation
Version control for scripts

4. External Application Development

Scenario: Building web application that displays Alma records

Workflow:

Export test data set
Use JSON for development/testing
Parse XML to extract display fields
Test application against real data
Deploy with live API integration

Benefits:

Realistic test data
No API calls during development
Fast iteration
Consistent test dataset

5. Metadata Auditing

Scenario: Document metadata state for annual report or compliance

Workflow:

Export collection at end of fiscal year
Archive JSON file with date
Generate statistics from JSON:
- Total records
- Field usage frequencies
- Rights statements distribution
- Format types breakdown
Include in annual report

Benefits:

Point-in-time snapshot
Reproducible statistics
Compliance documentation
Year-over-year comparison

6. Training and Documentation

Scenario: Create training materials showing real record examples

Workflow:

Export sample set of diverse records
Extract specific examples:
- Best practice records
- Problematic records
- Various content types
Use in training documentation
Provide to staff for reference

Benefits:

Real examples from production
Diverse record types
Easily shareable
Version controlled

Technical Details

API Endpoint

For Each Record:

GET /almaws/v1/bibs/{mms_id}?view=full&expand=None
Accept: application/xml
Authorization: apikey {api_key}

Parameters:

mms_id: Bibliographic record identifier
view=full: Returns complete record data
expand=None: No additional linked data
API key from environment variables

Response:

Content-Type: application/xml
Body: Complete <bib>...</bib> XML
Status: 200 on success

JSON Generation

Python Code Pattern:

import json

# Dictionary to store records
records_dict = {}

# Fetch each record
for mms_id in set_members:
    xml_string = fetch_record_xml(mms_id)
    records_dict[mms_id] = xml_string

# Write to JSON file
filename = f"batch_records_{timestamp}_{len(records_dict)}.json"
with open(filename, 'w', encoding='utf-8') as f:
    json.dump(records_dict, f, indent=2, ensure_ascii=False)

JSON Settings:

indent=2: Pretty-print with 2-space indentation
ensure_ascii=False: Allow Unicode characters
encoding='utf-8': UTF-8 file encoding
Default key order: insertion order (Python 3.7+)

File I/O

Write Mode: 'w' (write, overwrite if exists)

Encoding: 'utf-8' (explicit UTF-8)

Location: CABB project directory (/Users/mcfatem/GitHub/CABB/)

Permissions: Uses default system permissions

Performance Considerations

Time per Record:

API call: 1-2 seconds
JSON processing: negligible
Total: ~1.5 seconds average per record

Total Time Estimates:

100 records: 2-3 minutes
500 records: 12-15 minutes
1,000 records: 25-30 minutes
2,847 records: 1-2 hours

File Sizes:

Average record XML: ~15-20 KB
100 records: ~1.5-2 MB JSON
1,000 records: ~15-20 MB JSON
2,847 records: ~43-57 MB JSON

Factors Affecting Speed:

Network latency
Alma server load
Record complexity (large records take longer)
Time of day (peak vs. off-peak)

Error Handling

Individual Record Failures:

Error logged with MMS ID and status code
Record skipped in output JSON
Processing continues to next record
Final count reflects successful exports only

Common Errors:

Error	Status	Cause	Handling
Not found	404	Invalid MMS ID	Skip, log error
Unauthorized	401	API key expired	Stop, display error
Forbidden	403	Insufficient permissions	Stop, display error
Timeout	-	Network issue	Skip, log error, continue
Rate limit	429	Too many requests	Retry with delay

Network Failures:

Logged with full traceback
User notified of issue
Can retry entire export or use kill switch

Parsing the JSON Output

Python Example

Load JSON:

import json
import xml.etree.ElementTree as ET

# Read JSON file
with open('batch_records_20241203_143022_2847.json', 'r', encoding='utf-8') as f:
    records = json.load(f)

print(f"Loaded {len(records)} records")

# Access specific record
mms_id = "991234567890104641"
xml_string = records[mms_id]

# Parse XML
root = ET.fromstring(xml_string)

# Extract fields
title = root.find('.//title').text
print(f"Title: {title}")

Extract Dublin Core:

namespaces = {
    'dc': 'http://purl.org/dc/elements/1.1/',
    'dcterms': 'http://purl.org/dc/terms/'
}

# Find Dublin Core section
record_elem = root.find('.//record[@xmlns]')

# Get all dc:title elements
titles = record_elem.findall('.//dc:title', namespaces)
for title in titles:
    print(f"Title: {title.text}")

# Get all dc:creator elements
creators = record_elem.findall('.//dc:creator', namespaces)
for creator in creators:
    print(f"Creator: {creator.text}")

Iterate All Records:

for mms_id, xml_string in records.items():
    try:
        root = ET.fromstring(xml_string)
        # Process record
        process_record(root, mms_id)
    except ET.ParseError as e:
        print(f"Error parsing {mms_id}: {e}")

JavaScript Example

Load JSON in Node.js:

const fs = require('fs');
const { DOMParser } = require('xmldom');

// Read JSON file
const data = fs.readFileSync('batch_records_20241203_143022_2847.json', 'utf-8');
const records = JSON.parse(data);

console.log(`Loaded ${Object.keys(records).length} records`);

// Parse specific record
const mmsId = '991234567890104641';
const xmlString = records[mmsId];

const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'text/xml');

// Extract title
const titleElement = xmlDoc.getElementsByTagName('title')[0];
const title = titleElement.textContent;
console.log(`Title: ${title}`);

Browser Example:

// Assuming JSON loaded as 'records' object

// Iterate all records
Object.entries(records).forEach(([mmsId, xmlString]) => {
  const parser = new DOMParser();
  const xmlDoc = parser.parseFromString(xmlString, 'text/xml');
  
  // Extract metadata
  const title = xmlDoc.querySelector('title')?.textContent;
  console.log(`${mmsId}: ${title}`);
});

Command Line (jq)

Count Records:

jq 'length' batch_records_20241203_143022_2847.json

List All MMS IDs:

jq 'keys[]' batch_records_20241203_143022_2847.json

Extract Specific Record:

jq '.["991234567890104641"]' batch_records_20241203_143022_2847.json

Pretty Print Specific XML:

jq -r '.["991234567890104641"]' batch_records_20241203_143022_2847.json | xmllint --format -

Best Practices

Before Export

Verify set membership: Check set contains intended records
Estimate time: Calculate expected duration based on record count
Check disk space: Ensure sufficient space for output file
Test with small set: Export 10-20 records first to verify
Note timestamp: Document when export starts for file identification

During Export

Monitor progress: Check progress bar periodically
Don't close application: Keep browser window open
Avoid system sleep: Disable sleep mode for long exports
Check logs: Review log file if errors appear
Use kill switch wisely: Only stop if necessary

After Export

Verify file created: Check project directory for JSON file
Validate JSON: Use JSON validator to ensure well-formed
Check record count: Compare file count to set count
Sample records: Parse and examine a few records
Backup file: Copy to secure location if important
Document export: Note date, purpose, and set details

File Management

Descriptive naming: Include date, set name in filename if renaming
Version control: Keep exports in dated folders
Compression: Gzip large files for storage (can compress to ~10% of size)
Retention policy: Delete old exports after specific period
Security: Protect files if they contain sensitive metadata

Limitations

Set-based only: Cannot export arbitrary MMS ID list (must be in set)
No filtering: Exports all records in set, no field-level filtering
JSON only: Does not support other formats (CSV, XML file per record, etc.)
Full records: Cannot export subset of fields (always complete XML)
No compression: Output file not automatically compressed
Single file: All records in one JSON file (can be large)
No resume: If export fails, must restart from beginning
Memory usage: Large sets may require significant memory

Troubleshooting

Export Hangs or Stalls

Symptoms: Progress bar stops updating

Possible Causes:

Network interruption
Alma server timeout
Very large record taking long time

Solutions:

Wait 2-3 minutes before using kill switch
Check network connection
Review logs for error messages
Use kill switch and retry

JSON File Corrupted

Symptoms: Cannot parse JSON, syntax errors

Possible Causes:

Export interrupted
File system error
Character encoding issue

Solutions:

Use JSON validator to identify problem
Check if file ends abruptly (missing closing brace)
Re-export if severely corrupted
Contact support if persistent

File Not Found After Export

Symptoms: Export completes but file not in directory

Possible Causes:

Saved to different directory
Permissions issue
Filename different than expected

Solutions:

Search entire system for "batch_records*.json"
Check user has write permissions to CABB directory
Review logs for actual filename
Check for error messages during save

Special Characters Display Incorrectly

Symptoms: Accents, symbols appear as �� or ?

Possible Causes:

File not opened with UTF-8 encoding
Editor doesn't support UTF-8
Character encoding lost

Solutions:

Open file with UTF-8 encoding explicitly
Use editor with good Unicode support (VS Code, Sublime)
Verify JSON file itself is UTF-8 (check with file command)
Re-export if file truly corrupted

Partial Records in JSON

Symptoms: Some MMS IDs missing from output

Possible Causes:

Records returned 404 (deleted or invalid)
API errors for specific records
Kill switch used

Solutions:

Check error log for failed MMS IDs
Verify those records exist in Alma
Compare file count to expected count
Re-export missing records individually if needed

Integration with Other Functions

Before Function 2, 6, or 7 (Editing Functions)

Backup Workflow:

Load set to be edited
Run Function 5 to export all records
Verify export completed successfully
Store JSON file securely
Run editing function
Compare results using JSON backup

FilesExpand file tree

FUNCTION_5_BATCH_FETCH_JSON.md

Latest commit

History

FUNCTION_5_BATCH_FETCH_JSON.md

File metadata and controls

Function 5: Batch Fetch Records to JSON

Overview

What It Does

Key Features

The Need for This Function

Data Export and Backup

Analysis and Processing

How It Works

Step-by-Step Process

Data Structure

Usage

Basic Export

Kill Switch Usage

Large Set Handling

Output File Format

Filename Convention

JSON Structure

Character Encoding

Use Cases

1. Pre-Edit Backup

2. Data Quality Analysis

3. Migration or Transformation

4. External Application Development

5. Metadata Auditing

6. Training and Documentation

Technical Details

API Endpoint

JSON Generation

File I/O

Performance Considerations

Error Handling

Parsing the JSON Output

Python Example

JavaScript Example

Command Line (jq)

Best Practices

Before Export

During Export

After Export

File Management

Limitations

Troubleshooting

Export Hangs or Stalls

JSON File Corrupted

File Not Found After Export

Special Characters Display Incorrectly

Partial Records in JSON

Integration with Other Functions

Before Function 2, 6, or 7 (Editing Functions)

With Function 3 (CSV Export)

With Function 1 (Single XML View)

Related Documentation

Version History