FastMARC API Reference

MARCReader

`init`

MARCReader(fp)

Parameters:

fp (file): Binary file object (open(..., "rb"))

Returns: MARCReader instance

Example:

with open("records.mrc", "rb") as f:
    reader = MARCReader(f)

`add_index(name, field_spec, mode=None)`

Parameters:

name (str): Index identifier (used for search operations)
field_spec (str): Field specification
- Control fields: "001", "008"
- Data field subfields: "245$a", "650$a"
mode (str, optional):
- "mask": Bitmask for substring search (fuzzy matching)
- "map": Hash map for exact lookup (O(1) retrieval)
- None: Auto-detect (control=”map”, data=”mask”)

Returns: self (for chaining)

Raises: RuntimeError if called after .build_index()

Example:

reader = (MARCReader(fp)
    .add_index("control_num", "001")              # Auto: map (control field)
    .add_index("title", "245$a")                  # Auto: mask (data field)
    .add_index("isbn", "020$a", mode="map")       # Explicit: map (ISBN)
    .add_index("subject", "650$a", mode="mask"))  # Explicit: mask (subjects)

`index(charset=None)`

Build indexes and execute hooks.

Parameters:

charset (str, optional): Custom character set for fuzzy indexing
- Default: full 256-bit character space
- Examples: "0123456789" (digits), "aeiou" (vowels)

Returns: self (for chaining)

Raises: ValueError if no indexes or hooks registered

Example:

# Basic indexing
reader = MARCReader(fp).add_index("245$a").build_index()

# With custom charset (memory optimization)
reader = MARCReader(fp).add_index("020$a").build_index(charset="0123456789")

# With hooks only (no search index)
reader = MARCReader(fp).hook("650$a", counter).build_index()

`hook(field_specs, callable)`

Parameters:

field_specs:
- str: Single field (e.g., "650$a")
- list[str]: Multiple fields (e.g., ["008", "264$c"])
callable: Hook function

Returns: self (for chaining)

Hook Signatures:

Single-field hook:

def hook(values: list[str]) -> None:
    """Called once per record with list of all occurrences."""
    pass

Multi-field hook:

def hook(fields: dict[str, list[str]]) -> None:
    """Called once per record with dict of fields present."""
    pass

Example:

from collections import Counter

# Single-field hook
class FieldCounter:
    def __init__(self):
        self.counts = Counter()

    def __call__(self, values):
        for v in values:
            self.counts[v] += 1

subjects = FieldCounter()
reader = MARCReader(fp).hook("650$a", subjects).build_index()
print(subjects.counts.most_common(10))

# Multi-field hook (fallback logic)
class YearExtractor:
    def __init__(self):
        self.years = Counter()

    def __call__(self, fields):
        year = None
        if "008" in fields and fields["008"]:
            year = fields["008"][0][7:11]
        elif "264$c" in fields:
            year = fields["264$c"][0].strip("[]c")
        if year and year.isdigit():
            self.years[year] += 1

years = YearExtractor()
reader = MARCReader(fp).hook(["008", "264$c"], years).build_index()

`search(field_spec, text)`

Search for records containing text in the specified field.

Parameters:

field_spec (str): Field specification (e.g., "245$a", "001")
text (str): Search query

Returns: list[int] - List of matching record indices

Behavior:

If index exists for field: Uses index (mask or map mode)
If no index: Performs sequential scan through all records
Mask mode: Case-insensitive substring match
Map mode: Exact value lookup (returns all collisions)

Raises: RuntimeError if .build_index() not called

Example:

reader = (MARCReader(fp)
    .add_index("control_num", "001", mode="map")
    .add_index("title", "245$a", mode="mask")
    .build_index())

# Uses map index (fast)
ids = reader.search("001", "12345")

# Uses mask index (fast)
results = reader.search("245$a", "music")

# No index - sequential scan (slower but still works)
results = reader.search("260$a", "New York")

`get_record(idx)`

Retrieve record by index.

Parameters:

idx (int): Zero-based record index

Returns: pymarc.Record

Raises:

RuntimeError if .build_index() not called
IndexError if out of range

Example:

reader = MARCReader(fp).add_index("245$a").build_index()
record = reader.get_record(0)
print(record['245']['a'])

`get_index(name)`

Direct access to map index dictionary.

Parameters:

name (str): Index name (must be mode=”map”)

Returns: dict[str, list[int]] - Value → record indices mapping

Raises: ValueError if index not found or not mode=”map”

Example:

reader = MARCReader(fp).add_index("title", "245$a", mode="map").build_index()
title_index = reader.get_index("title")

# Find duplicates
for title, indices in title_index.items():
    if len(indices) > 1:
        print(f"'{title}': {len(indices)} records")

`get_all_values(field_spec)`

Extract all values of a field/subfield from every record.

Parameters:

field_spec (str): Field specification (e.g., "001", "245$a", "650$a")

Returns: list[list[str]] - List of lists, one per record

Outer list length always equals number of records
Each inner list contains all occurrences of the field in that record
Inner list is empty [] if record doesn’t have the field

Raises: RuntimeError if .build_index() not called

Behavior:

Scans through all records sequentially
Preserves record-level organization
For repeating fields (e.g., 650$a), inner lists may have multiple entries
Decodes bytes to UTF-8 strings

Example:

reader = MARCReader(fp).build_index()

# Get all titles (one list per record)
all_titles = reader.get_all_values("245$a")
print(f"Total records: {len(all_titles)}")

# Count records with titles
records_with_titles = sum(1 for titles in all_titles if titles)
print(f"Records with titles: {records_with_titles}")

# Get all subject headings (repeating field)
all_subjects = reader.get_all_values("650$a")

# Find records with multiple subjects
for idx, subjects in enumerate(all_subjects):
    if len(subjects) > 3:
        record = reader.get_record(idx)
        print(f"Record {idx} has {len(subjects)} subjects")

# Flatten to get all unique subjects
from itertools import chain
unique_subjects = set(chain.from_iterable(all_subjects))
print(f"Unique subjects: {len(unique_subjects)}")

`len()`

Get total record count.

Returns: int

Raises: RuntimeError if .build_index() not called

Example:

reader = MARCReader(fp).add_index("245$a").build_index()
print(f"{len(reader):,} records")

`iter()`

Iterate through all records.

Yields: pymarc.Record

Note: Can iterate without calling .build_index() (no hooks/search)

Example:

reader = MARCReader(fp)
for record in reader:
    print(record['245']['a'])

`close()`

Free memory and close resources.

Note: Called automatically on garbage collection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastMARC API Reference

MARCReader

`init`

`add_index(name, field_spec, mode=None)`

`index(charset=None)`

`hook(field_specs, callable)`

`search(field_spec, text)`

`get_record(idx)`

`get_index(name)`

`get_all_values(field_spec)`

`len()`

`iter()`

`close()`

FilesExpand file tree

API.org

Latest commit

History

API.org

File metadata and controls

FastMARC API Reference

MARCReader

__init__

add_index(name, field_spec, mode=None)

index(charset=None)

hook(field_specs, callable)

search(field_spec, text)

get_record(idx)

get_index(name)

get_all_values(field_spec)

__len__()

__iter__()

close()

`init`

`add_index(name, field_spec, mode=None)`

`index(charset=None)`

`hook(field_specs, callable)`

`search(field_spec, text)`

`get_record(idx)`

`get_index(name)`

`get_all_values(field_spec)`

`len()`

`iter()`

`close()`