Configuration

RecordManager configuration can be divided to two categories, the general RecordManager settings and data source settings. The default distribution contains sample configuration files in the conf directory. They need to be copied from datasources.ini.sample to datasources.ini and recordmanager.ini.sample to recordmanager.ini.

General Settings

General settings are in recordmanager.ini.

Site

This section contains general settings.

Setting	Description
timezone	Local time zone used to convert date stamps to/from OAI-PMH providers.
abbreviations	Name of a file containing abbreviations. When removing trailing periods, any abbreviations are left intact.
full_title_prefixes	Name of a file containing title prefixes. If a title starts with a listed title prefix, it will not be shortened in title_keys (for deduplication). Add frequently found titles, such as "visual approach chart" to the list
articles	Name of a file containing articles that should be removed from the beginning of a title for sorting.
dedup_handler	Name of the class and .php file containing the methods for handling record deduplication. Default is DedupHandler, which can be subclassed for modifications and the subclass specified here.

Harvesting

This section contains settings controlling OAI-PMH harvesting.

Setting	Description
max_tries	Number of attempts to fetch data from the OAI-PMH provider. Default is 5. RecordManager will try a harvesting request at most max_tries times if it fails for any reason.
retry_wait	Delay between request attempts in seconds. Default is 30.

Mongo

This section specifies how to connect to the Mongo database.

Setting	Description
url	Mongo connection string in format mongodb:///tmp/mongodb-27017.sock (preferred) or mongodb://username:password@server. In a typical default installation with Mongo residing on the same server, username and password are not needed, and mongodb:///tmp/mongodb-27017.sock can be used. Using unix sockets provide a significant performance advantage over TCP/IP.
database	Mongo database to be used
counts	Whether to fetch counts from the Mongo database when processing records. Defaults to false because fetching counts can be slow in a large database, but setting this to true gives more feedback during operations.
compress_records	Whether to compress record metadata when it is stored in MongoDB. Compression/decompression increases CPU usage slightly but is offset by reduced disk space and I/O demand. Compression is enabled by default. Turn off if you use TokuMX instead of MongoDB (TokuMX has built-in compression).
connect_timeout	Connection timeout in milliseconds. Default is 300 000 ms.
cursor_timeout	Cursor timeout in milliseconds. Might be needed if a cursor doesn't live long enough for the whole operation to complete. Default is 300 000 ms.

Solr

This section contains settings used when running the direct Solr updates from RecordManager. These settings are not needed if updatesolr function is not used. Note that RecordManager uses the JSON update method which requires a fairly recent Solr version, and in some cases that the method be enabled separately. See http://wiki.apache.org/solr/UpdateJSON for more information.

Setting	Description
update_url	The url used for the JSON update in Solr
max_commit_interval	Maximum number of record updates to send to Solr between commits. Note that Solr also has settings for automatic commit that may override this and cause more frequent commits. Committing changes means that the updated version of the search index is brought online, which requires some resources for warmup etc. Therefore it is recommended to keep the commit interval at a fairly high value. A commit is always done at the end of the Solr update process regardless of this setting, if there were changes and the --nocommit parameter was not used.
username	User name if basic http authentication is required to connect to the Solr index for update
password	Password if basic http authentication is required to connect to the Solr index for update
background_update	Number of background tasks to be used for making Solr http calls. Can improve indexing performance as batches of records can be created and sent to Solr in parallel. Disabled (0) by default. Requires the pcntl extension in PHP.
threaded_merged_record_update	Whether merged record update is run in parallel with individual record update. Default is false. Enabling this setting may speed up indexing as server resources are utilized by two processes instead of one (especially when Solr is running on a separate server). Note that this effectively doubles background_update value as long as the two processes run in parallel. Requires the pcntl extension in PHP.
max_update_tries	Maximum number of tries to send an update to Solr. Default is 15. Useful for keeping a RecordManager solrupdate task running when Solr is restarted.
update_retry_wait	Delay between Solr update request attempts in seconds. Default is 60.
merge_records	If true, a merged record is created for duplicate records. This merged record is indexed alongside normal records. The merged record is marked with field merged_boolean=true and the normal records belonging to it with merged_child_boolean=true. This allows the merged child records to be excluded from search results, and replacing the merged record in result list with the appropriate original record (requires that VuFind support this. Support is included since VuFind 2.3, but for VuFind 1.x see sys/Solr.php for our customization).
merged_fields	A comma-separated list of multivalued fields to be added to the merged records. Default contains normal VuFind multivalued fields. There is one special case, "author=author2": if two records to be merged have different value in author field, the other one is copied to author2 since author is a single-valued field.
single_fields	A comma-separated list of single-valued fields to be added to the merged records. Default contains normal VuFind single-valued fields apart from fullrecord. For single-valued fields only the first occurrence is taken.
suffixed_merged_fields	A comma-separated list of merged fields to which the data source id is appended. Default is empty.
ignore_in_comparison	A comma-separated list of fields that are ignored in comparesolr function (typically fields that are created with Solr's copyField command or where stored="false").
format_in_allfields	Whether the format (e.g. "Book") should be added to allfields. Default is false.
unicode_normalization_form	Unicode normalization form to use. Valid values: NFC, NFD, NFKC and NFKD. See e.g. the Wikipedia entry for more information.

OAI-PMH

These settings are specific to the OAI-PMH provider. It is not a mandatory part of RecordManager, but with it RecordManager can be used as an OAI-PMH aggregator. See Setting up the OAI-PMH Provider for more information on setting up the OAI-PMH provider.

Setting	Description
repository_name	Name of the repository displayed in the Identify response
base_url	Base url of the provider (e.g. http://x.y.z/oai-pmh with the default configuration)
admin_email	Email address displayed in the Identify response
result_limit	Limit of results per single response (additional results are requested with a resumptionToken)
format_definitions	File that contains the descriptions of the available metadata formats
set_definitions	File that contains the set definitions (for selective harvesting)
transformation_to_[format]	XSL transformation to be used for outputting records in the given [format] in OAI-PMH provider

Record Classes

These settings provide mappings between formats and the record classes used to process them. By default the class used is FormatRecord where Format is the record format with first letter capitalized. The section contains a list of key=value pairs, where key is the format and value is the class name (e.g. marc=MyOwnMarcRecord). An example of creating a custom record class that can override or add functionality to the original one can be found in classes/NdlEadRecord.php.

Log

Setting	Description
log_file	File where RecordManager writes its log
log_level	The level of information written to the log file. It is recommended to keep this at least at level 2, and level 3 is also safe for production use, but level 4 might cause the log file size to increase rapidly. See table below for log levels.
error_email	An optional email address, or a comma-separated list of email addresses, where a message is sent if any fatal errors are encountered

Log Levels

Level	Description
4	Debug, the most verbose level
3	Info, some extra information in addition to errors and warnings
2	Warning, only errors and warning messages
1	Error, only errors are logged
0	Fatal, only fatal errors that prevent continuing the current function are logged

Data Source Settings

Data Source settings are further divided into two categories. The first category of settings is used for all data sources, and the second one is specific to OAI-PMH harvesting. All data source settings always belong to a section that identifies the data source. The section name is is used as the "source" parameter in the command line programs.

Common Settings

Setting	Description
idPrefix	By default the section name in datasources.ini is used as an identifier prefix for the institution. idPrefix can be used to override this e.g. in case multiple OAI-PMH sets need to be harvested from the same data source (which requires multiple uniquely named sections in datasources.ini).
institution	The institution code mapped to the data source. Used e.g. to fill an organization field in the Solr index.
recordXPath	An xpath expression used when loading records from a file to identify a single record (e.g. //record)
oaiIDXPath	An xpath expression used when loading records from a file to find record's OAI ID, if it's present in the file (typically when importing a file containing an OAI-PMH listRecords response). Relative to recordXPath (e.g. ../../header/identifier).
format	Record format in RecordManager (e.g. dc, ead, lido or marc)
preTransformation	Optional transformation to be applied to files to be imported (just the name of the xsl file in transformations directory, e.g. to strip namespaces)
recordSplitter	Optional XSL transformation or PHP class used to split records in import or OAI-PMH harvest (just the name of the xsl file in transformations directory). See classes/EadSplitter.php for an example implementation of a PHP-based splitter or transformations/EadSplit.xsl for an example of XSL transformation. Specify only the .xsl or .php file name without path.
normalization	Optional XSL Transformation to be applied to each record. Points to a properties file in transformations directory (enter only the file name, no path). The properties file further defines the actual XSL transformation and any PHP-based helper functions or classes used in the transformation.
solrTransformation	XSL Transformation to be used when converting a record for import to Solr. Must be specified if the record driver does not provide a usable toSolrArray method. Points to a properties file in transformations directory.
dedup	Whether this data source needs deduplication (true/false, defaults to false)
keepMissingHierarchyMembers	Whether members of a hierarchical record not present in an imported or harvested records are kept and not deleted (true/false, defaults to false). Normally it is assumed that an imported hierarchical record contains all the child records, and those not present anymore need to be deleted, but if a record hierarchy is imported in multiple parts, this setting can be enabled to keep the previously imported parts intact. The downside is that another way to handle any deletions (e.g. OAI-PMH harvest with the [[reharvest
componentParts	How component parts, if any, are handled in the data source during load to Solr. See the table below for possible values.
indexMergedParts	Whether to index merged component parts also separately with hidden_component_boolean field set to true. Defaults to true.
{field}_mapping[,regexp]	A mapping file in mappings directory to be used to map values of {field} when updating Solr index. Useful for e.g. mapping multiple location codes to one. See below for an explanation of mapping files.
institutionInBuilding	How institution is converted to building field. See below for possible values.
extraFields[]	An array of static fields to add to each record when sending them to solr. Format is fieldname:value, e.g. `extraFields[] = "building:mainLibrary"` or `extraFields[] = "sector_str_mv:library"`
driverParams[]	An array of driver-specific parameters that control driver behavior. Format is fieldname:value, e.g. `driverParams[] = "holdingsInBuilding:true"`. See below for available driver parameters.
enrichments[]	An array of enrichment classes to use for the records, e.g. `enrichments[] = "MarcOnkiLightEnrichment"`

Possible Settings for componentParts

Setting	Description
as_is	No special handling (default)
merge_all	Merge all component parts to their host records
merge_non_articles	Merge to host record unless article (including e-journal articles)
merge_non_earticles	Merge to host record unless e-journal article

Possible Settings for institutionInBuilding

Setting	Description
default	Use institution setting from datasources.ini
"none"	No mapping. Note that due to PHP ini file handling, the quotes are required.
driver	Use whatever the record driver provided in institution field
source	Use source id
institution/source	Use institution and source id separated with a slash

Possible Settings for driverParams

Setting	Description
splitTitles=true	Lido: Split titles at the end of the first sentence. Some heuristics are applied when searching for the end of the sentence. If a title is split, the full title is recorded in description field.
holdingsInInstitution	Marc: Include holdings locations (852b) in building field.

There are further parameters specific to NDL record drivers, and they are documented below for completeness, but the NDL drivers generally include functionality not useful for others or not compatible with the standard VuFind index.

Setting	Description
institutionInBuilding=true	NdlLido: Add institution information into building field.
collectionInBuilding=true	NdlLido: Add collection information into building field.
003InLinkingID=true	NdlMarc: Whether links from component parts to the host records include 003 field.
projectIdIn960=true	NdlMarc: 960 field contains a project id
categoriesIn650=true	NdlMarc: Whether 650 field contains categories (typically MetaLib records)

Mapping Files

Normal mapping files are simple .ini-style files where on the left side of an equals sign is the original value and on the right side the resulting value. Mappings are case-sensitive, and if multiple values in a multivalued field map to same result, only one is kept. There is a simple example mapping file in the mappings directory.

There are a couple of special mapping strings that can be used to provide default values:

; A default value of xyz is used if none of the other strings match
##default = xyz
; A default for a singlevalued field where no original value exists
##empty = xyz
; A default for a multivalued field where no original value exists
##emptyarray = xyz

It is also possible to use mapping files with regular expressions by adding ,regexp after the mapping file name. With regexp files, the left-hand side is used as a regexp pattern and the right hand side as the replacement for strings that match the pattern. The expressions are tested one by one and the process ends when a match is found. Slashes must not be escaped in the pattern. In replacement $1 .. $9 can be used to denote a match in the pattern. An example:

; Remove a number from the beginning
\d+(.*) = "$1"

; Convert a string to hierarchical using the first character as the hierarchy separator (e.g. h12 becomes h/h12)
(.)(.*) = "$1/$1$2

OAI-PMH Harvesting Specific Settings

Setting	Description
url	OAI-PMH provider base URL
set	Identifier of a set to harvest (normally found in the setSpec tag of an OAI-PMH ListSets response). Omit this setting to harvest all records.
metadataPrefix	Format to harvest. The default is oai_dc.
idSearch[]
idReplace[]	Can be used to manipulate record ID's with regular expression.
dateGranularity	dateGranularity is the granularity used by the server for representing dates. This may be "YYYY-MM-DDThh:mm:ssZ," "YYYY-MM-DD" or "auto" (to query the server for details). The default is "auto."
verbose	Can be set to true in order to log more detailed output while harvesting; this may be useful for troubleshooting purposes, but it defaults to false.
debugLog	Can be set to a file where all the OAI-PMH requests and responses are written. There is also a splitlog.php utility that can be used to split the responses from the debug log so that they can be reloaded with the import program. This is especially useful when testing record splitters.
oaipmhTransformation	An XSL transformation that is applied to OAI-PMH responses before they are processed (just the name of the xsl file in the transformations directory, e.g. to strip namespaces).

MetaLib IRD Harvest Specific Settings

Note that MetaLib IRD Harvest uses MetaLib X-Server. While easy to set up, it doesn't include categories in the records.

Setting	Description
type	Only valid value is metalib. This tells RecordManager to harvest from MetaLib X-Server instead of OAI-PMH.
url	MetaLib X-Server address
xUser	User name for X-Server login
xPassword	Password for X-Server login
query	X-Server source_locate query used to identify records to be harvested (e.g. "WIN=INSTITUTE")

See MetaLib documentation at EL Commons for more information on the X-Server call used and the syntax used in query (locate_command).

MetaLib CKB Harvest Specific Settings

MetaLib CKB harvest is actually "fetch export files and import them". MetaLib export files are fetched according to their time stamps and processed in RecordManager.

Setting	Description
type	Only valid value is metalib_export. This tells RecordManager to harvest MetaLib export files via HTTP.
url	HTTP address of the export directory on the MetaLib server. Remember to include trailing slash.
filePrefix	File name prefix used to distinquish the files to be processed from any other export files
fileSuffix	File name suffix used to distinquish the files to be processed from any other export files

The MetaLib export harvest requires that a MetaLib export be scheduled to run on the MetaLib server and the results exposed via Apache. See [Harvesting MetaLib CKB Export](Harvesting MetaLib CKB Export) for information on how to set up the MetaLib side.

SFX KB Harvest Specific Settings

SFX KB harvest is actually "fetch export files and import them". SFX export files are fetched according to their time stamps and processed in RecordManager.

Setting	Description
type	Only valid value is sfx. This tells RecordManager to harvest SFX exports via HTTP.
url	HTTP address of the export directory on the SFX server
filePrefix	File name prefix used to distinquish the files to be processed from any other export files

The SFX harvest requires that an SFX export be scheduled to run on the SFX server and the results exposed via the proxy Apache on the SFX server. See [Harvesting SFX Objects](Harvesting SFX Objects) for information on how to set up the SFX side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Configuration

General Settings

Site

Harvesting

Mongo

Solr

OAI-PMH

Record Classes

Log

Log Levels

Data Source Settings

Common Settings

Possible Settings for componentParts

Possible Settings for institutionInBuilding

Possible Settings for driverParams

Mapping Files

OAI-PMH Harvesting Specific Settings

MetaLib IRD Harvest Specific Settings

MetaLib CKB Harvest Specific Settings

SFX KB Harvest Specific Settings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally