Skip to content
Ere Maijala edited this page Apr 15, 2014 · 62 revisions

Configuration

RecordManager configuration can be divided to two categories, the general RecordManager settings and data source settings. The default distribution contains sample configuration files in the conf directory. They need to be copied from datasources.ini.sample to datasources.ini and recordmanager.ini.sample to recordmanager.ini.

General Settings

General settings are in recordmanager.ini.

Site

This section contains general settings.

Setting Description
timezone Local time zone used to convert date stamps to/from OAI-PMH providers.
abbreviations Name of a file containing abbreviations. When removing trailing periods, any abbreviations are left intact.
full_title_prefixes Name of a file containing title prefixes. If a title starts with a listed title prefix, it will not be shortened in title_keys (for deduplication). Add frequently found titles, such as "visual approach chart" to the list
articles Name of a file containing articles that should be removed from the beginning of a title for sorting.
dedup_handler Name of the class and .php file containing the methods for handling record deduplication. Default is DedupHandler, which can be subclassed for modifications and the subclass specified here.

Harvesting

This section contains settings controlling OAI-PMH harvesting.

Setting Description
max_tries Number of attempts to fetch data from the OAI-PMH provider. Default is 5.
retry_wait Wait time between request attempts in seconds. Default is 30.

Mongo

This section specifies how to connect to the Mongo database.

Setting Description
url Mongo connection string in format mongodb:///tmp/mongodb-27017.sock (preferred) or mongodb://username:password@server. In a typical default installation with Mongo residing on the same server, username and password are not needed, and mongodb:///tmp/mongodb-27017.sock can be used. Using unix sockets provide a significant performance advantage over TCP/IP.
database Mongo database to be used
counts Whether to fetch counts from the Mongo database when processing records. Defaults to false because fetching counts can be slow in a large database, but setting this to true gives more feedback during operations.
compress_records Whether to compress record metadata when it is stored in MongoDB. Compression/decompression increases CPU usage slightly but is offset by reduced disk space and I/O demand. Compression is enabled by default. Turn off if you use TokuMX instead of MongoDB (TokuMX has built-in compression).

Solr

This section contains settings used when running the direct Solr updates from RecordManager. These settings are not needed if updatesolr function is not used. Note that RecordManager uses the JSON update method which requires a fairly recent Solr version, and in some cases that the method be enabled separately. See http://wiki.apache.org/solr/UpdateJSON for more information.

Setting Description
update_url The url used for the JSON update in Solr
max_commit_interval Maximum number of record updates to send to Solr between commits. Note that Solr also has settings for automatic commit that may override this and cause more frequent commits. Committing changes means that the updated version of the search index is brought online, which requires some resources for warmup etc. Therefore it is recommended to keep the commit interval at a fairly high value. A commit is always done at the end of the Solr update process regardless of this setting.
username User name if basic http authentication is required to connect to the Solr index for update
password Password if basic http authentication is required to connect to the Solr index for update
background_update Number of background tasks to be used for making Solr http calls. Can improve indexing performance as batches of records can be created and sent to Solr in parallel. Disabled (0) by default. Requires the pcntl extension in PHP.
max_update_tries Maximum number of tries to send an update to Solr. Default is 15.
update_retry_wait Wait time between Solr update request attempts in seconds. Default is 60.
merge_records If true, a merged record is created for duplicate records. This merged record is indexed alongside normal records. The merged record is marked with field merged_boolean=true and the normal records belonging to it with merged_child_boolean=true. This allows the merged child records to be excluded from search results, and replacing the merged record in result list with the appropriate original record (requires that VuFind support this, see sys/Solr.php for our customization to do this).
merged_fields A comma-separated list of multivalued fields to be added to the merged records. Default contains normal VuFind multivalued fields.
single_fields A comma-separated list of single-valued fields to be added to the merged records. Default contains normal VuFind single-valued fields apart from fullrecord. For single-valued fields only the first occurrence is taken.
suffixed_merged_fields A comma-separated list of merged fields to which the data source id is appended. Default is empty.
format_in_allfields Whether the format (e.g. "Book") should be added to allfields. Default is false.
unicode_normalization_form Unicode normalization form to use. Valid values: NFC, NFD, NFKC and NFKD. See e.g. [http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization](http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization) for more information.

OAI-PMH

These settings are specific to the OAI-PMH provider. It is not a mandatory part of RecordManager, but with it RecordManager can be used as an OAI-PMH aggregator. See Setting up the OAI-PMH Provider for more information on setting up the OAI-PMH provider.

Setting Description
repository_name Name of the repository displayed in the Identify response
base_url Base url of the provider (e.g. http://x.y.z/oai-pmh with the default configuration)
admin_email Email address displayed in the Identify response
result_limit Limit of results per single response (additional results are requested with a resumptionToken)
format_definitions File that contains the descriptions of the available metadata formats
set_definitions File that contains the set definitions (for selective harvesting)
transformation_to_[format] XSL transformation to be used for outputting records in the given [format] in OAI-PMH provider

Record Classes

These settings provide mappings between formats and the record classes used to process them. By default the class used is FormatRecord where Format is the record format with first letter capitalized. The section contains a list of key=value pairs, where key is the format and value is the class name (e.g. marc=MyOwnMarcRecord). An example of creating a custom record class that can override or add functionality to the original one can be found in classes/NdlEadRecord.php.

Geocoding

These settings control how geocoding is done. See Geocoding for more information on how geocoding works.

Setting Description
geocoder The geocoder to use. Only NominatimGeocoder is provided out of box.
delay Delay in milliseconds between requests when using NominatimGeocoder. Set to at least 1000 when using OpenStreetMap's servers.
url Address of Nominatim server
email Your email address. Mandatory when using OpenStreetMap's servers.
preferred_area Rectangle defining the preferred area for matches (can be copied from http://nominatim.openstreetmap.org/)
simplification_tolerance Tolerance initially used for simplification if polygon has more than simplification_max_length elements. 0 is no-op, 0.001 is a good starting point and higher fractions result in polygons with less elements. See e.g. http://gis.stackexchange.com/questions/11910/meaning-of-simplifys-tolerance-parameter for more information.
simplification_max_length Maximum number of elements in a polygon. If exceeded, the polygon is simplified using simplification_tolerance. If still exceeded, simplification_tolerance is doubled until the number of elements is low enough or 100 tried are exceeded.
solr_field Solr field where polygon data is stored. Must use the SpatialRecursivePrefixTreeFieldType field type
important_threshold Threshold governing whether a location is considered important. If such a location is found, locations with lower importance are ignored. Default is 0.9.

Log

Setting Description
log_file File where RecordManager writes its log
log_level The level of information written to the log file. It is recommended to keep this at least at level 2, and level 3 is also safe for production use, but level 4 might cause the log file size to increase rapidly.
4 Debug, the most verbose level
3 Info, some extra information in addition to errors and warnings
2 Warning, only errors and warning messages
1 Error, only errors are logged
0 Fatal, only fatal errors that prevent continuing the current function are logged
error_email An optional email address, or a comma-separated list of email addresses, where a message is sent if any fatal errors are encountered

Data Source Settings

Data Source settings are further divided into two categories. The first category of settings is used for all data sources, and the second one is specific to OAI-PMH harvesting. All data source settings always belong to a section that identifies the data source. The section name is is used as the "source" parameter in the command line programs.

Common Settings

Setting Description
idPrefix By default the section name in datasources.ini is used as an identifier prefix for the institution. idPrefix can be used to override this e.g. in case multiple OAI-PMH sets need to be harvested from the same data source (which requires multiple uniquely named sections in datasources.ini).
institution The institution code mapped to the data source. Used e.g. to fill an organization field in the Solr index.
recordXPath An xpath expression used when loading records from a file to identify a single record (e.g. //record)
oaiIDXPath An xpath expression used when loading records from a file to find record's OAI ID, if it's present in the file (typically when importing a file containing an OAI-PMH listRecords response). Relative to recordXPath (e.g. ../../header/identifier).
format Record format in RecordManager (e.g. dc, ead, lido or marc)
preTransformation Optional transformation to be applied to files to be imported (just the name of the xsl file in transformations directory, e.g. to strip namespaces)
recordSplitter Optional XSL transformation or PHP class used to split records in import or OAI-PMH harvest (just the name of the xsl file in transformations directory). See classes/EadSplitter.php for an example implementation of a PHP-based splitter or transformations/EadSplit.xsl for an example of XSL transformation. Specify only the .xsl or .php file name without path.
normalization Optional XSL Transformation to be applied to each record. Points to a properties file in transformations directory (enter only the file name, no path). The properties file further defines the actual XSL transformation and any PHP-based helper functions or classes used in the transformation.
solrTransformation XSL Transformation to be used when converting a record for import to Solr. Must be specified if the record driver does not provide a usable toSolrArray method. Points to a properties file in transformations directory.
dedup Whether this data source needs deduplication (true/false, defaults to false)
componentParts How component parts, if any, are handled in the data source during load to Solr:
as_is No special handling (default)
merge_all Merge all component parts to their host records
merge_non_articles Merge to host record unless article (including e-journal articles)
merge_non_earticles Merge to host record unless e-journal article
indexMergedParts Whether to index merged component parts also separately with hidden_component_boolean field set to true. Defaults to true.
{field}_mapping A mapping file in mappings directory to be used to map values of {field} when updating Solr index. Useful for e.g. mapping multiple location codes to one. The mapping file is a simple .ini-style file where on the left side of an equals sign is the original value and on the right side the resulting value. Mappings are case-sensitive, and if multiple values in a multivalued field map to same result, only one is kept. There is a simple example mapping file in the mappings directory.
There are a couple of special mapping strings that can be used to provide default values:
; A default value of xyz is used if none of the other strings match
##default = xyz
; A default for singlevalued field where no original value exists
##empty = xyz
; A default for multivalued field where no original value exists
##emptyarray = xyz
institutionInBuilding How institution is converted to building field:
default Use institution setting from datasources.ini
"none" No mapping. Note that due to PHP ini file handling, the quotes are required
driver Use whatever the record driver provided in institution field
source Use source id
institution/source Use institution and source id separated with a slash
extraFields[] An array of static fields to add to each record when sending them to solr. Format is fieldname:value, e.g.
    
extraFields[] = building:mainLibrary
extraFields[] = sector_str_mv:library

OAI-PMH Specific Settings

Setting Description
url OAI-PMH provider base URL
set Identifier of a set to harvest (normally found in the setSpec tag of an OAI-PMH ListSets response). Omit this setting to harvest all records.
metadataPrefix Format to harvest. The default is oai_dc.
idSearch[] and idReplace[] Can be used to manipulate record ID's with regular expression.
dateGranularity dateGranularity is the granularity used by the server for representing dates. This may be "YYYY-MM-DDThh:mm:ssZ," "YYYY-MM-DD" or "auto" (to query the server for details). The default is "auto."
verbose Can be set to true in order to log more detailed output while harvesting; this may be useful for troubleshooting purposes, but it defaults to false.
debugLog Can be set to a file where all the OAI-PMH requests and responses are written. There is also a splitlog.php utility that can be used to split the responses from the debug log so that they can be reloaded with the import program. This is especially useful when testing record splitters.
oaipmhTransformation An XSL transformation that is applied to OAI-PMH responses before they are processed (just the name of the xsl file in the transformations directory, e.g. to strip namespaces).

MetaLib IRD Harvest Specific Settings

Setting Description
type Only valid value is metalib. This tells RecordManager to harvest from X-Server instead of OAI-PMH.
url MetaLib X-Server address
xUser User name for X-Server login
xPassword Password for X-Server login
query X-Server source_locate query used to identify records to be harvested (e.g. "WIN=INSTITUTE")

See MetaLib documentation at EL Commons for more information on the X-Server call used and the syntax used in query (locate_command).

SFX KB Harvest Specific Settings

SFX KB harvest is actually "fetch export files and import them". SFX export files are fetched according to their time stamps and processed in RecordManager.

Setting Description
type Only valid value is sfx. This tells RecordManager to harvest SFX exports via HTTP.
url HTTP address of the export directory on the SFX server
filePrefix File name prefix used to distinquish the files to be processed from any other export files

The SFX harvest requires that an SFX export be scheduled to run on the SFX server and the results exposed via the proxy Apache on the SFX server. See [Harvesting SFX Objects](Harvesting SFX Objects) for information on how to set up the SFX side.

Clone this wiki locally