-
Notifications
You must be signed in to change notification settings - Fork 33
Configuration
RecordManager configuration can be divided to two categories, the general RecordManager settings and data source settings. The default distribution contains sample configuration files in the conf directory. They need to be copied from datasources.ini.sample to datasources.ini and recordmanager.ini.sample to recordmanager.ini.
General settings are in recordmanager.ini.
This section contains general settings.
| Setting | Description |
|---|---|
| timezone | Local time zone used to convert date stamps to/from OAI-PMH providers. |
| abbreviations | Name of a file containing abbreviations. When removing trailing periods, any abbreviations are left intact. |
| full_title_prefixes | Name of a file containing title prefixes. If a title starts with a listed title prefix, it will not be shortened in title_keys (for deduplication). Add frequently found titles, such as "visual approach chart" to the list |
| articles | Name of a file containing articles that should be removed from the beginning of a title for sorting. |
| dedup_handler | Name of the class and .php file containing the methods for handling record deduplication. Default is DedupHandler, which can be subclassed for modifications and the subclass specified here. |
This section contains settings controlling OAI-PMH harvesting.
| Setting | Description |
|---|---|
| max_tries | Number of attempts to fetch data from the OAI-PMH provider. Default is 5. |
| retry_wait | Wait time between request attempts in seconds. Default is 30. |
This section specifies how to connect to the Mongo database.
| Setting | Description |
|---|---|
| url | Mongo connection string in format mongodb:///tmp/mongodb-27017.sock (preferred) or mongodb://username:password@server. In a typical default installation with Mongo residing on the same server, username and password are not needed, and mongodb:///tmp/mongodb-27017.sock can be used. Using unix sockets provide a significant performance advantage over TCP/IP. |
| database | Mongo database to be used |
| counts | Whether to fetch counts from the Mongo database when processing records. Defaults to false because fetching counts can be slow in a large database, but setting this to true gives more feedback during operations. |
| compress_records | Whether to compress record metadata when it is stored in MongoDB. Compression/decompression increases CPU usage slightly but is offset by reduced disk space and I/O demand. Compression is enabled by default. Turn off if you use TokuMX instead of MongoDB (TokuMX has built-in compression). |
This section contains settings used when running the direct Solr updates from RecordManager. These settings are not needed if updatesolr function is not used. Note that RecordManager uses the JSON update method which requires a fairly recent Solr version, and in some cases that the method be enabled separately. See http://wiki.apache.org/solr/UpdateJSON for more information.
| Setting | Description |
|---|---|
| update_url | The url used for the JSON update in Solr |
| max_commit_interval | Maximum number of record updates to send to Solr between commits. Note that Solr also has settings for automatic commit that may override this and cause more frequent commits. Committing changes means that the updated version of the search index is brought online, which requires some resources for warmup etc. Therefore it is recommended to keep the commit interval at a fairly high value. A commit is always done at the end of the Solr update process regardless of this setting. |
| username | User name if basic http authentication is required to connect to the Solr index for update |
| password | Password if basic http authentication is required to connect to the Solr index for update |
| background_update | Number of background tasks to be used for making Solr http calls. Can improve indexing performance as batches of records can be created and sent to Solr in parallel. Disabled (0) by default. Requires the pcntl extension in PHP. |
| max_update_tries | Maximum number of tries to send an update to Solr. Default is 15. |
| update_retry_wait | Wait time between Solr update request attempts in seconds. Default is 60. |
| merge_records | If true, a merged record is created for duplicate records. This merged record is indexed alongside normal records. The merged record is marked with field merged_boolean=true and the normal records belonging to it with merged_child_boolean=true. This allows the merged child records to be excluded from search results, and replacing the merged record in result list with the appropriate original record (requires that VuFind support this, see sys/Solr.php for our customization to do this). |
| merged_fields | A comma-separated list of multivalued fields to be added to the merged records. Default contains normal VuFind multivalued fields. |
| single_fields | A comma-separated list of single-valued fields to be added to the merged records. Default contains normal VuFind single-valued fields apart from fullrecord. For single-valued fields only the first occurrence is taken. |
| suffixed_merged_fields | A comma-separated list of merged fields to which the data source id is appended. Default is empty. |
| format_in_allfields | Whether the format (e.g. "Book") should be added to allfields. Default is false. |
| unicode_normalization_form | Unicode normalization form to use. Valid values: NFC, NFD, NFKC and NFKD. See e.g. [http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization](http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization) for more information. |
These settings are specific to the OAI-PMH provider. It is not a mandatory part of RecordManager, but with it RecordManager can be used as an OAI-PMH aggregator. See Setting up the OAI-PMH Provider for more information on setting up the OAI-PMH provider.
| Setting | Description |
|---|---|
| repository_name | Name of the repository displayed in the Identify response |
| base_url | Base url of the provider (e.g. http://x.y.z/oai-pmh with the default configuration) |
| admin_email | Email address displayed in the Identify response |
| result_limit | Limit of results per single response (additional results are requested with a resumptionToken) |
| format_definitions | File that contains the descriptions of the available metadata formats |
| set_definitions | File that contains the set definitions (for selective harvesting) |
| transformation_to_[format] | XSL transformation to be used for outputting records in the given [format] in OAI-PMH provider |
These settings provide mappings between formats and the record classes used to process them. By default the class used is FormatRecord where Format is the record format with first letter capitalized. The section contains a list of key=value pairs, where key is the format and value is the class name (e.g. marc=MyOwnMarcRecord). An example of creating a custom record class that can override or add functionality to the original one can be found in classes/NdlEadRecord.php.
These settings control how geocoding is done. See Geocoding for more information on how geocoding works.
| Setting | Description |
|---|---|
| geocoder | The geocoder to use. Only NominatimGeocoder is provided out of box. |
| delay | Delay in milliseconds between requests when using NominatimGeocoder. Set to at least 1000 when using OpenStreetMap's servers. |
| url | Address of Nominatim server |
| Your email address. Mandatory when using OpenStreetMap's servers. | |
| preferred_area | Rectangle defining the preferred area for matches (can be copied from http://nominatim.openstreetmap.org/) |
| simplification_tolerance | Tolerance initially used for simplification if polygon has more than simplification_max_length elements. 0 is no-op, 0.001 is a good starting point and higher fractions result in polygons with less elements. See e.g. http://gis.stackexchange.com/questions/11910/meaning-of-simplifys-tolerance-parameter for more information. |
| simplification_max_length | Maximum number of elements in a polygon. If exceeded, the polygon is simplified using simplification_tolerance. If still exceeded, simplification_tolerance is doubled until the number of elements is low enough or 100 tried are exceeded. |
| solr_field | Solr field where polygon data is stored. Must use the SpatialRecursivePrefixTreeFieldType field type |
| important_threshold | Threshold governing whether a location is considered important. If such a location is found, locations with lower importance are ignored. Default is 0.9. |
| Setting | Description | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| log_file | File where RecordManager writes its log | ||||||||||
| log_level | The level of information written to the log file. It is recommended to keep this at least at level 2, and level 3 is also safe for production use, but level 4 might cause the log file size to increase rapidly.
|
||||||||||
| error_email | An optional email address, or a comma-separated list of email addresses, where a message is sent if any fatal errors are encountered |
Data Source settings are further divided into two categories. The first category of settings is used for all data sources, and the second one is specific to OAI-PMH harvesting. All data source settings always belong to a section that identifies the data source. The section name is is used as the "source" parameter in the command line programs.
| Setting | Description | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| idPrefix | By default the section name in datasources.ini is used as an identifier prefix for the institution. idPrefix can be used to override this e.g. in case multiple OAI-PMH sets need to be harvested from the same data source (which requires multiple uniquely named sections in datasources.ini). | ||||||||||
| institution | The institution code mapped to the data source. Used e.g. to fill an organization field in the Solr index. | ||||||||||
| recordXPath | An xpath expression used when loading records from a file to identify a single record (e.g. //record) | ||||||||||
| oaiIDXPath | An xpath expression used when loading records from a file to find record's OAI ID, if it's present in the file (typically when importing a file containing an OAI-PMH listRecords response). Relative to recordXPath (e.g. ../../header/identifier). | ||||||||||
| format | Record format in RecordManager (e.g. dc, ead, lido or marc) | ||||||||||
| preTransformation | Optional transformation to be applied to files to be imported (just the name of the xsl file in transformations directory, e.g. to strip namespaces) | ||||||||||
| recordSplitter | Optional XSL transformation or PHP class used to split records in import or OAI-PMH harvest (just the name of the xsl file in transformations directory). See classes/EadSplitter.php for an example implementation of a PHP-based splitter or transformations/EadSplit.xsl for an example of XSL transformation. Specify only the .xsl or .php file name without path. | ||||||||||
| normalization | Optional XSL Transformation to be applied to each record. Points to a properties file in transformations directory (enter only the file name, no path). The properties file further defines the actual XSL transformation and any PHP-based helper functions or classes used in the transformation. | ||||||||||
| solrTransformation | XSL Transformation to be used when converting a record for import to Solr. Must be specified if the record driver does not provide a usable toSolrArray method. Points to a properties file in transformations directory. | ||||||||||
| dedup | Whether this data source needs deduplication (true/false, defaults to false) | ||||||||||
| componentParts | How component parts, if any, are handled in the data source during load to Solr:
|
||||||||||
| indexMergedParts | Whether to index merged component parts also separately with hidden_component_boolean field set to true. Defaults to true. | ||||||||||
| {field}_mapping | A mapping file in mappings directory to be used to map values of {field} when updating Solr index. Useful for e.g. mapping multiple location codes to one. The mapping file is a simple .ini-style file where on the left side of an equals sign is the original value and on the right side the resulting value. Mappings are case-sensitive, and if multiple values in a multivalued field map to same result, only one is kept. There is a simple example mapping file in the mappings directory. There are a couple of special mapping strings that can be used to provide default values: ; A default value of xyz is used if none of the other strings match ##default = xyz ; A default for singlevalued field where no original value exists ##empty = xyz ; A default for multivalued field where no original value exists ##emptyarray = xyz |
||||||||||
| institutionInBuilding | How institution is converted to building field:
|
||||||||||
| extraFields[] | An array of static fields to add to each record when sending them to solr. Format is fieldname:value, e.g.extraFields[] = building:mainLibrary extraFields[] = sector_str_mv:library |
| Setting | Description |
|---|---|
| url | OAI-PMH provider base URL |
| set | Identifier of a set to harvest (normally found in the setSpec tag of an OAI-PMH ListSets response). Omit this setting to harvest all records. |
| metadataPrefix | Format to harvest. The default is oai_dc. |
| idSearch[] and idReplace[] | Can be used to manipulate record ID's with regular expression. |
| dateGranularity | dateGranularity is the granularity used by the server for representing dates. This may be "YYYY-MM-DDThh:mm:ssZ," "YYYY-MM-DD" or "auto" (to query the server for details). The default is "auto." |
| verbose | Can be set to true in order to log more detailed output while harvesting; this may be useful for troubleshooting purposes, but it defaults to false. |
| debugLog | Can be set to a file where all the OAI-PMH requests and responses are written. There is also a splitlog.php utility that can be used to split the responses from the debug log so that they can be reloaded with the import program. This is especially useful when testing record splitters. |
| oaipmhTransformation | An XSL transformation that is applied to OAI-PMH responses before they are processed (just the name of the xsl file in the transformations directory, e.g. to strip namespaces). |
| Setting | Description |
|---|---|
| type | Only valid value is metalib. This tells RecordManager to harvest from X-Server instead of OAI-PMH. |
| url | MetaLib X-Server address |
| xUser | User name for X-Server login |
| xPassword | Password for X-Server login |
| query | X-Server source_locate query used to identify records to be harvested (e.g. "WIN=INSTITUTE") |
See MetaLib documentation at EL Commons for more information on the X-Server call used and the syntax used in query (locate_command).
SFX KB harvest is actually "fetch export files and import them". SFX export files are fetched according to their time stamps and processed in RecordManager.
| Setting | Description |
|---|---|
| type | Only valid value is sfx. This tells RecordManager to harvest SFX exports via HTTP. |
| url | HTTP address of the export directory on the SFX server |
| filePrefix | File name prefix used to distinquish the files to be processed from any other export files |
The SFX harvest requires that an SFX export be scheduled to run on the SFX server and the results exposed via the proxy Apache on the SFX server. See [Harvesting SFX Objects](Harvesting SFX Objects) for information on how to set up the SFX side.