NAME [![Build Status](https://travis-ci.org/thedatahub/Datahub-Factory.svg?branch=mas ter)](https://travis-ci.org/thedatahub/Datahub-Factory) Datahub::Factory - A conveyor belt which transports data from a data source to a data sink. SYNOPSIS dhconveyor [ARGUMENTS] [OPTIONS] DESCRIPTION Datahub::Factory is a command line conveyor belt which automates three tasks: Data is fetched automatically from a local or remote data source. Data is converted to an exchange format. The output is pushed to a data sink. Datahub::Factory fetches data from several sources as specified by the Importer settings, executes a Fix and sends it to a data sink, set by Exporter. Several importer and exporter modules are supported. Datahub::Factory contains Log4perl support to monitor conveyor belt operations. Note: This toolset is not a generic tool. It has been tailored towards the functional requirements of the Flemish Art Collection use case. CONFIGURATION Datahub::Factory uses a general configuration file called settings.ini. It can be located at /etc/datahub-factory/settings.ini or conf/settings.ini. The one in /etc takes priority. An example file is provided at conf/settings.example.ini . It is in INI format . It has two parts, a [General] block that contains some generic options, and (optionally) multiple module-specific blocks called [module_Module_name]. For a list of module options, see the documentation for every module. Supported modules PIDS General options log_level Set the log_level. Takes a numeric parameter. Supported levels are: 1 (WARN), 2 (INFO), 3 (DEBUG). WARN (1) is the default. Example [General] # 1 => WARN; 2 => INFO; 3 => DEBUG log_level = 1 [module_PIDS] username = username api_key = api_key COMMANDS help COMMAND Documentation about command line options. It is possible to provide either all importer and/or exporter options on the command line, or to create a pipeline configuration file that sets those options. transport [OPTIONS] Fetch data from a local or remote source, convert it to an exchange format and push the data to a Datahub instance. Command line options --importer NAME The importer which fetches data from a Collection Registration system. Currently only "Adlib" and "TMS" are supported options. All --oimport arguments are tied to the specific importer used. --fixes PATH The path to the Catmandu Fix files to transform the data. --exporter NAME The exporter that will do something with your data. It is possible to print to STDOUT in a specific format ("YAML" and "LIDO" are supported) or to export to a Datahub instance. All --oexport arguments are tied to the specific exporter used. --oimport file_name=PATH The path to a flat file containing data. This option is only relevant when the input is an Adlib XML export file. --oimport db_user=VALUE The database user. This option is only relevant when the input is an TMS database. --oimport db_passowrd=VALUE The database user password. This option is only relevant when the input is an TMS database. --oimport db_name=VALUE The database name. This option is only relevant when the input is an TMS database. --oimport db_host=VALUE The database host. This option is only relevant when the input is an TMS database. --oexport datahub_url=VALUE The URL to the datahub instance. This should be a FQDN ie. http://datahub.lan/ --oexport oauth_client_id=VALUE The client public ID. Used for OAuth authentication of the Datahub endpoint. --oexport oauth_client_secret=VALUE The client secret passphrase. Used for OAuth authentication of the Datahub endpoint. --oexport oauth_username=VALUE The username of the Datahub user. Used for OAuth authentication of the Datahub endpoint. --oexport oauth_password=VALUE The password of the Datahub user. Used for OAuth authentication of the Datahub endpoint. Pipeline configuration file The pipeline configuration file is in the INI format and its location is provided to the application using the --pipeline switch. The file is broadly divided in two parts: the first (shortest) part configures the pipeline itself and sets the plugins to use for the import, fix and export actions. The second part sets options specific for the used plugins. Pipeline configuration This part has three sections: [Importer], [Fixer] and [Exporter]. Every section has just one option: plugin. Set this to the plugin you want to use for every action. All current supported plugins are in the Importer and Exporter folders. For the [Fixer], only the Fix plugin is supported. The [Fixer] has an additional option, id_path. This options contains the path (in Fix syntax) of the identifier of each record in your data after the fix has been applied, but before it is submitted to the Exporter. It is used for reporting and logging. Supported Importer plugins: TMS Adlib KMSKA MSK VKC OAI Supported Exporter plugins: Datahub LIDO YAML Plugin configuration All plugins have their own configuration options in sections called [plugin_type_name] where type can be importer, exporter or fixer and name is the name of the plugin (see above). For a list of supported and required options, see the plugin documentation. Example configuration file [Importer] plugin = Adlib [Fixer] plugin = Fix id_path = 'administrativeMetadata.recordWrap.recordID.0._' [Exporter] plugin = Datahub [plugin_importer_Adlib] file_name = '/tmp/adlib.xml' data_path = 'recordList.record.*' [plugin_fixer_Fix] fix_file = '/tmp/msk.fix' [plugin_exporter_Datahub] datahub_url = my.thedatahub.io datahub_format = LIDO oauth_client_id = datahub oauth_client_secret = datahub oauth_username = datahub oauth_password = datahub AUTHORS Pieter De Praetere Matthias Vandermaesen COPYRIGHT Copyright 2016 - PACKED vzw, Vlaamse Kunstcollectie vzw LICENSE This library is free software; you can redistribute it and/or modify it under the terms of the GPLv3.