Technical Implementation Guidelines

Metadata representation in CERIF XML

The CRIS-to-OpenAIRE information interchange uses the OAI-PMH 2.0 protocol with the CERIF XML defined by these Guidelines as the metadata language. This CERIF XML uses the namespace https://www.openaire.eu/cerif-profile/1.1/. Its structure is defined and constrained by the corresponding XML Schema. [1] Accompanying these Guidelines is a comprehensive set of examples. [2]

Each metadata object is represented as a top-level XML element: Publication, Product, Patent, Person, OrgUnit, Project, Funding, Event, Equipment. The content model for each of these elements is specified in the previous section; the rest of this subsection gives guidelines to its usage.

CERIF represents titles, names, abstracts and similar text attributes as multi-lingual. In CERIF XML the language is expressed using the standard xml:lang attribute. Unless stated otherwise this is considered to be the value in the original language.

While syntactically, the CERIF profile XML allows to construct structures of any depth, the contents of each metadata record should be kept limited to the nearest objects that are representable by a top-level element. These neighboring objects should be expressed using as much detail as is practical to identify them. This includes links to any higher level structures of which the object is part, e.g. to an institution of which an organisation unit is part.

However, the neighboring object XML shall never contain more information or different information from what is expressed in the main record for that object i.e., where the object is retrieved as a top-level object. This is a stronger form of a requirement of functional dependency.

Footnotes

[1]The XML schema is located at https://github.com/openaire/guidelines-cris-managers/raw/v1.1/schemas/openaire-cerif-profile.xsd.
[2]Please see an overview map at https://github.com/openaire/guidelines-cris-managers/blob/v1.1/docs/_illustrations/OpenAIRE-examples-map.png; the individual examples as full OAI-PMH 2.0 response messages https://github.com/openaire/guidelines-cris-managers/tree/v1.1/samples

OAI-PMH for Harvesting

OpenAIRE uses the OAI-PMH 2.0 protocol for harvesting metadata from CRIS systems.

Metadata Format and Prefix

OpenAIRE Guidelines 1.1 compatible CRIS should use the OAI-PMH metadata prefix oai_cerif_openaire and XML metadata contents from the https://www.openaire.eu/cerif-profile/1.1/ namespace.

A sample response to a ListMetadataFormats OAI-PMH request is available in openaire_oaipmh_example_ListMetadataFormats.xml.

OpenAIRE OAI-PMH Sets

For harvesting the records relevant to OpenAIRE, the use of specific OAI-PMH sets at the local CRIS system is mandatory. All of the following OAI-PMH sets shall be recognized by the CRIS, even if not all of them are populated.

OpenAIRE_CRIS_publications (setSpec: openaire_cris_publications): The list of CERIF XML records for publications and publishing channels.

OpenAIRE_CRIS_products (setSpec: openaire_cris_products): The list of CERIF XML records for datasets and other research products.

OpenAIRE_CRIS_patents (setSpec: openaire_cris_patents): The list of CERIF XML records for patents.

OpenAIRE_CRIS_persons (setSpec: openaire_cris_persons): The list of CERIF XML records for persons.

OpenAIRE_CRIS_orgunits (setSpec: openaire_cris_orgunits): The list of CERIF XML records for organisations and organisation units.

OpenAIRE_CRIS_projects (setSpec: openaire_cris_projects): The list of CERIF XML records for projects.

OpenAIRE_CRIS_funding (setSpec: openaire_cris_funding): The list of CERIF XML records for funding.

OpenAIRE_CRIS_events (setSpec: openaire_cris_events): The list of CERIF XML records for events.

OpenAIRE_CRIS_equipments (setSpec: openaire_cris_equipments): The list of CERIF XML records for equipment.

A sample response to a ListSets OAI-PMH request is available in openaire_oaipmh_example_ListSets.xml.

Referential integrity constraints for all relationships among entities must be satisfied in the CERIF XML data provided by the CRIS system.

Note that there is no set for services. Exactly one Service record, namely the one representing the CRIS, shall be given in the response to an OAI-PMH Identify request. For an example please see openaire_oaipmh_example_Identify.xml.

OAI identifiers

The identifiers of objects from the source CRIS shall be represented as OAI identifier of the form oai:{service}:{type}/{internal ID} where {service} denotes the internet domain name of the CRIS, {type} stands for the type of the object, and {internal ID} denotes an internal identifier of the object within the CRIS.

The types are expressed by the plural form of the XML element that represents the object i.e., the name of the collection of all such objects.

The internal identifiers are also used in the id attributes in the CERIF XML mark-up. If several candidate internal identifiers are available, the most persistent one should be preferred. In many cases a UUID – if it is assigned – is more likely to be persistent than integer IDs.

For example a publication with internal ID of 560d48b6-42c3-4ef9-81d6-32c949fb2cdb (a UUID) from a CRIS running on behalf of the University of Exampleton (www.exampleton.ac.uk with a cris running at cris.exampleton.ac.uk) will have the OAI identifier oai:cris.exampleton.ac.uk:Publications/560d48b6-42c3-4ef9-81d6-32c949fb2cdb.

Compatibility of aggregators

Aggregating CRISs (e.g. at the regional or national levels) can also become compliant to these Guidelines. These CRISs should provide additional provenance information about its records. The relevant section of the Literature Repository Guidelines should be followed.