Abstract

This document describes the background and methodology for the design of the DataCite profile of DCAT-AP (DataCite+DCAT-AP), as well as the defined mappings.

Disclaimer

The views expressed are purely those of the author and may not in any circumstances be regarded as stating an official position of the European Commission.

Introduction

This document describes the background and methodology for the design of the DataCite profile of DCAT-AP (DataCite+DCAT-AP), as well as the defined mappings.

An XSLT-based implementation of the defined mappings is available in the dedicated GitHub repository.

The following sections first illustrate the background () and the methodology () for the design of DataCite+DCAT-AP, along with an analysis of related work (). ...

Background

The DCAT Application Profile for Data Portals in Europe (DCAT-AP)

DCAT-AP [[DCAT-AP]] is a metadata profile developed in the framework of the EU Programme Interoperability Solutions for European Public Administrations (ISA), and based on and compliant with the W3C Data Catalog vocabulary (DCAT) [[VOCAB-DCAT]] - currently, one of the most widely used Semantic Web vocabularies for describing datasets and data catalogues.

The purpose of DCAT-AP is to define a common interchange metadata format for data portals of the EU and of EU Member States. In order to achieve this, DCAT-AP defines a set of classes and properties, grouped into mandatory, recommended and optional. Such classes and properties correspond to information on datasets and data catalogues that are shared by many European data portals, aiding interoperability. Although DCAT-AP is designed to be independent from its actual implementation, RDF [[RDF-CONCEPTS]] and Linked Data [[LD-BOOK]] are the reference technologies.

DataCite

DataCite is an international initiative meant to enable citation for scientific datasets. To achieve this, DataCite operates a metadata infrastructure, following the same approach used by CrossRef for scientific publications. As such, the DataCite infrastructure is responsible for issuing persistent identifiers (in particular, DOIs) for datasets, and for registering dataset metadata. Such metadata are to be provided according to the DataCite metadata schema - which is basically an extension to the DOI one.

Currently, DataCite is the de facto standard for data citation. Therefore, the ability to transform metadata records from and to the DataCite metadata schema would enable, respectively, the harvesting of DataCite records, and the publication of metadata records in the DataCite infrastructure (thus enabling their citation).

Aligning DataCite with DCAT-AP

The motivation for investigating the possiblity of aligning DataCite metadata with DCAT-AP is twofold:

  1. To identify how to create a DCAT-AP-compliant representation of DataCite metadata, in order to enable their sharing across DCAT-AP-enabled data catalogues. This analysis is not meant to provide a complete representation of all DataCite metadata elements, but only of those included in DCAT-AP.
  2. To identify how to create a DataCite-compliant representation of DCAT-AP metadata, in order to enable their publishing on the DataCite infrastructure. This analysis is meant to develop an extension of DCAT-AP, covering all DataCite metadata elements.

About point (2), the DataCite-based extension of DCAT-AP is also meant to integrate into DCAT-AP all the information required for data citation.

Based on these considerations, two versions of a DataCite profile of DCAT-AP have been defined, namely, DataCite+DCAT-AP Core (addressing the requirements of point (1)) and DataCite+DCAT-AP Extended (addressing the requirements of point (2)). More precisely, the core version includes alignments only for the subset of DataCite metadata elements included in the DCAT-AP specification, whereas the extended version tries to defines alignments for all the DataCite metadata elements using DCAT-AP and other Semantic Web vocabularies (whenever DCAT-AP does not offer suitable candidates). As such, DataCite+DCAT-AP Extended is a superset of DataCite+DCAT-AP Core, and both are conformant with DCAT-AP.

Methodology

The reference DCAT-AP and DataCite specifications on which DataCite+DCAT-AP is based are the following ones:

For the mappings, existing work has been taken into account concerning the mapping of DataCite to other metadata standards. In particular:

DataCite+DCAT-AP builds upon these specifications to provide an as much as possible complete mapping of all the metadata elements in version 4.1 of the DataCite metadata schema. Moreover, the defined mappings are backward compatible with earlier versions of the DataCite metadata schema.

The resulting mappings have been grouped into two classes, corresponding to two different DataCite+DCAT-AP profiles:

As far as the extended profile is concerned, the reference vocabularies have been chosen based on the following criteria:

  1. They have clear persistence and versioning policies.
  2. Preferably, they should be used across domains and data communities.

These criteria are determining the main differences with the mappings defined in DC2AP and DC2RDF, that are illustrated in the following section.

Comparison with DC2AP & DataCite2RDF

DC2AP and DataCite2RDF provide a full mapping of version 3.1 of the DataCite metadata schema. To achieve this, they re-use a number of vocabularies, that can be grouped into two main classes:

  1. General purpose and widely used vocabularies, such as Dublin Core, FOAF, GeoSPARQL, and SKOS.
  2. A set of vocabularies developed specifically to model the publishing and academic domain. They include PRISM (Publishing Requirements for Industry Standard Metadata) [[PRISM]] and FRBR (Functional Requirements for Bibliographic Records) [[FRBR]], plus a set of ontologies developed in the framework of the SPAR (Semantic Publishing and Referencing Ontologies) project.

The current version of DataCite+DCAT-AP follows the mappings based on the former group of vocabularies, but not the ones based on the latter group. The reason is twofold. First, the persistence and versioning policies of these vocabularies are unclear, and so it has been considered safer to re-consider their use when DataCite+DCAT-AP is more consolidated. Second, the mappings defined in DC2AP and DataCite2RDF are not compliant with the general requirements of DCAT-AP. For instance, despite the resource types defined in DataCite include datasets and metadata records, DC2AP and DataCite2RDF do not make use of DCAT.

It is worth noting that the second group of ontologies, and in particular the ones developed in the SPAR project, provide interesting solutions to modelling some aspects not explicitly addressed in DCAT-AP - e.g., the possibility of associating a temporal dimension to agent roles - but also alternative solutions for specifying the same information - one of the examples being resource identifiers. Complementing and aligning the different approaches would be mutually beneficial.

DataCite and DCAT-AP at a glance

The following sections provide a high-level comparison of the metadata elements defined in DataCite and DCAT-AP.

DataCite metadata elements supported in DCAT-AP

The following table provides the complete list of DataCite metadata elements, and shows whether they are supported in DCAT-AP.

For each of the DataCite metadata elements, the table specifies whether they are mandatory (M), recommended (R), or optional (O).

DataCite 4.1 DCAT-AP 1.1 Comments
Elements Obligation
Identifier M Partially DataCite requires this to be a DOI, whereas DCAT-AP does not have such requirement
Creator M No This agent role is supported in GeoDCAT-AP
Title M Yes
Publisher M Yes
Publication year M Yes
Subject R Yes
Contributor R Partially

DCAT-AP supports only 1 out of the 21 DataCite contributor types (namely, contact point / person).

GeoDCAT-AP supports 1 additional DataCite contributor type, namely, rights holder.

Date R Partially

DCAT-AP supports only 2 out of the 9 DataCite date types (namely, issue date and last modified date)

GeoDCAT-AP supports also an additional date type, namely, creation date.

Resource type R Partially

DCAT-AP supports just one resource type, namely, dcat:Dataset. All DataCite resource types fall under the definition of dcat:Dataset, with the exception of event, physical object, and service. The last one (i.e., service) is however supported in GeoDCAT-AP.

Related identifier R Yes
Description R Yes
Geolocation R Yes
Language O Yes
Alternate identifier O Yes
Size O Yes In DCAT-AP, this is a property of the dataset distribution, and not of the dataset itself
Format O Yes In DCAT-AP, this is a property of the dataset distribution, and not of the dataset itself
Version O Yes
Rights O Yes

DataCite does not use specific elements for use conditions (i.e., licences) and access rights,

In DCAT-AP, use conditions are a property of the dataset distribution, whereas access rights are associated with the dataset.

Funding Reference O No This element specifies: (a) title, identifier and, possibly, URI of the funding project, and (b) name and identifier of the organisation who awarded that project

DCAT-AP classes and properties supported in DataCite

The following table provides the list of classes and properties defined in DCAT-AP, and shows whether they are supported in DataCite.

NB: The list of DCAT-AP classes and properties is here limited to those that are either mandatory (M) or recommended (R).

DCAT-AP 1.1 DataCite 4.1 Comments
Classes Obligation Properties Obligation
Agent M name M Yes
type R No
Catalogue M dataset M No
description M No
publisher M No
title M No
homepage R No
language R No
licence R No
release date R No
themes R No
update / modification date R No
Dataset M description M Yes In DataCite, this property is recommended, not mandatory
title M Yes
contact point R Yes
dataset distribution R No
keyword / tag R Yes
publisher R Yes
theme / category R Yes
Category R preferred label M Yes
Category scheme R title M Yes
Distribution R accessURL M No
description R No
format R Yes In DataCite, this is always a property of the resource itself - even when such resource is a dataset
licence R Yes In DataCite, this is always a property of the resource itself - even when such resource is a dataset
Licence document R type R No

Summary of alignment issues

As shown in the previous section, DCAT-AP is able to represent all DataCite mandatory elements, with the exception of "creator". This poses an issue for the possible use of DCAT-AP for data citation purposes, since element "creator" is one of the required components. Notably, GeoDCAT-AP supports this agent role, so it can re-used for this purpose.

On the other hand, DataCite includes all the DCAT-AP mandatory classes and related properties, with the only notable exception of dcat:Catalog. However, this does not pose particular compliance issues, since the catalogue description could be obtained separately from the relevant DataCite records. Actually, since DataCite records are supposed to be all available via the DataCite catalogue, the catalogue description can potentially be the same for all DataCite records. Of course, this does not apply for those records following the DataCite schema but not registered in the DataCite infrastructure.

There are however some key differences on the DCAT-AP and DataCite data models that needs to be addressed. The following sections outline the solutions adopted in DataCite+DCAT-AP, as well as open issues.

Resource types

DataCite supports 14 different resource types - namely: audiovisual, collection, dataset, event, image, interactive resource, model, physical object, service, software, sound, text, workflow, other. They basically corresponds to the classes included in the DCMI Type vocabulary, with the exception of model and workflow.

The definition of dcat:Dataset is broad enough to cover most of the DataCite resource type, the exceptions being event, physical object, service, which are not supported in DCAT-AP. Moreover, the notion of "service" is supported in GeoDCAT-AP via dctype:Service. For the rest, it is possible to re-use the DCMI Type vocabulary, which includes classes for event (dctype:Event) and physical object (dctype:Event).

DataCite+DCAT-AP re-uses the approach outlined above. Moreover, in order to preserve the original information, it uses dct:type with the relevant classes of the DCMI Type vocabulary to denote the DataCite resource type. This is basically the solution adopted in GeoDCAT-AP to model the resource types defined in ISO 19115 - namely, dataset, dataset series, and services.

As said above, the DCMI Type vocabulary does not include classes for model and workflow, and no suitable candidates have been found in the reference vocabularies. As a result, in DataCite+DCAT-AP are both modelled only as dcat:Dataset's, thus loosing the original information.

Identifiers

The requirements are basically the following ones:

DCAT-AP already provides a mechanism to model primary and secondary identifiers, as well as the identifier type. More precisely:

Such solutions are basically reflecting the DataCite approach to model identifiers. However, identifiers modelled in this way are of no use for effectively linking the relevant resources. For this purpose, an option would be encoding identifiers as HTTP URIs, whenever possible. This is the case, e.g., for ORCIDs, ISNIs, and DOIs. About the ability to modelling differently primary and secondary/alternative identifiers, the resource URI can denote the primary identifier, whereas URIs corresponding to alternative identifiers can be specified by using owl:sameAs.

Based on what said above, DataCite+DCAT-AP models identifiers as follows:

Agent roles

DataCite supports three main types of agent roles, namely, creator, publisher, and contributor. The last can be further specialised by specifying a contributor "type". DataCite supports 22 contributor types, including, e.g., "contact person", "editor", "funder", "producer", "rights holder", "sponsor", "other".

DCAT-AP supports only two agent roles, namely, publisher and contact point (corresponding to contributor type "contact person" in DataCite). GeoDCAT-AP includes other two DataCite agent roles - namely, creator and rights holder.

As a result, together, DCAT-AP and GeoDCAT-AP cover publisher, creator, and 2 contributor types, namely, contact point and rights holder. For the other ones, DataCite+DCAT-AP includes the following mappings:

It is worth noting that some of the DataCite contributor types cannot be modelled with a direct relationship. This is the case of roles "project leader", "project manager", "project member", "researcher", "supervisor", and "workpackage leader". Such roles are not directly describing the relationship between a resource and an agent, but rather the role of the individual in the "activity" that created the resource. E.g., "project leader" can be described as "the leader of the project that created the resource".

In such cases, the approach used in DataCite+DCAT-AP is as follows:

In case of roles "project leader", "project manager", and "project member", the activity is additionally typed as a foaf:Project.

The following code snippet shows how contributor type "project member" is modelled in DataCite+DCAT-AP:

a:Dataset a dcat:Dataset ;
  dct:contributor a:Contributor ;
  prov:wasGeneratedBy a:Project .

a:Contributor a foaf:Agent , prov:Agent .

a:Project a prov:Activity , foaf:Project ;
  foaf:member a:Contributor .

The issue is that the reference vocabularies does not provide candidates for modelling such contributor types, with the exception of "project member".

For the remaining 14 DataCite contributor types, no candidates have been found in the reference vocabularies, so they are left unmapped in DataCite+DCAT-AP.

Distributions

The DataCite data model does not distinguish between a dataset and its embodiment(s) ("distribution(s)", in the DCAT terminology).

As a consequence, attributes that in DCAT/DCAT-AP are specific to distributions (as format, licence, size), in DataCite are associated with the dataset. Moreover, in DataCite there is no attribute equivalent to dcat:accessURL or dcat:downloadURL. Actually, the only information that can be used to access the dataset, and, possibly, its distribution(s), is the resource DOI.

Based on this, the approach used in DataCite+DCAT-AP to map DataCite records is as follows:

  1. If the described resource is an event, physical object, or service (i.e., if it cannot be modelled as a dataset), the notion of "distribution" does not apply. Therefore, all DataCite elements are used in DataCite+DCAT-AP to describe the resource. Otherwise:
  2. Each record is modelled in DataCite+DCAT-AP as a dataset (dcat:Dataset), having exactly 1 distribution.
  3. The resulting distribution gets the relevant DataCite elements (as format, licence, size), as per the DCAT/DCAT-AP schema, whereas the remaining ones are used to describe the dataset.
  4. The dataset DOI is used both as the dataset identifier / URI and as the distribution access URL.

Use and access conditions

DataCite includes a single element, namely, "rights", to specify use and access conditions. This element is also supported in DCAT-AP (dct:rights), but, in addition, specific properties are used for use conditions (dct:license) and access rights (dct:accessRights). Moreover, in DCAT-AP use conditions are associated with distributions, whereas access rights with datasets.

Based on this, DataCite+DCAT-AP maps by default DataCite "rights" to dct:rights. In addition, they are mapped to dct:license and dct:accessRights when DataCite rights make explicit reference to some known licences and access rights vocabularies. More precisely, the recognised vocabularies are the following ones:

Keywords and controlled vocabularies

DataCite supports the specification of both free-text keywords and keywords from controlled vocabularies.

For the latter case, DCAT-AP recommends the use of URIs, but in DataCite only textual labels are used.

To comply with the DCAT-AP recommendation, an option is to implement mappings from textual labels to URIs. However, this poses two main issues:

  1. DataCite does not require / recommend the use of specific vocabularies, nor a particular format for the textual labels.
  2. It is often the case that no URIs are available for the used vocabularies.

Such situation makes it difficult the effective implementation of vocabulary mapping.

For this reason, DataCite+DCAT-AP preserve keywords from controlled vocabularies as textual labels.

Used namespaces

Prefix Namespace URI Schema & documentation
adms http://www.w3.org/ns/adms# Asset Description Metadata Schema [[VOCAB-ADMS]]
dc http://purl.org/dc/elements/1.1/ Dublin Core Metadata Element Set, Version 1.1 [[DC11]]
dcat http://www.w3.org/ns/dcat# Data Catalog Vocabulary [[VOCAB-DCAT]]
dct http://purl.org/dc/terms/ DCMI Metadata Terms [[DCTERMS]]
duv http://www.w3.org/ns/duv# Dataset Usage Vocabulary [[VOCAB-DUV]]
foaf http://xmlns.com/foaf/0.1/ FOAF Vocabulary [[FOAF]]
frapo http://purl.org/cerif/frapo/ FRAPO, the Funding, Research Administration and Projects Ontology [[FRAPO]]
geo http://www.w3.org/2003/01/geo/wgs84_pos# W3C Basic Geo (WGS84 lat/long) vocabulary [[W3C-BASIC-GEO]]
gsp http://www.opengis.net/ont/geosparql# GeoSPARQL - A Geographic Query Language for RDF Data [[GeoSPARQL]]
locn http://www.w3.org/ns/locn# ISA Programme Core Location Vocabulary [[LOCN]]
org http://www.w3.org/ns/org# The Organization Ontology [[VOCAB-ORG]]
owl http://www.w3.org/2002/07/owl# OWL Web Ontology Language Reference [[OWL-REF]]
prov http://www.w3.org/ns/prov# PROV-O: The PROV Ontology [[PROV-O]]
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# Resource Description Framework (RDF): Concepts and Abstract Syntax [[RDF-CONCEPTS]]
rdfs http://www.w3.org/2000/01/rdf-schema# RDF Vocabulary Description Language 1.0: RDF Schema [[RDF-SCHEMA]]
schema http://schema.org/ schema.org [[SCHEMA-ORG]]
skos http://www.w3.org/2004/02/skos/core# SKOS Simple Knowledge Organization System - Reference [[SKOS-REFERENCE]]
vcard http://www.w3.org/2006/vcard/ns# vCard Ontology [[vCARD-RDF]]
xsd http://www.w3.org/2001/XMLSchema# XML Schema Part 2: Datatypes Second Edition [[XMLSCHEMA-2]]
wdrs https://www.w3.org/2007/05/powder-s# Protocol for Web Description Resources (POWDER): POWDER-S Vocabulary (WDRS) [[POWDER-S]]

Reference code lists for metadata elements

DataCite metadata elements Code list URI Code lists Status
Language http://publications.europa.eu/resource/authority/language Language register operated by the Metadata Registry of the Publications Office of the EU [[MDR-LANG]] stable
Format http://publications.europa.eu/resource/authority/file-type File type register operated by the Metadata Registry of the Publications Office of the EU [[MDR-FT]] stable
http://www.iana.org/assignments/media-types IANA MIME Media Types register [[IANA-MEDIA-TYPES]] testing

Mapping summary

The following section summarises the alignments defined in DataCite+DCAT-AP.

The alignments are grouped as follows:

The alignments supported only in the extended profile of DataCite+DCAT-AP are in bold.

1st-level mappings

The mappings illustrated in this section concern the 1st-level elements in the DataCite metadata schema.

These elements specify properties / relationships that, in some cases, can be futher specialised with an attribute denoting their sub-type (e.g., the "type" of resource, the "type" of contributor, the "type" of related resource). For this reason, elements having a "type" attribute have both a default mapping for the element, and a specific mapping for the type. The default mapping is used in the following cases:

As a rule, the domain of the mappings is the one corresponding to the ResourceType element (i.e., rdfs:Resource, dcat:Dataset, dctype:Service, or dctype:Event). However, "starred" elements - i.e., elements whose name is preceded by an asterisk ("*") - are those having as domain dcat:Distribution when the resource is modelled as a dcat:Dataset.

Element Type Mappings Mapping status Comments
Property or RDF/XML attribute Range
Identifier @rdf:about rdfs:Resource (URI reference) testing
dct:identifier xsd:anyURI testing
dcat:landingPage rdfs:Resource (URI reference) testing If the resource is modelled as a dcat:Dataset
foaf:page rdfs:Resource (URI reference) testing If the resource is not modelled as a dcat:Dataset
* dcat:accessURL rdfs:Resource (URI reference) testing If the resource is modelled as a dcat:Dataset, the domain is dcat:Distribution
Creator dct:creator foaf:Agent testing
Title default dct:title rdf:PlainLiteral testing
AlternativeTitle dct:alternative rdf:PlainLiteral testing
Subtitle ??:?? rdf:PlainLiteral unstable TBD
TranslatedTitle dct:title rdf:PlainLiteral testing
Publisher dct:publisher foaf:Agent testing
PublicationYear dct:issued xsd:gYear testing
Subject dct:subject skos:Concept testing If the subject is associated with a subject scheme
dcat:keyword rdf:PlainLiteral testing If the subject is not associated with a subject scheme
Contributor default dct:contributor foaf:Agent testing Only for the extended profile
ContactPerson dcat:contactPoint vcard:Individual testing
DataCollector ??:?? foaf:Agent unstable TBD
DataCurator ??:?? foaf:Agent unstable TBD
DataManager ??:?? foaf:Agent unstable TBD
Distributor duv:hasDistributor foaf:Agent testing Only for the extended profile
Editor schema:editor foaf:Agent testing Only for the extended profile
Funder schema:funder foaf:Agent testing

Only for the extended profile.

This element has been deprecated in DataCite 4.0, in favour of new element FundingReference.

HostingInstitution ??:?? foaf:Agent unstable TBD
Producer schema:producer foaf:Agent testing Only for the extended profile
ProjectLeader ??:?? foaf:Agent unstable TBD
ProjectManager ??:?? foaf:Agent unstable TBD
ProjectMember dct:contributor foaf:Agent testing Only for the extended profile
* foaf:member testing

Only for the extended profile

The domain of property foaf:member is class foaf:Project.

The resource is linked to foaf:Project with property prov:wasGeneratedBy.

RegistrationAgency ??:?? foaf:Agent unstable TBD
RegistrationAuthority ??:?? foaf:Agent unstable TBD
RelatedPerson ??:?? foaf:Agent unstable TBD
Researcher ??:?? foaf:Agent unstable TBD
ResearchGroup ??:?? foaf:Agent unstable TBD
RightsHolder dct:rightsHolder foaf:Agent testing Only for the extended profile
Sponsor schema:sponsor foaf:Agent testing Only for the extended profile
Supervisor ??:?? foaf:Agent unstable TBD
WorkPackageLeader ??:?? foaf:Agent unstable TBD
Other dct:contributor foaf:Agent testing Only for the extended profile
Date default dct:date xsd:date testing Only for the extended profile
Accepted dct:dateAccepted xsd:date testing Only for the extended profile
Available dct:available xsd:date testing Only for the extended profile
Copyrighted dct:dateCopyrighted xsd:date testing Only for the extended profile
Collected dct:created xsd:date unstable TBD
Created dct:created xsd:date testing Only for the extended profile
Issued dct:issued xsd:date testing
Submitted dct:dateSubmitted xsd:date testing Only for the extended profile
Updated dct:modified xsd:date testing
Valid dct:valid xsd:date testing Only for the extended profile
Language dct:language dct:LinguisticSystem testing
ResourceType default rdf:type rdfs:Resource unstable TBD
Audiovisual rdf:type dcat:Dataset testing
dct:type dctype:MovingImage testing Only for the extended profile
Collection rdf:type dcat:Dataset testing
dct:type dctype:Collection testing Only for the extended profile
DataPaper rdf:type dcat:Dataset testing Added in DataCite v4.1
dct:type ??:?? unstable TBD
Dataset rdf:type dcat:Dataset testing
dct:type dctype:Dataset testing Only for the extended profile
Event rdf:type dctype:Event testing Only for the extended profile
dct:type dctype:Event testing Only for the extended profile
Image rdf:type dcat:Dataset testing
dct:type dctype:Image testing Only for the extended profile
InteractiveResource rdf:type dcat:Dataset testing
dct:type dctype:InteractiveResource testing Only for the extended profile
Model rdf:type dcat:Dataset testing
dct:type ??:?? unstable TBD
PhysicalObject rdf:type dctype:PhysicalObject testing Only for the extended profile
dct:type dctype:PhysicalObject testing Only for the extended profile
Service rdf:type dctype:Service testing Only for the extended profile
dct:type dctype:Service testing Only for the extended profile
Software rdf:type dcat:Dataset testing
dct:type dctype:Software testing Only for the extended profile
Sound rdf:type dcat:Dataset testing
dct:type dctype:Sound testing Only for the extended profile
Text rdf:type dcat:Dataset testing
dct:type dctype:Text testing Only for the extended profile
Workflow rdf:type dcat:Dataset testing
dct:type ??:?? unstable TBD
Other rdf:type rdfs:Resource unstable TBD
dct:type ??:?? unstable TBD
AlternateIdentifier owl:sameAs URI reference testing
adms:identifier adms:Identifier testing
RelatedIdentifier default dct:relation rdfs:Resource testing
IsCitedBy ??:?? rdfs:Resource unstable TBD
Cites ??:?? rdfs:Resource unstable TBD
IsSupplementTo ??:?? rdfs:Resource unstable TBD
IsSupplementedBy ??:?? rdfs:Resource unstable TBD
IsContinuedBy ??:?? rdfs:Resource unstable TBD
Continues ??:?? rdfs:Resource unstable TBD
HasMetadata foaf:isPrimaryTopicOf dcat:CatalogRecord (URI reference) testing
IsMetadataFor foaf:primaryTopic rdfs:Resource testing Only for the extended profile
IsNewVersionOf dct:isVersionOf rdfs:Resource testing
IsPreviousVersionOf dct:hasVersion rdfs:Resource testing
IsPartOf dct:isPartOf rdfs:Resource testing Only for the extended profile
HasPart dct:hasPart rdfs:Resource testing Only for the extended profile
IsReferencedBy dct:isReferencedBy rdfs:Resource testing Only for the extended profile
References dct:references rdfs:Resource testing Only for the extended profile
IsDocumentedBy foaf:page rdfs:Resource testing
Documents ??:?? rdfs:Resource unstable TBD
IsCompiledBy ??:?? rdfs:Resource unstable TBD
Compiles ??:?? rdfs:Resource unstable TBD
IsVariantFormOf schema:isVariantOf rdfs:Resource testing Only for the extended profile
IsOriginalFormOf ??:?? rdfs:Resource unstable TBD
IsIdenticalTo owl:sameAs rdfs:Resource testing Only for the extended profile
IsReviewedBy schema:review rdfs:Resource testing Only for the extended profile
Reviews schema:itemReviewed rdfs:Resource testing Only for the extended profile
IsDerivedFrom dct:source rdfs:Resource testing
IsSourceOf prov:hadDerivation rdfs:Resource testing Only for the extended profile
Describes ??:?? rdfs:Resource unstable TBD
IsDescribedBy wdrs:describedby rdfs:Resource testing Only for the extended profile
HasVersion dct:hasVersion rdfs:Resource testing
IsVersionOf dct:isVersionOf rdfs:Resource testing
Requires dct:requires rdfs:Resource testing Only for the extended profile
IsRequiredBy dct:isRequiredBy rdfs:Resource testing Only for the extended profile
* Size dct:extent dct:SizeOrDuration testing

If the resource is modelled as a dcat:Dataset, the domain is dcat:Distribution.

Only for the extended profile.

* Format dct:format dct:MediaTypeOrExtent testing

If not specified with a IANA media type

If the resource is modelled as a dcat:Dataset, the domain is dcat:Distribution.

dcat:mediaType dct:MediaTypeOrExtent (URI reference) testing

If specified with a IANA media type

If the resource is modelled as a dcat:Dataset, the domain is dcat:Distribution.

Version owl:versionInfo rdf:PlainLiteral testing
* Rights dct:rights dct:RightsStatement testing If the resource is modelled as a dcat:Dataset, the domain is dcat:Distribution.
Description default dct:description rdf:PlainLiteral testing
Abstract dct:description rdf:PlainLiteral testing
Methods dct:provenance dct:ProvenanceStatement testing
SeriesInformation ??:?? ??:?? unstable TBD
TableOfContents dct:tableOfContents rdf:PlainLiteral testing Only for the extended profile.
Other rdfs:comment rdf:PlainLiteral testing Only for the extended profile.
GeoLocation dct:spatial dct:Location testing
FundingReference frapo:isFundedBy foaf:Project testing

Element added in DataCite 4.0.

Only for the extended profile.

2nd-level mappings

The mappings illustrated in this section concern the 2nd-level elements in the DataCite metadata schema.

These elements, and the corresponding mappings, are grouped in the following classes:

Elements with child elements

Element Child elements Mappings Mapping status Comments
Domain Property or RDF/XML attribute Range
Creator creatorName foaf:Agent foaf:name rdf:PlainLiteral testing
givenName foaf:givenName rdf:PlainLiteral testing
familyName foaf:familyName rdf:PlainLiteral testing
nameIdentifier @rdf:about URI reference testing
affiliation org:memberOf foaf:Organization testing
Contributor contributorName foaf:Agent foaf:name rdf:PlainLiteral testing
vcard:Individual vcard:fn rdf:PlainLiteral testing If the contributor type is "ContactPerson"
givenName foaf:Agent foaf:givenName rdf:PlainLiteral testing
vcard:Individual vcard:given-name rdf:PlainLiteral testing If the contributor type is "ContactPerson"
familyName foaf:Agent foaf:familyName rdf:PlainLiteral testing
vcard:Individual vcard:family-name rdf:PlainLiteral testing If the contributor type is "ContactPerson"
nameIdentifier foaf:Agent @rdf:about URI reference testing
vcard:Individual testing If the contributor type is "ContactPerson"
affiliation foaf:Agent org:memberOf foaf:Organization testing
vcard:Individual vcard:organization-name rdf:PlainLiteral testing If the contributor type is "ContactPerson"
GeoLocation geoLocationPoint dct:Location geo:lat_long rdfs:Literal testing

In DataCite 4.0, this information is specified by using 2 child elements - namely, pointLatitude and pointLongitude.

Earlier versions of DataCite use a literal instead.

locn:geometry gsp:gmlLiteral testing
gsp:wktLiteral
geoLocationBox locn:geometry gsp:wktLiteral testing

In DataCite 4.0, this information is specified by using 4 child elements - namely, northBoundLatitude, eastBoundLongitude, southBoundLatitude, and westBoundLongitude.

Earlier versions of DataCite use a literal instead.

gsp:gmlLiteral
schema:box rdfs:Literal testing
geoLocationPolygon locn:geometry gsp:wktLiteral testing

Element added in DataCite 4.0.

The polygon vertices are specified by using child element geoPolygonPoint. The coordinates of each vertex are specified by using two child elements - respectively, pointLatitude and pointLongitude.

gsp:gmlLiteral
schema:polygon rdfs:Literal testing
FundingReference awardNumber foaf:Project dct:identifier xsd:string | xsd:anyURI testing
awardTitle dct:title rdf:PlainLiteral testing
* funderName foaf:Organization foaf:name rdf:PlainLiteral testing

The "funding project" (foaf:Project) is linked to the "funder" (foaf:Organization) by using property frapo:isAwardedBy.

The domain is foaf:Organization.

* funderIdentifier dct:identifier xsd:string | xsd:anyURI testing

Elements with attributes

Element Textual content & attributes Mappings Mapping status Comments
Domain Property or RDF/XML attribute Range
Subject textual content skos:Concept skos:prefLabel rdf:PlainLiteral testing
@schemeURI skos:inScheme skos:ConceptScheme (URI reference) testing
* @subjectScheme skos:ConceptScheme dct:title rdf:PlainLiteral testing The domain is skos:ConceptScheme
Rights textual content dct:RightsStatement rdfs:label rdf:PlainLiteral testing
@rightsURI @rdf:about URI reference testing
awardNumber textual content foaf:Project dct:identifier xsd:string | xsd:anyURI testing
@awardURI @rdf:about URI reference testing

Identifiers

DataCite supports the use of persistent identifiers to denote:

In DataCite, such identifiers are specified as follows:

In DataCite+DCAT-AP, all these identifiers are mapped to URIs, by concatenating the identifier in the DataCite record with a URI prefix defined for each identifier type / scheme. Whenever possible, dereferenceable HTTP URIs/URLs are used; otherwise, URNs.

Notably, DataCite provides code lists for the types / schemes of identifiers used to denote resources and funders, but no code list is defined in DataCite for types / schemes of identifiers used to denote resource creators / contributors (the specification uses, as an example, "ORCID" and "ISNI").

However, DataCite does not specify a code list for scheme URIs. So, the mapping between the identifier type / scheme implemented in DataCite+DCAT-AP is based on the relevant registries and examples in the DataCite metadata schema specification. No URI prefix is of course used if the identifier is already a URI (as URLs and URNs).

The following table shows, for each identifier type / scheme, which is the URI prefix used in DataCite+DCAT-AP, along with examples of the results of such mappings. As mentioned above, all the identifier types / schemes in the table are defined as a code list in the DataCite metadata schema, with the exception of ORCID and ISNI (however, ISNI is defined in the code list for funder identifier types).

Identifier type / scheme Element(s) URI prefix used in DataCite+DCAT-AP Example Mapping status Comments
Original Transformed
ORCID nameIdentifier http://orcid.org/ 0000-0002-7285-027X http://orcid.org/0000-0002-7285-027X testing
ISNI nameIdentifier http://www.isni.org/ 0000000121032683 http://www.isni.org/0000000121032683 testing
funderIdentifier
GRID funderIdentifier https://www.grid.ac/institutes/ grid.270680.b https://www.grid.ac/institutes/grid.270680.b testing
CrossRef Funder ID funderIdentifier https://doi.org/ 10.13039/501100000900 https://doi.org/10.13039/501100000900 testing
DOI Identifier https://doi.org/ 10.1016/j.epsl.2011.11.037 https://doi.org/10.1016/j.epsl.2011.11.037 testing
AlternateIdentifier
RelatedIdentifier
ARK AlternateIdentifier http://n2t.net/ ark:/67531/metapth346793/ http://n2t.net/ark:/67531/metapth346793/ testing
RelatedIdentifier
arΧiv AlternateIdentifier http://arxiv.org/abs/ arXiv:0706.0001 http://arxiv.org/abs/0706.0001 testing The URI prefix replaces the namespace prefix arXiv: in the original identifier
RelatedIdentifier
bibcode AlternateIdentifier http://adsabs.harvard.edu/abs/ 2014Wthr...69...72C http://adsabs.harvard.edu/abs/2014Wthr...69...72C testing
RelatedIdentifier
EAN13 AlternateIdentifier urn:ean-13: 9783468111242 urn:ean-13:9783468111242 unstable
RelatedIdentifier
EISSN AlternateIdentifier urn:issn: 1562-6865 urn:issn:1562-6865 unstable
RelatedIdentifier
Handle AlternateIdentifier http://hdl.handle.net/ 10013/epic.10033 http://hdl.handle.net/10013/epic.10033 testing
RelatedIdentifier
IGSN AlternateIdentifier http://hdl.handle.net/10273/ (https://doi.org/10273/) SSH000SUA http://hdl.handle.net/10273/SSH000SUA (https://doi.org/10273/SSH000SUA) stable Identifier type added in DataCite 4.0.
RelatedIdentifier
ISBN AlternateIdentifier urn:isbn: 978-3-905673-82-1 urn:isbn:978-3-905673-82-1 unstable
RelatedIdentifier
ISSN AlternateIdentifier urn:issn: 0077-5606 urn:issn:0077-5606 unstable
RelatedIdentifier
ISTC AlternateIdentifier http://istc-search-beta.peppertag.com/ptproc/IstcSearch?tFrame=IstcListing&esfIstc= A12-2014-00013328-5 http://istc-search-beta.peppertag.com/ptproc/IstcSearch?tFrame=IstcListing&tForceNewQuery=Yes&esfIstc=A12-2014-00013328-5 testing
RelatedIdentifier
LISSN AlternateIdentifier urn:issn: 1188-1534 urn:issn:1188-1534 unstable
RelatedIdentifier
LSID AlternateIdentifier urn:lsid:ubio.org:namebank:11815 urn:lsid:ubio.org:namebank:11815 testing

LSIDs are implemented as URNs, following the pattern urn:lsid:authority:namespace:identifier:revision

URNs are URIs - no need for a URI prefix.

RelatedIdentifier
PMID AlternateIdentifier http://www.ncbi.nlm.nih.gov/pubmed/ 12082125 http://www.ncbi.nlm.nih.gov/pubmed/12082125 testing
RelatedIdentifier
PURL AlternateIdentifier http://purl.org/dc/terms/ http://purl.org/dc/terms/ testing PURLs are HTTP URIs - no need for a URI prefix.
RelatedIdentifier
UPC AlternateIdentifier urn:upc: 123456789999 urn:upc:123456789999 unstable
RelatedIdentifier
URL AlternateIdentifier http://www.heatflow.und.edu/index2.html http://www.heatflow.und.edu/index2.html testing URLs are URIs - no need for a URI prefix.
RelatedIdentifier
URN AlternateIdentifier urn:nbn:de:101:1-201102033592 urn:nbn:de:101:1-201102033592 testing URNs are URIs - no need for a URI prefix.
RelatedIdentifier