Discoverability
Datasets are made discoverable by a variety of methods.
DataCite Integration
If you are using DataCite as your DOI provider, when datasets are published, metadata is pushed to DataCite, where it can be searched. For more information, see Legacy Single PID Provider: :DoiProvider in the Installation Guide.
OAI-PMH (Harvesting)
The Dataverse software supports a protocol called OAI-PMH that facilitates harvesting dataset metadata from one system into another. For details on harvesting, see the Managing Harvesting Server and Sets section.
Machine-Readable Metadata on Dataset Landing Pages
As recommended in A Data Citation Roadmap for Scholarly Data Repositories, the Dataverse software embeds metadata on dataset landing pages in a variety of machine-readable ways.
Croissant Metadata in the <head> of Dataset Landing Pages
Croissant is a metadata format for machine learning datasets.
In Dataverse, the <head> of the HTML source of a dataset landing page includes Croissant metadata like this:
<script type="application/ld+json">{"@context":..."cr":"http://mlcommons.org/croissant/"...
This is the same Croissant file you can download from a dataset landing page by clicking “Metadata” then “Export Metadata” (see Supported Metadata Export Formats) and the API (see croissant at Export Metadata of a Dataset in Various Formats).
We include Croissant in the <head> because it’s recommended by Google for Google Dataset Search, where they offer a filter to narrow results to only datasets with support for Croissant.
Before Croissant was invented, Google recommended a different format that Dataverse refers to as “Schema.org JSON-LD” in the user interface (and schema.org in the API). If you prefer to put that older format in the <head>, which was the behavior in older versions of Dataverse, see dataverse.feature.legacy-format-in-head.
Signposting
The Dataverse software supports Signposting. This allows machines to request more information about a dataset through the Link HTTP header. Links to all enabled metadata export formats are given. See Supported Metadata Export Formats for a list.
- There are 2 Signposting profile levels, level 1 and level 2. In this implementation,
Level 1 links are shown as recommended in the “Link” HTTP header, which can be fetched by sending an HTTP HEAD request, e.g.
curl -I https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/KPY4ZC. The number of author and file links in the level 1 header can be configured as described below.The level 2 linkset can be fetched by visiting the dedicated linkset page for that artifact. The link can be seen in level 1 links with key name
rel="linkset".
Note: Authors without author link will not be counted nor shown in any profile/linkset. The following configuration options are available:
dataverse.signposting.level1-author-limit
Sets the max number of authors to be shown in level 1 profile. If the number of authors (with identifier URLs) exceeds this value, no author links will be shown in level 1 profile. The default is 5.
dataverse.signposting.level1-item-limit
Sets the max number of items/files which will be shown in level 1 profile. Datasets with too many files will not show any file links in level 1 profile. They will be shown in level 2 linkset only. The default is 5.
See also Retrieve Signposting Information in the API Guide.
Additional Discoverability Through Integrations
See Discoverability in the Integrations section for additional discovery methods you can enable.