Uploading Metadata into Cafe Variome

There are two different ways to upload metadata into Cafe Variome.

Using the editing interface

The most straight forward way is to use the editing interface to (manually) create a new meta source. This is the recommended approach when there are only several records to add.

Using the file uploader

If there is a larger quantity of metadata, or if the metadata is being exported from another system, it's better to use the file uploader to upload the metadata. It does, however, require the metadata to be in a specific format to allow direct ingestion.

Metadata file format

The metadata file to be uploaded should be in JSON format. It should be an array containing one or multiple objects, with each object being a meta source. The structure of the meta-source should follow the meta-source model, but some fields can be omitted or left empty. The following is an example of a metadata file:

{ "sourceName": "A minimum custom source", "sourceType": "custom", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceUrls": [ "https://www.example.com" ] }

Fields and constraints

Some of the fields within the metadata model have constraints and/or formatting requirements. The following is a list of all of the fields, what they are, and their constraints. Fields not marked as optional are required, but may be left empty (as an empty string) or 0 (as a number) unless otherwise specified. All field names are case-sensitive.

Metadata entries cannot contain additional fields, except those allowed in customFields list or fields labeled as "custom." All other additional fields will be ignored and not stored or represented in any way.

Internal fields and manual assignment

The fields explained in the sections below are the fields that form the metadata about a given resource. However, there are several other fields used internally within Cafe Variome to enable specific features, like interlinking of metadata entries. These are not visible when using the editing interface, but may be manually assigned providing the data is accurate, and the admin conforms to the required processes to format them correctly.

sourceId: The UUID of the source. If omitted, Cafe Variome will assign a UUID to the source. If present, it should be a valid UUID4 string. This is used within each Cafe Variome instance to identify the source (UUID may not be unique accross instances). It can be used to link one metadata entry to another, for example, by filling the datasetIds fields in the cohort model.
connectionId: The UUID of a data source this metadata entry describes. This is usually only valid when the metadata describes a dataset, but in rare cases can be assigned to other metadata. This field is not recommended to assign directly, as there is no other way to know the UUID of a data source except from checking the database.

Common fields

These are the fields that are present in all types of meta-sources.

sourceName

string Cannot be empty.

sourceType

string Cannot be empty. Constrained vocabulary values:

custom
cohort
catalog
biobank
registry
guideline
dataset
dataCollection

resourceUrls

array[string] optional Full URI format with schema.

publisher

object Nested JSON object.

publisherType

string Cannot be empty. Constrained vocabulary values:

individual
organization
agency
other

name

string Cannot be empty.

contactEmail

string Cannot be empty.

contactName

string optional.

url

string optional Full URI format with schema.

location

string optional.

description

string optional Can be empty but not recommended.

themes

array[string] optional URI format.

releaseLicense

string optional Full URI format with schema.

language

string optional Two-character code adhering to ISO639-1 standard, lower case.

customFields

object optional Key-value or key[values] pairs. Constraints:

Key must be a string and cannot contain special characters: ., $, /, or \.
If a key is present, the value cannot be `null` but can be an empty string or an empty array.

Cohort specific fields

These are the fields specific to the cohort type.

cohortDetails

object Nested JSON object.

siteType

string Cannot be empty. Constrained vocabulary values:

singleSite
multiSite
multiCountry

country

string Cannot be empty. Two-character code adhering to ISO3166-1 standard, upper case.

yearStart

integer Cannot be empty. Four-digit integer.

collectedTypes

object optional Nested JSON object.

participants

object optional Nested JSON object.

diseases

array[string] Cannot be empty. Constrained vocabulary values:

controlGroup
ad
pd
irbd
dlb
caa
ftd
als
psp
cbd
msa
hd
ataxia
other

Note: An empty array will cause the entire participants object to be ignored.

numberOfSubjects

integer Must be greater than 0. If 0, the entire participants object will be ignored.

bioSamples

array[string] optional Constrained vocabulary values:

csf
serum
plasma
dna
saliva
urine
stool

images

array[string] optional Constrained vocabulary values:

mri
petAmyloid
petTau
spect
ocular

cognitiveData

array[string] optional Constrained vocabulary values:

crossSectional
longitudinal

datasetIds

array[string] optional UUID format. Datasets must be either present in the same file or already uploaded to the system.

Dataset Specific fields

These are the fields specific to the dataset type.

datasetVersions

array[object] Nested array of JSON objects.

datasetDetails

object Nested JSON object.

versionId: string optional Valid UUID 4 string. If omitted, a UUID will be automatically assigned.
versionName: string optional Semantic versioning recommended. Non-semantic versions will disable version parsing, comparison, and sorting.
keywords: array[string] optional Array of keyword strings.
publishedDate: string optional Date string in the format of `YYYY-MM-DD`.
updateDate: string optional Date string in the format of `YYYY-MM-DD`.

datasetContent

object Nested JSON object.

numberOfSubjects

integer Must be greater than 0. May be an approximate number if exact count is confidential.

minAge

integer optional Must be greater than 0.

maxAge

integer optional Must be greater than 0.

countries

array[string] optional Array of two-character codes adhering to ISO3166-1 standard, upper case.

diseases

array[string] optional Constrained vocabulary values:

controlGroup
ad
pd
irbd
dlb
caa
ftd
als
psp
cbd
msa
hd
ataxia
other

sex

array[string] optional Constrained vocabulary values:

male
female
other
undifferential
unknown

clinical

array[string] optional Constrained vocabulary values:

comorbidities
medicationUse
familyHistory
ageOfSymptomOnset
clinicalDiagnosis
exposure
lifeStyleInfo
vitalSigns

markers

array[string] optional Constrained vocabulary values:

amyloid
tau
neurofilamentLightChain
alphaSynuclein
dat

images

array[string] optional Constrained vocabulary values:

mri
petAmyloid
petTau
spect
ocular

electrophysiology

array[string] optional Constrained vocabulary values:

dataTypes

array[string] optional Constrained vocabulary values:

demographics
clinical
lifestyle
functionalRatings
motor
neuropsychiatric
neuropsychological
qualityOfLife
sleepScales
digitalData
imaging
electrophysiology
neuroPathology
other

Data collection Specific fields

These are the fields specific to the data collection type.

dataCollectionDetails

object optional Nested JSON object.

keywords: array[string] optional Array of keyword strings.
publishedDate: string optional Date string in the format of `YYYY-MM-DD`.
updateDate: string optional Date string in the format of `YYYY-MM-DD`.

dataCollectionContent

object optional Nested JSON object.

numberOfSubjects

integer Must be greater than 0. May be an approximate number if exact count is confidential.

minAge

integer optional Must be greater than 0.

maxAge

integer optional Must be greater than 0.

countries

array[string] optional Array of two-character codes adhering to ISO3166-1 standard, upper case.

diseases

array[string] optional Constrained vocabulary values:

controlGroup
ad
pd
irbd
dlb
caa
ftd
als
psp
cbd
msa
hd
ataxia
other

sex

array[string] optional Constrained vocabulary values:

male
female
other
undifferential
unknown

clinical

array[string] optional Constrained vocabulary values:

comorbidities
medicationUse
familyHistory
ageOfSymptomOnset
clinicalDiagnosis
exposure
lifeStyleInfo
vitalSigns

markers

array[string] optional Constrained vocabulary values:

amyloid
tau
neurofilamentLightChain
alphaSynuclein
dat

images

array[string] optional Constrained vocabulary values:

mri
petAmyloid
petTau
spect
ocular

electrophysiology

array[string] optional Constrained vocabulary values:

dataTypes

array[string] optional Constrained vocabulary values:

demographics
clinical
lifestyle
functionalRatings
motor
neuropsychiatric
neuropsychological
qualityOfLife
sleepScales
digitalData
imaging
electrophysiology
neuroPathology
other

Metadata model examples

Here are some minimum and maximum examples of metadata models for different source types.

Minimum example for custom type:

Maximum example for custom type:

{ "sourceId": "8df136d8-7fb0-4bec-a72a-5deed972bbb6", "sourceName": "A maximum custom source", "sourceType": "custom", "publisher": { "publisherType": "organization", "name": "University of Leicester", "contactEmail": "brookeslab@le.ac.uk", "contactName": "John Doe", "url": "https://www.le.ac.uk", "location": "Leicester, UK, Europe" }, "resourceUrls": [ "https://www.example.com" ], "description": "This is a maximum example of a custom source", "themes": [ "https://example.com/theme1", "https://example.com/theme2" ], "releaseLicense": "https://opensource.org/licenses/MIT", "language": "en", "connectionId": "b1120b19-e631-46ad-915c-c964c8a278a2", "customFields": { "Some custom field": "Some value", "Another custom field": [ "Value 1", "Value 2" ] } }

Minimum example for cohort type:

{ "sourceName": "A minimum cohort", "sourceType": "cohort", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceUrls": [ "https://www.example.com" ], "cohortDetails": { "siteType": "singleSite", "country": "UK", "yearStart": 2023 } }

Maximum example for cohort type:

{ "sourceId": "a6e001cb-bb60-48b9-a47a-3dccee13c085", "sourceName": "A maximum cohort", "sourceType": "cohort", "publisher": { "publisherType": "organization", "name": "University of Leicester", "contactEmail": "brookeslab@le.ac.uk", "contactName": "John Doe", "url": "https://www.le.ac.uk", "location": "Leicester, UK, Europe" }, "resourceUrls": [ "https://www.example.com" ], "description": "This is a maximum example of a cohort", "releaseLicense": "https://opensource.org/licenses/MIT", "language": "en", "themes": [ "https://example.com/theme1", "https://example.com/theme2" ], "cohortDetails": { "siteType": "multiSite", "country": "UK", "yearStart": 2023 }, "collectedTypes": { "participants": { "diseases": [ "controlGroup", "ad", "hd" ], "numberOfSubjects": 1000 }, "bioSamples": [ "csf", "serum", "plasma", "dna", "saliva", "urine" ], "images": [ "mri", "petTau", "datScan" ], "cognitiveData": [ "crossSectional" ] }, "connectionId": "6c3968af-3d29-4f81-8747-b2337c1cc01b", "datasetIds": [ "adbec8c2-9460-4814-9574-06a0dfe2efb5" ], "customFields": { "Some custom field": "Some value", "Another custom field": [ "Value 1", "Value 2" ] } }

Minimum example for dataset type:

{ "sourceName": "A minimum dataset", "sourceType": "dataset", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceUrls": [ "https://www.example.com" ], "datasetVersions": [ { "datasetDetails": { "versionName": "v1.0.0" }, "datasetContent": { "numberOfSubjects": 100 } } ] }

Maximum example for dataset type:

{ "sourceId": "adbec8c2-9460-4814-9574-06a0dfe2efb5", "sourceName": "A maximum dataset", "sourceType": "dataset", "publisher": { "publisherType": "organization", "name": "University of Leicester", "contactEmail": "brookeslab@le.ac.uk", "contactName": "John Doe", "url": "https://www.le.ac.uk", "location": "Leicester, UK, Europe" }, "resourceUrls": [ "https://www.example.com" ], "description": "This is a maximum example of a custom source", "themes": [ "https://example.com/theme1", "https://example.com/theme2" ], "datasetVersions": [ { "datasetDetails": { "versionId": "1b71b513-33be-45ee-b6e9-a24b2bc9dc05", "versionName": "v1.0.0", "keywords": [ "keyword1", "keyword2" ], "publishedDate": "2023-12-02", "updateDate": "2023-12-12" }, "datasetContent": { "numberOfSubjects": 100, "minAge": 18, "maxAge": 35, "countries": [ "UK", "US" ], "diseases": [ "controlGroup", "ad", "hd" ], "sex": [ "male", "female" ], "clinical": [ "lifeStyleInfo", "vitalSigns" ], "markers": [ "amyloid", "tau" ], "images": [ "mri", "petTau", "datScan" ], "electrophysiology": [ "eeg", "meg" ], "dataTypes": [ "demographics" ] } }, { "datasetDetails": { "versionId": "4114682d-73f5-45eb-9b7c-023e18cd12c9", "versionName": "v2.0.0", "keywords": [ "keyword3", "keyword4" ], "publishedDate": "2024-01-01", "updateDate": "2024-01-11" }, "datasetContent": { "numberOfSubjects": 200, "minAge": 20, "maxAge": 40, "countries": [ "UK", "US", "CA" ], "diseases": [ "controlGroup", "ad", "hd" ], "sex": [ "male", "female", "other" ], "clinical": [ "lifeStyleInfo", "vitalSigns" ], "markers": [ "amyloid", "tau", "neurofilamentLightChain" ], "images": [ "mri", "petTau", "datScan", "spect" ], "electrophysiology": [ "eeg", "meg", "erp" ], "dataTypes": [ "demographics", "clinical", "lifestyle" ] } } ], "releaseLicense": "https://opensource.org/licenses/MIT", "language": "en", "connectionId": "ac743200-c8ff-485e-a82d-45d0e636f862", "customFields": { "Some custom field": "Some value", "Another custom field": [ "Value 1", "Value 2" ] } }

Minimum example for dataCollection type:

Maximum example for dataCollection type:

Uploading the metadata file

To upload the metadata file, go to the admin interface, and click on the "Create Meta Source" button.

Then, set the meta source type to "all" to use the file uploader. Multiple files can be selected, and selected files can be removed before finalising the upload.

After selecting the files, click the "Process" button to begin reading and processing them. All metadata entries will first be read by the frontend, sanitized, and validated, then sent to the server for storage. Once processing is complete, you'll see a prompt displaying the number of metadata entries read and asking you to confirm the upload. After your confirmation, the metadata entries will be stored in the database and become available for search and discovery.

Last modified: 31 March 2025