Admin Guide Help

Uploading Metadata into Cafe Variome

To upload metadata into Cafe Variome, there are two different ways.

Using the editing interface

The most straight forward way is to use the editing interface to (manually) create a new meta source. This is the recommended approach when there are only several records to add.

Using the file uploader

If the metadata is large in quantity, or if the metadata is being exported from another system, it's better to use the file uploader to upload the metadata. It does, however, requires the metadata to be in a specific format to allow direct ingestion.

Metadata file format

The metadata file to be uploaded should be in JSON format. It should be an array containing one or multiple objects, each object being a meta source. The structure of the meta-source should follow the meta-source model, but some fields can be omitted or left empty. The following is an example of a metadata file:

{ "sourceName": "A minimum custom source", "sourceType": "custom", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceURLs": [ "https://www.example.com" ] }

Fields and constraints

Some of the fields within the metadata model have constraints and/or formatting requirements. The following is a list of all of the fields, what they are, and their constraints. Fields not marked as optional are required, but may be left empty (as an empty string) or 0 (as a number) unless otherwise specified. All field names are case-sensitive.

Metadata entries cannot contain additional fields, except those allowed in customFields list or fields labeled as "custom." All other additional fields will be ignored and not stored or represented in any way.

Internal fields and manual assignment

The fields explained in the sections below are the fields that form the metadata about a given resource. However, there are several other fields used internally within Cafe Variome to enable specific features, like interlinking of metadata entries. These are not visible when using the editing interface, but may be manually assigned providing the data is accurate, and the admin conforms to the required processes to format them correctly.

sourceID

The UUID of the source. If omitted, Cafe Variome will assign a UUID to the source. If present, it should be a valid UUID4 string. This is used within each Cafe Variome instance to identify the source (UUID may not be unique accross instances). It can be used to link one metadata entry to another, for example, by filling the datasetIDs fields in the cohort model.

connectionID

The UUID of a data source this metadata entry describes. This is usually only valid when the metadata describes a dataset, but in rare cases can be assigned to other metadata. This field is not recommended to assign directly, as there is no other way to know the UUID of a data source except from checking the database.

Common fields

These are the fields that are present in all types of meta-sources.

sourceName

string Cannot be empty.

sourceType

string Cannot be empty. Constrained vocabulary values:

  • custom

  • cohort

  • catalog

  • biobank

  • registry

  • guideline

  • dataset

  • dataCollection

resourceURLs

array[string] optional Full URI format with schema.

publisher

object Nested JSON object.

publisherType

string Cannot be empty. Constrained vocabulary values:

  • individual

  • organization

  • agency

  • other

name

string Cannot be empty.

contactEmail

string Cannot be empty.

contactName

string optional.

url

string optional Full URI format with schema.

location

string optional.

description

string optional Can be empty but not recommended.

themes

array[string] optional URI format.

releaseLicense

string optional Full URI format with schema.

language

string optional Two-character code adhering to ISO639-1 standard, lower case.

customFields

object optional Key-value or key[values] pairs. Constraints:

  • Key must be a string and cannot contain special characters: `.`, `$`, `/`, or `\`.

  • If a key is present, the value cannot be `null` but can be an empty string or an empty array.

Cohort specific fields

These are the fields specific to the cohort type.

cohortDetails

object Nested JSON object.

siteType

string Cannot be empty. Constrained vocabulary values:

  • singleSite

  • multiSite

  • multiCountry

country

string Cannot be empty. Two-character code adhering to ISO3166-1 standard, upper case.

yearStart

integer Cannot be empty. Four-digit integer.

collectedTypes

object optional Nested JSON object.

participants

object optional Nested JSON object.

diseases

array[string] Cannot be empty. Constrained vocabulary values:

  • controlGroup

  • ad

  • pd

  • irbd

  • dlb

  • caa

  • ftd

  • als

  • psp

  • cbd

  • msa

  • hd

  • ataxia

  • other

Note: An empty array will cause the entire participants object to be ignored.

numberOfSubjects

integer Must be greater than 0. If 0, the entire participants object will be ignored.

bioSamples

array[string] optional Constrained vocabulary values:

  • csf

  • serum

  • plasma

  • dna

  • saliva

  • urine

  • stool

images

array[string] optional Constrained vocabulary values:

  • mri

  • petAmyloid

  • petTau

  • spect

  • ocular

cognitiveData

array[string] optional Constrained vocabulary values:

  • crossSectional

  • longitudinal

datasetIDs

array[string] optional UUID format. Datasets must be either present in the same file or already uploaded to the system.

Dataset Specific fields

These are the fields specific to the dataset type.

datasetVersions

array[object] Nested array of JSON objects.

datasetDetails

object Nested JSON object.

versionID

string optional Valid UUID 4 string. If omitted, a UUID will be automatically assigned.

versionCode

string optional Semantic versioning recommended. Non-semantic versions will disable version parsing, comparison, and sorting.

keywords

array[string] optional Array of keyword strings.

publishedDate

string optional Date string in the format of `YYYY-MM-DD`.

updateDate

string optional Date string in the format of `YYYY-MM-DD`.

datasetContent

object Nested JSON object.

numberOfSubjects

integer Must be greater than 0. May be an approximate number if exact count is confidential.

minAge

integer optional Must be greater than 0.

maxAge

integer optional Must be greater than 0.

countries

array[string] optional Array of two-character codes adhering to ISO3166-1 standard, upper case.

diseases

array[string] optional Constrained vocabulary values:

  • controlGroup

  • ad

  • pd

  • irbd

  • dlb

  • caa

  • ftd

  • als

  • psp

  • cbd

  • msa

  • hd

  • ataxia

  • other

sex

array[string] optional Constrained vocabulary values:

  • male

  • female

  • other

  • undifferential

  • unknown

clinical

array[string] optional Constrained vocabulary values:

  • comorbidities

  • medicationUse

  • familyHistory

  • ageOfSymptomOnset

  • clinicalDiagnosis

  • exposure

  • lifeStyleInfo

  • vitalSigns

markers

array[string] optional Constrained vocabulary values:

  • amyloid

  • tau

  • neurofilamentLightChain

  • alphaSynuclein

  • dat

images

array[string] optional Constrained vocabulary values:

  • mri

  • petAmyloid

  • petTau

  • spect

  • ocular

electrophysiology

array[string] optional Constrained vocabulary values:

  • eeg

  • meg

  • erp

dataTypes

array[string] optional Constrained vocabulary values:

  • demographics

  • clinical

  • lifestyle

  • functionalRatings

  • motor

  • neuropsychiatric

  • neuropsychological

  • qualityOfLife

  • sleepScales

  • digitalData

  • imaging

  • electrophysiology

  • neuroPathology

  • other

Data collection Specific fields

These are the fields specific to the data collection type.

dataCollectionDetails

object optional Nested JSON object.

keywords

array[string] optional Array of keyword strings.

publishedDate

string optional Date string in the format of `YYYY-MM-DD`.

updateDate

string optional Date string in the format of `YYYY-MM-DD`.

dataCollectionContent

object optional Nested JSON object.

numberOfSubjects

integer Must be greater than 0. May be an approximate number if exact count is confidential.

minAge

integer optional Must be greater than 0.

maxAge

integer optional Must be greater than 0.

countries

array[string] optional Array of two-character codes adhering to ISO3166-1 standard, upper case.

diseases

array[string] optional Constrained vocabulary values:

  • controlGroup

  • ad

  • pd

  • irbd

  • dlb

  • caa

  • ftd

  • als

  • psp

  • cbd

  • msa

  • hd

  • ataxia

  • other

sex

array[string] optional Constrained vocabulary values:

  • male

  • female

  • other

  • undifferential

  • unknown

clinical

array[string] optional Constrained vocabulary values:

  • comorbidities

  • medicationUse

  • familyHistory

  • ageOfSymptomOnset

  • clinicalDiagnosis

  • exposure

  • lifeStyleInfo

  • vitalSigns

markers

array[string] optional Constrained vocabulary values:

  • amyloid

  • tau

  • neurofilamentLightChain

  • alphaSynuclein

  • dat

images

array[string] optional Constrained vocabulary values:

  • mri

  • petAmyloid

  • petTau

  • spect

  • ocular

electrophysiology

array[string] optional Constrained vocabulary values:

  • eeg

  • meg

  • erp

dataTypes

array[string] optional Constrained vocabulary values:

  • demographics

  • clinical

  • lifestyle

  • functionalRatings

  • motor

  • neuropsychiatric

  • neuropsychological

  • qualityOfLife

  • sleepScales

  • digitalData

  • imaging

  • electrophysiology

  • neuroPathology

  • other

Metadata model examples

Here are some minimum and maximum examples of metadata models for different source types.

Minimum example for custom type:

{ "sourceName": "A minimum custom source", "sourceType": "custom", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceURLs": [ "https://www.example.com" ] }

Maximum example for custom type:

{ "sourceID": "8df136d8-7fb0-4bec-a72a-5deed972bbb6", "sourceName": "A maximum custom source", "sourceType": "custom", "publisher": { "publisherType": "organization", "name": "University of Leicester", "contactEmail": "brookeslab@le.ac.uk", "contactName": "John Doe", "url": "https://www.le.ac.uk", "location": "Leicester, UK, Europe" }, "resourceURLs": ["https://www.example.com"], "description": "This is a maximum example of a custom source", "themes": [ "https://example.com/theme1", "https://example.com/theme2" ], "releaseLicense": "https://opensource.org/licenses/MIT", "language": "en", "connectionID": "b1120b19-e631-46ad-915c-c964c8a278a2", "customFields": { "Some custom field": "Some value", "Another custom field": [ "Value 1", "Value 2" ] } }

Minimum example for cohort type:

{ "sourceName": "A minimum cohort", "sourceType": "cohort", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceURLs": ["https://www.example.com"], "cohortDetails": { "siteType": "singleSite", "country": "UK", "yearStart": 2023 } }

Maximum example for cohort type:

{ "sourceID": "a6e001cb-bb60-48b9-a47a-3dccee13c085", "sourceName": "A maximum cohort", "sourceType": "cohort", "publisher": { "publisherType": "organization", "name": "University of Leicester", "contactEmail": "brookeslab@le.ac.uk", "contactName": "John Doe", "url": "https://www.le.ac.uk", "location": "Leicester, UK, Europe" }, "resourceURLs": ["https://www.example.com"], "description": "This is a maximum example of a cohort", "releaseLicense": "https://opensource.org/licenses/MIT", "language": "en", "themes": [ "https://example.com/theme1", "https://example.com/theme2" ], "cohortDetails": { "siteType": "multiSite", "country": "UK", "yearStart": 2023 }, "collectedTypes": { "participants": { "diseases": [ "controlGroup", "ad", "hd" ], "numberOfSubjects": 1000 }, "bioSamples": [ "csf", "serum", "plasma", "dna", "saliva", "urine" ], "images": [ "mri", "petTau", "datScan" ], "cognitiveData": [ "crossSectional" ] }, "connectionID": "6c3968af-3d29-4f81-8747-b2337c1cc01b", "datasetIDs": [ "adbec8c2-9460-4814-9574-06a0dfe2efb5" ], "customFields": { "Some custom field": "Some value", "Another custom field": [ "Value 1", "Value 2" ] } }

Minimum example for dataset type:

{ "sourceName": "A minimum dataset", "sourceType": "dataset", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceURLs": ["https://www.example.com"], "datasetVersions": [ { "datasetDetails": { "versionName": "v1.0.0" }, "datasetContent": { "numberOfSubjects": 100 } } ] }

Maximum example for dataset type:

{ "sourceID": "adbec8c2-9460-4814-9574-06a0dfe2efb5", "sourceName": "A maximum dataset", "sourceType": "dataset", "publisher": { "publisherType": "organization", "name": "University of Leicester", "contactEmail": "brookeslab@le.ac.uk", "contactName": "John Doe", "url": "https://www.le.ac.uk", "location": "Leicester, UK, Europe" }, "resourceURLs": ["https://www.example.com"], "description": "This is a maximum example of a custom source", "themes": [ "https://example.com/theme1", "https://example.com/theme2" ], "datasetVersions": [ { "datasetDetails": { "versionID": "1b71b513-33be-45ee-b6e9-a24b2bc9dc05", "versionName": "v1.0.0", "keywords": [ "keyword1", "keyword2" ], "publishedDate": "2023-12-02", "updateDate": "2023-12-12" }, "datasetContent": { "numberOfSubjects": 100, "minAge": 18, "maxAge": 35, "countries": [ "UK", "US" ], "diseases": [ "controlGroup", "ad", "hd" ], "sex": [ "male", "female" ], "clinical": [ "lifeStyleInfo", "vitalSigns" ], "markers": [ "amyloid", "tau" ], "images": [ "mri", "petTau", "datScan" ], "electrophysiology": [ "eeg", "meg" ], "dataTypes": [ "demographics" ] } }, { "datasetDetails": { "versionID": "4114682d-73f5-45eb-9b7c-023e18cd12c9", "versionName": "v2.0.0", "keywords": [ "keyword3", "keyword4" ], "publishedDate": "2024-01-01", "updateDate": "2024-01-11" }, "datasetContent": { "numberOfSubjects": 200, "minAge": 20, "maxAge": 40, "countries": [ "UK", "US", "CA" ], "diseases": [ "controlGroup", "ad", "hd" ], "sex": [ "male", "female", "other" ], "clinical": [ "lifeStyleInfo", "vitalSigns" ], "markers": [ "amyloid", "tau", "neurofilamentLightChain" ], "images": [ "mri", "petTau", "datScan", "spect" ], "electrophysiology": [ "eeg", "meg", "erp" ], "dataTypes": [ "demographics", "clinical", "lifestyle" ] } } ], "releaseLicense": "https://opensource.org/licenses/MIT", "language": "en", "connectionID": "ac743200-c8ff-485e-a82d-45d0e636f862", "customFields": { "Some custom field": "Some value", "Another custom field": [ "Value 1", "Value 2" ] } }

Minimum example for dataCollection type:

{ "sourceName": "A minimum dataset", "sourceType": "dataset", "publisher": { "publisherType": "individual", "name": "John Doe", "contactEmail": "john@example.com" }, "resourceURLs": ["https://www.example.com"], "dataCollectionContent": { "numberOfSubjects": 100 } }

Maximum example for dataCollection type:

{ "sourceID": "adbec8c2-9460-4814-9574-06a0dfe2efb5", "sourceName": "A maximum dataset", "sourceType": "dataset", "publisher": { "publisherType": "organization", "name": "University of Leicester", "contactEmail": "brookeslab@le.ac.uk", "contactName": "John Doe", "url": "https://www.le.ac.uk", "location": "Leicester, UK, Europe" }, "resourceURLs": ["https://www.example.com"], "description": "This is a maximum example of a custom source", "themes": [ "https://example.com/theme1", "https://example.com/theme2" ], "dataCollectionDetails": { "keywords": [ "keyword1", "keyword2" ], "publishedDate": "2023-12-02", "updateDate": "2023-12-12" }, "dataCollectionContent": { "numberOfSubjects": 100, "minAge": 18, "maxAge": 35, "countries": [ "UK", "US" ], "diseases": [ "controlGroup", "ad", "hd" ], "sex": [ "male", "female" ], "clinical": [ "lifeStyleInfo", "vitalSigns" ], "markers": [ "amyloid", "tau" ], "images": [ "mri", "petTau", "datScan" ], "electrophysiology": [ "eeg", "meg" ], "dataTypes": [ "demographics" ] }, "releaseLicense": "https://opensource.org/licenses/MIT", "language": "en", "connectionID": "ac743200-c8ff-485e-a82d-45d0e636f862", "customFields": { "Some custom field": "Some value", "Another custom field": [ "Value 1", "Value 2" ] } }

Uploading the metadata file

To upload the metadata file, go to the admin interface, and click on the "Create Meta Source" button.

interface-landing_page_dashboard.png
interface-meta_source_upload.png

Then, set the meta source type to "all" to use the file uploader. Multiple files can be selected, and selected files can be removed before finalising the upload.

interface-file_pick_multiple.png
interface-meta_source_file_selected.png

Once the files are decided, click the "Process" button to start reading and processing the files. All of the metadata entries will be read in the front end, sanitized and validated, and then sent to the server for storage. After processing, you will be prompted with the number of metadata entries it read in, and to confirm the upload. Once confirmed, the metadata entries will be stored in the database, and will be available for search and discovery.

Last modified: 11 September 2024