Data Indices
Cafe Variome V3 utilises a dynamic query interface, which automatically displays or hides query elements based on the data available in the database. To achieve this, it regularly indexes the databases, generating indices that are then distributed across the entire network.
Record Level Indices
Record level indices are stored in RecordIndex
objects. They follow this format:
Capability
The capability
field identifies the types of data available in the database. The example above shows that the database contains subject data (demographic data), HPO terms, Gene codes, and OIDC terms, among others.
The demographic data is somewhat unique. Most general-purpose ontologies contain their own terms for demographic descriptions, which can cause confusion when multiple ontologies are used simultaneously. For instance, if subject gender is stored using NCIT terms, but the user searches using SNOMED Clinical Terms, the system won't automatically translate between these different ontologies. Given that demographic filters are frequently utilised, we have introduced a generic subject
filter to handle ontology translations. The subjectCapability
field reports the specific demographic filters supported by the database.
Record Count
The record count shows the number of subjects present in the specified resource (which could be a source, an instance, a network, etc.). This count is used to calculate scores, such as response quality and differential privacy metrics.
EAV Index
Since the system cannot accommodate every possible data model, we've implemented a generic Entity-Attribute-Value (EAV) model for each subject and developed a query system to search these fields. The eavIndex
field captures information about the attributes present, their possible values, and their corresponding display names. The query interface dynamically populates using this data.
The eavIndex
object contains three fields:
attributes
: A key-value pair representing attribute names and their corresponding display names. The attribute name refers to the field stored in the database, which also appears in theSubject
object when detailed subject information is returned to the frontend. However, users only see the user-friendly display name, as the frontend manages this translation.values
: A key-value pair listing the possible attribute values and their display names. This information is used to populate dropdown menus or checkboxes within the query interface.mappings
: Defines the relationships between attributes and their values. Each attribute appears as a key, and its value is an array of possible attribute values (using actual field names rather than display names). This mapping is only included for alphanumeric attributes with categorical values. When a mapping exists, the frontend assumes the attribute must contain one or more of the provided values and therefore displays a dropdown, checkbox selection, or a free-text box for regex searches. If an attribute isn't included in the mappings, the frontend assumes it is numerical and presents a free-text box alongside numerical operators (>
,<
,=
, etc.).