Deploying Backend

Overview

The backend of Cafe Variome V3 is developed in Python and optimised for high efficiency. It is designed to run seamlessly in both clustered and cloud environments. This document outlines how to deploy the backend on a single server and how to configure the necessary supporting software to ensure everything works correctly.

Prerequisite

If running Cafe Variome V3 backend from source, the following dependencies are required on the server:

Python 3.11. The backend requires Python 3.11 or later due to specific syntax and asyncio features. While Python 3.12 may work in theory, it has not been thoroughly tested. For best compatibility, it’s recommended to use the latest release of Python 3.11.

The following services are also required, but they do not need to be on the server, just in a reachable location:

KeyCloak 23+. Keycloak is an open-source OIDC provider maintained by Red Hat. CV3 uses OIDC for user authentication and management, as well as for advanced token handling supported by Keycloak. If you prefer to use a different authentication provider, it's recommended to connect it through Keycloak’s identity brokering feature. While CV3 may work with other OIDC providers, this is untested and not officially supported.
Vault by HashiCorp 1.15+. Vault is an all-in-one secret management tool and cryptographic engine. CV3 uses vault to store sensitive information, and to encrypt/sign critical traffic for security.
MongoDB 6.0+. CV3 uses MongoDB to store all system data, including configuration and ingested data used for querying. Both standalone and clustered MongoDB deployments are supported, as CV3 does not depend on advanced cluster-specific features.
Redis 6.0+. CV3 uses Redis as a cache and a message broker. It's used to store temporary data, and to pass messages between different parts of the system.

For detailed instructions on how to set up these services, refer to Dependent Services Configuration.

Configuring the CV3 backend

The Cafe Variome V3 backend requires a configuration file before it can start. This file holds all essential settings that should not be modified during runtime. To begin, copy the sample configuration:

cp instance_config.json.example instance_config.json

The content of the JSON file resembles:

{
  "MongoDB": {
    "Host": "localhost",
    "Port": "27017",
    "User": "cafevariome",
    "Password": "cafevariome",
    "Database": "cafevariome",
    "MaxJobs": 40
  },
  "Redis": {
    "Host": "localhost",
    "Port": "6379",
    "Cluster": false
  },
  "Keycloak": {
    "Client": "test_client",
    "Realm": "cafe_variome",
    "URL": "http://localhost:8080",
    "BackendURL": "http://localhost:8080",
    "RedirectURLs": [
      "http://localhost:49430/callback.html",
      "http://localhost:49430/callback-silent.html"
    ]
  },
  "Vault": {
    "Host": "http://localhost:8200",
    "TransitPath": "transit_cv3",
    "KV2Path": "kv",
    "KV2Prefix": "cv3"
  },
  "CORS": {
    "AllowOrigin": [
      "*"
    ],
    "AllowMethods": [
      "GET",
      "POST",
      "PUT",
      "PATCH",
      "DELETE",
      "OPTIONS"
    ],
    "AllowHeaders": [
      "Content-Type",
      "Authorization",
      "Network-Id"
    ]
  },
  "Metrics": {
    "Prometheus": {
      "Enabled": false,
      "Path": "/metrics",
      "Key": ""
    }
  },
  "Email": {
    "From": {
      "Address": "admin@cafevariome.org",
      "Name": "Cafe Variome Admin"
    },
    "Sender": "noreply@system.le.ac.uk",
    "SMTP": {
      "Host": "localhost",
      "Port": 25,
      "Authentication": {
        "Required": false,
        "Username": "",
        "Password": ""
      }
    }
  },
  "Legacy": {
    "NexusMode": {
      "Enabled": false,
      "AccessTokenEnabled": true
    },
    "Query": true
  },
  "Logging": {
    "Name": "Cafe Variome V3",
    "Level": "INFO",
    "MaxBytes": 10485760,
    "BackupCount": 20,
    "SplunkHEC": {
      "Enabled": false,
      "HecEndpoint": "",
      "Token": ""
    },
    "Loki": {
      "Enabled": false,
      "Endpoint": "",
      "Tags": {},
      "Auth": {
        "Enabled": false,
        "Username": "",
        "Password": ""
      }
    }
  },
  "Data": {
    "UploadPath": "data_source/",
    "ValidFileFormats": [
      "vcf",
      "csv"
    ],
    "ChunkSize": 1024
  }
}

Here is a breakdown of the configuration:

MongoDB configs

This section defines the data storage settings for the instance. Most options are straightforward. The MaxJobs setting specifies the maximum number of concurrent connections used during intensive database operations, such as data ingestion. Regular, lightweight operations are not limited by this setting.

If you're running in a cluster mode, keep in mind that MaxJobs refers to the number of connections EACH instance will maintain - not the total across the entire cluster. Be sure to scale this setting appropriately.

For example, if MaxJobs is set to 10, a single instance will hold open up to 10 connections during intensive operations like data ingestion, until the task completes. Other quick tasks (such as typical CRUD operations) are not limited by this setting and may use additional connections. This means the total number of connections can exceed 10, but only 10 will be dedicated to heavy operations.

In a cluster with three instances, each maintaining 10 ingestion connections, the total number of concurrent connections across the cluster would be 30.

Redis configs

Redis configuration is simple, with only the host and port to be set. As of now no authentication or ACL is supported. If the redis server is a cluster, set the Cluster to true.

Keycloak configs

The primary OIDC provider for CV3 must be a Keycloak server, rather than a different OIDC provider. This is because CV3 relies not only on standard OIDC flows for user authentication, but also on Keycloak-specific configurations and features to manage users and access tokens.

The configuration includes all required information to connect to the Keycloak server. It also specifies the backend service URL and a list of all supported redirect URIs. Note that the client secret is not stored in the configuration file - it is securely managed within HashiCorp Vault.

Vault configs

Within a single HashiCorp Vault server, multiple secret engines can exist, and several applications may share the same engine. For CV3, two specific secret engines are required:

A transit engine, used to store user keys and handle encryption/decryption of data payloads.
A KV (Key-Value) engine, used to store other secrets such as the Keycloak client secret.

The TransitPath defines the path to the transit engine, while KV2Path defines the path to the KV engine. The KV2Prefix is the prefix applied to all keys stored in the KV engine.

It’s recommended to isolate access between different applications using Vault by applying Vault policies to distinct path prefixes.

CORS configs

The only setting that typically needs to be updated in the CORS configuration is the AllowedOrigin header. Once the external URLs are finalised, this should be set to the origin(s) where the frontend is hosted. Avoid changing the other headers unless you have a specific reason to extend the allowed methods or headers. Do not remove any existing headers, as this may prevent the backend from functioning correctly when accessed via a web browser.

Metrics configs

The CV3 backend includes built-in support for Prometheus metrics collection, exporting several application-level metrics. These do not conflict with or duplicate metrics from other sources, such as MongoDB exporters. When enabled, metrics will be collected and made available at the specified sub-path. If the Key parameter is set, access to the metrics endpoint will require an Authorization header containing the API key. Prometheus can be configured to use bearer authentication to scrape this endpoint.

Email configs

This configuration is essential, as the CV3 backend relies on email to notify admins about important events - such as access requests or critical issues like database inconsistencies. Most hosting environments include a built-in SMTP server; if not, it's recommended to use a third-party service that supports SMTP delivery. If this configuration is missing or misconfigured, CV3 will be unable to send alerts or deliver user credentials. As a result, key features such as BEACON endpoints, Nexus mode, and access request handling may fail silently.

Legacy configs

This section controls legacy features for compatibility with Cafe Variome V2 (CV2) instances. Nexus mode refers to the functionality previously known as Cafe Variome Net - a centralised network server that facilitates federation between CV2 nodes. When Nexus mode is enabled, this CV3 instance can act as the net server for CV2 instances, allowing admins to manage them via the Nexus page.

The access token field is used for a service account and requires a recent version of the CV2 codebase. It secures communication between CV2 and CV3. The Query option determines whether CV2 instances are allowed to send queries to this CV3 instance:

When Query is set to true, the CV3 instance will appear as an installation named nexus in the CV2 networks and will accept incoming queries from CV2 nodes. However, it will not federate these queries outward, as the access control models differ between CV2 and CV3.
When Query is set to false, CV2 instances cannot query this CV3 instance. However, CV3 will still be able to query CV2 instances automatically, if Nexus mode is enabled.

Logging configs

This configures the logging behaviour of the CV3 server. It sets the log level for the Quart application, which controls how application-level logs are captured. Note that this does not affect the ASGI server used to run the app - you’ll need to configure that separately depending on which ASGI server you're using. If desired, logs can also be forwarded to external systems using SplunkHEC or Loki, enabling integration with log analysis tools for real-time processing and monitoring.

CV3’s built-in file logger supports automatic log rotation. When a log file exceeds the limit set by MaxBytes, it is rotated, and a new log file is started. The number of archived log files kept is defined by BackupCount. Once this limit is reached, the oldest log file is deleted, helping prevent excessive disk usage over time.

Deploying the Cafe Variome V3 backend

The CV3 backend can be deployed in three ways: from source code, using pre-compiled binaries, or via Docker containers. Docker containers are the recommended option for production environments, as they simplify deployment, scaling, and maintenance. If your hosting environment does not support Docker, the pre-compiled binaries are a good alternative - they include all required libraries and dependencies as statically linked files, ensuring consistent behaviour across systems. Source code deployment is intended primarily for development purposes or when no other deployment method is feasible.

The first thing to do is to install dependencies with pip. Run the following command in the downloaded source folder (with Python and pip available in PATH, you may need to activate the conda environment beforehand):

pip install -r requirements.txt

To run each component from source, you’ll need a configuration file placed in each component’s folder. These config files can be identical or customised individually, depending on your setup needs. Refer to the guide above for instructions on modifying the config content. Once ready, use the provided script to copy the configuration into the correct location and to set the Vault credentials.

            ./CafeVariomeV3 update-config
            export VAULT_ROLE_ID=... # Role ID for AppRole authentication, ensure this role have access to the secret paths
            export VAULT_SECRET_ID=... # Secret ID for AppRole authentication
        

The backend is now ready to start. However, before it can function properly, the storage must be bootstrapped with the correct configuration. This can be done using the CLI:

./CafeVariomeV3 cli

Type install, and follow the interactive process to finish the setup.

The binaries are pre-compiled Python code, built using nuitka. In theory they should behave exactly like the source code. To use the binaries, first download them in from the GitHub Release page. Then, extract the files to a location where the backend will be run. The config file is located in cv3-backend/etc/instance_config.json. Modify the file to match your environment, and set the Vault credentials:

            export VAULT_ROLE_ID=... # Role ID for AppRole authentication, ensure this role have access to the secret paths
            export VAULT_SECRET_ID=... # Secret ID for AppRole authentication
        

The backend is now ready to start. However, before it can function properly, the storage must be bootstrapped with the correct configuration. This can be done using the CLI:

./CafeVariomeV3 cli

Type install, and follow the interactive process to finish the setup.

Docker is the recommended way to deploy the CV3 backend. Below is an example docker compose file to deploy the backend:

services:
  cv3-backend-admin:
    image: brookeslab/cv3-backend-admin:latest
    restart: unless-stopped
    network_mode: "host"
    depends_on:
      - cv3-backend-database-manager
      - cv3-backend-scheduler
    environment:
      VAULT_ROLE_ID: <Vault Role ID here>
      VAULT_SECRET_ID: <Vault Secret ID here>
    volumes:
      - ./config/backend_config.json:/app/instance_config.json
      - ./logs:/app/logs

  # cv3-backend-cli:
  #   image: brookeslab/cv3-backend-cli:latest
  #   restart: unless-stopped
  #   network_mode: "host"
  #   depends_on:
  #     - cv3-backend-database-manager
  #     - cv3-backend-scheduler
  #   environment:
  #     VAULT_ROLE_ID: <Vault Role ID here>
  #     VAULT_SECRET_ID: <Vault Secret ID here>
  #   volumes:
  #     - ./config/backend_config.json:/app/instance_config.json
  #     - ./logs:/app/logs

  cv3-backend-database-manager:
    image: brookeslab/cv3-backend-database-manager:latest
    restart: unless-stopped
    network_mode: "host"
    environment:
      VAULT_ROLE_ID: <Vault Role ID here>
      VAULT_SECRET_ID: <Vault Secret ID here>
      KEYCLOAK_CLIENT_SECRET: <Your Keycloak Client Secret here>
      ADMIN_EMAIL: demo@cafevariome.org
      ADMIN_AFFILIATION: CafeVariome
    volumes:
      - ./config/backend_config.json:/app/instance_config.json
      - ./logs:/app/logs

  # cv3-backend-exporter:
  #   image: brookeslab/cv3-backend-exporter:latest
  #   restart: unless-stopped
  #   network_mode: "host"
  #   depends_on:
  #     - cv3-backend-database-manager
  #     - cv3-backend-scheduler
  #   volumes:
  #     - ./config/backend_config.json:/app/instance_config.json
  #     - ./logs:/app/logs

  cv3-backend-network:
    image: brookeslab/cv3-backend-network:latest
    restart: unless-stopped
    network_mode: "host"
    depends_on:
      - cv3-backend-database-manager
      - cv3-backend-scheduler
    environment:
      VAULT_ROLE_ID: <Vault Role ID here>
      VAULT_SECRET_ID: <Vault Secret ID here>
    volumes:
      - ./config/backend_config.json:/app/instance_config.json
      - ./logs:/app/logs

  # cv3-backend-nexus:
  #   image: brookeslab/cv3-backend-nexus:latest
  #   restart: unless-stopped
  #   network_mode: "host"
  #   depends_on:
  #     - cv3-backend-database-manager
  #     - cv3-backend-scheduler
  #   environment:
  #     VAULT_ROLE_ID: <Vault Role ID here>
  #     VAULT_SECRET_ID: <Vault Secret ID here>
  #   volumes:
  #     - ./config/backend_config.json:/app/instance_config.json
  #     - ./logs:/app/logs

  cv3-backend-query:
    image: brookeslab/cv3-backend-query:latest
    restart: unless-stopped
    network_mode: "host"
    depends_on:
      - cv3-backend-database-manager
      - cv3-backend-scheduler
    environment:
      VAULT_ROLE_ID: <Vault Role ID here>
      VAULT_SECRET_ID: <Vault Secret ID here>
    volumes:
      - ./config/backend_config.json:/app/instance_config.json
      - ./logs:/app/logs

  cv3-backend-query-compiler:
    image: brookeslab/cv3-backend-query-compiler:latest
    restart: unless-stopped
    network_mode: "host"
    depends_on:
      - cv3-backend-database-manager
      - cv3-backend-scheduler
    environment:
      VAULT_ROLE_ID: <Vault Role ID here>
      VAULT_SECRET_ID: <Vault Secret ID here>
    volumes:
      - ./config/backend_config.json:/app/instance_config.json
      - ./logs:/app/logs

  cv3-backend-query-meta:
    image: brookeslab/cv3-backend-query-meta:latest
    restart: unless-stopped
    network_mode: "host"
    depends_on:
      - cv3-backend-database-manager
      - cv3-backend-scheduler
    environment:
      VAULT_ROLE_ID: <Vault Role ID here>
      VAULT_SECRET_ID: <Vault Secret ID here>
    volumes:
      - ./config/backend_config.json:/app/instance_config.json
      - ./logs:/app/logs

  cv3-backend-scheduler:
    image: brookeslab/cv3-backend-scheduler:latest
    restart: unless-stopped
    network_mode: "host"
    depends_on:
      - cv3-backend-database-manager
    environment:
      VAULT_ROLE_ID: <Vault Role ID here>
      VAULT_SECRET_ID: <Vault Secret ID here>
    volumes:
      - ./config/backend_config.json:/app/instance_config.json
      - ./logs:/app/logs

  cv3-frontend-admin:
    image: brookeslab/cv3-frontend-admin:latest
    restart: unless-stopped
    ports:
      - '5080:80'
    volumes:
      - ./config/frontend_admin_config.json:/usr/share/nginx/html/assets/assets/config.json

  cv3-frontend-query:
    image: brookeslab/cv3-frontend-query:latest
    restart: unless-stopped
    ports:
      - '5081:80'
    volumes:
      - ./config/frontend_query_config.json:/usr/share/nginx/html/assets/assets/config.json

  cv3-frontend-query-meta:
    image: brookeslab/cv3-frontend-query-meta:latest
    restart: unless-stopped
    ports:
      - '5082:80'
    volumes:
      - ./config/frontend_query_meta_config.json:/usr/share/nginx/html/assets/assets/config.json

  nginx:
    image: nginx:mainline-alpine3.18
    restart: unless-stopped
    network_mode: "host"
    ports:
      - '18080:80'
    volumes:
      - ./config/reverse_proxy.nginx.conf:/etc/nginx/nginx.conf

Running the Cafe Variome V3 backend

After all necessary services and config file in place, the CV3 backend can be started.

The backend services are either Quart ASGI apps, or process pool applications built using aiomultiprocess. To start a specific service, you can run its corresponding Python script directly. For example:

            cd cv3-backend-admin  # Change to the admin app folder
            export PYTHONPATH="./src:$PYTHONPATH"  # Add the source folder to the Python path
            python src/cv3_backend_admin/admin.py  # Run the admin app
            # Or optionally use your preferred ASGI server
            hypercorn cv3_backend_admin.asgi:app
        

Or to use the management script to start all services:

                # To install the script to PATH
                ./CafeVariome3.sh install <a_location_you_have_write_access>
                # To start the server
                CafeVariome3 start
                # To stop the server
                CafeVariome3 stop
                # To restart the server
                CafeVariome3 restart
            

To modify any startup parameters, modify the script:

                #!/bin/bash
                CV3_APP_PATH="${CV3_APP_PATH:-.}"
                CV3_APP_MODULE="${CV3_APP_MODULE:-cvf_app.app:app}"
                CV3_LOGFILE="${CV3_LOGFILE:-$CV3_APP_PATH/hypercorn.log}"
                CV3_ACCESS_LOG_FILE="${CV3_ACCESS_LOG_FILE:-$CV3_APP_PATH/access.log}"
                CV3_ERROR_LOG_FILE="${CV3_ERROR_LOG_FILE:-$CV3_APP_PATH/error.log}"
                CV3_PIDFILE="${CV3_PIDFILE:-$CV3_APP_PATH/hypercorn.pid}"
                CV3_BIND="${CV3_BIND:-127.0.0.1:5000}"
                CV3_GRACEFUL_TIMEOUT="${CV3_GRACEFUL_TIMEOUT:-30}"
                CV3_HYPERCORN_PATH="$(which hypercorn)"
            

The parameters listed above appear at the beginning of the script and can be modified as needed. Alternatively, they can be overridden using environment variables. For example:

                export CV3_BIND="0.0.0.0:8000"
                ./CafeVariome3.sh start
            

The binary build uses Nuitka’s multi-project build feature, which combines multiple binaries into a single package to share statically linked libraries. To run these binaries, they must either be renamed to match the expected entry point name, or executed with a different arg[0]. For example:

            cp backend admin
            ./admin