Biomedical Term Service Help

Deploying BTS

CV-BTS is a backend-only system, designed to be queried by CV3 and its supporting services. However, it can also be used independently or to support other biomedical-related tools.

Overview

The BTS backend is written in Python, powered by Quart. When in operation mode, it does not write into the databases, so it's safe to be scaled horizontally. All requests are RESTful, so no session management is required.

Dependencies

The following language and libraries are required to run BTS:

  • Python 3.11+

  • Packages listed in requirements.txt

The following software and tools are required to run BTS:

  • A web server (e.g., Nginx) for reverse proxy and HTTPS

  • An ASGI server (Hypercorn if running from the script directly)

  • MongoDB

  • Neo4j

  • Redis

The hardware requirements are:

  • 2 CPU cores (4 recommended), as many as needed for calculating semantic similarity.

  • 2 GB RAM to start (excluding the database requirements), 8 GB to serve a reasonable amount of requests, 64 GB if building the graph locally with up to 12 workers.

  • 30 GB storage, 60 GB if building the graph locally.

Installation

The BTS can be run from source, from a Docker container, from a pre-built binary, or as a Python package.

From source

Prepare the Python environment first:

git clone https://github.com/CafeVariomeUoL/cv3-bioterms.git cd cv3-bioterms pip install . cp config.json.example config.json export BIOPORTAL_API_KEY=YOUR_BIOPORTAL_API_KEY export NHS_TRUD_API_KEY=YOUR_NHS_TRUD_API_KEY

The two API keys are required to download the ontology terms from BioPortal and NHS TRUD. However, if you're using the database dump we provide, the API keys are not needed.

From Docker

The Docker images include the CLI component. However, since running the CLI from a container can be cumbersome, the images also support automatic downloading of data and database initialization. As the project requires multiple external databases, it's recommended to use Docker Compose to start the service. An example Docker Compose file is provided in the main repository:

services: cv3-bioterms: image: brookeslab/cv3-bioterms:latest ports: - '3000:3000' environment: - BIOPORTAL_API_KEY=YOUR_BIOPORTAL_API_KEY - NHS_TRUD_API_KEY=YOUR_NHS_TRUD_API_KEY volumes: - ./config/bioterms_config.json:/app/config.json networks: - cv3_bioterms mongodb: image: mongo:7.0.11 restart: always ports: - "27017:27017" volumes: - mongodb_data:/data/db networks: - cv3_bioterms neo4j: image: neo4j:5.22.0 restart: always ports: - "7474:7474" - "7687:7687" volumes: - neo4j_data:/data networks: - cv3_bioterms redis: image: redis:7.4 restart: always ports: - '6379:6379' command: redis-server volumes: - redis_data:/data networks: cv3_bioterms: driver: bridge volumes: mongodb_data: neo4j_data: redis_data:

When running in docker, the database will be automatically populated. To disable this behavior, set environment variable AUTO_LOAD to false. The API keys are used to download the files. However, due to restrictions on the various data sources, not all data can be downloaded from the API. Some data files require manual downloading.

From binary

We provide portable binary package releases for Linux environments. The binary is built using nuitka, a tool to compile Python code to C first, then executables. The resulting binary contains all necessary dependencies, and should perform consistently on all Linux distributions, provided that the glibc version is compatible. The binary also does not require Python or any other runtime to be used, and may have a slightly better performance due to the compilation methods used by nuitka. We do not currently provide binary release for Windows or MacOS, as we do not anticipate use cases with any OS other than Linux servers.

To use the binary release, download it from either the GitHub release page, or our artifact repository. The binaries in our artifact repository contain a "nightly" build, which keeps up to date with the main branch (if the test cases pass). However, due to storage limitations, older artifacts may be removed. If you're looking for a release older than 180 days, you may need to build it from source or check the GitHub repository.

After downloading the tarball, extract it to a directory, and run the script:

tar -xzf cv3-bioterms-nightly.tar.gz cd cv3-bioterms ./BioTermService.sh

From Python package

Using the CLI to load data

The CLI is used to download and load the data into the databases. In production, it might be the case that only part of the data is needed, so the CLI can selectively download and initialize only part of the graph.

# When running from source code, use the script ./BioTermService.sh cli # When running from binary, use the binary ./app cli ________ ___ ___ ________ ___ ________ _________ _______ ________ _____ ______ |\ ____\|\ \ / /| |\ __ \|\ \|\ __ \|\___ ___\\ ___ \ |\ __ \|\ _ \ _ \ \ \ \___|\ \ \ / / / \ \ \|\ /\ \ \ \ \|\ \|___ \ \_\ \ __/|\ \ \|\ \ \ \\\__\ \ \ \ \ \ \ \ \/ / / \ \ __ \ \ \ \ \\\ \ \ \ \ \ \ \_|/_\ \ _ _\ \ \\|__| \ \ \ \ \____\ \ / / \ \ \|\ \ \ \ \ \\\ \ \ \ \ \ \ \_|\ \ \ \\ \\ \ \ \ \ \ \ \_______\ \__/ / \ \_______\ \__\ \_______\ \ \__\ \ \_______\ \__\\ _\\ \__\ \ \__\ \|_______|\|__|/ \|_______|\|__|\|_______| \|__| \|_______|\|__|\|__|\|__| \|__| Database reachable CTV3 terms loaded Gene Symbol (HGNC Standard) terms loaded HGNC terms loaded HPO terms loaded NCIT terms loaded OMIM terms loaded Orphanet terms loaded Reactome terms loaded SNOMED CT terms loaded >

The CLI has an autocomplete and help feature, so you can follow the built-in guide to load the data.

Last modified: 28 March 2025