Requirements & results
Output
The goal of the software is the following: pseudonymized textual health records where the personally identifiable information is replaced by pseudo replacements for secondary health usage.
Requirements
In order to be able to use project blackbar, you will need a few things
- Documents with texts
- A Docker (version 24 or higher) or Podman environment which you need in order to
- start up the different webapps
- automate the anonymization and pseudonymization
- train and store the anonymization models
- host the applications and the API’s
This gets you started with the annotation so that you can provide already provide pseudonymized texts in adhoc runs to internal partners. For the automation part, you need:
- A database with documents which you would like to anonymize and pseudonymize
- A free or corporate account at Prefect to be able to monitor the anonymization progress at app.prefect.cloud or your own hosted version of Prefect
- Familiarity with Docker based deployments
- Familiarity with Python for testing the setup
We advise to run the software on an Linux machine with Ubuntu (e.g. 22.04) with 4 cores and at least 32 Gb of RAM within your own datacenter. Required disk space in a normal hospital which has roughly 2.5Mio patients would be 500Gb. Such a setup is able to anonymize/pseudonymize about 4000000 text records per day.
The webapps of project blackbar can be deployed behind an Apache webserver. Make sure you have an SSL certificate for your server and an (internal) URL where you want to host the apps.
Steps to get started
The following is the high-level list of steps to get you going
Basics
- Make sure you have a server with Docker which has access to internal data
- Launch the Docker container with Inception in order to annotate texts (see the section on apps and the section on training data)
- Use the cockpit app to import texts (see section apps), pre-annotate texts with an existing model from a partner hospital and automatically pseudonymize your manual annotations
Automation
- Set up the database with documents (see the database section) and test connectivity (see get-started-part1, get-started-part2, get-started-part3)
- Train a model on your own annotations (see the training section)
- Launch the anonymization and pseudonymization flows (see section automation)
- Inspect the results in the apps (see section apps)
- See behaviour of the entity detection and pseudonymization
- Validate where needed
- Export
- Optionally launch the API (see section api)
Use AI
- Use the included Generative AI apps to see how AI models work on the pseudonymized texts