Frequently Asked Questions
It refers to the black bar which is put in front of the eyes of people when they are anonymized in an interview, on a television broadcast or in a comic strip.
- Based on tests done in a local hospital where we let people label if a text was pseudonymized or not, persons can no longer see the difference between a real text and a pseudonymized text.
- If you perform anonymization using NLP only methods you generally have around 92%-98% of the entities which are retrieved (recall) with a lower recall for addresses as these are the complex ones
- If you have internally the information of the persons involved in the texts (addresses of patients and doctors and the information which doctors are treating a patient) which allows to use the Smith-Waterman algorithm, that increases the recall to 98-99% on addresses/names such that about 1-2% of the names/address in texts are not detected.
Yes, you can get these upon request to speed up your setup, contact us here
Yes. Your data does not need to be in the cloud, you can run the software on your premises or in the cloud, whichever you prefer.
Regarding public internet access.
- The server where you run the anonymization/pseudonymization should have access to the Prefect scheduler at api.prefect.io if you run use Prefect Cloud as the scheduler.
- The server which runs the applications should have access as well to the container registries at registry.datatailor.be and ghcr.io/bnosac to obtain the software.
- If you build the Docker images yourself instead of getting it from the above registries, you need access to the git repositories listed in the architecture section.
We have tested the different scenarios where text is stored in an IRIS database, a SQLite database and a PostgreSQL database. Other database are possible but have not been tested.
The different status fields are there to keep track of the progress as if you have large volumes of data, it allows to indicated what is already done for each set of text.
- 0: anonymization is in progress
- 1: anonymization is done
- 2: pseudonymization is done
- x: other codes for you to keep track
- NULL: no anonymization is done / redo anonymization & pseudonymization
Certainly, you can contact us here to request a token. If you want to clone the repositories to look at the code, you can use the token e.g. as follows.
git clone https://<username>:${BLACKBAR_GITHUB_PAT}@github.com/bnosac/blackbar-py.git
git clone https://<username>:${BLACKBAR_GITHUB_PAT}@github.com/bnosac/textalignment.git
git clone https://<username>:${BLACKBAR_GITHUB_PAT}@github.com/bnosac/rlike.git
git clone https://<username>:${BLACKBAR_GITHUB_PAT}@github.com/bnosac/blackbar-docker.git
Certainly, contact us here