Allow to use a local Prefect Server instead of Prefect Cloud
Prefect anonymization / pseudonymization uses workers instead of agents
Based on base docker containers version 0.3.0 at https://github.com/bnosac/blackbar-docker/releases/tag/0.3.0
upgrade to Ubuntu 24.04 of VS Code / RStudio
upgrade to Debian trixie of base image, apps and as well modelbuilding images and images of anonymization / pseudonymization flows
upgrade to Prefect 3.4.22-python3.10
connectivity to Inception updated to pycaprio 0.3.0
Anonymization
Add extra tag: ID_Physician to tag list in the Inception projects for anonymization
Blackbar apps shinyproxy config
Add default container names for BLACKBAR_IMAGE_MODELBUILDING, BLACKBAR_IMAGE_PREFECT_ANONYMIZATION, BLACKBAR_IMAGE_PREFECT_PSEUDONYIMIZATION
Cockpit app
Fix showing predefined query for model building training data
Add public certificates of UZA
Add public certificates of Saint-Luc
Entrypage contains link to local Prefect UI
Bug fixes
Anonymization deployment: no longer print all fetched document id’s to anonymize to avoid log limit issues in Prefect in case of large doc_ids
Handle special characters in the user name when getting the CAS from Inception
Based on blackbar-py version 0.3.3
1.1
Enables to use blackbar as well as with Intersystems IRIS and SQLite, with Microsoft SQL Server databases, MySQL and MariaDB
Based on blackbar-py version 0.3.2
Allow to build a model in the cockpit frontend using a combination of manual annotations in projects and automated annotations stored in blackbar_document combining an existing model with Smith-Waterman lookup
1.0
Added cockpit frontend webapp
Manage projects
Launch automated anonymization/pseudonymization + progress of annotation on the database
Launch model building
Inspect annotation progress
Easily import data from files / database to annotation frontend
Allow automatic anonymization of imported texts for quicker labelling
Allow automatic pseudonymization of labelled texts for quicker pseudonymization
Allow validation of anonymization / pseudonymization
Inspect logs
Allowed dynamic model selection in inspection + pseudonymization inspection + entity detector inspector
Allow usage of LLM models for edge case pseudonymization
Added app for talking to patients documents using LLM
Added app for talking to an LLM
Based on blackbar-py version 0.3.1
Based on prefecthq/prefect version 2.15.0-python3.10
Based on base docker containers version 0.2.2 at https://github.com/bnosac/blackbar-docker/releases/tag/0.2.2
Increase ShinyProxy to 3.1.1 to enable later versions of Java and Docker >= 0.25 (https://github.com/bnosac/blackbar/issues/34)
Allow to have different table names as the default ones
Allow to have versioned models
Allow to build models based on manual annotations and automated annotations
Added details on Authentication alongside Keycloak and Microsoft Entra ID
Allow in the apps to switch to a test SQLite database
Provide docker containers for Inception annotation engine alongside Minio
Based on blackbar-py version 0.2.0
Based on prefecthq/prefect version 2.15.0-python3.10
Based on base docker containers version 0.1 at https://github.com/bnosac/blackbar-docker/releases/tag/0.1
blackbar-base: image based on prefecthq/prefect, version 2.15.0-python3.10 which is based on Ubuntu 20.04
blackbar-base-apps: image based on ghcr.io/bnosac/blackbar-base + shiny and commonly used shiny extensions (rmarkdown/shinydashboard/slib/fontawesome/knitr/reticulate)
blackbar-inception-minio: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z)
blackbar-inception-minio-mariadb: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z) and MariaDB (10.7)
blackbar-shinyproxy: ShinyProxy 3.0.2 and openjdk 11
blackbar-rstudio: RStudio, R 4.2.1, Python 3.10, Python packages relevant for blackbar, Java 21 from eclipse-temurin
Speeded up Smith-Waterman
Changes to blackbar-py
0.3.3
inception_cas
now urllib.parse.quotes the user_name in order to handle special characters in the user name
gains argument level to indicate if cas documents need to be created at the level of the unique entities (dropping documents without annotations) or at the level of the unique documents
inception_upload_documents catch documents which can not be put to xmi due to non-Unicode or non-ASCII
Add extra tag: ID_Physician to tag list
blackbar-example-empty.zip now contains an extra tag: ID_Physician
PseudoGenerator works with ID_Physician
blackbar_tagset_recode works with ID_Physician
BlackbarDB
read now correctly handles column names if database type is MySQL or MariaDB. The column names put in AS are not in the description but need to be fetched from the metadata
the init method uses the BLACKBAR_DB_TABLES environment variable if tables is not provided
Containers
blackbar_job_anonymization and blackbar_job_pseudonymization no longer pass on PREFECT_API_URL in case of prefect_env = “cloud” and no longer pass on PREFECT_API_KEY and PREFECT_WORKSPACE in case of prefect_env = “local”PREFECT_API_KEY
0.3.2
BlackbarDB
now ships jars for Intersystems IRIS (intersystems-jdbc-3.8.0.jar), MS SQL Server (mssql-jdbc-13.2.0.jre11), PostgreSQL (postgresql-42.7.7), MariaDB (mariadb-java-client-3.5.5.jar), MySQL (mysql-connector-j-9.4.0.jar)
connection to Intersystems IRIS, MS SQL Server, PostgreSQL and MariaDB use the drivers above. Connection to MySQL uses mariadb driver
read_blackbar_status casts textCvtModDate to a DATE and takes the DATEPART of textCvtModDate for showing the daily and hourly status in order to be compatible with MS SQL Server. Result is sorted now by date/hour/status. For MySQL and MariaDB, uses HOUR function
Database variant names:
IRIS, InterSystems IRIS
MS SQL Server, Microsoft SQL Server, SQL Server, MSSQL, mssql, sqlserver
blackbar_job_modelbuilding gains argument projects_sql to enable training on annotated data stored in the database
0.3.1
Pseudonymization
Perform pseudonymization differently by gender
Allow to have a backup strategy in case of edge cases where the pseudonymization is not giving any new texts
Allow to use existing pseudonymizations by patient
Add read_pseudonymization to database with type either ‘documents’, ‘entities’, ‘traindata’ or ‘documents-entities’
patient identifiers upper cased
patient id, keep the first letter + follow a date/month structure
if Leeftijd/Geboortedatum and mentions years, apply the dateshift
deid_pseudonymize now adds by default a pseudonymization of the patient id even if it was never found in the texts
deid_anonymization_entities now returns doc_id if it is missing (case of extended = False where the doc_id is not stored in the textCvt) and model (case of extended = False - only nlp)
deid_anonymization_entities now returns empty data frame if no data in textAnonymization
Added deid_pseudonymization_entities to get the document + detected entities + the pseudo replacements
PseudoGenerator gains extra argument indicating to provide a warning/error/pass in case the entity type is not known or not implemented
Allow to perform pseudonymization with as failure strategy LLM generation
Database
By default on IRIS, disable logging
Allow not to store the anonymized version in textCvt in deid_anonymize by passing output structure version v4
Pseudonymization results of entities max 4000 characters long
Specify default queries for the app
Allow to specify table names as json
Enable using a sqlite database for testing in the cockpit app
To enable switching to an example database, the sqlite jars are always included when starting the connection to the database
BlackbarDB
gains argument url instead of using the environment variable BLACKBAR_DB_URL
allows to use shorthands for the tables argument. E.g. ‘default_sqlite’
S3
Allow to work with versioned data
s3_download_file, blackbar_s3_download, blackbar_s3_read now allow to get data with a specific version_id
blackbar_s3_list now returns version information
new buckets created are enabled with versioning by default
s3_remove gains argument version_id
Smith Waterman
deid_enrich_identifiers for V. Achternaam returns None if Achternaam is None (vn_achternaam)
Training data
Exported blackbar_annotations to get a set of annotations from table blackbar_document
Added blackbar_typesystem with the required typesystem for NER and pseudonymization
Make blackbar_inception_entities more dynamic, uses by default ‘custom.Span’ + ‘label’, if label is not provided or does not exist: extract data contained in ‘entity’ and if that does not exist in ‘value’
blackbar_inception_entities gains extra argument ‘type’ to return entities within a sentences or allow cross-sentence entities. And by default now returns data at the entity level, not a entity within sentence level
Add functionality to read in a zipped inception file in a format similar to the data which is needed for blackbar_traindata
Added blackbar_cas, blackbar_typesystem and blackbar_inception_read_export
Added blackbar_cas_pseudo for validating pseudo documents + make sure uploaded pseudo documents still contain the entity_detail to use this for generating example data
Added inception_upload_documents + allow to upload in NEW/IN-PROGRESS/COMPLETE state
Added inception_create_project
Added inception_delete
blackbar_inception_annotations
returns empty data frame if no documents are in a project
adds field user_name to entities dataset
add option to decode the text in utf-8 instead of keeping it as bytes
add option to rename the doc_id to the label in order to keep the doc_id from the database
add argument document_state to filter the documents and annotations based on the document state
inception_list_documents gains extra argument encoding and encoding_errors in order to decode to utf-8/latin-1
Added BlackbarInception in order to get anonymized data based on Inception project directly
blackbar_traindata
allow leading/trailing spaces in entity tokens + more extensive logging where unexpected elements are happening + allow to use strictly strict without expand
do not disable ner in order to have aligned tokens which might be different when disabling the ner tokenization
keep document dataset with doc_id/text and indication if in training/test
add doc_id to user_data metadata
BlackbarDB
Extend ability to write to patients db.write
PII detection
min_size
Blackbar gains argument min_size to indicate the minimum length of the entity to be detected (smith-waterman). Default is 3 which was previously hardcoded.
deid_anonymize gains argument min_size to indicate the minimum length of the entity to be detected (smith-waterman). Default is 3 which was previously hardcoded. This argument is passed on to Blackbar and only is used if extended = True
Added BlackbarInceptionSmithWaterman to get NER evaluation metrics
Docker/Podman
pod_client, change default to Docker instead of Podman
add function to check connectivity to docker/podman
pod_container now adds type ‘restart’
change environment variables to use POD_TYPE/POD_URL
added blackbar_job_modelbuilding in order to launch the model building container
added blackbar_job_anonymization in order to launch anonymization container
added blackbar_job_pseudonymization in order to launch pseudonymization container
Added empty inception annotation project in the data folder and an empty inception pseudonymization validation project in the data folder
Utils
add split_df
Pseudonymization
0.3.0
Added blackbar_inception_annotations to easily extract all annotations from inception projects and get the entities for type custom.Span
Added blackbar_traindata to allow to generate training data from the output of blackbar_inception_annotations
Added deid class in order to integrate the training data from Inception with spacy model building
BlackbarDB - added queries to do pseudonymization differently by gender
Renamed blackbar_s3_download_file to s3_download_file, blackbar_s3_upload_file to s3_upload_file and blackbar_s3_remove to s3_remove
0.2.0
Allow to specify the tables of the data as parameter in BlackbarDB
Include sqlite test database in the package
Ability to have the data in MS SQL Server, PostgreSQL, Oracle, MySQL, SQLite, MariaDB
Bump dependency to textalignment 0.2.0 to be able to speed up smith waterman
Blackbar
gains argument alignment_method allowing to perform smith waterman alignment using biopython
gains 2 methods: anonymize and anonymize_dataframe
Fix bug in Blackbar.anonimise_extended on the offset for removing the entities, the length of the entity, the text was correct but the end position had a bug. Fixed such that
start position is 0-based (Python-like)
end position includes the last letter of the textual entity (Python-like)
such that text[start:(end+1)] gives the entity text
Ability to save pseudonymization and test data
Store the model name in object Blackbar and the anonymization and pseudonymization results, keep the entities in the pseudonymization results per document as well
0.1.0
Added options to inspect, build, maintain and deploy containers using Docker and Podman with functions