Releases

High-level changes to apps / flows / containers

1.0

Added cockpit frontend webapp
- Manage projects
- Launch automated anonymization/pseudonymization + progress of annotation on the database
- Launch model building
- Inspect annotation progress
- Easily import data from files / database to annotation frontend
- Allow automatic anonymization of imported texts for quicker labelling
- Allow automatic pseudonymization of labelled texts for quicker pseudonymization
- Allow validation of anonymization / pseudonymization
- Inspect logs
Allowed dynamic model selection in inspection + pseudonymization inspection + entity detector inspector
Allow usage of LLM models for edge case pseudonymization
Added app for talking to patients documents using LLM
Added app for talking to an LLM
Based on blackbar-py version 0.3.1
Based on prefecthq/prefect version 2.15.0-python3.10
Based on base docker containers version 0.2 at https://github.com/bnosac/blackbar-docker/releases/tag/0.2
Increase ShinyProxy to 3.1.1 to enable later versions of Java and Docker >= 0.25 (https://github.com/bnosac/blackbar/issues/34)
Allow to have different table names as the default ones
Allow to have versioned models
Allow to build models based on manual annotations and automated annotations
Added details on Authentication alongside Keycloak and Microsoft Entra ID

0.1

Allow automated anonymization
Allow automated pseudonymization
Frontend webapps for annotation (Inception)
Frontend webapps for annotation inspection + pseudonymization inspection + entity detector inspector
Provide docker containers for API
Provide docker containers for Inception annotation engine alongside Minio
Based on blackbar-py version 0.2.0
Based on prefecthq/prefect version 2.15.0-python3.10
Based on base docker containers version 0.1 at https://github.com/bnosac/blackbar-docker/releases/tag/0.1
- blackbar-base: image based on prefecthq/prefect, version 2.15.0-python3.10 which is based on Ubuntu 20.04
- blackbar-base-apps: image based on ghcr.io/bnosac/blackbar-base + shiny and commonly used shiny extensions (rmarkdown/shinydashboard/slib/fontawesome/knitr/reticulate)
- blackbar-inception-minio: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z)
- blackbar-inception-minio-mariadb: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z) and MariaDB (10.7)
- blackbar-shinyproxy: ShinyProxy 3.0.2 and openjdk 11
- blackbar-rstudio: RStudio, R 4.2.1, Python 3.10, Python packages relevant for blackbar, Java 21 from eclipse-temurin
Speeded up Smith-Waterman

Changes to blackbar-py

0.3.1

Pseudonymization
- Perform pseudonymization differently by gender
- Allow to have a backup strategy in case of edge cases where the pseudonymization is not giving any new texts
- Allow to use existing pseudonymizations by patient
- Add read_pseudonymization to database with type either ‘documents’, ‘entities’, ‘traindata’ or ‘documents-entities’
- patient identifiers upper cased
- patient id, keep the first letter + follow a date/month structure
- if Leeftijd/Geboortedatum and mentions years, apply the dateshift
- deid_pseudonymize now adds by default a pseudonymization of the patient id even if it was never found in the texts
- deid_anonymization_entities now returns doc_id if it is missing (case of extended = False where the doc_id is not stored in the textCvt) and model (case of extended = False - only nlp)
- Added deid_pseudonymization_entities to get the document + detected entities + the pseudo replacements
- PseudoGenerator gains extra argument indicating to provide a warning/error/pass in case the entity type is not known or not implemented
- Allow to perform pseudonymization with as failure strategy LLM generation
Database
- By default on IRIS, disable logging
- Allow not to store the anonymized version in textCvt in deid_anonymize by passing output structure version v4
- Pseudonymization results of entities max 4000 characters long
- Specify default queries for the app
- Allow to specify table names as json
S3
- Allow to work with versioned data
- s3_download_file, blackbar_s3_download, blackbar_s3_read now allow to get data with a specific version_id
- blackbar_s3_list now returns version information
- new buckets created are enabled with versioning by default
- s3_remove gains argument version_id
Training data
- Exported blackbar_annotations to get a set of annotations from table blackbar_document
- Added blackbar_typesystem with the required typesystem for NER and pseudonymization
- Make blackbar_inception_entities more dynamic, uses by default ‘custom.Span’ + ‘label’, if label is not provided or does not exist: extract data contained in ‘entity’ and if that does not exist in ‘value’
- blackbar_inception_entities gains extra argument ‘type’ to return entities within a sentences or allow cross-sentence entities. And by default now returns data at the entity level, not a entity within sentence level
- Add functionality to read in a zipped inception file in a format similar to the data which is needed for blackbar_traindata
- Added blackbar_cas, blackbar_typesystem and blackbar_inception_read_export
- Added blackbar_cas_pseudo for validating pseudo documents
- Added inception_upload_documents + allow to upload in NEW/IN-PROGRESS/COMPLETE state
- Added inception_create_project
- Added inception_delete
- blackbar_inception_annotations
  - returns empty data frame if no documents are in a project
  - adds field user_name to entities dataset
  - add option to decode the text in utf-8 instead of keeping it as bytes
- inception_list_documents gains extra argument encoding and encoding_errors in order to decode to utf-8/latin-1
- Added BlackbarInception in order to get anonymized data based on Inception project directly
Docker/Podman
- pod_client, change default to Docker instead of Podman
- add function to check connectivity to docker/podman
- pod_container now adds type ‘restart’
- change environment variables to use POD_TYPE/POD_URL
- added blackbar_job_modelbuilding in order to launch the model building container
- added blackbar_job_anonymization in order to launch anonymization container
- added blackbar_job_pseudonymization in order to launch pseudonymization container
Added empty inception annotation project in the data folder and an empty inception pseudonymization validation project in the data folder
Utils
- add split_df
Pseudonymization

0.3.0

Added blackbar_inception_annotations to easily extract all annotations from inception projects and get the entities for type custom.Span
Added blackbar_traindata to allow to generate training data from the output of blackbar_inception_annotations
Added deid class in order to integrate the training data from Inception with spacy model building
BlackbarDB - added queries to do pseudonymization differently by gender
Renamed blackbar_s3_download_file to s3_download_file, blackbar_s3_upload_file to s3_upload_file and blackbar_s3_remove to s3_remove

0.2.0

Allow to specify the tables of the data as parameter in BlackbarDB
Include sqlite test database in the package
Ability to have the data in MS SQL Server, PostgreSQL, Oracle, MySQL, SQLite, MariaDB
Bump dependency to textalignment 0.2.0 to be able to speed up smith waterman
Blackbar
- gains argument alignment_method allowing to perform smith waterman alignment using biopython
- gains 2 methods: anonymize and anonymize_dataframe
Exported Blackbar
Exported deid_anonymize, deid_enrich_identifiers, deid_smith_waterman
Exported deid_anonymization_entities, deid_pseudonymize
Exported blackbar_example
Fix bug in Blackbar.anonimise_extended on the offset for removing the entities, the length of the entity, the text was correct but the end position had a bug. Fixed such that
- start position is 0-based (Python-like)
- end position includes the last letter of the textual entity (Python-like)
- such that text[start:(end+1)] gives the entity text
Ability to save pseudonymization and test data
Store the model name in object Blackbar and the anonymization and pseudonymization results, keep the entities in the pseudonymization results per document as well

0.1.0

Added options to inspect, build, maintain and deploy containers using Docker and Podman with functions
- pod_info, pod_ls, pod_pull, pod_build, pod_container, pod_container_log, pod_remove
Added functionality to perform pseudonymization
- PseudoGenerator (generate pseudo text)
- anonimisation_entities (extract the anonimized entities)
- pseudo_replacements, txt_mimic_readability (replacement functions while sticking as close to the original layout as possible)
- Utility functions which help with the pseudonymization
  - txt_insert, txt_freq, txt_leading_trailing
  - txt_n_capital, txt_contains, txt_contains_lot_of_capitals, txt_n_newlines
Utilities to connect to the database
- BlackbarDB class and the methods to interact with it
  - read/sendquery/read_documents/read_patientdoctoraddresses
  - update_anonimisation/read_blackbar_status/read_anonimisation + parsing the anonimization json: parse_anonimisation_json
- Deprecated database connectivity (which are covered in class BlackbarDB)
  - read_iris, read_iris_sample, read_iris_documents, read_iris_patientartsadressen, iris_update_cvt, iris_sendquery
Utilities to store models and other files on S3 in particular using Minio
- blackbar_s3_list, blackbar_s3_upload, blackbar_s3_download (not model specific)
- blackbar_model_save, blackbar_model_load (model specific)
Utilities to interact with Inception and the inception API
- Connectivity: inception_client, inception_list_projects,
- Exporting and fetching annotations: inception_export, inception_list_documents, inception_list_annotations
- Inspecting the annotations with their corresponding types: inception_types, inception_cas, read_xmi
- Fetching the log of the user interactions on the site: :inception_read_eventlog
- Utiltiy functions to extract information from the inception data:
  - blackbar_inception_entities, line_spans, token_spans, token_entity_spans
UZB specific functionality
- uzb_identify_chunks, uzb_harmonize_physician, uzb_vn_achternaam, uzb_txt_contains
General utility functions
- ascii_translit, txt_clean_word2vec
- txt_sample, txt_paste
- na_exclude
- chunk
- tokenize_letters, tokenize_spaces_punct, tokenize_lines
- combine_chunkranges to allow to combine ranges of entities from different models
Entry point functionalities allowing to anonymise texts + structure the output
- class Blackbar and the methods anonimise, anonimise_extended
- deid_anonimise_dataframe