Releases

High-level changes to apps / flows / containers

1.0

  • Added cockpit frontend webapp
    • Manage projects
    • Launch automated anonymization/pseudonymization + progress of annotation on the database
    • Launch model building
    • Inspect annotation progress
    • Easily import data from files / database to annotation frontend
    • Allow automatic anonymization of imported texts for quicker labelling
    • Allow automatic pseudonymization of labelled texts for quicker pseudonymization
    • Allow validation of anonymization / pseudonymization
    • Inspect logs
  • Allowed dynamic model selection in inspection + pseudonymization inspection + entity detector inspector
  • Allow usage of LLM models for edge case pseudonymization
  • Added app for talking to patients documents using LLM
  • Added app for talking to an LLM
  • Based on blackbar-py version 0.3.1
  • Based on prefecthq/prefect version 2.15.0-python3.10
  • Based on base docker containers version 0.2 at https://github.com/bnosac/blackbar-docker/releases/tag/0.2
  • Increase ShinyProxy to 3.1.1 to enable later versions of Java and Docker >= 0.25 (https://github.com/bnosac/blackbar/issues/34)
  • Allow to have different table names as the default ones
  • Allow to have versioned models
  • Allow to build models based on manual annotations and automated annotations
  • Added details on Authentication alongside Keycloak and Microsoft Entra ID

0.1

  • Allow automated anonymization
  • Allow automated pseudonymization
  • Frontend webapps for annotation (Inception)
  • Frontend webapps for annotation inspection + pseudonymization inspection + entity detector inspector
  • Provide docker containers for API
  • Provide docker containers for Inception annotation engine alongside Minio
  • Based on blackbar-py version 0.2.0
  • Based on prefecthq/prefect version 2.15.0-python3.10
  • Based on base docker containers version 0.1 at https://github.com/bnosac/blackbar-docker/releases/tag/0.1
    • blackbar-base: image based on prefecthq/prefect, version 2.15.0-python3.10 which is based on Ubuntu 20.04
    • blackbar-base-apps: image based on ghcr.io/bnosac/blackbar-base + shiny and commonly used shiny extensions (rmarkdown/shinydashboard/slib/fontawesome/knitr/reticulate)
    • blackbar-inception-minio: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z)
    • blackbar-inception-minio-mariadb: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z) and MariaDB (10.7)
    • blackbar-shinyproxy: ShinyProxy 3.0.2 and openjdk 11
    • blackbar-rstudio: RStudio, R 4.2.1, Python 3.10, Python packages relevant for blackbar, Java 21 from eclipse-temurin
  • Speeded up Smith-Waterman

Changes to blackbar-py

0.3.1

  • Pseudonymization
    • Perform pseudonymization differently by gender
    • Allow to have a backup strategy in case of edge cases where the pseudonymization is not giving any new texts
    • Allow to use existing pseudonymizations by patient
    • Add read_pseudonymization to database with type either ‘documents’, ‘entities’, ‘traindata’ or ‘documents-entities’
    • patient identifiers upper cased
    • patient id, keep the first letter + follow a date/month structure
    • if Leeftijd/Geboortedatum and mentions years, apply the dateshift
    • deid_pseudonymize now adds by default a pseudonymization of the patient id even if it was never found in the texts
    • deid_anonymization_entities now returns doc_id if it is missing (case of extended = False where the doc_id is not stored in the textCvt) and model (case of extended = False - only nlp)
    • Added deid_pseudonymization_entities to get the document + detected entities + the pseudo replacements
    • PseudoGenerator gains extra argument indicating to provide a warning/error/pass in case the entity type is not known or not implemented
    • Allow to perform pseudonymization with as failure strategy LLM generation
  • Database
    • By default on IRIS, disable logging
    • Allow not to store the anonymized version in textCvt in deid_anonymize by passing output structure version v4
    • Pseudonymization results of entities max 4000 characters long
    • Specify default queries for the app
    • Allow to specify table names as json
  • S3
    • Allow to work with versioned data
    • s3_download_file, blackbar_s3_download, blackbar_s3_read now allow to get data with a specific version_id
    • blackbar_s3_list now returns version information
    • new buckets created are enabled with versioning by default
    • s3_remove gains argument version_id
  • Training data
    • Exported blackbar_annotations to get a set of annotations from table blackbar_document
    • Added blackbar_typesystem with the required typesystem for NER and pseudonymization
    • Make blackbar_inception_entities more dynamic, uses by default ‘custom.Span’ + ‘label’, if label is not provided or does not exist: extract data contained in ‘entity’ and if that does not exist in ‘value’
    • blackbar_inception_entities gains extra argument ‘type’ to return entities within a sentences or allow cross-sentence entities. And by default now returns data at the entity level, not a entity within sentence level
    • Add functionality to read in a zipped inception file in a format similar to the data which is needed for blackbar_traindata
    • Added blackbar_cas, blackbar_typesystem and blackbar_inception_read_export
    • Added blackbar_cas_pseudo for validating pseudo documents
    • Added inception_upload_documents + allow to upload in NEW/IN-PROGRESS/COMPLETE state
    • Added inception_create_project
    • Added inception_delete
    • blackbar_inception_annotations
      • returns empty data frame if no documents are in a project
      • adds field user_name to entities dataset
      • add option to decode the text in utf-8 instead of keeping it as bytes
    • inception_list_documents gains extra argument encoding and encoding_errors in order to decode to utf-8/latin-1
    • Added BlackbarInception in order to get anonymized data based on Inception project directly
  • Docker/Podman
    • pod_client, change default to Docker instead of Podman
    • add function to check connectivity to docker/podman
    • pod_container now adds type ‘restart’
    • change environment variables to use POD_TYPE/POD_URL
    • added blackbar_job_modelbuilding in order to launch the model building container
    • added blackbar_job_anonymization in order to launch anonymization container
    • added blackbar_job_pseudonymization in order to launch pseudonymization container
  • Added empty inception annotation project in the data folder and an empty inception pseudonymization validation project in the data folder
  • Utils
    • add split_df
  • Pseudonymization

0.3.0

  • Added blackbar_inception_annotations to easily extract all annotations from inception projects and get the entities for type custom.Span
  • Added blackbar_traindata to allow to generate training data from the output of blackbar_inception_annotations
  • Added deid class in order to integrate the training data from Inception with spacy model building
  • BlackbarDB - added queries to do pseudonymization differently by gender
  • Renamed blackbar_s3_download_file to s3_download_file, blackbar_s3_upload_file to s3_upload_file and blackbar_s3_remove to s3_remove

0.2.0

  • Allow to specify the tables of the data as parameter in BlackbarDB
  • Include sqlite test database in the package
  • Ability to have the data in MS SQL Server, PostgreSQL, Oracle, MySQL, SQLite, MariaDB
  • Bump dependency to textalignment 0.2.0 to be able to speed up smith waterman
  • Blackbar
    • gains argument alignment_method allowing to perform smith waterman alignment using biopython
    • gains 2 methods: anonymize and anonymize_dataframe
  • Exported Blackbar
  • Exported deid_anonymize, deid_enrich_identifiers, deid_smith_waterman
  • Exported deid_anonymization_entities, deid_pseudonymize
  • Exported blackbar_example
  • Fix bug in Blackbar.anonimise_extended on the offset for removing the entities, the length of the entity, the text was correct but the end position had a bug. Fixed such that
    • start position is 0-based (Python-like)
    • end position includes the last letter of the textual entity (Python-like)
    • such that text[start:(end+1)] gives the entity text
  • Ability to save pseudonymization and test data
  • Store the model name in object Blackbar and the anonymization and pseudonymization results, keep the entities in the pseudonymization results per document as well

0.1.0

  • Added options to inspect, build, maintain and deploy containers using Docker and Podman with functions
    • pod_info, pod_ls, pod_pull, pod_build, pod_container, pod_container_log, pod_remove
  • Added functionality to perform pseudonymization
    • PseudoGenerator (generate pseudo text)
    • anonimisation_entities (extract the anonimized entities)
    • pseudo_replacements, txt_mimic_readability (replacement functions while sticking as close to the original layout as possible)
    • Utility functions which help with the pseudonymization
      • txt_insert, txt_freq, txt_leading_trailing
      • txt_n_capital, txt_contains, txt_contains_lot_of_capitals, txt_n_newlines
  • Utilities to connect to the database
    • BlackbarDB class and the methods to interact with it
      • read/sendquery/read_documents/read_patientdoctoraddresses
      • update_anonimisation/read_blackbar_status/read_anonimisation + parsing the anonimization json: parse_anonimisation_json
    • Deprecated database connectivity (which are covered in class BlackbarDB)
      • read_iris, read_iris_sample, read_iris_documents, read_iris_patientartsadressen, iris_update_cvt, iris_sendquery
  • Utilities to store models and other files on S3 in particular using Minio
    • blackbar_s3_list, blackbar_s3_upload, blackbar_s3_download (not model specific)
    • blackbar_model_save, blackbar_model_load (model specific)
  • Utilities to interact with Inception and the inception API
    • Connectivity: inception_client, inception_list_projects,
    • Exporting and fetching annotations: inception_export, inception_list_documents, inception_list_annotations
    • Inspecting the annotations with their corresponding types: inception_types, inception_cas, read_xmi
    • Fetching the log of the user interactions on the site: :inception_read_eventlog
    • Utiltiy functions to extract information from the inception data:
      • blackbar_inception_entities, line_spans, token_spans, token_entity_spans
  • UZB specific functionality
    • uzb_identify_chunks, uzb_harmonize_physician, uzb_vn_achternaam, uzb_txt_contains
  • General utility functions
    • ascii_translit, txt_clean_word2vec
    • txt_sample, txt_paste
    • na_exclude
    • chunk
    • tokenize_letters, tokenize_spaces_punct, tokenize_lines
    • combine_chunkranges to allow to combine ranges of entities from different models
  • Entry point functionalities allowing to anonymise texts + structure the output
    • class Blackbar and the methods anonimise, anonimise_extended
    • deid_anonimise_dataframe