Provide docker containers for Inception annotation engine alongside Minio
Based on blackbar-py version 0.2.0
Based on prefecthq/prefect version 2.15.0-python3.10
Based on base docker containers version 0.1 at https://github.com/bnosac/blackbar-docker/releases/tag/0.1
blackbar-base: image based on prefecthq/prefect, version 2.15.0-python3.10 which is based on Ubuntu 20.04
blackbar-base-apps: image based on ghcr.io/bnosac/blackbar-base + shiny and commonly used shiny extensions (rmarkdown/shinydashboard/slib/fontawesome/knitr/reticulate)
blackbar-inception-minio: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z)
blackbar-inception-minio-mariadb: Inception (28.5) and Minio (RELEASE.2024-11-07T00-52-20Z) and MariaDB (10.7)
blackbar-shinyproxy: ShinyProxy 3.0.2 and openjdk 11
blackbar-rstudio: RStudio, R 4.2.1, Python 3.10, Python packages relevant for blackbar, Java 21 from eclipse-temurin
Speeded up Smith-Waterman
Changes to blackbar-py
0.3.1
Pseudonymization
Perform pseudonymization differently by gender
Allow to have a backup strategy in case of edge cases where the pseudonymization is not giving any new texts
Allow to use existing pseudonymizations by patient
Add read_pseudonymization to database with type either ‘documents’, ‘entities’, ‘traindata’ or ‘documents-entities’
patient identifiers upper cased
patient id, keep the first letter + follow a date/month structure
if Leeftijd/Geboortedatum and mentions years, apply the dateshift
deid_pseudonymize now adds by default a pseudonymization of the patient id even if it was never found in the texts
deid_anonymization_entities now returns doc_id if it is missing (case of extended = False where the doc_id is not stored in the textCvt) and model (case of extended = False - only nlp)
Added deid_pseudonymization_entities to get the document + detected entities + the pseudo replacements
PseudoGenerator gains extra argument indicating to provide a warning/error/pass in case the entity type is not known or not implemented
Allow to perform pseudonymization with as failure strategy LLM generation
Database
By default on IRIS, disable logging
Allow not to store the anonymized version in textCvt in deid_anonymize by passing output structure version v4
Pseudonymization results of entities max 4000 characters long
Specify default queries for the app
Allow to specify table names as json
S3
Allow to work with versioned data
s3_download_file, blackbar_s3_download, blackbar_s3_read now allow to get data with a specific version_id
blackbar_s3_list now returns version information
new buckets created are enabled with versioning by default
s3_remove gains argument version_id
Training data
Exported blackbar_annotations to get a set of annotations from table blackbar_document
Added blackbar_typesystem with the required typesystem for NER and pseudonymization
Make blackbar_inception_entities more dynamic, uses by default ‘custom.Span’ + ‘label’, if label is not provided or does not exist: extract data contained in ‘entity’ and if that does not exist in ‘value’
blackbar_inception_entities gains extra argument ‘type’ to return entities within a sentences or allow cross-sentence entities. And by default now returns data at the entity level, not a entity within sentence level
Add functionality to read in a zipped inception file in a format similar to the data which is needed for blackbar_traindata
Added blackbar_cas, blackbar_typesystem and blackbar_inception_read_export
Added blackbar_cas_pseudo for validating pseudo documents
Added inception_upload_documents + allow to upload in NEW/IN-PROGRESS/COMPLETE state
Added inception_create_project
Added inception_delete
blackbar_inception_annotations
returns empty data frame if no documents are in a project
adds field user_name to entities dataset
add option to decode the text in utf-8 instead of keeping it as bytes
inception_list_documents gains extra argument encoding and encoding_errors in order to decode to utf-8/latin-1
Added BlackbarInception in order to get anonymized data based on Inception project directly
Docker/Podman
pod_client, change default to Docker instead of Podman
add function to check connectivity to docker/podman
pod_container now adds type ‘restart’
change environment variables to use POD_TYPE/POD_URL
added blackbar_job_modelbuilding in order to launch the model building container
added blackbar_job_anonymization in order to launch anonymization container
added blackbar_job_pseudonymization in order to launch pseudonymization container
Added empty inception annotation project in the data folder and an empty inception pseudonymization validation project in the data folder
Utils
add split_df
Pseudonymization
0.3.0
Added blackbar_inception_annotations to easily extract all annotations from inception projects and get the entities for type custom.Span
Added blackbar_traindata to allow to generate training data from the output of blackbar_inception_annotations
Added deid class in order to integrate the training data from Inception with spacy model building
BlackbarDB - added queries to do pseudonymization differently by gender
Renamed blackbar_s3_download_file to s3_download_file, blackbar_s3_upload_file to s3_upload_file and blackbar_s3_remove to s3_remove
0.2.0
Allow to specify the tables of the data as parameter in BlackbarDB
Include sqlite test database in the package
Ability to have the data in MS SQL Server, PostgreSQL, Oracle, MySQL, SQLite, MariaDB
Bump dependency to textalignment 0.2.0 to be able to speed up smith waterman
Blackbar
gains argument alignment_method allowing to perform smith waterman alignment using biopython
gains 2 methods: anonymize and anonymize_dataframe
Fix bug in Blackbar.anonimise_extended on the offset for removing the entities, the length of the entity, the text was correct but the end position had a bug. Fixed such that
start position is 0-based (Python-like)
end position includes the last letter of the textual entity (Python-like)
such that text[start:(end+1)] gives the entity text
Ability to save pseudonymization and test data
Store the model name in object Blackbar and the anonymization and pseudonymization results, keep the entities in the pseudonymization results per document as well
0.1.0
Added options to inspect, build, maintain and deploy containers using Docker and Podman with functions