diff --git a/build/doctrees/environment.pickle b/build/doctrees/environment.pickle index 2d13618..44d8b45 100644 Binary files a/build/doctrees/environment.pickle and b/build/doctrees/environment.pickle differ diff --git a/build/doctrees/notebooks/DemoNotebook_ammico.doctree b/build/doctrees/notebooks/DemoNotebook_ammico.doctree index d383c11..aa3ff92 100644 Binary files a/build/doctrees/notebooks/DemoNotebook_ammico.doctree and b/build/doctrees/notebooks/DemoNotebook_ammico.doctree differ diff --git a/build/html/notebooks/DemoNotebook_ammico.html b/build/html/notebooks/DemoNotebook_ammico.html index 970f7ab..6528695 100644 --- a/build/html/notebooks/DemoNotebook_ammico.html +++ b/build/html/notebooks/DemoNotebook_ammico.html @@ -525,7 +525,7 @@ directly on the right next to the image. This way, the user can directly inspect

The detector modules

The different detector modules with their options are explained in more detail in this section. ## Text detector Text on the images can be extracted using the TextDetector class (text module). The text is initally extracted using the Google Cloud Vision API and then translated into English with googletrans. The translated text is cleaned of whitespace, linebreaks, and numbers using Python syntax and spaCy.

-

adf4ddf456154e7389f71f2402faec7f

+

bf6b22cc01234d6597818c0bbaa723c5

The user can set if the text should be further summarized, and analyzed for sentiment and named entity recognition, by setting the keyword analyse_text to True (the default is False). If set, the transformers pipeline is used for each of these tasks, with the default models as of 03/2023. Other models can be selected by setting the optional keyword model_names to a list of selected models, on for each task: model_names=["sshleifer/distilbart-cnn-12-6", "distilbert-base-uncased-finetuned-sst-2-english", "dbmdz/bert-large-cased-finetuned-conll03-english"] for summary, sentiment, and ner. To be even more specific, revision numbers can also be selected by specifying the optional keyword revision_numbers to a list of revision numbers for each model, for example revision_numbers=["a4f8f3e", "af0f99b", "f2482bf"].

Please note that for the Google Cloud Vision API (the TextDetector class) you need to set a key in order to process the images. This key is ideally set as an environment variable using for example

@@ -617,7 +617,7 @@ directly on the right next to the image. This way, the user can directly inspect

Image summary and query

The SummaryDetector can be used to generate image captions (summary) as well as visual question answering (VQA).

-

2339669053964d559a7fcc0cbe8c34aa

+

baba61dbfc694ead94b6df6962dd98ce

This module is based on the LAVIS library. Since the models can be quite large, an initial object is created which will load the necessary models into RAM/VRAM and then use them in the analysis. The user can specify the type of analysis to be performed using the analysis_type keyword. Setting it to summary will generate a caption (summary), questions will prepare answers (VQA) to a list of questions as set by the user, summary_and_questions will do both. Note that the desired analysis type needs to be set here in the initialization of the detector object, and not when running the analysis for each image; the same holds true for the selected model.

The implemented models are listed below.

@@ -880,7 +880,7 @@ directly on the right next to the image. This way, the user can directly inspect

Detection of faces and facial expression analysis

Faces and facial expressions are detected and analyzed using the EmotionDetector class from the faces module. Initially, it is detected if faces are present on the image using RetinaFace, followed by analysis if face masks are worn (Face-Mask-Detection). The probabilistic detection of age, gender, race, and emotions is carried out with deepface, but only if the disclosure statement has been accepted (see above).

-

917342082844417aabeed7f713000412

+

2c0f419cf9514e5fba83b1b1efd10205

Depending on the features found on the image, the face detection module returns a different analysis content: If no faces are found on the image, all further steps are skipped and the result "face": "No", "multiple_faces": "No", "no_faces": 0, "wears_mask": ["No"], "age": [None], "gender": [None], "race": [None], "emotion": [None], "emotion (category)": [None] is returned. If one or several faces are found, up to three faces are analyzed if they are partially concealed by a face mask. If yes, only age and gender are detected; if no, also race, emotion, and dominant emotion are detected. In case of the latter, the output could look like this: "face": "Yes", "multiple_faces": "Yes", "no_faces": 2, "wears_mask": ["No", "No"], "age": [27, 28], "gender": ["Man", "Man"], "race": ["asian", None], "emotion": ["angry", "neutral"], "emotion (category)": ["Negative", "Neutral"], where for the two faces that are detected (given by no_faces), some of the values are returned as a list with the first item for the first (largest) face and the second item for the second (smaller) face (for example, "emotion" returns a list ["angry", "neutral"] signifying the first face expressing anger, and the second face having a neutral expression).