AMMICO/ammico/notebooks/DemoNotebook_ammico.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AMMICO Demonstration Notebook\n",
    "With ammico, you can analyze text on images and image content at the same time. This is a demonstration notebook to showcase the capabilities of ammico.\n",
    "You can run this notebook on google colab or locally / on your own HPC resource. The analysis can be quite slow on the google colab default runtime. For production data processing, it is recommended to run the analysis locally on a GPU-supported machine. You can also make use of the colab GPU runtime, or purchase additional runtime. However, google colab comes with pre-installed libraries that can lead to dependency conflicts. The setting on google colab changes frequently, so it is only ensured that this demonstration notebook runs on the default runtime. \n",
    "\n",
    "This first cell only runs on google colab; on all other machines, you need to create a conda environment first and install ammico from the Python Package Index using  \n",
    "```pip install ammico```  \n",
    "Alternatively you can install the development version from the GitHub repository  \n",
    "```pip install git+https://github.com/ssciwr/AMMICO.git```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if running on google colab\\\n",
    "# PLEASE RUN THIS ONLY AS CPU RUNTIME\n",
    "# for a GPU runtime, there are conflicts with pre-installed packages -\n",
    "# you first need to uninstall them (prepare a clean environment with no pre-installs) and then install ammico\n",
    "# flake8-noqa-cell\n",
    "\n",
    "if \"google.colab\" in str(get_ipython()):\n",
    "    # update python version\n",
    "    # install setuptools\n",
    "    # %pip install setuptools==61 -qqq\n",
    "    # uninstall some pre-installed packages due to incompatibility\n",
    "    %pip uninstall --yes tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint flex-y jax jaxlib -qqq\n",
    "    # install ammico\n",
    "    %pip install git+https://github.com/ssciwr/ammico.git -qqq\n",
    "    # install older version of jax to support transformers use of diffusers\n",
    "    # mount google drive for data and API key\n",
    "    from google.colab import drive\n",
    "\n",
    "    drive.mount(\"/content/drive\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use a test dataset\n",
    "\n",
    "You can download this dataset for test purposes. Skip this step if you use your own data. If the data set on Hugging Face is gated or private, Hugging Face will ask you for a login token. However, for the default dataset in this notebook you do not need to provide one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "from pathlib import Path\n",
    "\n",
    "# If the dataset is gated/private, make sure you have run huggingface-cli login\n",
    "dataset = load_dataset(\"iulusoy/test-images\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next you need to provide a path for the saved images - a folder where the data is stored locally. This directory is automatically created if it does not exist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_path = \"./data-test\"\n",
    "data_path = Path(data_path)\n",
    "data_path.mkdir(parents=True, exist_ok=True)\n",
    "# now save the files from the Huggingface dataset as images into the data_path folder\n",
    "for i, image in enumerate(dataset[\"train\"][\"image\"]):\n",
    "    filename = \"img\" + str(i) + \".png\"\n",
    "    image.save(data_path / filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the ammico package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# jax also sometimes leads to problems on google colab\n",
    "# if this is the case, try restarting the kernel and executing this\n",
    "# and the above two code cells again\n",
    "import ammico\n",
    "\n",
    "# for displaying a progress bar\n",
    "from tqdm import tqdm\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes you may need to restart a session after installing the correct versions of packages, because `Tensorflow` and `EmotitionDetector` may not work and give an error. You can check it by running this code: \n",
    "```\n",
    "import tensorflow as tf\n",
    "tf.ones([2, 2])\n",
    "```\n",
    "If this code generates an error, you need to restart session. For this please click `Runtime` -> `Restart session`. And rerun the notebook again. All required packages will already be installed, so the execution will be very fast. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 0: Create and set a Google Cloud Vision Key\n",
    "\n",
    "Please note that for the [Google Cloud Vision API](https://cloud.google.com/vision/docs/setup) (the TextDetector class) you need to set a key in order to process the images. A key is generated following [these instructions](https://ssciwr.github.io/AMMICO/build/html/create_API_key_link.html). This key is ideally set as an environment variable using for example\n",
    "```\n",
    "os.environ[\n",
    "    \"GOOGLE_APPLICATION_CREDENTIALS\"\n",
    "] = \"/content/drive/MyDrive/misinformation-data/misinformation-campaign-981aa55a3b13.json\"\n",
    "```\n",
    "where you place the key on your Google Drive if running on colab, or place it in a local folder on your machine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = (\n",
    "    \"../../data/misinformation-campaign-981aa55a3b13.json\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 1: Read your data into AMMICO\n",
    "The ammico package reads in one or several input files given in a folder for processing. The user can select to read in all image files in a folder, to include subfolders via the `recursive` option, and can select the file extension that should be considered (for example, only \"jpg\" files, or both \"jpg\" and \"png\" files). For reading in the files, the ammico function `find_files` is used, with optional keywords:\n",
    "\n",
    "| input key | input type | possible input values |\n",
    "| --------- | ---------- | --------------------- |\n",
    "`path` | `str` | the directory containing the image files (defaults to the location set by environment variable `AMMICO_DATA_HOME`) |\n",
    "| `pattern` | `str\\|list` | the file extensions to consider (defaults to \"png\", \"jpg\", \"jpeg\", \"gif\", \"webp\", \"avif\", \"tiff\") |\n",
    "| `recursive` | `bool` | include subdirectories recursively (defaults to `True`) |\n",
    "| `limit` | `int` | maximum number of files to read (defaults to `20`, for all images set to `None` or `-1`) |\n",
    "| `random_seed` | `str` | the random seed for shuffling the images; applies when only a few images are read and the selection should be preserved (defaults to `None`) |\n",
    "\n",
    "The `find_files` function returns a nested dictionary that contains the file ids and the paths to the files and is empty otherwise. This dictionary is filled step by step with more data as each detector class is run on the data (see below).\n",
    "\n",
    "If you downloaded the test dataset above, you can directly provide the path you already set for the test directory, `data_path`. The below cell is already set up for the test dataset.\n",
    "\n",
    "If you use your own dataset, you need to toggle the path and provide the directory where you have saved your data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_path = \"./data-test\"\n",
    "image_dict = ammico.find_files(\n",
    "    # path = \"/content/drive/MyDrive/misinformation-data/\",\n",
    "    path=str(data_path),\n",
    "    limit=15,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Inspect the input files using the graphical user interface\n",
    "A Dash user interface is to select the most suitable options for the analysis, before running a complete analysis on the whole data set. The options for each detector module are explained below in the corresponding sections; for example, different models can be selected that will provide slightly different results. This way, the user can interactively explore which settings provide the most accurate results. In the interface, the nested `image_dict` is passed through the `AnalysisExplorer` class. The interface is run on a specific port which is passed using the `port` keyword; if a port is already in use, it will return an error message, in which case the user should select a different port number. \n",
    "The interface opens a dash app inside the Jupyter Notebook and allows selection of the input file in the top left dropdown menu, as well as selection of the detector type in the top right, with options for each detector type as explained below. The output of the detector is shown directly on the right next to the image. This way, the user can directly inspect how updating the options for each detector changes the computed results, and find the best settings for a production run.\n",
    "\n",
    "### Ethical disclosure statement\n",
    "\n",
    "If you want to run an analysis using the EmotionDetector detector type, you have first have to respond to an ethical disclosure statement. This disclosure statement ensures that you only use the full capabilities of the EmotionDetector after you have been made aware of its shortcomings.\n",
    "\n",
    "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_disclosure`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_disclosure, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n",
    "\n",
    "If the disclosure statement is accepted, the EmotionDetector will perform age, gender and race/ethnicity classification depending on the provided thresholds. If the disclosure is rejected, only the presence of faces and emotion (if not wearing a mask) is detected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# respond to the disclosure statement\n",
    "# this will set an environment variable for you\n",
    "# if you do not want to re-accept the disclosure every time, you can set this environment variable in your shell\n",
    "# to re-set the environment variable, uncomment the below line\n",
    "accept_disclosure = \"DISCLOSURE_AMMICO\"\n",
    "# os.environ.pop(accept_disclosure, None)\n",
    "_ = ammico.ethical_disclosure(accept_disclosure=accept_disclosure)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Privacy disclosure statement\n",
    "\n",
    "If you want to run an analysis using the TextDetector detector type, you have first have to respond to a privacy disclosure statement. This disclosure statement ensures that you are aware that your data will be sent to google cloud vision servers for analysis.\n",
    "\n",
    "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_privacy`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_privacy, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n",
    "\n",
    "If the privacy disclosure statement is accepted, the TextDetector will perform the text extraction, translation and if selected, analysis. If the privacy disclosure is rejected, no text processing will be carried out and you cannot use the TextDetector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# respond to the privacy disclosure statement\n",
    "# this will set an environment variable for you\n",
    "# if you do not want to re-accept the privacy disclosure every time, you can set this environment variable in your shell\n",
    "# to re-set the environment variable, uncomment the below line\n",
    "accept_privacy = \"PRIVACY_AMMICO\"\n",
    "# os.environ.pop(accept_privacy, None)\n",
    "_ = ammico.privacy_disclosure(accept_privacy=accept_privacy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "analysis_explorer = ammico.AnalysisExplorer(image_dict)\n",
    "analysis_explorer.run_server(port=8055)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Analyze all images\n",
    "The analysis can be run in production on all images in the data set. Depending on the size of the data set and the computing resources available, this can take some time. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is also possible to set the dump file  creation `dump_file` in order to save the calculated data every `dump_every` images. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# dump file name\n",
    "dump_file = \"dump_file.csv\"\n",
    "# dump every N images\n",
    "dump_every = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The desired detector modules are called sequentially in any order, for example the `EmotionDetector`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set the thresholds for the emotion detection\n",
    "emotion_threshold = 50  # this is the default value for the detection confidence\n",
    "# the lowest possible value is 0\n",
    "# the highest possible value is 100\n",
    "race_threshold = 50\n",
    "gender_threshold = 50\n",
    "for num, key in tqdm(\n",
    "    enumerate(image_dict.keys()), total=len(image_dict)\n",
    "):  # loop through all images\n",
    "    image_dict[key] = ammico.EmotionDetector(\n",
    "        image_dict[key],\n",
    "        emotion_threshold=emotion_threshold,\n",
    "        race_threshold=race_threshold,\n",
    "        gender_threshold=gender_threshold,\n",
    "    ).analyse_image()  # analyse image with EmotionDetector and update dict\n",
    "    if (\n",
    "        num % dump_every == 0 or num == len(image_dict) - 1\n",
    "    ):  # save results every dump_every to dump_file\n",
    "        image_df = ammico.get_dataframe(image_dict)\n",
    "        image_df.to_csv(dump_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`TextDetector`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for num, key in tqdm(\n",
    "    enumerate(image_dict.keys()), total=len(image_dict)\n",
    "):  # loop through all images\n",
    "    image_dict[key] = ammico.TextDetector(\n",
    "        image_dict[key]\n",
    "    ).analyse_image()  # analyse image with EmotionDetector and update dict\n",
    "\n",
    "    if (\n",
    "        num % dump_every == 0 | num == len(image_dict) - 1\n",
    "    ):  # save results every dump_every to dump_file\n",
    "        image_df = ammico.get_dataframe(image_dict)\n",
    "        image_df.to_csv(dump_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the computationally demanding `SummaryDetector`, it is best to initialize the model first and then analyze each image while passing the model explicitly. This can be done in a separate loop or in the same loop as for text and emotion detection."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initialize the models\n",
    "model = ammico.MultimodalSummaryModel()\n",
    "image_summary_detector = ammico.ImageSummaryDetector(\n",
    "    subdict=image_dict, summary_model=model\n",
    ")\n",
    "\n",
    "image_summary_detector.analyse_images_from_dict(analysis_type=\"summary\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you can run all Detectors in one loop as for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initialize the models\n",
    "model = ammico.MultimodalSummaryModel()\n",
    "image_summary_detector = ammico.ImageSummaryDetector(\n",
    "    subdict=image_dict, summary_model=model\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for num, key in tqdm(\n",
    "    enumerate(image_dict.keys()), total=len(image_dict)\n",
    "):  # loop through all images\n",
    "    image_dict[key] = ammico.EmotionDetector(\n",
    "        image_dict[key]\n",
    "    ).analyse_image()  # analyse image with EmotionDetector and update dict\n",
    "    image_dict[key] = ammico.TextDetector(\n",
    "        image_dict[key]\n",
    "    ).analyse_image()  # analyse image with TextDetector and update dict\n",
    "    image_dict[key] = image_summary_detector.analyse_images_from_dict(\n",
    "        subdict=image_dict[key], analysis_type=\"summary\"\n",
    "    )  # analyse image with SummaryDetector and update dict\n",
    "\n",
    "    if (\n",
    "        num % dump_every == 0 | num == len(image_dict) - 1\n",
    "    ):  # save results every dump_every to dump_file\n",
    "        image_df = ammico.get_dataframe(image_dict)\n",
    "        image_df.to_csv(dump_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The nested dictionary will be updated from containing only the file id's and paths to the image files, to containing all calculated image features."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Convert analysis output to pandas dataframe and write csv\n",
    "The content of the nested dictionary can then conveniently be converted into a pandas dataframe for further analysis in Python, or be written as a csv file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_df = ammico.get_dataframe(image_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspect the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or write to a csv file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_df.to_csv(\"data_out.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read in a csv file containing text and translating/analysing the text\n",
    "\n",
    "Instead of extracting text from an image, or to re-process text that was already extracted, it is also possible to provide a `csv` file containing text in its rows.\n",
    "Provide the path and name of the csv file with the keyword `csv_path`. The keyword `column_key` tells the Analyzer which column key in the csv file holds the text that should be analyzed. This defaults to \"text\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ta = ammico.TextAnalyzer(csv_path=\"test.csv\", column_key=\"text\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read the csv file\n",
    "ta.read_csv()\n",
    "# set up the dict containing all text entries\n",
    "text_dict = ta.mydict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set the dump file\n",
    "# dump file name\n",
    "dump_file = \"dump_file.csv\"\n",
    "# dump every N images\n",
    "dump_every = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# analyze the csv file\n",
    "for num, key in tqdm(\n",
    "    enumerate(text_dict.keys()), total=len(text_dict)\n",
    "):  # loop through all text entries\n",
    "    ammico.TextDetector(\n",
    "        text_dict[key], analyse_text=True, skip_extraction=True\n",
    "    ).analyse_image()  # analyse text with TextDetector and update dict\n",
    "    if (\n",
    "        num % dump_every == 0 | num == len(text_dict) - 1\n",
    "    ):  # save results every dump_every to dump_file\n",
    "        image_df = ammico.get_dataframe(text_dict)\n",
    "        image_df.to_csv(dump_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the results to a csv file\n",
    "text_df = ammico.get_dataframe(text_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# inspect\n",
    "text_df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write to csv\n",
    "text_df.to_csv(\"data_out.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The detector modules\n",
    "The different detector modules with their options are explained in more detail in this section.\n",
    "## Text detector\n",
    "Text on the images can be extracted using the `TextDetector` class (`text` module). The text is initally extracted using the Google Cloud Vision API and then translated into English with googletrans. The translated text is cleaned of whitespace, linebreaks, and numbers using Python syntax and spaCy. \n",
    "\n",
    "<img src=\"../../docs/source/_static/text_detector.png\" width=\"800\" />\n",
    "\n",
    "The user can set if the text should be further summarized, and analyzed for sentiment and named entity recognition, by setting the keyword `analyse_text` to `True` (the default is `False`). If set, the transformers pipeline is used for each of these tasks, with the default models as of 03/2023. Other models can be selected by setting the optional keyword `model_names` to a list of selected models, on for each task: `model_names=[\"sshleifer/distilbart-cnn-12-6\", \"distilbert-base-uncased-finetuned-sst-2-english\", \"dbmdz/bert-large-cased-finetuned-conll03-english\"]` for summary, sentiment, and ner. To be even more specific, revision numbers can also be selected by specifying the optional keyword `revision_numbers` to a list of revision numbers for each model, for example `revision_numbers=[\"a4f8f3e\", \"af0f99b\", \"f2482bf\"]`. \n",
    "\n",
    "Please note that for the Google Cloud Vision API (the TextDetector class) you need to set a key in order to process the images. This key is ideally set as an environment variable using for example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"<path_to_your_service_account_key>.json\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where you place the key on your Google Drive if running on colab, or place it in a local folder on your machine.\n",
    "\n",
    "Summarizing, the text detection is carried out using the following method call and keywords, where `analyse_text`, `model_names`, and `revision_numbers` are optional:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for num, key in tqdm(\n",
    "    enumerate(image_dict.keys()), total=len(image_dict)\n",
    "):  # loop through all images\n",
    "    image_dict[key] = ammico.TextDetector(\n",
    "        image_dict[key],  # analyse image with TextDetector and update dict\n",
    "    ).analyse_image()\n",
    "\n",
    "    if (\n",
    "        num % dump_every == 0 | num == len(image_dict) - 1\n",
    "    ):  # save results every dump_every to dump_file\n",
    "        image_df = ammico.get_dataframe(image_dict)\n",
    "        image_df.to_csv(dump_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write output to csv\n",
    "image_df = ammico.get_dataframe(image_dict)\n",
    "image_df.to_csv(\"data_out.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The models can be adapted interactively in the notebook interface and the best models can then be used in a subsequent analysis of the whole data set.\n",
    "\n",
    "A detailed description of the output keys and data types is given in the following table.\n",
    "\n",
    "| output key | output type | output value |\n",
    "| ---------- | ----------- | ------------ |\n",
    "| `text` | `str` | the extracted text in the original language |\n",
    "| `text_language` | `str` | the detected dominant language of the extracted text |\n",
    "| `text_english` | `str` | the text translated into English |\n",
    "| `text_clean` | `str` | the text after cleaning from numbers and unrecognizable words |\n",
    "| `text_summary` | `str` | the summary of the text, generated with a transformers model |\n",
    "| `sentiment` | `str` | the detected sentiment, generated with a transformers model |\n",
    "| `sentiment_score` | `float` | the confidence associated with the predicted sentiment |\n",
    "| `entity` | `list[str]` | the detected named entities, generated with a transformers model |\n",
    "| `entity_type` | `list[str]` | the detected entity type |\n",
    "\n",
    "## Image summary and query\n",
    "\n",
    "The `SummaryDetector` can be used to generate image captions (`summary`) as well as visual question answering (`VQA`). \n",
    "\n",
    "<img src=\"../../docs/source/_static/summary_detector.png\" width=\"800\" />\n",
    "\n",
    "### Multimodal Summary Model\n",
    "\n",
    "This module is built on the Qwen2.5-VL model family. In this project, two model variants are supported: \n",
    "\n",
    "1. `Qwen2.5-VL-3B-Instruct`, which requires approximately 3 GB of video memory to load.\n",
    "2. `Qwen2.5-VL-7B-Instruct`, which requires up to 8 GB of VRAM for initialization.\n",
    "\n",
    "Each version can be run on the CPU, but this will significantly increase the operating time, so we cannot recommend it, but we retain this option. \n",
    "The model type can be specified when initializing the `MultimodalSummaryModel` class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ammico.MultimodalSummaryModel(\n",
    "    model_id=\"Qwen/Qwen2.5-VL-7B-Instruct\"\n",
    ")  # or \"Qwen/Qwen2.5-VL-3B-Instruct\" respectively"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also define the preferred device type (\"cpu\" or \"cuda\") explicitly during initialization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ammico.MultimodalSummaryModel(\n",
    "    model_id=\"Qwen/Qwen2.5-VL-7B-Instruct\", device=\"cuda\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the initialization follows this logic:\n",
    "\n",
    "If a GPU is available, it is automatically detected and the model defaults to Qwen2.5-VL-7B-Instruct on \"cuda\".\n",
    "\n",
    "If no GPU is detected, the system falls back to the Qwen2.5-VL-3B-Instruct model on the \"cpu\" device."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ammico.MultimodalSummaryModel()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Image Summary and VQA module\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To instantiate class it is required to provide `MultimodalSummaryModel` and dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "image_summary_vqa = ammico.ImageSummaryDetector(summary_model=model, subdict=image_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To perform image analysis, use the analyse_images_from_dict() method.\n",
    "This function provides flexible options for generating summaries and performing visual question answering. \n",
    "1. `analysis_type` – defines the type of analysis to perform. Setting it to `summary` will generate a caption (summary), `questions` will prepare answers (VQA) to a list of questions as set by the user, `summary_and_questions` will do both.\n",
    "2. `list_of_questions` a list of text questions to be answered by the model. This parameter is required when analysis_type is set to \"questions\" or \"summary_and_questions\".\n",
    "3. `keys_batch_size` controls the number of images processed per batch. Increasing this value may slightly improve performance, depending on your system.\n",
    "The default is `16`, which provides a good balance between speed and stability on most setups.\n",
    "4. `is_concise_summary` – determines the level of detail in generated captions:\n",
    "    * `True` → produces short, concise summaries.\n",
    "    * `False` → produces longer, more descriptive captions that may include additional context or atmosphere, but take more time to compute.\n",
    "5. `is_concise_answer`– similar to the previous flag, but for controlling the level of detail in question answering responses."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example Usage**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To generate a concise image summary only:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary = ammico.analyse_images_from_dict(\n",
    "    analysis_type=\"summary\", is_concise_summary=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To generate detailed summaries and answer multiple questions:\n",
    "\n",
    "First, define a list of questions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "list_of_questions = [\n",
    "    \"How many persons on the picture?\",\n",
    "    \"Are there any politicians in the picture?\",\n",
    "    \"Does the picture show something from medicine?\",\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then call the function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "summary_and_answers = ammico.analyse_images_from_dict(\n",
    "    analysis_type=\"summary_and_questions\",\n",
    "    list_of_questions=list_of_questions,\n",
    "    is_concise_summary=False,\n",
    "    is_concise_answer=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to execute only the VQA module without captioning, just specify the `analysis_type` as `questions`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of the `analyse_images_from_dict()` method is a dictionary, where each key corresponds to an input image identifier. Each entry in this dictionary contains the processed results for that image.\n",
    "\n",
    "| output key | output type | output value |\n",
    "| ---------- | ----------- | ------------ |\n",
    "| `caption` | `str` | when `analysis_type=\"summary\"` or `\"summary_and_questions\"`, constant image caption |\n",
    "| `vqa` | `list[str]` | when `analysis_type=\"questions\"` or `summary_and_questions`, the answers to the user-defined input question |\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Video summary and VQA module\n",
    "This module is currently under development and will be demonstrated here as soon as it is ready."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Detection of faces and facial expression analysis\n",
    "Faces and facial expressions are detected and analyzed using the `EmotionDetector` class from the `faces` module. Initially, it is detected if faces are present on the image using RetinaFace, followed by analysis if face masks are worn (Face-Mask-Detection). The probabilistic detection of age, gender, race, and emotions is carried out with deepface, but only if the disclosure statement has been accepted (see above).\n",
    "\n",
    "<img src=\"../../docs/source/_static/emotion_detector.png\" width=\"800\" />\n",
    "\n",
    "Depending on the features found on the image, the face detection module returns a different analysis content: If no faces are found on the image, all further steps are skipped and the result `\"face\": \"No\", \"multiple_faces\": \"No\", \"no_faces\": 0, \"wears_mask\": [\"No\"], \"age\": [None], \"gender\": [None], \"race\": [None], \"emotion\": [None], \"emotion (category)\": [None]` is returned. If one or several faces are found, up to three faces are analyzed if they are partially concealed by a face mask. If yes, only age and gender are detected; if no, also race, emotion, and dominant emotion are detected. In case of the latter, the output could look like this: `\"face\": \"Yes\", \"multiple_faces\": \"Yes\", \"no_faces\": 2, \"wears_mask\": [\"No\", \"No\"], \"age\": [27, 28], \"gender\": [\"Man\", \"Man\"], \"race\": [\"asian\", None], \"emotion\": [\"angry\", \"neutral\"], \"emotion (category)\": [\"Negative\", \"Neutral\"]`, where for the two faces that are detected (given by `no_faces`), some of the values are returned as a list with the first item for the first (largest) face and the second item for the second (smaller) face (for example, `\"emotion\"` returns a list `[\"angry\", \"neutral\"]` signifying the first face expressing anger, and the second face having a neutral expression).\n",
    "\n",
    "The emotion detection reports the seven facial expressions angry, fear, neutral, sad, disgust, happy and surprise. These emotions are assigned based on the returned confidence of the model (between 0 and 1), with a high confidence signifying a high likelihood of the detected emotion being correct. Emotion recognition is not an easy task, even for a human; therefore, we have added a keyword `emotion_threshold` signifying the % value above which an emotion is counted as being detected. The default is set to 50%, so that a confidence above 0.5 results in an emotion being assigned. If the confidence is lower, no emotion is assigned. \n",
    "\n",
    "From the seven facial expressions, an overall dominating emotion category is identified: negative, positive, or neutral emotion. These are defined with the facial expressions angry, disgust, fear and sad for the negative category, happy for the positive category, and surprise and neutral for the neutral category.\n",
    "\n",
    "A similar threshold as for the emotion recognition is set for the race/ethnicity and gender detection, `race_threshold` and `gender_threshold`, with the default set to 50% so that a confidence for race / gender above 0.5 only will return a value in the analysis. \n",
    "\n",
    "For age unfortunately no confidence value is accessible so that no threshold values can be set for this type of analysis. The [reported MAE of the model is &pm; 4.65](https://sefiks.com/2019/02/13/apparent-age-and-gender-prediction-in-keras/).\n",
    "\n",
    "You may also pass the name of the environment variable that determines if you accept or reject the ethical disclosure statement. By default, the variable is named `DISCLOSURE_AMMICO`.\n",
    "\n",
    "Summarizing, the face detection is carried out using the following method call and keywords, where `emotion_threshold`, `race_threshold`, `gender_threshold`, `accept_disclosure` are optional:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for key in image_dict.keys():\n",
    "    image_dict[key] = ammico.EmotionDetector(\n",
    "        image_dict[key],\n",
    "        emotion_threshold=50,\n",
    "        race_threshold=50,\n",
    "        gender_threshold=50,\n",
    "        accept_disclosure=\"DISCLOSURE_AMMICO\",\n",
    "    ).analyse_image()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write output to csv\n",
    "image_df = ammico.get_dataframe(image_dict)\n",
    "image_df.to_csv(\"data_out.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The thresholds can be adapted interactively in the notebook interface and the optimal value can then be used in a subsequent analysis of the whole data set.\n",
    "\n",
    "The output keys that are generated are\n",
    "\n",
    "| output key | output type | output value |\n",
    "| ---------- | ----------- | ------------ |\n",
    "| `face` | `str` | if a face is detected |\n",
    "| `multiple_faces` | `str` | if multiple faces are detected |\n",
    "| `no_faces` | `int` | the number of detected faces |\n",
    "| `wears_mask` | `list[str]` | if each of the detected faces wears a face covering, up to three faces |\n",
    "| `age` | `list[int]` | the detected age, up to three faces |\n",
    "| `gender` | `list[str]` | the detected gender, up to three faces |\n",
    "| `race` | `list[str]` | the detected race, up to three faces, if above the confidence threshold |\n",
    "| `emotion` | `list[str]` | the detected emotion, up to three faces, if above the confidence threshold |\n",
    "| `emotion (category)` | `list[str]` | the detected emotion category (positive, negative, or neutral), up to three faces, if above the confidence threshold |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Image Multimodal Search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This module is currently under development and will be demonstrated here as soon as it is ready."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Color analysis of pictures\n",
    "\n",
    "This module shows primary color analysis of color image using K-Means algorithm.\n",
    "The output are N primary colors and their corresponding percentage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of inspecting each of the images, you can also directly carry out the analysis and export the result into a csv. This may take a while depending on how many images you have loaded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for key in image_dict.keys():\n",
    "    image_dict[key] = ammico.colors.ColorDetector(image_dict[key]).analyse_image()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These steps are required to convert the dictionary of dictionarys into a dictionary with lists, that can be converted into a pandas dataframe and exported to a csv file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = ammico.get_dataframe(image_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write the csv file - here you should provide a file path and file name for the csv file to be written."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv(\"data_out.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further detector modules\n",
    "Please get in touch or open an issue, if you would like to propose further detector modules."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ammico",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}