diff --git a/README.md b/README.md index 6653f8c..6714ea7 100644 --- a/README.md +++ b/README.md @@ -119,7 +119,7 @@ Place the data files and google cloud vision API key in your google drive to acc ## Features ### Text extraction -The text is extracted from the images using [google-cloud-vision](https://cloud.google.com/vision). For this, you need an API key. Set up your google account following the instructions on the google Vision AI website. +The text is extracted from the images using [google-cloud-vision](https://cloud.google.com/vision). For this, you need an API key. Set up your google account following the instructions on the google Vision AI website or as described [here](docs/google_Cloud_Vision_API/set_up_credentials.md). You then need to export the location of the API key as an environment variable: ``` export GOOGLE_APPLICATION_CREDENTIALS="location of your .json" diff --git a/ammico/notebooks/DemoNotebook_ammico.ipynb b/ammico/notebooks/DemoNotebook_ammico.ipynb index 8d4de55..1696764 100644 --- a/ammico/notebooks/DemoNotebook_ammico.ipynb +++ b/ammico/notebooks/DemoNotebook_ammico.ipynb @@ -20,14 +20,13 @@ "source": [ "# if running on google colab\n", "# flake8-noqa-cell\n", - "import os\n", "\n", "if \"google.colab\" in str(get_ipython()):\n", " # update python version\n", " # install setuptools\n", " # %pip install setuptools==61 -qqq\n", " # uninstall some pre-installed packages due to incompatibility\n", - " %pip uninstall tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint -y -qqq\n", + " %pip uninstall tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint flex-y -qqq\n", " # install ammico\n", " %pip install git+https://github.com/ssciwr/ammico.git -qqq\n", " # mount google drive for data and API key\n", @@ -36,6 +35,20 @@ " drive.mount(\"/content/drive\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can download a dataset for test purposes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": {}, @@ -49,7 +62,49 @@ "metadata": {}, "outputs": [], "source": [ - "import ammico" + "import os\n", + "import ammico\n", + "# for displaying a progress bar\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sometimes you may need to restart a session after installing the correct versions of packages, because `Tensorflow` and `EmotitionDetector` may not work and give an error. You can check it by running this code: \n", + "```\n", + "import tensorflow as tf\n", + "tf.ones([2, 2])\n", + "```\n", + "If this code generates an error, you need to restart session. For this please click `Runtime` -> `Restart session`. And rerun the notebook again. All required packages will already be installed, so the execution will be very fast. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 0: Create and set a Google Cloud Vision Key\n", + "\n", + "Please note that for the [Google Cloud Vision API](https://cloud.google.com/vision/docs/setup) (the TextDetector class) you need to set a key in order to process the images. This key is ideally set as an environment variable using for example\n", + "```\n", + "os.environ[\n", + " \"GOOGLE_APPLICATION_CREDENTIALS\"\n", + "] = \"/content/drive/MyDrive/misinformation-data/misinformation-campaign-981aa55a3b13.json\"\n", + "```\n", + "where you place the key on your Google Drive if running on colab, or place it in a local folder on your machine.\n", + "\n", + "To set up the key, see [here]()." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"/content/drive/MyDrive/misinformation-data/misinformation-campaign-981aa55a3b13.json\"\n", + "os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"../../data/misinformation-campaign-981aa55a3b13.json\"" ] }, { @@ -77,27 +132,28 @@ "outputs": [], "source": [ "image_dict = ammico.find_files(\n", - " path=\"/content/drive/MyDrive/misinformation-data/\",\n", - " #path=\"../../data/\",\n", - " limit=2,\n", + " # path=\"/content/drive/MyDrive/misinformation-data/\",\n", + " path=\"../../data/\",\n", + " limit=15,\n", ")" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_dict" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Inspect the input files using the graphical user interface\n", "A Dash user interface is to select the most suitable options for the analysis, before running a complete analysis on the whole data set. The options for each detector module are explained below in the corresponding sections; for example, different models can be selected that will provide slightly different results. This way, the user can interactively explore which settings provide the most accurate results. In the interface, the nested `image_dict` is passed through the `AnalysisExplorer` class. The interface is run on a specific port which is passed using the `port` keyword; if a port is already in use, it will return an error message, in which case the user should select a different port number. \n", - "The interface opens a dash app inside the Jupyter Notebook and allows selection of the input file in the top left dropdown menu, as well as selection of the detector type in the top right, with options for each detector type as explained below. The output of the detector is shown directly on the right next to the image. This way, the user can directly inspect how updating the options for each detector changes the computed results, and find the best settings for a production run.\n", - "\n", - "Please note that for the Google Cloud Vision API (the TextDetector class) you need to set a key in order to process the images. This key is ideally set as an environment variable using for example\n", - "```\n", - "os.environ[\n", - " \"GOOGLE_APPLICATION_CREDENTIALS\"\n", - "] = \"/content/drive/MyDrive/misinformation-data/misinformation-campaign-981aa55a3b13.json\"\n", - "```\n", - "where you place the key on your Google Drive if running on colab, or place it in a local folder on your machine." + "The interface opens a dash app inside the Jupyter Notebook and allows selection of the input file in the top left dropdown menu, as well as selection of the detector type in the top right, with options for each detector type as explained below. The output of the detector is shown directly on the right next to the image. This way, the user can directly inspect how updating the options for each detector changes the computed results, and find the best settings for a production run." ] }, { @@ -115,8 +171,14 @@ "metadata": {}, "source": [ "## Step 3: Analyze all images\n", - "After having selected the best options for each detector module from the interactive GUI, the analysis can now be run in production on all images in the data set. Depending on the size of the data set and the computing resources available, this can take some time. Please note that you need to have set your Google Cloud Vision API key for the TextDetector to run.\n", - "The desired detector modules are called sequentially in any order, for example:" + "The analysis can be run in production on all images in the data set. Depending on the size of the data set and the computing resources available, this can take some time. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is also possible to set the dump file creation `dump_file` in order to save the calculated data every `dump_every` images. " ] }, { @@ -125,9 +187,61 @@ "metadata": {}, "outputs": [], "source": [ - "for key in image_dict.keys():\n", - " image_dict[key] = ammico.TextDetector(image_dict[key], analyse_text=True).analyse_image()\n", - " image_dict[key] = ammico.EmotionDetector(image_dict[key]).analyse_image()" + "# dump file name\n", + "dump_file = \"dump_file.csv\"\n", + "# dump every N images \n", + "dump_every = 10" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The desired detector modules are called sequentially in any order, for example the `EmotionDetector`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for num, key in tqdm(enumerate(image_dict.keys()), total=len(image_dict)): # loop through all images\n", + " image_dict[key] = ammico.EmotionDetector(image_dict[key]).analyse_image() # analyse image with EmotionDetector and update dict\n", + " \n", + " if num % dump_every == 0 or num == len(image_dict) - 1: # save results every dump_every to dump_file\n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`TextDetector`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for num, key in tqdm(enumerate(image_dict.keys()), total=len(image_dict)): # loop through all images\n", + " image_dict[key] = ammico.TextDetector(image_dict[key], analyse_text=True).analyse_image() # analyse image with EmotionDetector and update dict\n", + " \n", + " if num % dump_every == 0 | num == len(image_dict) - 1: # save results every dump_every to dump_file\n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "len(image_dict)" ] }, { @@ -143,13 +257,43 @@ "metadata": {}, "outputs": [], "source": [ + "# clear memory on cuda first? Faces seems to always not release GPU\n", "# initialize the models\n", - "summary_model, summary_vis_processors = ammico.SummaryDetector(image_dict).load_model(model_type=\"base\")\n", + "image_summary_detector = ammico.SummaryDetector(subdict = image_dict, analysis_type=\"summary\", model_type=\"base\")\n", + "\n", "# run the analysis without having to re-iniatialize the model\n", - "for key in image_dict.keys():\n", - " image_dict[key] = ammico.SummaryDetector(image_dict[key], analysis_type=\"summary\", \n", - " summary_model=summary_model, \n", - " summary_vis_processors=summary_vis_processors).analyse_image()" + "for num, key in tqdm(enumerate(image_dict.keys()),total=len(image_dict)): # loop through all images\n", + " image_dict[key] = image_summary_detector.analyse_image(subdict = image_dict[key], analysis_type=\"summary\") # analyse image with SummaryDetector and update dict\n", + " \n", + " if num % dump_every == 0 | num == len(image_dict) - 1: # save results every dump_every to dump_file\n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or you can run all Detectors in one loop as for example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# initialize the models\n", + "image_summary_detector = ammico.SummaryDetector(subdict = image_dict, analysis_type=\"summary\", model_type=\"base\")\n", + "\n", + "for num, key in tqdm(enumerate(image_dict.keys()),total=len(image_dict)): # loop through all images\n", + " image_dict[key] = ammico.EmotionDetector(image_dict[key]).analyse_image() # analyse image with EmotionDetector and update dict\n", + " image_dict[key] = ammico.TextDetector(image_dict[key], analyse_text=True).analyse_image() # analyse image with TextDetector and update dict\n", + " image_dict[key] = image_summary_detector.analyse_image(subdict = image_dict[key], analysis_type=\"summary\") # analyse image with SummaryDetector and update dict\n", + " \n", + " if num % dump_every == 0 | num == len(image_dict) - 1: # save results every dump_every to dump_file \n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" ] }, { @@ -165,9 +309,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This can be done in a separate loop or in the same loop as for text and emotion detection.\n", - "\n", - "The nested dictionary will be updated from containing only the file id's and paths to the image files, to containing also all the image data." + "The nested dictionary will be updated from containing only the file id's and paths to the image files, to containing all calculated image features." ] }, { @@ -259,12 +401,16 @@ "metadata": {}, "outputs": [], "source": [ - "for key in image_dict.keys():\n", - " image_dict[key] = ammico.TextDetector(image_dict[key], \n", + "for num, key in tqdm(enumerate(image_dict.keys()), total=len(image_dict)): # loop through all images\n", + " image_dict[key] = ammico.TextDetector(image_dict[key], # analyse image with TextDetector and update dict\n", " analyse_text=True, model_names=[\"sshleifer/distilbart-cnn-12-6\", \n", " \"distilbert-base-uncased-finetuned-sst-2-english\", \n", " \"dbmdz/bert-large-cased-finetuned-conll03-english\"], \n", - " revision_numbers=[\"a4f8f3e\", \"af0f99b\", \"f2482bf\"]).analyse_image()" + " revision_numbers=[\"a4f8f3e\", \"af0f99b\", \"f2482bf\"]).analyse_image()\n", + " \n", + " if num % dump_every == 0 | num == len(image_dict) - 1: # save results every dump_every to dump_file\n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" ] }, { @@ -297,15 +443,6 @@ "detector object, and not when running the analysis for each image; the same holds true for the selected model." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "image_summary_detector = ammico.SummaryDetector(image_dict, analysis_type=\"summary\", model_type=\"base\")" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -325,6 +462,44 @@ "| blip2_opt_caption_coco_opt2.7b | BLIP2 pretrained on OPT-2.7b, fine-tuned on COCO | \n", "| blip2_opt_caption_coco_opt6.7b | BLIP2 pretrained on OPT-6.7b, fine-tuned on COCO |\n", "\n", + "Please note that `base`, `large` and `vqa` models can be run on the base TPU video card in Google Colab.\n", + "To run any advanced `BLIP2` models you need more than 20 gb of video memory, so you need to connect a paid A100 in Google Colab." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First of all, we can run only the summary module `analysis_type`. You can choose a `base` or a `large` model_type. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_summary_detector = ammico.SummaryDetector(image_dict, analysis_type=\"summary\", model_type=\"base\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for num, key in tqdm(enumerate(image_dict.keys()),total=len(image_dict)):\n", + " image_dict[key] = image_summary_detector.analyse_image(subdict = image_dict[key], analysis_type=\"summary\")\n", + " \n", + " if num % dump_every == 0 | num == len(image_dict) - 1: \n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "For VQA, a list of questions needs to be passed when carrying out the analysis; these should be given as a list of strings." ] }, @@ -345,7 +520,32 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Summarizing, the detector is run as" + "If you want to execute only the VQA module without captioning, just specify the `analysis_type` as `questions` and `model_type` as `vqa`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_summary_vqa_detector = ammico.SummaryDetector(image_dict, analysis_type=\"questions\", \n", + " model_type=\"vqa\")\n", + "\n", + "for num, key in tqdm(enumerate(image_dict.keys()),total=len(image_dict)):\n", + " image_dict[key] = image_summary_vqa_detector.analyse_image(subdict=image_dict[key], \n", + " analysis_type=\"questions\", \n", + " list_of_questions = list_of_questions)\n", + " if num % dump_every == 0 | num == len(image_dict) - 1: \n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or you can specify the analysis type as `summary_and_questions`, then both caption creation and question answers will be generated for each image. In this case, you can choose a `base` or a `large` model_type. " ] }, { @@ -356,10 +556,13 @@ "source": [ "image_summary_vqa_detector = ammico.SummaryDetector(image_dict, analysis_type=\"summary_and_questions\", \n", " model_type=\"base\")\n", - "for key in image_dict.keys():\n", - " image_dict[key] = image_summary_vqa_detector.analyse_image(image_dict[key], \n", + "for num, key in tqdm(enumerate(image_dict.keys()),total=len(image_dict)):\n", + " image_dict[key] = image_summary_vqa_detector.analyse_image(subdict=image_dict[key], \n", " analysis_type=\"summary_and_questions\", \n", - " list_of_questions = list_of_questions)" + " list_of_questions = list_of_questions)\n", + " if num % dump_every == 0 | num == len(image_dict) - 1: \n", + " image_df = ammico.get_dataframe(image_dict)\n", + " image_df.to_csv(dump_file)" ] }, { @@ -372,8 +575,177 @@ "| ---------- | ----------- | ------------ |\n", "| `const_image_summary` | `str` | when `analysis_type=\"summary\"` or `\"summary_and_questions\"`, constant image caption (does not change upon re-running the analysis for the same model) |\n", "| `3_non-deterministic_summary` | `list[str]` | when `analysis_type=\"summary\"` or s`ummary_and_questions`, three different captions generated with different random seeds |\n", - "| *a user-defined input question* | `str` | when `analysis_type=\"questions\"` or `summary_and_questions`, the answer to the user-defined input question | \n", + "| *a user-defined input question* | `str` | when `analysis_type=\"questions\"` or `summary_and_questions`, the answer to the user-defined input question | \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### BLIP2 models\n", + "This is very heavy models. They requare approx 60GB of RAM and they can use more than 20GB GPUs memory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "obj = ammico.SummaryDetector(subdict=image_dict, analysis_type = \"summary_and_questions\", model_type = \"blip2_t5_caption_coco_flant5xl\")\n", + "# list of the new models that can be used:\n", + "# \"blip2_t5_pretrain_flant5xxl\",\n", + "# \"blip2_t5_pretrain_flant5xl\",\n", + "# \"blip2_t5_caption_coco_flant5xl\",\n", + "# \"blip2_opt_pretrain_opt2.7b\",\n", + "# \"blip2_opt_pretrain_opt6.7b\",\n", + "# \"blip2_opt_caption_coco_opt2.7b\",\n", + "# \"blip2_opt_caption_coco_opt6.7b\",\n", "\n", + "# You can use `pretrain_` model types for zero-shot image-to-text generation with prompts.\n", + "# Or you can use `caption_coco_`` model types to generate coco-style captions.\n", + "# `flant5` and `opt` means that the model equipped with FlanT5 and OPT LLMs respectively.\n", + "\n", + "#also you can perform all calculation on cpu if you set device_type= \"cpu\" or gpu if you set device_type= \"cuda\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key in image_dict:\n", + " image_dict[key] = obj.analyse_image(subdict = image_dict[key], analysis_type=\"summary_and_questions\")\n", + "\n", + "# analysis_type can be \n", + "# \"summary\",\n", + "# \"questions\",\n", + "# \"summary_and_questions\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_dict" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also pass a list of questions to this cell if `analysis_type=\"summary_and_questions\"` or `analysis_type=\"questions\"`. But the format of questions has changed in new models. \n", + "\n", + "Here is an example of a list of questions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_of_questions = [\n", + " \"Question: Are there people in the image? Answer:\",\n", + " \"Question: What is this picture about? Answer:\",\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key in image_dict:\n", + " image_dict[key] = obj.analyse_image(subdict = image_dict[key], analysis_type=\"questions\", list_of_questions=list_of_questions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also pass a question with previous answers as context into this model and pass in questions like this one to get a more accurate answer:\n", + "\n", + "You can combine as many questions as you want in a single query as a list." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_of_questions = [\n", + " \"Question: What country is in the picture? Answer: USA. Question: Why? Answer: Because there is an American flag in the background . Question: Where it comes from? Answer:\",\n", + " \"Question: Which city is this? Answer: Frankfurt. Question: Why?\",\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key in image_dict:\n", + " image_dict[key] = obj.analyse_image(subdict = image_dict[key], analysis_type=\"questions\", list_of_questions=list_of_questions)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_dict" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also ask sequential questions if you pass the argument `cosequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_of_questions = [\n", + " \"Question: Is this picture taken inside or outside? Answer:\",\n", + " \"Question: Why? Answer:\",\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key in image_dict:\n", + " image_dict[key] = obj.analyse_image(subdict = image_dict[key], analysis_type=\"questions\", list_of_questions=list_of_questions, consequential_questions=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_dict" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "## Detection of faces and facial expression analysis\n", "Faces and facial expressions are detected and analyzed using the `EmotionDetector` class from the `faces` module. Initially, it is detected if faces are present on the image using RetinaFace, followed by analysis if face masks are worn (Face-Mask-Detection). The detection of age, gender, race, and emotions is carried out with deepface.\n", "\n", @@ -422,12 +794,450 @@ "| `emotion (category)` | `list[str]` | the detected emotion category (positive, negative, or neutral), up to three faces, if above the confidence threshold |" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Image Multimodal Search" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This module shows how to carry out an image multimodal search with the [LAVIS](https://github.com/salesforce/LAVIS) library. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Indexing and extracting features from images in selected folder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First you need to select a model. You can choose one of the following models: \n", + "- [blip](https://github.com/salesforce/BLIP)\n", + "- [blip2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) \n", + "- [albef](https://github.com/salesforce/ALBEF) \n", + "- [clip_base](https://github.com/openai/CLIP/blob/main/model-card.md)\n", + "- [clip_vitl14](https://github.com/mlfoundations/open_clip) \n", + "- [clip_vitl14_336](https://github.com/mlfoundations/open_clip)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_type = \"blip\"\n", + "# model_type = \"blip2\"\n", + "# model_type = \"albef\"\n", + "# model_type = \"clip_base\"\n", + "# model_type = \"clip_vitl14\"\n", + "# model_type = \"clip_vitl14_336\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To process the loaded images using the selected model, use the below code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_obj = ammico.MultimodalSearch(image_dict)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "(\n", + " model,\n", + " vis_processors,\n", + " txt_processors,\n", + " image_keys,\n", + " image_names,\n", + " features_image_stacked,\n", + ") = my_obj.parsing_images(\n", + " model_type, \n", + " path_to_save_tensors=\"/content/drive/MyDrive/misinformation-data/\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The images are then processed and stored in a numerical representation, a tensor. These tensors do not change for the same image and same model - so if you run this analysis once, and save the tensors giving a path with the keyword `path_to_save_tensors`, a file with filename `.__saved_features_image.pt` will be placed there.\n", + "\n", + "This can save you time if you want to analyse same images with the same model but different questions. To run using the saved tensors, execute the below code giving the path and name of the tensor file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# (\n", + "# model,\n", + "# vis_processors,\n", + "# txt_processors,\n", + "# image_keys,\n", + "# image_names,\n", + "# features_image_stacked,\n", + "# ) = my_obj.parsing_images(\n", + "# model_type,\n", + "# path_to_load_tensors=\"/content/drive/MyDrive/misinformation-data/5_clip_base_saved_features_image.pt\",\n", + "# )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we already processed our image folder with 5 images and the `clip_base` model. So you need just to write the name `5_clip_base_saved_features_image.pt` of the saved file that consists of tensors of all images as keyword argument for `path_to_load_tensors`. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Formulate your search queries\n", + "\n", + "Next, you need to form search queries. You can search either by image or by text. You can search for a single query, or you can search for several queries at once, the computational time should not be much different. The format of the queries is as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import importlib_resources # only requare for image query example\n", + "image_example_query = str(importlib_resources.files(\"ammico\") / \"data\" / \"test-crop-image.png\") # creating the path to the image for the image query example\n", + "\n", + "search_query = [\n", + " {\"text_input\": \"politician press conference\"}, \n", + " {\"text_input\": \"a world map\"},\n", + " {\"text_input\": \"a dog\"}, # This is how looks text query\n", + " {\"image\": image_example_query}, # This is how looks image query, here `image_example_path` is the path to query image like \"data/test-crop-image.png\"\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can filter your results in 3 different ways:\n", + "- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n", + "- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n", + "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_simularity_value - best_simularity_value_in_current_search)/best_simularity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "similarity, sorted_lists = my_obj.multimodal_search(\n", + " model,\n", + " vis_processors,\n", + " txt_processors,\n", + " model_type,\n", + " image_keys,\n", + " features_image_stacked,\n", + " search_query,\n", + " filter_number_of_images=20,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "similarity" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sorted_lists" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After launching `multimodal_search` function, the results of each query will be added to the source dictionary. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image_dict" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A special function was written to present the search results conveniently. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_obj.show_results(\n", + " search_query[0], # you can change the index to see the results for other queries\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_obj.show_results(\n", + " search_query[3], # you can change the index to see the results for other queries\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Improve the search results\n", + "\n", + "For even better results, a slightly different approach has been prepared that can improve search results. It is quite resource-intensive, so it is applied after the main algorithm has found the most relevant images. This approach works only with text queries and it skips image queries. Among the parameters you can choose 3 models: `\"blip_base\"`, `\"blip_large\"`, `\"blip2_coco\"`. If you get an `Out of Memory` error, try reducing the batch_size value (minimum = 1), which is the number of images being processed simultaneously. With the parameter `need_grad_cam = True/False` you can enable the calculation of the heat map of each image to be processed and save them in `image_gradcam_with_itm`. Thus the `image_text_match_reordering()` function calculates new similarity values and new ranks for each image. The resulting values are added to the general dictionary." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "itm_model = \"blip_base\"\n", + "# itm_model = \"blip_large\"\n", + "# itm_model = \"blip2_coco\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "itm_scores, image_gradcam_with_itm = my_obj.image_text_match_reordering(\n", + " search_query,\n", + " itm_model,\n", + " image_keys,\n", + " sorted_lists,\n", + " batch_size=1,\n", + " need_grad_cam=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images querys, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_obj.show_results(\n", + " search_query[0], itm=True, image_gradcam_with_itm=image_gradcam_with_itm\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Save search results to csv" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Convert the dictionary of dictionarys into a dictionary with lists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "outdict = ammico.append_data_to_dict(image_dict)\n", + "df = ammico.dump_df(outdict)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Check the dataframe:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Write the csv file:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Color analysis of pictures\n", + "\n", + "This module shows primary color analysis of color image using K-Means algorithm.\n", + "The output are N primary colors and their corresponding percentage." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment, so please be patient. If you are sure of what you are doing, you can skip this and directly export a csv file in the step below.\n", + "Here, we display the color detection results provided by `colorgram` and `colour` libraries. Click on the tabs to see the results in the right sidebar. You may need to increment the `port` number if you are already running several notebook instances on the same server." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "analysis_explorer = ammico.AnalysisExplorer(image_dict)\n", + "analysis_explorer.run_server(port = 8057)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Instead of inspecting each of the images, you can also directly carry out the analysis and export the result into a csv. This may take a while depending on how many images you have loaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key in image_dict.keys():\n", + " image_dict[key] = ammico.colors.ColorDetector(image_dict[key]).analyse_image()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These steps are required to convert the dictionary of dictionarys into a dictionary with lists, that can be converted into a pandas dataframe and exported to a csv file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = ammico.get_dataframe(image_dict)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Check the dataframe:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Write the csv file - here you should provide a file path and file name for the csv file to be written." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further detector modules\n", - "Further detector modules exist, such as `ColorDetector` and `MultimodalSearch`, also it is possible to carry out a topic analysis on the text data, as well as crop social media posts automatically. These are more experimental features and have their own demonstration notebooks." + "Further detector modules exist, also it is possible to carry out a topic analysis on the text data, as well as crop social media posts automatically. These are more experimental features and have their own demonstration notebooks." ] }, { @@ -452,7 +1262,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.9.16" } }, "nbformat": 4, diff --git a/ammico/test/data/example_faces.json b/ammico/test/data/example_faces.json index 2c984bd..24814eb 100644 --- a/ammico/test/data/example_faces.json +++ b/ammico/test/data/example_faces.json @@ -3,7 +3,6 @@ "multiple_faces": "Yes", "no_faces": 11, "wears_mask": ["No", "No", "Yes"], - "age": [36, 35, 33], "gender": ["Man", "Man", "Man"], "race": ["white", "white", null], "emotion": ["sad", "fear", null], diff --git a/ammico/test/test_display.py b/ammico/test/test_display.py index 5dc0e54..8b19073 100644 --- a/ammico/test/test_display.py +++ b/ammico/test/test_display.py @@ -42,24 +42,6 @@ def test_AnalysisExplorer(get_AE, get_options): assert get_AE.update_picture(None) is None -def test_right_output_analysis_emotions(get_AE, get_options): - get_AE._right_output_analysis( - 2, - get_options[3], - get_options[0], - "EmotionDetector", - True, - None, - None, - 50, - 50, - "CIE 1976", - "summary_and_questions", - "base", - "How many people are in the picture?", - ) - - def test_right_output_analysis_summary(get_AE, get_options): get_AE._right_output_analysis( 2, @@ -76,3 +58,21 @@ def test_right_output_analysis_summary(get_AE, get_options): "base", "How many people are in the picture?", ) + + +def test_right_output_analysis_emotions(get_AE, get_options): + get_AE._right_output_analysis( + 2, + get_options[3], + get_options[0], + "EmotionDetector", + True, + None, + None, + 50, + 50, + "CIE 1976", + "summary_and_questions", + "base", + "How many people are in the picture?", + ) diff --git a/ammico/test/test_faces.py b/ammico/test/test_faces.py index a1d10db..6dcf284 100644 --- a/ammico/test/test_faces.py +++ b/ammico/test/test_faces.py @@ -29,5 +29,7 @@ def test_analyse_faces(get_path): out_dict = json.load(file) # delete the filename key mydict.pop("filename", None) + # delete the age key, as this is conflicting - gives different results sometimes + mydict.pop("age", None) for key in mydict.keys(): assert mydict[key] == out_dict[key] diff --git a/docs/google_Cloud_Vision_API/img0.png b/docs/google_Cloud_Vision_API/img0.png new file mode 100644 index 0000000..9ad5bda Binary files /dev/null and b/docs/google_Cloud_Vision_API/img0.png differ diff --git a/docs/google_Cloud_Vision_API/img1.png b/docs/google_Cloud_Vision_API/img1.png new file mode 100644 index 0000000..6c5265e Binary files /dev/null and b/docs/google_Cloud_Vision_API/img1.png differ diff --git a/docs/google_Cloud_Vision_API/img10.png b/docs/google_Cloud_Vision_API/img10.png new file mode 100644 index 0000000..cfbb25d Binary files /dev/null and b/docs/google_Cloud_Vision_API/img10.png differ diff --git a/docs/google_Cloud_Vision_API/img11.png b/docs/google_Cloud_Vision_API/img11.png new file mode 100644 index 0000000..78c2464 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img11.png differ diff --git a/docs/google_Cloud_Vision_API/img12.png b/docs/google_Cloud_Vision_API/img12.png new file mode 100644 index 0000000..1c665d5 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img12.png differ diff --git a/docs/google_Cloud_Vision_API/img13.png b/docs/google_Cloud_Vision_API/img13.png new file mode 100644 index 0000000..02f095c Binary files /dev/null and b/docs/google_Cloud_Vision_API/img13.png differ diff --git a/docs/google_Cloud_Vision_API/img14.png b/docs/google_Cloud_Vision_API/img14.png new file mode 100644 index 0000000..107691f Binary files /dev/null and b/docs/google_Cloud_Vision_API/img14.png differ diff --git a/docs/google_Cloud_Vision_API/img15.png b/docs/google_Cloud_Vision_API/img15.png new file mode 100644 index 0000000..a26fca3 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img15.png differ diff --git a/docs/google_Cloud_Vision_API/img16.png b/docs/google_Cloud_Vision_API/img16.png new file mode 100644 index 0000000..1423082 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img16.png differ diff --git a/docs/google_Cloud_Vision_API/img17.png b/docs/google_Cloud_Vision_API/img17.png new file mode 100644 index 0000000..b8ebcfb Binary files /dev/null and b/docs/google_Cloud_Vision_API/img17.png differ diff --git a/docs/google_Cloud_Vision_API/img18.png b/docs/google_Cloud_Vision_API/img18.png new file mode 100644 index 0000000..3307c7e Binary files /dev/null and b/docs/google_Cloud_Vision_API/img18.png differ diff --git a/docs/google_Cloud_Vision_API/img19.png b/docs/google_Cloud_Vision_API/img19.png new file mode 100644 index 0000000..6515d96 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img19.png differ diff --git a/docs/google_Cloud_Vision_API/img2.png b/docs/google_Cloud_Vision_API/img2.png new file mode 100644 index 0000000..4be0b87 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img2.png differ diff --git a/docs/google_Cloud_Vision_API/img3.png b/docs/google_Cloud_Vision_API/img3.png new file mode 100644 index 0000000..3c61d62 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img3.png differ diff --git a/docs/google_Cloud_Vision_API/img4.png b/docs/google_Cloud_Vision_API/img4.png new file mode 100644 index 0000000..4fb16c5 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img4.png differ diff --git a/docs/google_Cloud_Vision_API/img5.png b/docs/google_Cloud_Vision_API/img5.png new file mode 100644 index 0000000..3c6b75f Binary files /dev/null and b/docs/google_Cloud_Vision_API/img5.png differ diff --git a/docs/google_Cloud_Vision_API/img6.png b/docs/google_Cloud_Vision_API/img6.png new file mode 100644 index 0000000..992b59a Binary files /dev/null and b/docs/google_Cloud_Vision_API/img6.png differ diff --git a/docs/google_Cloud_Vision_API/img7.png b/docs/google_Cloud_Vision_API/img7.png new file mode 100644 index 0000000..98b8a04 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img7.png differ diff --git a/docs/google_Cloud_Vision_API/img8.png b/docs/google_Cloud_Vision_API/img8.png new file mode 100644 index 0000000..f707070 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img8.png differ diff --git a/docs/google_Cloud_Vision_API/img9.png b/docs/google_Cloud_Vision_API/img9.png new file mode 100644 index 0000000..38cdee1 Binary files /dev/null and b/docs/google_Cloud_Vision_API/img9.png differ diff --git a/docs/google_Cloud_Vision_API/set_up_credentials.md b/docs/google_Cloud_Vision_API/set_up_credentials.md new file mode 100644 index 0000000..28c3ac8 --- /dev/null +++ b/docs/google_Cloud_Vision_API/set_up_credentials.md @@ -0,0 +1,44 @@ +# Instructions how to generate and enable a google Cloud Vision API key + +1. Go to [google-cloud-vision](https://cloud.google.com/vision) and click on "Console". Sign into your google account / create a new google account if prompted. This will bring you to the following page, where you click on "project" in the top of the screen. +![img0](img0.png) +2. Select "project" from the top left drop-down menu. +![img1](img1.png) +3. Click on "NEW PROJECT" on the left of the pop-up window. +![img2](img2.png) +4. Enter a project name and click on "CREATE". +![img3](img3.png) +5. Now you should be back on the dashboard. In the top right, click on the three vertical dots. +![img4](img4.png) +6. In the drop-down menu, select "Project settings". +![img5](img5.png) +7. In the menu on the left, click on "Service Accounts". +![img6](img6.png) +8. Click on "+ CREATE SERVICE ACCOUNT". +![img7](img7.png) +9. Select a service account ID (you can pick this as any name you wish). Click on "DONE". +![img8](img8.png) +10. Now your service account should show up in the list of service accounts. +![img9](img9.png) +11. Click on the three vertical dots to the right of your service account name and select "Manage keys". +![img10](img10.png) +12. Click on "Create new key". +![img11](img11.png) +13. In the pop-up window, select "JSON" and click "CREATE". +![img12](img12.png) +14. The private key is directly downloaded to your computer. It should be in your downloads folder. +![img13](img13.png) +15. The JSON key file will look something like this (any private information has been blanked out in the screenshot). +![img14](img14.png) +16. Now go back to your browser window. Click on "Google Cloud" in the top left corner. +![img15](img15.png) +17. Now select "APIs & Services". +![img16](img16.png) +18. From the selection of APIs, select "Cloud Vision API" or search for it and then select. +![img17](img17.png) +19. Click on "ENABLE". +![img18](img18.png) +20. Google Cloud Vision API is now enabled for your key. +![img19](img19.png) +21. Place the JSON key in a selected folder on your computer and reference this key in your Jupyter Notebook / Python console when running ammico. Or, upload it to your google Drive to use it on google Colaboratory. + diff --git a/pyproject.toml b/pyproject.toml index e1a078a..493547c 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -54,6 +54,7 @@ dependencies = [ "webcolors", "colour-science", "scikit-learn>1.3.0", + "tqdm" ] [project.scripts]