AMMICO/ammico/notebooks/multimodal_search.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Image Multimodal Search"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "This notebooks shows how to carry out an image multimodal search with the [LAVIS](https://github.com/salesforce/LAVIS) library. \n",
    "\n",
    "The first cell is only run on google colab and installs the [ammico](https://github.com/ssciwr/AMMICO) package.\n",
    "\n",
    "After that, we can import `ammico` and read in the files given a folder path."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# if running on google colab\n",
    "# flake8-noqa-cell\n",
    "import os\n",
    "\n",
    "if \"google.colab\" in str(get_ipython()):\n",
    "    # update python version\n",
    "    # install setuptools\n",
    "    # %pip install setuptools==61 -qqq\n",
    "    # uninstall some pre-installed packages due to incompatibility\n",
    "    %pip uninstall tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint -y -qqq\n",
    "    # install ammico\n",
    "    %pip install git+https://github.com/ssciwr/ammico.git -qqq\n",
    "    # mount google drive for data and API key\n",
    "    from google.colab import drive\n",
    "\n",
    "    drive.mount(\"/content/drive\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import ammico"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "mydict = ammico.find_files(\n",
    "    path=\"/content/drive/MyDrive/misinformation-data/\",\n",
    "    limit=10,\n",
    ")\n",
    "mydict"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "## Indexing and extracting features from images in selected folder"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "First you need to select a model. You can choose one of the following models: \n",
    "- [blip](https://github.com/salesforce/BLIP)\n",
    "- [blip2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) \n",
    "- [albef](https://github.com/salesforce/ALBEF) \n",
    "- [clip_base](https://github.com/openai/CLIP/blob/main/model-card.md)\n",
    "- [clip_vitl14](https://github.com/mlfoundations/open_clip) \n",
    "- [clip_vitl14_336](https://github.com/mlfoundations/open_clip)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "model_type = \"blip\"\n",
    "# model_type = \"blip2\"\n",
    "# model_type = \"albef\"\n",
    "# model_type = \"clip_base\"\n",
    "# model_type = \"clip_vitl14\"\n",
    "# model_type = \"clip_vitl14_336\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "To process the loaded images using the selected model, use the below code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_obj = ammico.MultimodalSearch(mydict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "(\n",
    "    model,\n",
    "    vis_processors,\n",
    "    txt_processors,\n",
    "    image_keys,\n",
    "    image_names,\n",
    "    features_image_stacked,\n",
    ") = my_obj.parsing_images(\n",
    "    model_type, \n",
    "    path_to_save_tensors=\"/content/drive/MyDrive/misinformation-data/\",\n",
    "    )"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "The images are then processed and stored in a numerical representation, a tensor. These tensors do not change for the same image and same model - so if you run this analysis once, and save the tensors giving a path with the keyword `path_to_save_tensors`, a file with filename `.<Number_of_images>_<model_name>_saved_features_image.pt` will be placed there.\n",
    "\n",
    "This can save you time if you want to analyse same images with the same model but different questions. To run using the saved tensors, execute the below code giving the path and name of the tensor file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# (\n",
    "#     model,\n",
    "#     vis_processors,\n",
    "#     txt_processors,\n",
    "#     image_keys,\n",
    "#     image_names,\n",
    "#     features_image_stacked,\n",
    "# ) = my_obj.parsing_images(\n",
    "#     model_type,\n",
    "#     path_to_load_tensors=\"/content/drive/MyDrive/misinformation-data/5_clip_base_saved_features_image.pt\",\n",
    "# )"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "Here we already processed our image folder with 5 images and the `clip_base` model. So you need just to write the name `5_clip_base_saved_features_image.pt` of the saved file that consists of tensors of all images as keyword argument for `path_to_load_tensors`. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "## Formulate your search queries\n",
    "\n",
    "Next, you need to form search queries. You can search either by image or by text. You can search for a single query, or you can search for several queries at once, the computational time should not be much different. The format of the queries is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import importlib_resources                                                                       # only requare for image query example\n",
    "image_example_path = str(importlib_resources.files(\"ammico\") / \"data\" / \"test-crop-image.png\")   # creating the path to the image for the image query example\n",
    "\n",
    "search_query3 = [\n",
    "    {\"text_input\": \"politician press conference\"},  \n",
    "    {\"text_input\": \"a world map\"},\n",
    "    {\"text_input\": \"a dog\"},            # This is how looks text query\n",
    "    {\"image\": image_example_path},      # This is how looks image query, here `image_example_path` is the path to query image like \"data/test-crop-image.png\"\n",
    "]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "You can filter your results in 3 different ways:\n",
    "- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n",
    "- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n",
    "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_simularity_value - best_simularity_value_in_current_search)/best_simularity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "similarity, sorted_lists = my_obj.multimodal_search(\n",
    "    model,\n",
    "    vis_processors,\n",
    "    txt_processors,\n",
    "    model_type,\n",
    "    image_keys,\n",
    "    features_image_stacked,\n",
    "    search_query3,\n",
    "    filter_number_of_images=20,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "similarity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "sorted_lists"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "After launching `multimodal_search` function, the results of each query will be added to the source dictionary.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "mydict"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "A special function was written to present the search results conveniently. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "my_obj.show_results(\n",
    "    search_query3[0], # you can change the index to see the results for other queries\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "## Improve the search results\n",
    "\n",
    "For even better results, a slightly different approach has been prepared that can improve search results. It is quite resource-intensive, so it is applied after the main algorithm has found the most relevant images. This approach works only with text queries and it skips image queries. Among the parameters you can choose 3 models: `\"blip_base\"`, `\"blip_large\"`, `\"blip2_coco\"`. If you get an `Out of Memory` error, try reducing the batch_size value (minimum = 1), which is the number of images being processed simultaneously. With the parameter `need_grad_cam = True/False` you can enable the calculation of the heat map of each image to be processed and save them in `image_gradcam_with_itm`. Thus the `image_text_match_reordering()` function calculates new similarity values and new ranks for each image. The resulting values are added to the general dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "itm_model = \"blip_base\"\n",
    "# itm_model = \"blip_large\"\n",
    "# itm_model = \"blip2_coco\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "itm_scores, image_gradcam_with_itm = my_obj.image_text_match_reordering(\n",
    "    search_query3,\n",
    "    itm_model,\n",
    "    image_keys,\n",
    "    sorted_lists,\n",
    "    batch_size=1,\n",
    "    need_grad_cam=True,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "27",
   "metadata": {},
   "source": [
    "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images querys, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "my_obj.show_results(\n",
    "    search_query3[0], itm=True, image_gradcam_with_itm=image_gradcam_with_itm\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "29",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Save search results to csv"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "30",
   "metadata": {
    "tags": []
   },
   "source": [
    "Convert the dictionary of dictionarys into a dictionary with lists:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "outdict = ammico.append_data_to_dict(mydict)\n",
    "df = ammico.dump_df(outdict)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "32",
   "metadata": {
    "tags": []
   },
   "source": [
    "Check the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.head(10)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "34",
   "metadata": {},
   "source": [
    "Write the csv file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}