* feat: create space in demo notebook for multiple image search

* added multiple reference images, re #241. Separated notebooks for #248

* Update ci.yml

added nbval test for notebook

* Update ci.yml

more descriptive test name

* Update pyproject.toml

added typing_extensions

* Update pyproject.toml

typo

* Update pyproject.toml

fixing dependencies, hopefully

* Update ci.yml

fixing dependencies hopefully part 2

* Update ci.yml

typo (again)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update DemoNotebook_ammico.ipynb

cleanup

* Update DemoNotebook_ammico_MultimodalSearch.ipynb

added df to save

* Update DemoNotebook_ammico_MultimodalSearch.ipynb

typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update notebook to include proper paths for sample data

* update paths in demo notebook

---------

Co-authored-by: ChristineSchulz <aq354@uni-heidelberg.de>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Этот коммит содержится в:
Inga Ulusoy 2025-04-10 22:18:39 +02:00 коммит произвёл GitHub
родитель 1a629836db
Коммит efe1851fea
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
4 изменённых файлов: 549 добавлений и 9 удалений

6
.github/workflows/ci.yml поставляемый
Просмотреть файл

@ -31,6 +31,7 @@ jobs:
- name: Install dependencies
run: |
pip install -e .
pip install typing_extensions==4.13.0
# python -m pip install uv
# uv pip install --system -e .
- name: Run pytest test_colors
@ -73,6 +74,11 @@ jobs:
run: |
cd ammico
python -m pytest test/test_utils.py -svv --cov=. --cov-report=xml --cov-append
- name: Run Notebook tests
run: |
cd ammico/notebooks
python -m pytest --nbval DemoNotebook_ammico_MultimodalSearch.ipynb
- name: Upload coverage
if: matrix.os == 'ubuntu-24.04' && matrix.python-version == '3.11'
uses: codecov/codecov-action@v3

Просмотреть файл

@ -1037,6 +1037,26 @@
"To process the loaded images using the selected model, use the below code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_dict = ammico.find_files(\n",
" path=str(data_path),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_dict"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -1061,7 +1081,7 @@
" features_image_stacked,\n",
") = my_obj.parsing_images(\n",
" model_type, \n",
" path_to_save_tensors=\"/content/drive/MyDrive/misinformation-data/\",\n",
" path_to_save_tensors=str(data_path),\n",
" )"
]
},
@ -1121,9 +1141,6 @@
"image_example_query = str(importlib_resources.files(\"ammico\") / \"data\" / \"test-crop-image.png\") # creating the path to the image for the image query example\n",
"\n",
"search_query = [\n",
" {\"text_input\": \"politician press conference\"}, \n",
" {\"text_input\": \"a world map\"},\n",
" {\"text_input\": \"a dog\"}, # This is how looks text query\n",
" {\"image\": image_example_query}, # This is how looks image query, here `image_example_path` is the path to query image like \"data/test-crop-image.png\"\n",
"]"
]
@ -1208,22 +1225,29 @@
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Formulate your search queries: Search for the best match using multiple reference images, for example, of a person"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_obj.show_results(\n",
" search_query[3], # you can change the index to see the results for other queries\n",
")"
"# Here goes the code that reads in multiple images as reference\n",
"# then you will loop over these multiple images and find the best matches\n",
"# in the end, the best matches will be averaged over for each picture and a list of averaged best matches will be provided"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Improve the search results\n",
"### Improve the search results: Use only for text queries, not image search\n",
"\n",
"For even better results, a slightly different approach has been prepared that can improve search results. It is quite resource-intensive, so it is applied after the main algorithm has found the most relevant images. This approach works only with text queries and it skips image queries. Among the parameters you can choose 3 models: `\"blip_base\"`, `\"blip_large\"`, `\"blip2_coco\"`. If you get an `Out of Memory` error, try reducing the batch_size value (minimum = 1), which is the number of images being processed simultaneously. With the parameter `need_grad_cam = True/False` you can enable the calculation of the heat map of each image to be processed and save them in `image_gradcam_with_itm`. Thus the `image_text_match_reordering()` function calculates new similarity values and new ranks for each image. The resulting values are added to the general dictionary."
]
@ -1452,7 +1476,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.11.11"
}
},
"nbformat": 4,

Просмотреть файл

@ -0,0 +1,509 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AMMICO Demonstration Notebook\n",
"With ammico, you can analyze text on images and image content at the same time. This is a demonstration notebook to showcase the capabilities of ammico.\n",
"You can run this notebook on google colab or locally / on your own HPC resource. The analysis can be quite slow on the google colab default runtime. For production data processing, it is recommended to run the analysis locally on a GPU-supported machine. You can also make use of the colab GPU runtime, or purchase additional runtime. However, google colab comes with pre-installed libraries that can lead to dependency conflicts. The setting on google colab changes frequently, so it is only ensured that this demonstration notebook runs on the default runtime. \n",
"\n",
"This first cell only runs on google colab; on all other machines, you need to create a conda environment first and install ammico from the Python Package Index using \n",
"```pip install ammico``` \n",
"Alternatively you can install the development version from the GitHub repository \n",
"```pip install git+https://github.com/ssciwr/AMMICO.git```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# if running on google colab\\\n",
"# PLEASE RUN THIS ONLY AS CPU RUNTIME\n",
"# for a GPU runtime, there are conflicts with pre-installed packages - \n",
"# you first need to uninstall them (prepare a clean environment with no pre-installs) and then install ammico\n",
"# flake8-noqa-cell\n",
"\n",
"if \"google.colab\" in str(get_ipython()):\n",
" # update python version\n",
" # install setuptools\n",
" # %pip install setuptools==61 -qqq\n",
" # uninstall some pre-installed packages due to incompatibility\n",
" %pip uninstall --yes tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint flex-y jax jaxlib -qqq\n",
" # install ammico\n",
" %pip install git+https://github.com/ssciwr/ammico.git -qqq\n",
" # install older version of jax to support transformers use of diffusers\n",
" # mount google drive for data and API key\n",
" from google.colab import drive\n",
"\n",
" drive.mount(\"/content/drive\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use a test dataset\n",
"\n",
"You can download this dataset for test purposes. Skip this step if you use your own data. If the data set on Hugging Face is gated or private, Hugging Face will ask you for a login token. However, for the default dataset in this notebook you do not need to provide one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"from pathlib import Path\n",
"\n",
"# If the dataset is gated/private, make sure you have run huggingface-cli login\n",
"dataset = load_dataset(\"iulusoy/test-images\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next you need to provide a path for the saved images - a folder where the data is stored locally. This directory is automatically created if it does not exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_path = \"./data-test\"\n",
"data_path = Path(data_path)\n",
"print(data_path)\n",
"data_path.mkdir(parents=True, exist_ok=True)\n",
"# now save the files from the Huggingface dataset as images into the data_path folder\n",
"for i, image in enumerate(dataset[\"train\"][\"image\"]):\n",
" filename = \"img\" + str(i) + \".png\"\n",
" image.save(data_path / filename)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import the ammico package"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# NBVAL_IGNORE_OUTPUT\n",
"# ignore output of this cell for automated testing\n",
"import os\n",
"# jax also sometimes leads to problems on google colab\n",
"# if this is the case, try restarting the kernel and executing this \n",
"# and the above two code cells again\n",
"import ammico\n",
"# for displaying a progress bar\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes you may need to restart a session after installing the correct versions of packages, because `Tensorflow` and `EmotitionDetector` may not work and give an error. You can check it by running this code: \n",
"```\n",
"import tensorflow as tf\n",
"tf.ones([2, 2])\n",
"```\n",
"If this code generates an error, you need to restart session. For this please click `Runtime` -> `Restart session`. And rerun the notebook again. All required packages will already be installed, so the execution will be very fast. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Image Multimodal Search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This module shows how to carry out an image multimodal search with the [LAVIS](https://github.com/salesforce/LAVIS) library. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indexing and extracting features from images in selected folder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First you need to select a model. You can choose one of the following models: \n",
"- [blip](https://github.com/salesforce/BLIP)\n",
"- [blip2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) \n",
"- [albef](https://github.com/salesforce/ALBEF) \n",
"- [clip_base](https://github.com/openai/CLIP/blob/main/model-card.md)\n",
"- [clip_vitl14](https://github.com/mlfoundations/open_clip) \n",
"- [clip_vitl14_336](https://github.com/mlfoundations/open_clip)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_type = \"blip\"\n",
"# model_type = \"blip2\"\n",
"# model_type = \"albef\"\n",
"# model_type = \"clip_base\"\n",
"# model_type = \"clip_vitl14\"\n",
"# model_type = \"clip_vitl14_336\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To process the loaded images using the selected model, use the below code and substitute the path to your images:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_dict = ammico.find_files(\n",
" path = data_path,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_dict"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_obj = ammico.MultimodalSearch(image_dict)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(\n",
" model,\n",
" vis_processors,\n",
" txt_processors,\n",
" image_keys,\n",
" image_names,\n",
" features_image_stacked,\n",
") = my_obj.parsing_images(\n",
" model_type, \n",
" path_to_save_tensors=data_path,\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The images are then processed and stored in a numerical representation, a tensor. These tensors do not change for the same image and same model - so if you run this analysis once, and save the tensors giving a path with the keyword `path_to_save_tensors`, a file with filename `.<Number_of_images>_<model_name>_saved_features_image.pt` will be placed there.\n",
"\n",
"This can save you time if you want to analyse the same images with the same model but different questions. To run using the saved tensors, execute the below code giving the path and name of the tensor file. Any subsequent query of the model will run in a fraction of the time than it run in initially."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# uncomment the code below if you want to load the tensors from the drive\n",
"# and just want to ask different questions for the same set of images\n",
"# (\n",
"# model,\n",
"# vis_processors,\n",
"# txt_processors,\n",
"# image_keys,\n",
"# image_names,\n",
"# features_image_stacked,\n",
"# ) = my_obj.parsing_images(\n",
"# model_type,\n",
"# path_to_load_tensors=\"/content/drive/MyDrive/misinformation-data/5_clip_base_saved_features_image.pt\",\n",
"# )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we already processed our image folder with 5 images and the `clip_base` model. So you need just to write the name `5_clip_base_saved_features_image.pt` of the saved file that consists of tensors of all images as keyword argument for `path_to_load_tensors`. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Formulate your search queries\n",
"\n",
"Next, you need to form search queries. You can search either by image or by text. You can search for a single query, or you can search for several queries at once, the computational time should not be much different. The format of the queries is as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_example_query = data_path / \"img0.png\" \n",
"\n",
"search_query = [\n",
" {\"image\": str(image_example_query)}, # This is how looks image query, here `image_example_path` is the path to query image like \"data/test-crop-image.png\"\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can filter your results in 3 different ways:\n",
"- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n",
"- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n",
"- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_similarity_value - best_similarity_value_in_current_search)/best_similarity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"similarity, sorted_lists = my_obj.multimodal_search(\n",
" model,\n",
" vis_processors,\n",
" txt_processors,\n",
" model_type,\n",
" image_keys,\n",
" features_image_stacked,\n",
" search_query,\n",
" filter_number_of_images=20,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"similarity"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sorted_lists "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After launching `multimodal_search` function, the results of each query will be added to the source dictionary. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A special function was written to present the search results conveniently. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_obj.show_results(\n",
" search_query[0], # you can change the index to see the results for other queries\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Formulate your search queries: Search for the best match using multiple reference images, for example, of a person"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Here goes the code that reads in multiple images as reference\n",
"# then you will loop over these multiple images and find the best matches\n",
"# in the end, the best matches will be averaged over for each picture and a list of averaged best matches will be provided"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_example_query = data_path / \"img0.png\" # creating the path to the image for the image query example\n",
"image_example_query2 = data_path / \"img1.png\"\n",
"\n",
"search_query = [\n",
" {\"image\": str(image_example_query)}, # This is how looks image query, here `image_example_path` is the path to query image like \"data/test-crop-image.png\"\n",
" {\"image\": str(image_example_query2)},\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"similarity, sorted_lists = my_obj.multimodal_search(\n",
" model,\n",
" vis_processors,\n",
" txt_processors,\n",
" model_type,\n",
" image_keys,\n",
" features_image_stacked,\n",
" search_query,\n",
" filter_number_of_images=20,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"similarity # now a 2D tensor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# average similarities\n",
"print(similarity.mean(dim = 1)) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# add the similarity average to the image_dict\n",
"for key in image_dict.keys():\n",
" # find the similarities for each image in the search query\n",
" similarities = [image_dict[key][query[\"image\"]] for query in search_query]\n",
" image_dict[key][\"similarity_average\"] = sum(similarities)/len(similarities)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Not yet compatible with show_results due to dictionary design"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# convert to dataframe\n",
"df = ammico.get_dataframe(image_dict)\n",
"df.head(10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# save to csv\n",
"df.to_csv(data_path / \"data_out.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "ammico",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

Просмотреть файл

@ -33,6 +33,7 @@ dependencies = [
"jupyter",
"jupyter_dash",
"matplotlib",
"nbval",
"numpy<=1.23.4",
"pandas",
"peft<=0.13.0",