AMMICO/docs/source/notebooks/Example multimodal.ipynb
Inga Ulusoy 6eff8b8145
Update doc (#61)
* add image summary notebook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pin deepface version to avoid bug with progress bar after update

* update actions version for checkout and python

* test ci without lavis

* no lavis for ci test

* merging

* return lavis

* change lavis to salesforce-lavis

* change pycocotools install method

* change pycocotools install method

* fix_pycocotools

* Downgrade Python

* back to 3.9 and remove pycocotools dependance

* instrucctions for windows

* missing comma after merge

* lavis only for ubuntu

* use lavis package name in install instead of git

* adding multimodal searching py and notebook

* exclude lavis on windows

* skip import on windows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reactivate lavis

* Revert "reactivate lavis"

This reverts commit ecdaf9d316e4b08816ba62da5e0482c8ff15b14e.

* Change input format for multimodal search

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix clip models

* account for new interface in init imports

* changed imports bec of lavis/windows

* fix if-else, added clip ViT-L-14=336 model

* fix code smells

* add model change function to summary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed new model in summary.py

* fixed summary windget

* moved some function to utils

* fixed imort torch in utils

* added test_summary.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed opencv version

* added first test of multimodal_search.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed test

* removed windows in CI and added test in multimodal search

* change lavis from dependencies from pip ro git

* fixed blip2 model in test_multimodal_search.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed test multimodal search on cpu and gpu machines

* added test, fixed dependencies

* add -vv to pytest command in CI

* added test_multimodal_search tests

* fixed tests in test_multimodal_search.py

* fixed tests in test_summary

* changed CI and fixed test_multimodel search

* fixed ci

* fixed error in test multimodal search, changed ci

* added multimodal search test, added windows CI, added picture in test data

* CI debuging

* fixing tests in CI

* fixing test in CI 2

* fixing CI 3

* fixing CI

* added filtering function

* Brought back all tests after CI fixing

* changed CI one pytest by individual tests

* fixed opencv problem

* fix path for text, adjust result for new gcv

* remove opencv

* fixing cv2 error

* added opencv-contrib, change objects_cvlib

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing tests in CI

* fixing CI testing

* fixing codecov in CI

* fixing codecov in CI

* run tests together; install opencv last

* update requirements for opencv dependencies

* first doc updates

* more changes to doc notebooks

---------

Co-authored-by: Petr Andriushchenko <pitandmind@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-03-22 10:56:10 +01:00

342 строки
9.0 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "22df2297-0629-45aa-b88c-6c61f1544db6",
"metadata": {},
"source": [
"# Image Multimodal Search"
]
},
{
"cell_type": "markdown",
"id": "9eeeb302-296e-48dc-86c7-254aa02f2b3a",
"metadata": {},
"source": [
"This notebooks shows some preliminary work on Image Multimodal Search with lavis library. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f10ad6c9-b1a0-4043-8c5d-ed660d77be37",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import misinformation\n",
"import misinformation.multimodal_search as ms"
]
},
{
"cell_type": "markdown",
"id": "acf08b44-3ea6-44cd-926d-15c0fd9f39e0",
"metadata": {},
"source": [
"Set an image path as input file path."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d3fe589-ff3c-4575-b8f5-650db85596bc",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"images = misinformation.utils.find_files(\n",
" path=\"data/\",\n",
" limit=10,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adf3db21-1f8b-4d44-bbef-ef0acf4623a0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mydict = misinformation.utils.initialize_dict(images)"
]
},
{
"cell_type": "markdown",
"id": "987540a8-d800-4c70-a76b-7bfabaf123fa",
"metadata": {},
"source": [
"## Indexing and extracting features from images in selected folder"
]
},
{
"cell_type": "markdown",
"id": "66d6ede4-00bc-4aeb-9a36-e52d7de33fe5",
"metadata": {},
"source": [
"You can choose one of the following models: blip, blip2, albef, clip_base, clip_vitl14, clip_vitl14_336"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bbca1f0-d4b0-43cd-8e05-ee39d37c328e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"model_type = \"blip\"\n",
"# model_type = \"blip2\"\n",
"# model_type = \"albef\"\n",
"# model_type = \"clip_base\"\n",
"# model_type = \"clip_vitl14\"\n",
"# model_type = \"clip_vitl14_336\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca095404-57d0-4f5d-aeb0-38c232252b17",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"(\n",
" model,\n",
" vis_processors,\n",
" txt_processors,\n",
" image_keys,\n",
" image_names,\n",
" features_image_stacked,\n",
") = ms.MultimodalSearch.parsing_images(mydict, model_type)"
]
},
{
"cell_type": "markdown",
"id": "9ff8a894-566b-4c4f-acca-21c50b5b1f52",
"metadata": {},
"source": [
"The tensors of all images `features_image_stacked` was saved in `<Number_of_images>_<model_name>_saved_features_image.pt`. If you run it once for current model and current set of images you do not need to repeat it again. Instead you can load this features with the command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56c6d488-f093-4661-835a-5c73a329c874",
"metadata": {},
"outputs": [],
"source": [
"# (\n",
"# model,\n",
"# vis_processors,\n",
"# txt_processors,\n",
"# image_keys,\n",
"# image_names,\n",
"# features_image_stacked,\n",
"# ) = ms.MultimodalSearch.parsing_images(mydict, model_type,\"18_clip_base_saved_features_image.pt\")"
]
},
{
"cell_type": "markdown",
"id": "309923c1-d6f8-4424-8fca-bde5f3a98b38",
"metadata": {},
"source": [
"Here we already processed our image folder with 18 images with `clip_base` model. So you need just write the name `18_clip_base_saved_features_image.pt` of the saved file that consists of tensors of all images as a 3rd argument to the previous function. "
]
},
{
"cell_type": "markdown",
"id": "162a52e8-6652-4897-b92e-645cab07aaef",
"metadata": {},
"source": [
"Next, you need to form search queries. You can search either by image or by text. You can search for a single query, or you can search for several queries at once, the computational time should not be much different. The format of the queries is as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4196a52-d01e-42e4-8674-5712f7d6f792",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"search_query3 = [\n",
" {\"text_input\": \"politician press conference\"},\n",
" {\"text_input\": \"a person wearing a mask\"},\n",
" {\"image\": \"data/106349S_por.png\"},\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "8bcf3127-3dfd-4ff4-b9e7-a043099b1418",
"metadata": {},
"source": [
"You can filter your results in 3 different ways:\n",
"- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n",
"- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n",
"- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_simularity_value - best_simularity_value_in_current_search)/best_simularity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f7dc52f-7ee9-4590-96b7-e0d9d3b82378",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"similarity = ms.MultimodalSearch.multimodal_search(\n",
" mydict,\n",
" model,\n",
" vis_processors,\n",
" txt_processors,\n",
" model_type,\n",
" image_keys,\n",
" features_image_stacked,\n",
" search_query3,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e1cf7e46-0c2c-4fb2-b89a-ef585ccb9339",
"metadata": {},
"source": [
"After launching `multimodal_search` function, the results of each query will be added to the source dictionary. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ad74b21-6187-4a58-9ed8-fd3e80f5a4ed",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mydict[\"106349S_por\"]"
]
},
{
"cell_type": "markdown",
"id": "cd3ee120-8561-482b-a76a-e8f996783325",
"metadata": {},
"source": [
"A special function was written to present the search results conveniently. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4324e4fd-e9aa-4933-bb12-074d54e0c510",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"ms.MultimodalSearch.show_results(mydict, search_query3[0])"
]
},
{
"cell_type": "markdown",
"id": "d86ab96b-1907-4b7f-a78e-3983b516d781",
"metadata": {
"tags": []
},
"source": [
"## Save search results to csv"
]
},
{
"cell_type": "markdown",
"id": "4bdbc4d4-695d-4751-ab7c-d2d98e2917d7",
"metadata": {
"tags": []
},
"source": [
"Convert the dictionary of dictionarys into a dictionary with lists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c6ddd83-bc87-48f2-a8d6-1bd3f4201ff7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"outdict = misinformation.utils.append_data_to_dict(mydict)\n",
"df = misinformation.utils.dump_df(outdict)"
]
},
{
"cell_type": "markdown",
"id": "ea2675d5-604c-45e7-86d2-080b1f4559a0",
"metadata": {
"tags": []
},
"source": [
"Check the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e78646d6-80be-4d3e-8123-3360957bcaa8",
"metadata": {},
"outputs": [],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"id": "05546d99-afab-4565-8f30-f14e1426abcf",
"metadata": {},
"source": [
"Write the csv file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "185f7dde-20dc-44d8-9ab0-de41f9b5734d",
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"./data_out.csv\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}