* add image summary notebook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pin deepface version to avoid bug with progress bar after update

* update actions version for checkout and python

* test ci without lavis

* no lavis for ci test

* merging

* return lavis

* change lavis to salesforce-lavis

* change pycocotools install method

* change pycocotools install method

* fix_pycocotools

* Downgrade Python

* back to 3.9 and remove pycocotools dependance

* instrucctions for windows

* missing comma after merge

* lavis only for ubuntu

* use lavis package name in install instead of git

* adding multimodal searching py and notebook

* exclude lavis on windows

* skip import on windows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reactivate lavis

* Revert "reactivate lavis"

This reverts commit ecdaf9d316e4b08816ba62da5e0482c8ff15b14e.

* Change input format for multimodal search

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix clip models

* account for new interface in init imports

* changed imports bec of lavis/windows

* fix if-else, added clip ViT-L-14=336 model

* fix code smells

* add model change function to summary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed new model in summary.py

* fixed summary windget

* moved some function to utils

* fixed imort torch in utils

* added test_summary.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed opencv version

* added first test of multimodal_search.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed test

* removed windows in CI and added test in multimodal search

* change lavis from dependencies from pip ro git

* fixed blip2 model in test_multimodal_search.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed test multimodal search on cpu and gpu machines

* added test, fixed dependencies

* add -vv to pytest command in CI

* added test_multimodal_search tests

* fixed tests in test_multimodal_search.py

* fixed tests in test_summary

* changed CI and fixed test_multimodel search

* fixed ci

* fixed error in test multimodal search, changed ci

* added multimodal search test, added windows CI, added picture in test data

* CI debuging

* fixing tests in CI

* fixing test in CI 2

* fixing CI 3

* fixing CI

* added filtering function

* Brought back all tests after CI fixing

* changed CI one pytest by individual tests

* fixed opencv problem

* fix path for text, adjust result for new gcv

* remove opencv

* fixing cv2 error

* added opencv-contrib, change objects_cvlib

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing tests in CI

* fixing CI testing

* cleanup objects

* fixing codecov in CI

* fixing codecov in CI

* run tests together; install opencv last

* update requirements for opencv dependencies

* moved lavis functions from utils to summary

* Remove lavis from utils.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing jupyter

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: iulusoy <inga.ulusoy@uni-heidelberg.de>
Этот коммит содержится в:
Petr Andriushchenko 2023-03-22 10:28:09 +01:00 коммит произвёл GitHub
родитель a6578cfdf3
Коммит 2891c8a6ed
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
27 изменённых файлов: 2050 добавлений и 143 удалений

4
.github/workflows/ci.yml поставляемый
Просмотреть файл

@ -14,7 +14,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-22.04, windows-latest]
os: [ubuntu-22.04,windows-latest]
python-version: [3.9]
steps:
- name: Checkout repository
@ -32,7 +32,7 @@ jobs:
- name: Run pytest
run: |
cd misinformation
python -m pytest -s -m "not gcv" --cov=. --cov-report=xml
python -m pytest --cov=. --cov-report=xml
- name: Upload coverage
if: matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.9'
uses: codecov/codecov-action@v3

Просмотреть файл

@ -29,6 +29,20 @@ pip install .
```
This will install the package and its dependencies locally.
## Installation on Windows
Some modules use [lavis]() to anaylse image content. To enable this functionality on Windows OS, you need to install some dependencies that are not available by default or can be obtained from the command line:
1. Download [Visual C++](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170) and install (see also [here](https://github.com/philferriere/cocoapi)).
1. Then install the coco API from Github
```
pip install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
```
1. Now you can install the package by navigating to the misinformation directory and typing
```
pip install .
```
in the command prompt.
# Usage
There are sample notebooks in the `misinformation/notebooks` folder for you to explore the package:

Просмотреть файл

@ -8,11 +8,3 @@ except ImportError:
# Export the version defined in project metadata
__version__ = metadata.version(__package__)
del metadata
from misinformation.display import explore_analysis
from misinformation.utils import (
find_files,
initialize_dict,
append_data_to_dict,
dump_df,
)

Просмотреть файл

@ -5,6 +5,8 @@ import misinformation.faces as faces
import misinformation.text as text
import misinformation.objects as objects
import misinformation.summary as summary
class JSONContainer:
"""Expose a Python dictionary as a JSON document in JupyterLab
@ -26,6 +28,7 @@ def explore_analysis(mydict, identify="faces"):
"faces": faces.EmotionDetector,
"text-on-image": text.TextDetector,
"objects": objects.ObjectDetector,
"summary": summary.SummaryDetector,
}
# create a list containing the image ids for the widget
# image_paths = [mydict[key]["filename"] for key in mydict.keys()]

Просмотреть файл

@ -141,7 +141,7 @@ class EmotionDetector(utils.AnalysisMethod):
DeepFace.analyze(
img_path=face,
actions=actions,
prog_bar=False,
silent=True,
detector_backend="skip",
)
)

358
misinformation/multimodal_search.py Обычный файл
Просмотреть файл

@ -0,0 +1,358 @@
from misinformation.utils import AnalysisMethod
import torch
import torch.nn.functional as Func
import requests
import lavis
from PIL import Image
from IPython.display import display
from lavis.models import load_model_and_preprocess
class MultimodalSearch(AnalysisMethod):
def __init__(self, subdict: dict) -> None:
super().__init__(subdict)
# self.subdict.update(self.set_keys())
multimodal_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def load_feature_extractor_model_blip2(self, device):
model, vis_processors, txt_processors = load_model_and_preprocess(
name="blip2_feature_extractor",
model_type="pretrain",
is_eval=True,
device=device,
)
return model, vis_processors, txt_processors
def load_feature_extractor_model_blip(self, device):
model, vis_processors, txt_processors = load_model_and_preprocess(
name="blip_feature_extractor",
model_type="base",
is_eval=True,
device=device,
)
return model, vis_processors, txt_processors
def load_feature_extractor_model_albef(self, device):
model, vis_processors, txt_processors = load_model_and_preprocess(
name="albef_feature_extractor",
model_type="base",
is_eval=True,
device=device,
)
return model, vis_processors, txt_processors
def load_feature_extractor_model_clip_base(self, device):
model, vis_processors, txt_processors = load_model_and_preprocess(
name="clip_feature_extractor",
model_type="base",
is_eval=True,
device=device,
)
return model, vis_processors, txt_processors
def load_feature_extractor_model_clip_vitl14(self, device):
model, vis_processors, txt_processors = load_model_and_preprocess(
name="clip_feature_extractor",
model_type="ViT-L-14",
is_eval=True,
device=device,
)
return model, vis_processors, txt_processors
def load_feature_extractor_model_clip_vitl14_336(self, device):
model, vis_processors, txt_processors = load_model_and_preprocess(
name="clip_feature_extractor",
model_type="ViT-L-14-336",
is_eval=True,
device=device,
)
return model, vis_processors, txt_processors
def read_img(self, filepath):
raw_image = Image.open(filepath).convert("RGB")
return raw_image
def read_and_process_images(self, image_paths, vis_processor):
raw_images = [MultimodalSearch.read_img(self, path) for path in image_paths]
images = [
vis_processor["eval"](r_img)
.unsqueeze(0)
.to(MultimodalSearch.multimodal_device)
for r_img in raw_images
]
images_tensors = torch.stack(images)
return raw_images, images_tensors
def extract_image_features_blip2(self, model, images_tensors):
with torch.cuda.amp.autocast(
enabled=(MultimodalSearch.multimodal_device != torch.device("cpu"))
):
features_image = [
model.extract_features({"image": ten, "text_input": ""}, mode="image")
for ten in images_tensors
]
features_image_stacked = torch.stack(
[feat.image_embeds_proj[:, 0, :].squeeze(0) for feat in features_image]
)
return features_image_stacked
def extract_image_features_clip(self, model, images_tensors):
features_image = [
model.extract_features({"image": ten}) for ten in images_tensors
]
features_image_stacked = torch.stack(
[Func.normalize(feat.float(), dim=-1).squeeze(0) for feat in features_image]
)
return features_image_stacked
def extract_image_features_basic(self, model, images_tensors):
features_image = [
model.extract_features({"image": ten, "text_input": ""}, mode="image")
for ten in images_tensors
]
features_image_stacked = torch.stack(
[feat.image_embeds_proj[:, 0, :].squeeze(0) for feat in features_image]
)
return features_image_stacked
def save_tensors(
self, model_type, features_image_stacked, name="saved_features_image.pt"
):
with open(
str(len(features_image_stacked)) + "_" + model_type + "_" + name, "wb"
) as f:
torch.save(features_image_stacked, f)
return name
def load_tensors(self, name="saved_features_image.pt"):
features_image_stacked = torch.load(name)
return features_image_stacked
def extract_text_features(self, model, text_input):
sample_text = {"text_input": [text_input]}
features_text = model.extract_features(sample_text, mode="text")
return features_text
def parsing_images(self, model_type, path_to_saved_tensors=None):
if model_type in ("clip_base", "clip_vitl14_336", "clip_vitl14"):
path_to_lib = lavis.__file__[:-11] + "models/clip_models/"
url = "https://raw.githubusercontent.com/salesforce/LAVIS/main/lavis/models/clip_models/bpe_simple_vocab_16e6.txt.gz"
r = requests.get(url, allow_redirects=False)
open(path_to_lib + "bpe_simple_vocab_16e6.txt.gz", "wb").write(r.content)
image_keys = sorted(self.keys())
image_names = [self[k]["filename"] for k in image_keys]
select_model = {
"blip2": MultimodalSearch.load_feature_extractor_model_blip2,
"blip": MultimodalSearch.load_feature_extractor_model_blip,
"albef": MultimodalSearch.load_feature_extractor_model_albef,
"clip_base": MultimodalSearch.load_feature_extractor_model_clip_base,
"clip_vitl14": MultimodalSearch.load_feature_extractor_model_clip_vitl14,
"clip_vitl14_336": MultimodalSearch.load_feature_extractor_model_clip_vitl14_336,
}
select_extract_image_features = {
"blip2": MultimodalSearch.extract_image_features_blip2,
"blip": MultimodalSearch.extract_image_features_basic,
"albef": MultimodalSearch.extract_image_features_basic,
"clip_base": MultimodalSearch.extract_image_features_clip,
"clip_vitl14": MultimodalSearch.extract_image_features_clip,
"clip_vitl14_336": MultimodalSearch.extract_image_features_clip,
}
if model_type in select_model.keys():
(model, vis_processors, txt_processors,) = select_model[
model_type
](self, MultimodalSearch.multimodal_device)
else:
raise SyntaxError(
"Please, use one of the following models: blip2, blip, albef, clip_base, clip_vitl14, clip_vitl14_336"
)
raw_images, images_tensors = MultimodalSearch.read_and_process_images(
self, image_names, vis_processors
)
if path_to_saved_tensors is None:
with torch.no_grad():
features_image_stacked = select_extract_image_features[model_type](
self, model, images_tensors
)
MultimodalSearch.save_tensors(self, model_type, features_image_stacked)
else:
features_image_stacked = MultimodalSearch.load_tensors(
self, str(path_to_saved_tensors)
)
return (
model,
vis_processors,
txt_processors,
image_keys,
image_names,
features_image_stacked,
)
def querys_processing(
self, search_query, model, txt_processors, vis_processors, model_type
):
select_extract_image_features = {
"blip2": MultimodalSearch.extract_image_features_blip2,
"blip": MultimodalSearch.extract_image_features_basic,
"albef": MultimodalSearch.extract_image_features_basic,
"clip_base": MultimodalSearch.extract_image_features_clip,
"clip_vitl14": MultimodalSearch.extract_image_features_clip,
"clip_vitl14_336": MultimodalSearch.extract_image_features_clip,
}
for query in search_query:
if not (len(query) == 1) and (query in ("image", "text_input")):
raise SyntaxError(
'Each querry must contain either an "image" or a "text_input"'
)
multi_sample = []
for query in search_query:
if "text_input" in query.keys():
text_processing = txt_processors["eval"](query["text_input"])
images_tensors = ""
elif "image" in query.keys():
_, images_tensors = MultimodalSearch.read_and_process_images(
self, [query["image"]], vis_processors
)
text_processing = ""
multi_sample.append(
{"image": images_tensors, "text_input": text_processing}
)
multi_features_query = []
for query in multi_sample:
if query["image"] == "":
if model_type in ("clip_base", "clip_vitl14_336", "clip_vitl14"):
features = model.extract_features(
{"text_input": query["text_input"]}
)
features_squeeze = features.squeeze(0).to(
MultimodalSearch.multimodal_device
)
multi_features_query.append(
Func.normalize(features_squeeze, dim=-1)
)
else:
features = model.extract_features(query, mode="text")
features_squeeze = (
features.text_embeds_proj[:, 0, :]
.squeeze(0)
.to(MultimodalSearch.multimodal_device)
)
multi_features_query.append(features_squeeze)
if query["text_input"] == "":
multi_features_query.append(
select_extract_image_features[model_type](
self, model, query["image"]
)
)
multi_features_stacked = torch.stack(
[query.squeeze(0) for query in multi_features_query]
).to(MultimodalSearch.multimodal_device)
return multi_features_stacked
def multimodal_search(
self,
model,
vis_processors,
txt_processors,
model_type,
image_keys,
features_image_stacked,
search_query,
filter_number_of_images=None,
filter_val_limit=None,
filter_rel_error=None,
):
if filter_number_of_images is None:
filter_number_of_images = len(self)
if filter_val_limit is None:
filter_val_limit = 0
if filter_rel_error is None:
filter_rel_error = 1e10
features_image_stacked.to(MultimodalSearch.multimodal_device)
with torch.no_grad():
multi_features_stacked = MultimodalSearch.querys_processing(
self, search_query, model, txt_processors, vis_processors, model_type
)
similarity = features_image_stacked @ multi_features_stacked.t()
# similarity_soft_max = torch.nn.Softmax(dim=0)(similarity / 0.01)
sorted_lists = [
sorted(range(len(similarity)), key=lambda k: similarity[k, i], reverse=True)
for i in range(len(similarity[0]))
]
places = [[item.index(i) for i in range(len(item))] for item in sorted_lists]
for q in range(len(search_query)):
max_val = similarity[sorted_lists[q][0]][q].item()
print(max_val)
for i, key in zip(range(len(image_keys)), sorted_lists[q]):
if (
i < filter_number_of_images
and similarity[key][q].item() > filter_val_limit
and 100 * abs(max_val - similarity[key][q].item()) / max_val
< filter_rel_error
):
self[image_keys[key]][
"rank " + list(search_query[q].values())[0]
] = places[q][key]
self[image_keys[key]][
list(search_query[q].values())[0]
] = similarity[key][q].item()
else:
self[image_keys[key]][
"rank " + list(search_query[q].values())[0]
] = None
self[image_keys[key]][list(search_query[q].values())[0]] = 0
return similarity, sorted_lists
def show_results(self, query):
if "image" in query.keys():
pic = Image.open(query["image"]).convert("RGB")
pic.thumbnail((400, 400))
display(
"Your search query: ",
pic,
"--------------------------------------------------",
"Results:",
)
elif "text_input" in query.keys():
display(
"Your search query: " + query["text_input"],
"--------------------------------------------------",
"Results:",
)
for s in sorted(
self.items(), key=lambda t: t[1][list(query.values())[0]], reverse=True
):
if s[1]["rank " + list(query.values())[0]] is None:
break
p1 = Image.open(s[1]["filename"]).convert("RGB")
p1.thumbnail((400, 400))
display(
"Rank: "
+ str(s[1]["rank " + list(query.values())[0]])
+ " Val: "
+ str(s[1][list(query.values())[0]]),
s[0],
p1,
)
display(
"--------------------------------------------------",
)

Просмотреть файл

@ -1,5 +1,7 @@
import cv2
import cvlib as cv
import numpy as np
from PIL import Image
def objects_from_cvlib(objects_list: list) -> dict:
@ -50,7 +52,11 @@ class ObjectCVLib(ObjectsMethod):
image_path: The path to the local file.
"""
img = cv2.imread(image_path)
bbox, label, conf = cv.detect_common_objects(img)
# preimg = Image.open(image_path).convert("RGB")
# preimg2 = np.asarray(preimg)
# img = cv2.cvtColor(preimg2, cv2.COLOR_BGR2RGB)
_, label, _ = cv.detect_common_objects(img)
# output_image = draw_bbox(im, bbox, label, conf)
objects = objects_from_cvlib(label)
return objects

104
misinformation/summary.py Обычный файл
Просмотреть файл

@ -0,0 +1,104 @@
from misinformation.utils import AnalysisMethod
from torch import device, cuda, no_grad
from PIL import Image
from lavis.models import load_model_and_preprocess
class SummaryDetector(AnalysisMethod):
def __init__(self, subdict: dict) -> None:
super().__init__(subdict)
summary_device = device("cuda" if cuda.is_available() else "cpu")
summary_model, summary_vis_processors, _ = load_model_and_preprocess(
name="blip_caption",
model_type="base_coco",
is_eval=True,
device=summary_device,
)
def load_model_base(self):
summary_device = device("cuda" if cuda.is_available() else "cpu")
summary_model, summary_vis_processors, _ = load_model_and_preprocess(
name="blip_caption",
model_type="base_coco",
is_eval=True,
device=summary_device,
)
return summary_model, summary_vis_processors
def load_model_large(self):
summary_device = device("cuda" if cuda.is_available() else "cpu")
summary_model, summary_vis_processors, _ = load_model_and_preprocess(
name="blip_caption",
model_type="large_coco",
is_eval=True,
device=summary_device,
)
return summary_model, summary_vis_processors
def load_model(self, model_type):
select_model = {
"base": SummaryDetector.load_model_base,
"large": SummaryDetector.load_model_large,
}
summary_model, summary_vis_processors = select_model[model_type](self)
return summary_model, summary_vis_processors
def analyse_image(self, summary_model=None, summary_vis_processors=None):
if summary_model is None and summary_vis_processors is None:
summary_model = SummaryDetector.summary_model
summary_vis_processors = SummaryDetector.summary_vis_processors
path = self.subdict["filename"]
raw_image = Image.open(path).convert("RGB")
image = (
summary_vis_processors["eval"](raw_image)
.unsqueeze(0)
.to(self.summary_device)
)
with no_grad():
self.subdict["const_image_summary"] = summary_model.generate(
{"image": image}
)[0]
self.subdict["3_non-deterministic summary"] = summary_model.generate(
{"image": image}, use_nucleus_sampling=True, num_captions=3
)
return self.subdict
(
summary_VQA_model,
summary_VQA_vis_processors,
summary_VQA_txt_processors,
) = load_model_and_preprocess(
name="blip_vqa", model_type="vqav2", is_eval=True, device=summary_device
)
def analyse_questions(self, list_of_questions):
if len(list_of_questions) > 0:
path = self.subdict["filename"]
raw_image = Image.open(path).convert("RGB")
image = (
self.summary_VQA_vis_processors["eval"](raw_image)
.unsqueeze(0)
.to(self.summary_device)
)
question_batch = []
for quest in list_of_questions:
question_batch.append(self.summary_VQA_txt_processors["eval"](quest))
batch_size = len(list_of_questions)
image_batch = image.repeat(batch_size, 1, 1, 1)
with no_grad():
answers_batch = self.summary_VQA_model.predict_answers(
samples={"image": image_batch, "text_input": question_batch},
inference_method="generate",
)
for q, a in zip(list_of_questions, answers_batch):
self.subdict[q] = a
else:
print("Please, enter list of questions")
return self.subdict

18
misinformation/test/conftest.py Обычный файл
Просмотреть файл

@ -0,0 +1,18 @@
import os
import pytest
@pytest.fixture
def get_path(request):
mypath = os.path.dirname(request.module.__file__)
mypath = mypath + "/data/"
return mypath
@pytest.fixture
def set_environ(request):
mypath = os.path.dirname(request.module.__file__)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = (
mypath + "/../../data/seismic-bonfire-329406-412821a70264.json"
)
print(os.environ.get("GOOGLE_APPLICATION_CREDENTIALS"))

Двоичные данные
misinformation/test/data/IMG_2809.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 1.2 MiB

После

Ширина:  |  Высота:  |  Размер: 1.2 MiB

Двоичные данные
misinformation/test/data/IMG_3758.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.9 MiB

Просмотреть файл

@ -1 +1 @@
{"filename": "./test/data/IMG_2809.png", "person": "yes", "bicycle": "no", "car": "yes", "motorcycle": "no", "airplane": "no", "bus": "yes", "train": "no", "truck": "no", "boat": "no", "traffic light": "no", "cell phone": "no"}
{"filename": "IMG_2809.png", "person": "yes", "bicycle": "no", "car": "yes", "motorcycle": "no", "airplane": "no", "bus": "yes", "train": "no", "truck": "no", "boat": "no", "traffic light": "no", "cell phone": "no"}

Просмотреть файл

@ -3,10 +3,10 @@ The Quantum Theory of
Nonrelativistic Collisions
JOHN R. TAYLOR
University of Colorado
ostaliga Lanbidean
postaldia Lanbidean
1 ilde
ballenger stor goin
gdĐOL, SIVI 23 TL 02
gd OOL, STVÍ 23 TL 02
de in obl
och yd badalang
a

Просмотреть файл

@ -3,12 +3,12 @@ The Quantum Theory of
Nonrelativistic Collisions
JOHN R. TAYLOR
University of Colorado
ostaliga Lanbidean
postaldia Lanbidean
1 ilde
balloons big goin
gdĐOL, SIVI 23 TL
there in obl
och yd change
ballenger stor goin
gd OOL, STVÍ 23 TL 02
de in obl
och yd badalang
a
Ber
ook Sy-RW isn't going anywhere
ook Sy-RW enot go baldus

Просмотреть файл

@ -1,11 +1,14 @@
import json
from misinformation.display import explore_analysis
from pytest import approx
# import misinformation.display as misinf_display
import pytest
misinf_display = pytest.importorskip("misinformation.display")
def test_explore_analysis_faces():
mydict = {"IMG_2746": {"filename": "./test/data/IMG_2746.png"}}
explore_analysis(mydict, identify="faces")
misinf_display.explore_analysis(mydict, identify="faces")
with open("./test/data/example_faces.json", "r") as file:
outs = json.load(file)
@ -17,7 +20,7 @@ def test_explore_analysis_faces():
def test_explore_analysis_objects():
mydict = {"IMG_2746": {"filename": "./test/data/IMG_2809.png"}}
explore_analysis(mydict, identify="objects")
misinf_display.explore_analysis(mydict, identify="objects")
with open("./test/data/example_analysis_objects.json", "r") as file:
outs = json.load(file)

605
misinformation/test/test_multimodal_search.py Обычный файл
Просмотреть файл

@ -0,0 +1,605 @@
import pytest
import math
from PIL import Image
import numpy
from torch import device, cuda
import misinformation.multimodal_search as ms
testdict = {
"d755771b-225e-432f-802e-fb8dc850fff7": {
"filename": "./test/data/d755771b-225e-432f-802e-fb8dc850fff7.png"
},
"IMG_2746": {"filename": "./test/data/IMG_2746.png"},
"IMG_2750": {"filename": "./test/data/IMG_2750.png"},
"IMG_2805": {"filename": "./test/data/IMG_2805.png"},
"IMG_2806": {"filename": "./test/data/IMG_2806.png"},
"IMG_2807": {"filename": "./test/data/IMG_2807.png"},
"IMG_2808": {"filename": "./test/data/IMG_2808.png"},
"IMG_2809": {"filename": "./test/data/IMG_2809.png"},
"IMG_3755": {"filename": "./test/data/IMG_3755.jpg"},
"IMG_3756": {"filename": "./test/data/IMG_3756.jpg"},
"IMG_3757": {"filename": "./test/data/IMG_3757.jpg"},
"pic1": {"filename": "./test/data/pic1.png"},
}
related_error = 1e-3
gpu_is_not_available = not cuda.is_available()
cuda.empty_cache()
def test_read_img():
my_dict = {}
test_img = ms.MultimodalSearch.read_img(my_dict, testdict["IMG_2746"]["filename"])
assert list(numpy.array(test_img)[257][34]) == [70, 66, 63]
pre_proc_pic_blip2_blip_albef = [
-1.0039474964141846,
-1.0039474964141846,
-0.8433647751808167,
-0.6097899675369263,
-0.5951915383338928,
-0.6243883967399597,
-0.6827820539474487,
-0.6097899675369263,
-0.7119789123535156,
-1.0623412132263184,
]
pre_proc_pic_clip_vitl14 = [
-0.7995694875717163,
-0.7849710583686829,
-0.7849710583686829,
-0.7703726291656494,
-0.7703726291656494,
-0.7849710583686829,
-0.7849710583686829,
-0.7703726291656494,
-0.7703726291656494,
-0.7703726291656494,
]
pre_proc_pic_clip_vitl14_336 = [
-0.7995694875717163,
-0.7849710583686829,
-0.7849710583686829,
-0.7849710583686829,
-0.7849710583686829,
-0.7849710583686829,
-0.7849710583686829,
-0.9163569211959839,
-1.149931788444519,
-1.0039474964141846,
]
pre_proc_text_blip2_blip_albef = (
"the bird sat on a tree located at the intersection of 23rd and 43rd streets"
)
pre_proc_text_clip_clip_vitl14_clip_vitl14_336 = (
"The bird sat on a tree located at the intersection of 23rd and 43rd streets."
)
pre_extracted_feature_img_blip2 = [
0.04566730558872223,
-0.042554520070552826,
-0.06970272958278656,
-0.009771779179573059,
0.01446065679192543,
0.10173682868480682,
0.007092420011758804,
-0.020045937970280647,
0.12923966348171234,
0.006452132016420364,
]
pre_extracted_feature_img_blip = [
-0.02480311505496502,
0.05037587881088257,
0.039517853409051895,
-0.06994109600782394,
-0.12886561453342438,
0.047039758414030075,
-0.11620642244815826,
-0.003398326924070716,
-0.07324369996786118,
0.06994668394327164,
]
pre_extracted_feature_img_albef = [
0.08971136063337326,
-0.10915573686361313,
-0.020636577159166336,
0.048121627420186996,
-0.05943416804075241,
-0.129856139421463,
-0.0034469354432076216,
0.017888527363538742,
-0.03284582123160362,
-0.1037328764796257,
]
pre_extracted_feature_img_clip = [
0.01621132344007492,
-0.004035486374050379,
-0.04304071143269539,
-0.03459808602929115,
0.016922621056437492,
-0.025056276470422745,
-0.04178355261683464,
0.02165347896516323,
-0.003224249929189682,
0.020485712215304375,
]
pre_extracted_feature_img_parsing_clip = [
0.01621132344007492,
-0.004035486374050379,
-0.04304071143269539,
-0.03459808602929115,
0.016922621056437492,
-0.025056276470422745,
-0.04178355261683464,
0.02165347896516323,
-0.003224249929189682,
0.020485712215304375,
]
pre_extracted_feature_img_clip_vitl14 = [
-0.023943455889821053,
-0.021703708916902542,
0.035043686628341675,
0.019495919346809387,
0.014351222664117813,
-0.008634116500616074,
0.01610446907579899,
-0.003426523646339774,
0.011931191198527813,
0.0008691544644534588,
]
pre_extracted_feature_img_clip_vitl14_336 = [
-0.009511193260550499,
-0.012618942186236382,
0.034754861146211624,
0.016356879845261574,
-0.0011549904011189938,
-0.008054453879594803,
0.0011990377679467201,
-0.010806051082909107,
0.00140204350464046,
0.0006861367146484554,
]
pre_extracted_feature_text_blip2 = [
-0.1384204626083374,
-0.008662976324558258,
0.006269007455557585,
0.03151319921016693,
0.060558050870895386,
-0.03230040520429611,
0.015861615538597107,
-0.11856459826231003,
-0.058296192437410355,
0.03699290752410889,
]
pre_extracted_feature_text_blip = [
0.0118643119931221,
-0.01291718054562807,
-0.0009687161073088646,
0.01428765058517456,
-0.05591396614909172,
0.07386433333158493,
-0.11475936323404312,
0.01620068959891796,
0.0062415082938969135,
0.0034833091776818037,
]
pre_extracted_feature_text_albef = [
-0.06229640915989876,
0.11278597265481949,
0.06628583371639252,
0.1649140566587448,
0.068987175822258,
0.006291372701525688,
0.03244050219655037,
-0.049556829035282135,
0.050752390176057816,
-0.0421440489590168,
]
pre_extracted_feature_text_clip = [
0.018169036135077477,
0.03634127229452133,
0.025660742074251175,
0.009149895049631596,
-0.035570453852415085,
0.033126577734947205,
-0.004808237310498953,
-0.0031453112605959177,
-0.02194291725754738,
0.024019461125135422,
]
pre_extracted_feature_text_clip_vitl14 = [
-0.0055463071912527084,
0.006908962037414312,
-0.019450219348073006,
-0.018097277730703354,
0.017567576840519905,
-0.03828490898013115,
-0.03781530633568764,
-0.023951737210154533,
0.01365653332322836,
-0.02341713197529316,
]
pre_extracted_feature_text_clip_vitl14_336 = [
-0.008720514364540577,
0.005284308455884457,
-0.021116750314831734,
-0.018112430348992348,
0.01685470901429653,
-0.03517491742968559,
-0.038612402975559235,
-0.021867064759135246,
0.01685977540910244,
-0.023832324892282486,
]
simularity_blip2 = [
[0.05826476216316223, -0.03215287625789642],
[0.12869958579540253, 0.005234059877693653],
[0.11073512583971024, 0.12327003479003906],
[0.08743024617433548, 0.05598106235265732],
[0.04591086134314537, 0.48981112241744995],
[0.06297147274017334, 0.4728018641471863],
[0.18486255407333374, 0.635167121887207],
[0.015356295742094517, 0.015282897278666496],
[-0.008485622704029083, 0.010882291942834854],
[-0.04328630864620209, -0.13117870688438416],
[-0.025470387190580368, 0.13175423443317413],
[-0.05090826004743576, 0.05902523919939995],
]
sorted_blip2 = [
[6, 1, 2, 3, 5, 0, 4, 7, 8, 10, 9, 11],
[6, 4, 5, 10, 2, 11, 3, 7, 8, 1, 0, 9],
]
simularity_blip = [
[0.15640679001808167, 0.752173662185669],
[0.15139800310134888, 0.7804810404777527],
[0.13010388612747192, 0.755257248878479],
[0.13746635615825653, 0.7618774175643921],
[0.1756758838891983, 0.8531903624534607],
[0.17233705520629883, 0.8448910117149353],
[0.1970970332622528, 0.8916105628013611],
[0.11693969368934631, 0.5833531618118286],
[0.12386563420295715, 0.5981853604316711],
[0.08427951484918594, 0.4962371587753296],
[0.14193706214427948, 0.7613846659660339],
[0.12051936239004135, 0.6492202281951904],
]
sorted_blip = [
[6, 4, 5, 0, 1, 10, 3, 2, 8, 11, 7, 9],
[6, 4, 5, 1, 3, 10, 2, 0, 11, 8, 7, 9],
]
simularity_albef = [
[0.12321824580430984, 0.35511350631713867],
[0.09512615948915482, 0.27168408036231995],
[0.09053325653076172, 0.20215675234794617],
[0.06335515528917313, 0.15055638551712036],
[0.09604836255311966, 0.4658776521682739],
[0.10870333760976791, 0.5143978595733643],
[0.11748822033405304, 0.6542638540267944],
[0.05688793584704399, 0.22170542180538177],
[0.05597608536481857, 0.11963296681642532],
[0.059643782675266266, 0.14969395101070404],
[0.06690303236246109, 0.3149859607219696],
[0.07909377664327621, 0.11911341547966003],
]
sorted_albef = [
[0, 6, 5, 4, 1, 2, 11, 10, 3, 9, 7, 8],
[6, 5, 4, 0, 10, 1, 7, 2, 3, 9, 8, 11],
]
simularity_clip = [
[0.23923014104366302, 0.5325412750244141],
[0.20101115107536316, 0.5112978219985962],
[0.17522737383842468, 0.49811851978302],
[0.20062290132045746, 0.5415266156196594],
[0.22865726053714752, 0.5762109756469727],
[0.2310466319322586, 0.5910375714302063],
[0.2644523084163666, 0.7851459383964539],
[0.21474510431289673, 0.4135811924934387],
[0.16407863795757294, 0.1474374681711197],
[0.19819433987140656, 0.26493316888809204],
[0.19545596837997437, 0.5007457137107849],
[0.1647854745388031, 0.45705708861351013],
]
sorted_clip = [
[6, 0, 5, 4, 7, 1, 3, 9, 10, 2, 11, 8],
[6, 5, 4, 3, 0, 1, 10, 2, 11, 7, 9, 8],
]
simularity_clip_vitl14 = [
[0.1051270067691803, 0.5184808373451233],
[0.09705893695354462, 0.49574509263038635],
[0.11964304000139236, 0.5424358248710632],
[0.13881900906562805, 0.5909714698791504],
[0.12728188931941986, 0.6758255362510681],
[0.1277746558189392, 0.6841973662376404],
[0.18026694655418396, 0.803142786026001],
[0.13977059721946716, 0.45957139134407043],
[0.11180847883224487, 0.24822194874286652],
[0.12296056002378464, 0.35143694281578064],
[0.11596094071865082, 0.5704031586647034],
[0.10174489766359329, 0.44422751665115356],
]
sorted_clip_vitl14 = [
[6, 7, 3, 5, 4, 9, 2, 10, 8, 0, 11, 1],
[6, 5, 4, 3, 10, 2, 0, 1, 7, 11, 9, 8],
]
simularity_clip_vitl14_336 = [
[0.09391091763973236, 0.49337542057037354],
[0.11103834211826324, 0.4881117343902588],
[0.12891019880771637, 0.5501476526260376],
[0.13288410007953644, 0.5498673915863037],
[0.12357455492019653, 0.6749162077903748],
[0.13700757920742035, 0.7003108263015747],
[0.1788637489080429, 0.7713702321052551],
[0.13260436058044434, 0.4300197660923004],
[0.11666625738143921, 0.2334875613451004],
[0.1316065937280655, 0.3291645646095276],
[0.12374477833509445, 0.5632147192955017],
[0.10333051532506943, 0.43023794889450073],
]
sorted_clip_vitl14_336 = [
[6, 5, 3, 7, 9, 2, 10, 4, 8, 1, 11, 0],
[6, 5, 4, 10, 2, 3, 0, 1, 11, 7, 9, 8],
]
@pytest.mark.parametrize(
(
"pre_multimodal_device",
"pre_model",
"pre_proc_pic",
"pre_proc_text",
"pre_extracted_feature_img",
"pre_extracted_feature_text",
"pre_simularity",
"pre_sorted",
),
[
pytest.param(
device("cuda"),
"blip2",
pre_proc_pic_blip2_blip_albef,
pre_proc_text_blip2_blip_albef,
pre_extracted_feature_img_blip2,
pre_extracted_feature_text_blip2,
simularity_blip2,
sorted_blip2,
marks=pytest.mark.skipif(
gpu_is_not_available, reason="gpu_is_not_availible"
),
),
(
device("cpu"),
"blip",
pre_proc_pic_blip2_blip_albef,
pre_proc_text_blip2_blip_albef,
pre_extracted_feature_img_blip,
pre_extracted_feature_text_blip,
simularity_blip,
sorted_blip,
),
pytest.param(
device("cuda"),
"blip",
pre_proc_pic_blip2_blip_albef,
pre_proc_text_blip2_blip_albef,
pre_extracted_feature_img_blip,
pre_extracted_feature_text_blip,
simularity_blip,
sorted_blip,
marks=pytest.mark.skipif(
gpu_is_not_available, reason="gpu_is_not_availible"
),
),
(
device("cpu"),
"albef",
pre_proc_pic_blip2_blip_albef,
pre_proc_text_blip2_blip_albef,
pre_extracted_feature_img_albef,
pre_extracted_feature_text_albef,
simularity_albef,
sorted_albef,
),
pytest.param(
device("cuda"),
"albef",
pre_proc_pic_blip2_blip_albef,
pre_proc_text_blip2_blip_albef,
pre_extracted_feature_img_albef,
pre_extracted_feature_text_albef,
simularity_albef,
sorted_albef,
marks=pytest.mark.skipif(
gpu_is_not_available, reason="gpu_is_not_availible"
),
),
(
device("cpu"),
"clip_base",
pre_proc_pic_clip_vitl14,
pre_proc_text_clip_clip_vitl14_clip_vitl14_336,
pre_extracted_feature_img_clip,
pre_extracted_feature_text_clip,
simularity_clip,
sorted_clip,
),
pytest.param(
device("cuda"),
"clip_base",
pre_proc_pic_clip_vitl14,
pre_proc_text_clip_clip_vitl14_clip_vitl14_336,
pre_extracted_feature_img_clip,
pre_extracted_feature_text_clip,
simularity_clip,
sorted_clip,
marks=pytest.mark.skipif(
gpu_is_not_available, reason="gpu_is_not_availible"
),
),
(
device("cpu"),
"clip_vitl14",
pre_proc_pic_clip_vitl14,
pre_proc_text_clip_clip_vitl14_clip_vitl14_336,
pre_extracted_feature_img_clip_vitl14,
pre_extracted_feature_text_clip_vitl14,
simularity_clip_vitl14,
sorted_clip_vitl14,
),
pytest.param(
device("cuda"),
"clip_vitl14",
pre_proc_pic_clip_vitl14,
pre_proc_text_clip_clip_vitl14_clip_vitl14_336,
pre_extracted_feature_img_clip_vitl14,
pre_extracted_feature_text_clip_vitl14,
simularity_clip_vitl14,
sorted_clip_vitl14,
marks=pytest.mark.skipif(
gpu_is_not_available, reason="gpu_is_not_availible"
),
),
(
device("cpu"),
"clip_vitl14_336",
pre_proc_pic_clip_vitl14_336,
pre_proc_text_clip_clip_vitl14_clip_vitl14_336,
pre_extracted_feature_img_clip_vitl14_336,
pre_extracted_feature_text_clip_vitl14_336,
simularity_clip_vitl14_336,
sorted_clip_vitl14_336,
),
pytest.param(
device("cuda"),
"clip_vitl14_336",
pre_proc_pic_clip_vitl14_336,
pre_proc_text_clip_clip_vitl14_clip_vitl14_336,
pre_extracted_feature_img_clip_vitl14_336,
pre_extracted_feature_text_clip_vitl14_336,
simularity_clip_vitl14_336,
sorted_clip_vitl14_336,
marks=pytest.mark.skipif(
gpu_is_not_available, reason="gpu_is_not_availible"
),
),
],
)
def test_parsing_images(
pre_multimodal_device,
pre_model,
pre_proc_pic,
pre_proc_text,
pre_extracted_feature_img,
pre_extracted_feature_text,
pre_simularity,
pre_sorted,
):
ms.MultimodalSearch.multimodal_device = pre_multimodal_device
(
model,
vis_processor,
txt_processor,
image_keys,
image_names,
features_image_stacked,
) = ms.MultimodalSearch.parsing_images(testdict, pre_model)
for i, num in zip(range(10), features_image_stacked[0, 10:20].tolist()):
assert (
math.isclose(num, pre_extracted_feature_img[i], rel_tol=related_error)
is True
)
test_pic = Image.open(testdict["IMG_2746"]["filename"]).convert("RGB")
test_querry = (
"The bird sat on a tree located at the intersection of 23rd and 43rd streets."
)
processed_pic = (
vis_processor["eval"](test_pic).unsqueeze(0).to(pre_multimodal_device)
)
processed_text = txt_processor["eval"](test_querry)
for i, num in zip(range(10), processed_pic[0, 0, 0, 25:35].tolist()):
assert math.isclose(num, pre_proc_pic[i], rel_tol=related_error) is True
assert processed_text == pre_proc_text
search_query = [
{"text_input": test_querry},
{"image": testdict["IMG_2746"]["filename"]},
]
multi_features_stacked = ms.MultimodalSearch.querys_processing(
testdict, search_query, model, txt_processor, vis_processor, pre_model
)
for i, num in zip(range(10), multi_features_stacked[0, 10:20].tolist()):
assert (
math.isclose(num, pre_extracted_feature_text[i], rel_tol=related_error)
is True
)
for i, num in zip(range(10), multi_features_stacked[1, 10:20].tolist()):
assert (
math.isclose(num, pre_extracted_feature_img[i], rel_tol=related_error)
is True
)
search_query2 = [
{"text_input": "A bus"},
{"image": "../misinformation/test/data/IMG_3758.png"},
]
similarity, sorted_list = ms.MultimodalSearch.multimodal_search(
testdict,
model,
vis_processor,
txt_processor,
pre_model,
image_keys,
features_image_stacked,
search_query2,
)
for i, num in zip(range(12), similarity.tolist()):
for j, num2 in zip(range(len(num)), num):
assert (
math.isclose(num2, pre_simularity[i][j], rel_tol=100 * related_error)
is True
)
for i, num in zip(range(2), sorted_list):
for j, num2 in zip(range(2), num):
assert num2 == pre_sorted[i][j]
del model, vis_processor, txt_processor
cuda.empty_cache()

Просмотреть файл

@ -6,8 +6,8 @@ import misinformation.objects_cvlib as ob_cvlib
OBJECT_1 = "cell phone"
OBJECT_2 = "motorcycle"
OBJECT_3 = "traffic light"
TEST_IMAGE_1 = "./test/data/IMG_2809.png"
JSON_1 = "./test/data/example_objects_cvlib.json"
TEST_IMAGE_1 = "IMG_2809.png"
JSON_1 = "example_objects_cvlib.json"
@pytest.fixture()
@ -25,11 +25,11 @@ def test_objects_from_cvlib(default_objects):
assert str(objects) == str(out_objects)
def test_analyse_image_cvlib():
mydict = {"filename": TEST_IMAGE_1}
def test_analyse_image_cvlib(get_path):
mydict = {"filename": get_path + TEST_IMAGE_1}
ob_cvlib.ObjectCVLib().analyse_image(mydict)
with open(JSON_1, "r") as file:
with open(get_path + JSON_1, "r") as file:
out_dict = json.load(file)
for key in mydict.keys():
assert mydict[key] == out_dict[key]
@ -54,37 +54,37 @@ def test_init_default_objects():
assert init_objects[obj] == "no"
def test_analyse_image_from_file_cvlib():
file_path = TEST_IMAGE_1
objs = ob_cvlib.ObjectCVLib().analyse_image_from_file(file_path)
def test_analyse_image_from_file_cvlib(get_path):
file_path = get_path + TEST_IMAGE_1
objs = ob_cvlib.ObjectCVLib().analyse_image_from_file(get_path + file_path)
with open(JSON_1, "r") as file:
with open(get_path + JSON_1, "r") as file:
out_dict = json.load(file)
for key in objs.keys():
assert objs[key] == out_dict[key]
def test_detect_objects_cvlib():
file_path = TEST_IMAGE_1
def test_detect_objects_cvlib(get_path):
file_path = get_path + TEST_IMAGE_1
objs = ob_cvlib.ObjectCVLib().detect_objects_cvlib(file_path)
with open(JSON_1, "r") as file:
with open(get_path + JSON_1, "r") as file:
out_dict = json.load(file)
for key in objs.keys():
assert objs[key] == out_dict[key]
def test_set_keys(default_objects):
mydict = {"filename": TEST_IMAGE_1}
def test_set_keys(default_objects, get_path):
mydict = {"filename": get_path + TEST_IMAGE_1}
key_objs = ob.ObjectDetector(mydict).set_keys()
assert str(default_objects) == str(key_objs)
def test_analyse_image():
mydict = {"filename": TEST_IMAGE_1}
def test_analyse_image(get_path):
mydict = {"filename": get_path + TEST_IMAGE_1}
ob.ObjectDetector.set_client_to_cvlib()
ob.ObjectDetector(mydict).analyse_image()
with open(JSON_1, "r") as file:
with open(get_path + JSON_1, "r") as file:
out_dict = json.load(file)
assert str(mydict) == str(out_dict)

166
misinformation/test/test_summary.py Обычный файл
Просмотреть файл

@ -0,0 +1,166 @@
import os
from torch import device, cuda
from lavis.models import load_model_and_preprocess
import misinformation.summary as sm
images = [
"./test/data/d755771b-225e-432f-802e-fb8dc850fff7.png",
"./test/data/IMG_2746.png",
"./test/data/IMG_2750.png",
"./test/data/IMG_2805.png",
"./test/data/IMG_2806.png",
"./test/data/IMG_2807.png",
"./test/data/IMG_2808.png",
"./test/data/IMG_2809.png",
"./test/data/IMG_3755.jpg",
"./test/data/IMG_3756.jpg",
"./test/data/IMG_3757.jpg",
"./test/data/pic1.png",
]
def test_analyse_image():
mydict = {}
for img_path in images:
id_ = os.path.splitext(os.path.basename(img_path))[0]
mydict[id_] = {"filename": img_path}
for key in mydict:
mydict[key] = sm.SummaryDetector(mydict[key]).analyse_image()
keys = list(mydict.keys())
assert len(mydict) == 12
for key in keys:
assert len(mydict[key]["3_non-deterministic summary"]) == 3
const_image_summary_list = [
"a river running through a city next to tall buildings",
"a crowd of people standing on top of a tennis court",
"a crowd of people standing on top of a field",
"a room with a desk and a chair",
"a table with plastic containers on top of it",
"a view of a city with mountains in the background",
"a view of a city street from a window",
"a busy city street with cars and pedestrians",
"a close up of an open book with writing on it",
"a book that is open on a table",
"a yellow book with green lettering on it",
"a person running on a beach near a rock formation",
]
for i in range(len(const_image_summary_list)):
assert mydict[keys[i]]["const_image_summary"] == const_image_summary_list[i]
del sm.SummaryDetector.summary_model, sm.SummaryDetector.summary_vis_processors
cuda.empty_cache()
summary_device = device("cuda" if cuda.is_available() else "cpu")
summary_model, summary_vis_processors, _ = load_model_and_preprocess(
name="blip_caption",
model_type="base_coco",
is_eval=True,
device=summary_device,
)
for key in mydict:
mydict[key] = sm.SummaryDetector(mydict[key]).analyse_image(
summary_model, summary_vis_processors
)
keys = list(mydict.keys())
assert len(mydict) == 12
for key in keys:
assert len(mydict[key]["3_non-deterministic summary"]) == 3
const_image_summary_list2 = [
"a river running through a city next to tall buildings",
"a crowd of people standing on top of a tennis court",
"a crowd of people standing on top of a field",
"a room with a desk and a chair",
"a table with plastic containers on top of it",
"a view of a city with mountains in the background",
"a view of a city street from a window",
"a busy city street with cars and pedestrians",
"a close up of an open book with writing on it",
"a book that is open on a table",
"a yellow book with green lettering on it",
"a person running on a beach near a rock formation",
]
for i in range(len(const_image_summary_list2)):
assert mydict[keys[i]]["const_image_summary"] == const_image_summary_list2[i]
del summary_model, summary_vis_processors
cuda.empty_cache()
summary_model, summary_vis_processors, _ = load_model_and_preprocess(
name="blip_caption",
model_type="large_coco",
is_eval=True,
device=summary_device,
)
for key in mydict:
mydict[key] = sm.SummaryDetector(mydict[key]).analyse_image(
summary_model, summary_vis_processors
)
keys = list(mydict.keys())
assert len(mydict) == 12
for key in keys:
assert len(mydict[key]["3_non-deterministic summary"]) == 3
const_image_summary_list3 = [
"a river running through a town next to tall buildings",
"a crowd of people standing on top of a track",
"a group of people standing on top of a track",
"a desk and chair in a small room",
"a table that has some chairs on top of it",
"a view of a city from a window of a building",
"a view of a city from a window",
"a city street filled with lots of traffic",
"an open book with german text on it",
"a close up of a book on a table",
"a book with a green cover on a table",
"a person running on a beach near the ocean",
]
for i in range(len(const_image_summary_list2)):
assert mydict[keys[i]]["const_image_summary"] == const_image_summary_list3[i]
def test_analyse_questions():
mydict = {}
for img_path in images:
id_ = os.path.splitext(os.path.basename(img_path))[0]
mydict[id_] = {"filename": img_path}
list_of_questions = [
"How many persons on the picture?",
"What happends on the picture?",
]
for key in mydict:
mydict[key] = sm.SummaryDetector(mydict[key]).analyse_questions(
list_of_questions
)
keys = list(mydict.keys())
assert len(mydict) == 12
list_of_questions_ans = [2, 100, "many", 0, 0, "none", "two", 5, 0, 0, 0, 1]
list_of_questions_ans2 = [
"flood",
"festival",
"people are flying kites",
"no one's home",
"chair is being moved",
"traffic jam",
"day time",
"traffic jam",
"nothing",
"nothing",
"nothing",
"running",
]
for i in range(len(list_of_questions_ans)):
assert mydict[keys[i]][list_of_questions[1]] == str(list_of_questions_ans2[i])

Просмотреть файл

@ -2,31 +2,30 @@ import os
import pytest
import spacy
import misinformation.text as tt
import misinformation
import pandas as pd
TESTDICT = {
"IMG_3755": {
"filename": "./test/data/IMG_3755.jpg",
},
"IMG_3756": {
"filename": "./test/data/IMG_3756.jpg",
},
"IMG_3757": {
"filename": "./test/data/IMG_3757.jpg",
},
}
LANGUAGES = ["de", "om", "en"]
os.environ[
"GOOGLE_APPLICATION_CREDENTIALS"
] = "../data/seismic-bonfire-329406-412821a70264.json"
def test_TextDetector():
for item in TESTDICT:
test_obj = tt.TextDetector(TESTDICT[item])
@pytest.fixture
def set_testdict(get_path):
testdict = {
"IMG_3755": {
"filename": get_path + "IMG_3755.jpg",
},
"IMG_3756": {
"filename": get_path + "IMG_3756.jpg",
},
"IMG_3757": {
"filename": get_path + "IMG_3757.jpg",
},
}
return testdict
LANGUAGES = ["de", "en", "en"]
def test_TextDetector(set_testdict):
for item in set_testdict:
test_obj = tt.TextDetector(set_testdict[item])
assert test_obj.subdict["text"] is None
assert test_obj.subdict["text_language"] is None
assert test_obj.subdict["text_english"] is None
@ -34,30 +33,30 @@ def test_TextDetector():
@pytest.mark.gcv
def test_analyse_image():
for item in TESTDICT:
test_obj = tt.TextDetector(TESTDICT[item])
def test_analyse_image(set_testdict, set_environ):
for item in set_testdict:
test_obj = tt.TextDetector(set_testdict[item])
test_obj.analyse_image()
test_obj = tt.TextDetector(TESTDICT[item], analyse_text=True)
test_obj = tt.TextDetector(set_testdict[item], analyse_text=True)
test_obj.analyse_image()
@pytest.mark.gcv
def test_get_text_from_image():
for item in TESTDICT:
test_obj = tt.TextDetector(TESTDICT[item])
def test_get_text_from_image(set_testdict, get_path, set_environ):
for item in set_testdict:
test_obj = tt.TextDetector(set_testdict[item])
test_obj.get_text_from_image()
ref_file = "./test/data/text_" + item + ".txt"
ref_file = get_path + "text_" + item + ".txt"
with open(ref_file, "r", encoding="utf8") as file:
reference_text = file.read()
assert test_obj.subdict["text"] == reference_text
def test_translate_text():
for item, lang in zip(TESTDICT, LANGUAGES):
test_obj = tt.TextDetector(TESTDICT[item])
ref_file = "./test/data/text_" + item + ".txt"
trans_file = "./test/data/text_translated_" + item + ".txt"
def test_translate_text(set_testdict, get_path):
for item, lang in zip(set_testdict, LANGUAGES):
test_obj = tt.TextDetector(set_testdict[item])
ref_file = get_path + "text_" + item + ".txt"
trans_file = get_path + "text_translated_" + item + ".txt"
with open(ref_file, "r", encoding="utf8") as file:
reference_text = file.read()
with open(trans_file, "r", encoding="utf8") as file:
@ -77,9 +76,9 @@ def test_remove_linebreaks():
assert test_obj.subdict["text_english"] == "This is another test."
def test_run_spacy():
test_obj = tt.TextDetector(TESTDICT["IMG_3755"], analyse_text=True)
ref_file = "./test/data/text_IMG_3755.txt"
def test_run_spacy(set_testdict, get_path):
test_obj = tt.TextDetector(set_testdict["IMG_3755"], analyse_text=True)
ref_file = get_path + "text_IMG_3755.txt"
with open(ref_file, "r") as file:
reference_text = file.read()
test_obj.subdict["text_english"] = reference_text
@ -87,10 +86,10 @@ def test_run_spacy():
assert isinstance(test_obj.doc, spacy.tokens.doc.Doc)
def test_clean_text():
def test_clean_text(set_testdict):
nlp = spacy.load("en_core_web_md")
doc = nlp("I like cats and fjejg")
test_obj = tt.TextDetector(TESTDICT["IMG_3755"])
test_obj = tt.TextDetector(set_testdict["IMG_3755"])
test_obj.doc = doc
test_obj.clean_text()
result = "I like cats and"
@ -117,30 +116,35 @@ def test_sentiment_analysis():
assert test_obj.subdict["subjectivity"] == 0.6
def test_PostprocessText():
def test_PostprocessText(set_testdict, get_path):
reference_dict = "THE\nALGEBRAIC\nEIGENVALUE\nPROBLEM\nDOM\nNVS TIO\nMINA\nMonographs\non Numerical Analysis\nJ.. H. WILKINSON"
reference_df = "Mathematische Formelsammlung\nfür Ingenieure und Naturwissenschaftler\nMit zahlreichen Abbildungen und Rechenbeispielen\nund einer ausführlichen Integraltafel\n3., verbesserte Auflage"
obj = tt.PostprocessText(mydict=TESTDICT)
# make sure test works on windows where end-of-line character is \r\n
img_numbers = ["IMG_3755", "IMG_3756", "IMG_3757"]
for image_ref in img_numbers:
ref_file = get_path + "text_" + image_ref + ".txt"
with open(ref_file, "r") as file:
reference_text = file.read()
set_testdict[image_ref]["text_english"] = reference_text
obj = tt.PostprocessText(mydict=set_testdict)
test_dict = obj.list_text_english[2].replace("\r", "")
assert test_dict == reference_dict
for key in TESTDICT.keys():
TESTDICT[key].pop("text_english")
for key in set_testdict.keys():
set_testdict[key].pop("text_english")
with pytest.raises(ValueError):
tt.PostprocessText(mydict=TESTDICT)
obj = tt.PostprocessText(use_csv=True, csv_path="./test/data/test_data_out.csv")
tt.PostprocessText(mydict=set_testdict)
obj = tt.PostprocessText(use_csv=True, csv_path=get_path + "test_data_out.csv")
# make sure test works on windows where end-of-line character is \r\n
test_df = obj.list_text_english[0].replace("\r", "")
assert test_df == reference_df
with pytest.raises(ValueError):
tt.PostprocessText(use_csv=True, csv_path="./test/data/test_data_out_nokey.csv")
tt.PostprocessText(use_csv=True, csv_path=get_path + "test_data_out_nokey.csv")
with pytest.raises(ValueError):
tt.PostprocessText()
def test_analyse_topic():
def test_analyse_topic(get_path):
_, topic_df, most_frequent_topics = tt.PostprocessText(
use_csv=True, csv_path="./test/data/topic_analysis_test.csv"
use_csv=True, csv_path=get_path + "topic_analysis_test.csv"
).analyse_topic()
# since this is not deterministic we cannot be sure we get the same result twice
assert len(topic_df) == 2

Просмотреть файл

@ -3,38 +3,36 @@ import pandas as pd
import misinformation.utils as ut
def test_find_files():
result = ut.find_files(
path="./test/data/", pattern="*.png", recursive=True, limit=10
)
def test_find_files(get_path):
result = ut.find_files(path=get_path, pattern="*.png", recursive=True, limit=10)
assert len(result) > 0
def test_initialize_dict():
def test_initialize_dict(get_path):
result = [
"./test/data/image_faces.jpg",
"./test/data/image_objects.jpg",
]
mydict = ut.initialize_dict(result)
with open("./test/data/example_utils_init_dict.json", "r") as file:
with open(get_path + "example_utils_init_dict.json", "r") as file:
out_dict = json.load(file)
assert mydict == out_dict
def test_append_data_to_dict():
with open("./test/data/example_append_data_to_dict_in.json", "r") as file:
def test_append_data_to_dict(get_path):
with open(get_path + "example_append_data_to_dict_in.json", "r") as file:
mydict = json.load(file)
outdict = ut.append_data_to_dict(mydict)
print(outdict)
with open("./test/data/example_append_data_to_dict_out.json", "r") as file:
with open(get_path + "example_append_data_to_dict_out.json", "r") as file:
example_outdict = json.load(file)
assert outdict == example_outdict
def test_dump_df():
with open("./test/data/example_append_data_to_dict_out.json", "r") as file:
def test_dump_df(get_path):
with open(get_path + "example_append_data_to_dict_out.json", "r") as file:
outdict = json.load(file)
df = ut.dump_df(outdict)
out_df = pd.read_csv("./test/data/example_dump_df.csv", index_col=[0])
out_df = pd.read_csv(get_path + "example_dump_df.csv", index_col=[0])
pd.testing.assert_frame_equal(df, out_df)

20
notebooks/facial_expressions.ipynb сгенерированный
Просмотреть файл

@ -46,7 +46,9 @@
"metadata": {},
"outputs": [],
"source": [
"import misinformation"
"import misinformation\n",
"from misinformation import utils as mutils\n",
"from misinformation import display as mdisplay"
]
},
{
@ -64,7 +66,7 @@
"metadata": {},
"outputs": [],
"source": [
"images = misinformation.find_files(\n",
"images = mutils.find_files(\n",
" path=\"drive/MyDrive/misinformation-data/\",\n",
" limit=1000,\n",
")"
@ -85,7 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"?misinformation.find_files"
"?mutils.find_files"
]
},
{
@ -103,7 +105,7 @@
"metadata": {},
"outputs": [],
"source": [
"mydict = misinformation.utils.initialize_dict(images)"
"mydict = mutils.initialize_dict(images[0:4])"
]
},
{
@ -122,7 +124,7 @@
"metadata": {},
"outputs": [],
"source": [
"misinformation.explore_analysis(mydict, identify=\"faces\")"
"mdisplay.explore_analysis(mydict, identify=\"faces\")"
]
},
{
@ -159,8 +161,8 @@
"metadata": {},
"outputs": [],
"source": [
"outdict = misinformation.utils.append_data_to_dict(mydict)\n",
"df = misinformation.utils.dump_df(outdict)"
"outdict = mutils.append_data_to_dict(mydict)\n",
"df = mutils.dump_df(outdict)"
]
},
{
@ -210,7 +212,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "misinf",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -224,7 +226,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]"
"version": "3.9.5"
},
"vscode": {
"interpreter": {

14
notebooks/get-text-from-image.ipynb сгенерированный
Просмотреть файл

@ -42,6 +42,8 @@
"import os\n",
"from IPython.display import Image, display\n",
"import misinformation\n",
"from misinformation import utils as mutils\n",
"from misinformation import display as mdisplay\n",
"import tensorflow as tf\n",
"\n",
"print(tf.config.list_physical_devices(\"GPU\"))"
@ -66,7 +68,7 @@
"metadata": {},
"outputs": [],
"source": [
"images = misinformation.find_files(path=\"../data/all/\", limit=1000)"
"images = mutils.find_files(path=\"../data/all/\", limit=1000)"
]
},
{
@ -87,7 +89,7 @@
"metadata": {},
"outputs": [],
"source": [
"mydict = misinformation.utils.initialize_dict(images[0:3])"
"mydict = mutils.initialize_dict(images[0:3])"
]
},
{
@ -126,7 +128,7 @@
"metadata": {},
"outputs": [],
"source": [
"misinformation.explore_analysis(mydict, identify=\"text-on-image\")"
"mdisplay.explore_analysis(mydict, identify=\"text-on-image\")"
]
},
{
@ -166,8 +168,8 @@
"metadata": {},
"outputs": [],
"source": [
"outdict = misinformation.utils.append_data_to_dict(mydict)\n",
"df = misinformation.utils.dump_df(outdict)"
"outdict = mutils.append_data_to_dict(mydict)\n",
"df = mutils.dump_df(outdict)"
]
},
{
@ -347,7 +349,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.5"
},
"vscode": {
"interpreter": {

292
notebooks/image_summary.ipynb сгенерированный Обычный файл
Просмотреть файл

@ -0,0 +1,292 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Image summary and visual question answering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebooks shows some preliminary work on Image Captioning and Visual question answering with lavis. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import misinformation\n",
"from misinformation import utils as mutils\n",
"from misinformation import display as mdisplay\n",
"import misinformation.summary as sm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set an image path as input file path."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"images = mutils.find_files(\n",
" path=\"../misinformation/test/data/\",\n",
" limit=1000,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mydict = mutils.initialize_dict(images[0:10])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mydict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create captions for images and directly write to csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here you can choose between two models: \"base\" or \"large\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"summary_model, summary_vis_processors = sm.SummaryDetector.load_model(mydict, \"base\")\n",
"# summary_model, summary_vis_processors = mutils.load_model(\"large\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for key in mydict:\n",
" mydict[key] = sm.SummaryDetector(mydict[key]).analyse_image(\n",
" summary_model, summary_vis_processors\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"Convert the dictionary of dictionarys into a dictionary with lists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"outdict = mutils.append_data_to_dict(mydict)\n",
"df = mutils.dump_df(outdict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Write the csv file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"./data_out.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Manually inspect the summaries\n",
"\n",
"To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment, so please be patient. If you are sure of what you are doing.\n",
"\n",
"`const_image_summary` - the permanent summarys, which does not change from run to run (analyse_image).\n",
"\n",
"`3_non-deterministic summary` - 3 different summarys examples that change from run to run (analyse_image). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mdisplay.explore_analysis(mydict, identify=\"summary\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate answers to free-form questions about images written in natural language. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set the list of questions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"list_of_questions = [\n",
" \"How many persons on the picture?\",\n",
" \"Are there any politicians in the picture?\",\n",
" \"Does the picture show something from medicine?\",\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for key in mydict:\n",
" mydict[key] = sm.SummaryDetector(mydict[key]).analyse_questions(list_of_questions)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mdisplay.explore_analysis(mydict, identify=\"summary\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Convert the dictionary of dictionarys into a dictionary with lists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"outdict2 = mutils.append_data_to_dict(mydict)\n",
"df2 = mutils.dump_df(outdict2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df2.head(10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df2.to_csv(\"./data_out2.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0"
},
"vscode": {
"interpreter": {
"hash": "f1142466f556ab37fe2d38e2897a16796906208adb09fea90ba58bdf8a56f0ba"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}

336
notebooks/multimodal_search.ipynb сгенерированный Обычный файл
Просмотреть файл

@ -0,0 +1,336 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "22df2297-0629-45aa-b88c-6c61f1544db6",
"metadata": {},
"source": [
"# Image Multimodal Search"
]
},
{
"cell_type": "markdown",
"id": "9eeeb302-296e-48dc-86c7-254aa02f2b3a",
"metadata": {},
"source": [
"This notebooks shows some preliminary work on Image Multimodal Search with lavis library. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f10ad6c9-b1a0-4043-8c5d-ed660d77be37",
"metadata": {},
"outputs": [],
"source": [
"import misinformation\n",
"import misinformation.multimodal_search as ms"
]
},
{
"cell_type": "markdown",
"id": "acf08b44-3ea6-44cd-926d-15c0fd9f39e0",
"metadata": {},
"source": [
"Set an image path as input file path."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d3fe589-ff3c-4575-b8f5-650db85596bc",
"metadata": {},
"outputs": [],
"source": [
"images = misinformation.utils.find_files(\n",
" path=\"../data/images/\",\n",
" limit=1000,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adf3db21-1f8b-4d44-bbef-ef0acf4623a0",
"metadata": {},
"outputs": [],
"source": [
"mydict = misinformation.utils.initialize_dict(images)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d98b6227-886d-41b8-a377-896dd8ab3c2a",
"metadata": {},
"outputs": [],
"source": [
"mydict"
]
},
{
"cell_type": "markdown",
"id": "987540a8-d800-4c70-a76b-7bfabaf123fa",
"metadata": {},
"source": [
"## Indexing and extracting features from images in selected folder"
]
},
{
"cell_type": "markdown",
"id": "66d6ede4-00bc-4aeb-9a36-e52d7de33fe5",
"metadata": {},
"source": [
"You can choose one of the following models: blip, blip2, albef, clip_base, clip_vitl14, clip_vitl14_336"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7bbca1f0-d4b0-43cd-8e05-ee39d37c328e",
"metadata": {},
"outputs": [],
"source": [
"model_type = \"blip\"\n",
"# model_type = \"blip2\"\n",
"# model_type = \"albef\"\n",
"# model_type = \"clip_base\"\n",
"# model_type = \"clip_vitl14\"\n",
"# model_type = \"clip_vitl14_336\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca095404-57d0-4f5d-aeb0-38c232252b17",
"metadata": {},
"outputs": [],
"source": [
"(\n",
" model,\n",
" vis_processors,\n",
" txt_processors,\n",
" image_keys,\n",
" image_names,\n",
" features_image_stacked,\n",
") = ms.MultimodalSearch.parsing_images(mydict, model_type)"
]
},
{
"cell_type": "markdown",
"id": "9ff8a894-566b-4c4f-acca-21c50b5b1f52",
"metadata": {},
"source": [
"The tensors of all images `features_image_stacked` was saved in `<Number_of_images>_<model_name>_saved_features_image.pt`. If you run it once for current model and current set of images you do not need to repeat it again. Instead you can load this features with the command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56c6d488-f093-4661-835a-5c73a329c874",
"metadata": {},
"outputs": [],
"source": [
"# (\n",
"# model,\n",
"# vis_processors,\n",
"# txt_processors,\n",
"# image_keys,\n",
"# image_names,\n",
"# features_image_stacked,\n",
"# ) = ms.MultimodalSearch.parsing_images(mydict, model_type,\"18_clip_base_saved_features_image.pt\")"
]
},
{
"cell_type": "markdown",
"id": "309923c1-d6f8-4424-8fca-bde5f3a98b38",
"metadata": {},
"source": [
"Here we already processed our image folder with 18 images with `clip_base` model. So you need just write the name `18_clip_base_saved_features_image.pt` of the saved file that consists of tensors of all images as a 3rd argument to the previous function. "
]
},
{
"cell_type": "markdown",
"id": "162a52e8-6652-4897-b92e-645cab07aaef",
"metadata": {},
"source": [
"Next, you need to form search queries. You can search either by image or by text. You can search for a single query, or you can search for several queries at once, the computational time should not be much different. The format of the queries is as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4196a52-d01e-42e4-8674-5712f7d6f792",
"metadata": {},
"outputs": [],
"source": [
"search_query3 = [\n",
" {\"text_input\": \"politician press conference\"},\n",
" {\"text_input\": \"a world map\"},\n",
" {\"image\": \"../data/haos.png\"},\n",
" {\"image\": \"../data/image-34098-800.png\"},\n",
" {\"image\": \"../data/LeonPresserMorocco20032015_600.png\"},\n",
" {\"text_input\": \"a dog\"},\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "8bcf3127-3dfd-4ff4-b9e7-a043099b1418",
"metadata": {},
"source": [
"You can filter your results in 3 different ways:\n",
"- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n",
"- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n",
"- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_simularity_value - best_simularity_value_in_current_search)/best_simularity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f7dc52f-7ee9-4590-96b7-e0d9d3b82378",
"metadata": {},
"outputs": [],
"source": [
"similarity = ms.MultimodalSearch.multimodal_search(\n",
" mydict,\n",
" model,\n",
" vis_processors,\n",
" txt_processors,\n",
" model_type,\n",
" image_keys,\n",
" features_image_stacked,\n",
" search_query3,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e1cf7e46-0c2c-4fb2-b89a-ef585ccb9339",
"metadata": {},
"source": [
"After launching `multimodal_search` function, the results of each query will be added to the source dictionary. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ad74b21-6187-4a58-9ed8-fd3e80f5a4ed",
"metadata": {},
"outputs": [],
"source": [
"mydict[\"100127S_ara\"]"
]
},
{
"cell_type": "markdown",
"id": "cd3ee120-8561-482b-a76a-e8f996783325",
"metadata": {},
"source": [
"A special function was written to present the search results conveniently. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4324e4fd-e9aa-4933-bb12-074d54e0c510",
"metadata": {},
"outputs": [],
"source": [
"ms.MultimodalSearch.show_results(mydict, search_query3[4])"
]
},
{
"cell_type": "markdown",
"id": "d86ab96b-1907-4b7f-a78e-3983b516d781",
"metadata": {
"tags": []
},
"source": [
"## Save searhing results to csv"
]
},
{
"cell_type": "markdown",
"id": "4bdbc4d4-695d-4751-ab7c-d2d98e2917d7",
"metadata": {
"tags": []
},
"source": [
"Convert the dictionary of dictionarys into a dictionary with lists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c6ddd83-bc87-48f2-a8d6-1bd3f4201ff7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"outdict = misinformation.utils.append_data_to_dict(mydict)\n",
"df = misinformation.utils.dump_df(outdict)"
]
},
{
"cell_type": "markdown",
"id": "ea2675d5-604c-45e7-86d2-080b1f4559a0",
"metadata": {
"tags": []
},
"source": [
"Check the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e78646d6-80be-4d3e-8123-3360957bcaa8",
"metadata": {},
"outputs": [],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"id": "05546d99-afab-4565-8f30-f14e1426abcf",
"metadata": {},
"source": [
"Write the csv file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "185f7dde-20dc-44d8-9ab0-de41f9b5734d",
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"./data_out.csv\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

14
notebooks/objects_expression.ipynb сгенерированный
Просмотреть файл

@ -21,6 +21,8 @@
"outputs": [],
"source": [
"import misinformation\n",
"from misinformation import utils as mutils\n",
"from misinformation import display as mdisplay\n",
"import misinformation.objects as ob"
]
},
@ -37,7 +39,7 @@
"metadata": {},
"outputs": [],
"source": [
"images = misinformation.find_files(\n",
"images = mutils.find_files(\n",
" path=\"../data/images-little-text/\",\n",
" limit=1000,\n",
")"
@ -49,7 +51,7 @@
"metadata": {},
"outputs": [],
"source": [
"mydict = misinformation.utils.initialize_dict(images)"
"mydict = mutils.initialize_dict(images)"
]
},
{
@ -91,8 +93,8 @@
"metadata": {},
"outputs": [],
"source": [
"outdict = misinformation.utils.append_data_to_dict(mydict)\n",
"df = misinformation.utils.dump_df(outdict)"
"outdict = mutils.append_data_to_dict(mydict)\n",
"df = mutils.dump_df(outdict)"
]
},
{
@ -142,7 +144,7 @@
"metadata": {},
"outputs": [],
"source": [
"misinformation.explore_analysis(mydict, identify=\"objects\")"
"mdisplay.explore_analysis(mydict, identify=\"objects\")"
]
},
{
@ -213,7 +215,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0"
"version": "3.9.5"
},
"vscode": {
"interpreter": {

Просмотреть файл

@ -12,6 +12,7 @@ readme = "README.md"
maintainers = [
{ name = "Inga Ulusoy", email = "ssc@iwr.uni-heidelberg.de" },
{ name = "Dominic Kempf", email = "ssc@iwr.uni-heidelberg.de" },
{ name = "Petr Andriushchenko", email = "ssc@iwr.uni-heidelberg.de" },
]
requires-python = ">=3.8"
license = { text = "MIT" }
@ -21,33 +22,34 @@ classifiers = [
"License :: OSI Approved :: MIT License",
]
dependencies = [
"google-cloud-vision",
"bertopic",
"cvlib",
"deepface <= 0.0.75",
"deepface @ git+https://github.com/iulusoy/deepface.git",
"googletrans==3.1.0a0",
"grpcio",
"importlib_metadata",
"ipython",
"ipywidgets",
"jupyterlab",
"matplotlib",
"numpy<=1.23.4",
"opencv_python",
"pandas",
"Pillow",
"pooch",
"protobuf",
"retina_face",
"setuptools",
"tensorflow",
"keras",
"openpyxl",
"pytest",
"pytest-cov",
"matplotlib",
"pytest",
"opencv-contrib-python",
"googletrans==3.1.0a0",
"requests",
"retina_face @ git+https://github.com/iulusoy/retinaface.git",
"salesforce-lavis @ git+https://github.com/iulusoy/LAVIS.git",
"spacy",
"jupyterlab",
"spacytextblob",
"tensorflow",
"textblob",
"bertopic",
"grpcio",
"pandas",
"torch",
"google-cloud-vision",
"setuptools",
"opencv-contrib-python",
]
[project.scripts]

Просмотреть файл

@ -23,6 +23,6 @@ jupyterlab
spacytextblob
textblob
git+https://github.com/sloria/TextBlob.git@dev
salesforce-lavis
bertopic
grpcio
pandas