Image summary and visual question answering

This notebooks shows how to generate image captions and use the visual question answering with LAVIS.

The first cell is only run on google colab and installs the ammico package.

After that, we can import ammico and read in the files given a folder path.

[1]:

# if running on google colab
# flake8-noqa-cell
import os

if "google.colab" in str(get_ipython()):
    # update python version
    # install setuptools
    # %pip install setuptools==61 -qqq
    # install ammico
    %pip install git+https://github.com/ssciwr/ammico.git -qqq
    # mount google drive for data and API key
    from google.colab import drive

    drive.mount("/content/drive")

[2]:

import ammico
from ammico import utils as mutils
from ammico import display as mdisplay
import ammico.summary as sm

[3]:

# Here you need to provide the path to your google drive folder
# or local folder containing the images
images = mutils.find_files(
    path="data/",
    limit=10,
)

[4]:

mydict = mutils.initialize_dict(images)

Create captions for images and directly write to csv

Here you can choose between two models: “base” or “large”. This will generate the caption for each image and directly put the results in a dataframe. This dataframe can be exported as a csv file.

The results are written into the columns const_image_summary - this will always be the same result (as always the same seed will be used). The column 3_non-deterministic_summary displays three different answers generated with different seeds, these are most likely different when you run the analysis again.

[5]:

obj = sm.SummaryDetector(mydict)
summary_model, summary_vis_processors = obj.load_model(model_type="base")
# summary_model, summary_vis_processors = mutils.load_model("large")

100%|██████████| 2.50G/2.50G [00:20<00:00, 130MB/s]
100%|██████████| 1.35G/1.35G [00:09<00:00, 154MB/s]

[6]:

for key in mydict:
    mydict[key] = sm.SummaryDetector(mydict[key]).analyse_image(
        summary_model=summary_model, summary_vis_processors=summary_vis_processors
    )

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 for key in mydict:
----> 2     mydict[key] = sm.SummaryDetector(mydict[key]).analyse_image(
      3         summary_model=summary_model, summary_vis_processors=summary_vis_processors
      4     )

TypeError: analyse_image() got an unexpected keyword argument 'summary_model'

Convert the dictionary of dictionarys into a dictionary with lists:

[7]:

outdict = mutils.append_data_to_dict(mydict)
df = mutils.dump_df(outdict)

Check the dataframe:

[8]:

df.head(10)

[8]:

	filename
0	102141_2_eng
1	102730_eng
2	106349S_por

Write the csv file:

[9]:

df.to_csv("data_out.csv")

Manually inspect the summaries

To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment, so please be patient. If you are sure of what you are doing.

const_image_summary - the permanent summarys, which does not change from run to run (analyse_image).

3_non-deterministic_summary - 3 different summarys examples that change from run to run (analyse_image).

[10]:

analysis_explorer = mdisplay.AnalysisExplorer(mydict, identify="summary")
analysis_explorer.run_server(port=8055)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 analysis_explorer = mdisplay.AnalysisExplorer(mydict, identify="summary")
      2 analysis_explorer.run_server(port=8055)

TypeError: __init__() got an unexpected keyword argument 'identify'

Generate answers to free-form questions about images written in natural language.

Set the list of questions as a list of strings:

[11]:

list_of_questions = [
    "How many persons on the picture?",
    "Are there any politicians in the picture?",
    "Does the picture show something from medicine?",
]

Explore the analysis using the interface:

[12]:

analysis_explorer = mdisplay.AnalysisExplorer(mydict, identify="summary")
analysis_explorer.run_server(port=8055)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 analysis_explorer = mdisplay.AnalysisExplorer(mydict, identify="summary")
      2 analysis_explorer.run_server(port=8055)

TypeError: __init__() got an unexpected keyword argument 'identify'

Or directly analyze for further processing

Instead of inspecting each of the images, you can also directly carry out the analysis and export the result into a csv. This may take a while depending on how many images you have loaded.

[13]:

for key in mydict:
    mydict[key] = sm.SummaryDetector(mydict[key]).analyse_questions(list_of_questions)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[13], line 2
      1 for key in mydict:
----> 2     mydict[key] = sm.SummaryDetector(mydict[key]).analyse_questions(list_of_questions)

File ~/work/AMMICO/AMMICO/ammico/summary.py:368, in SummaryDetector.analyse_questions(self, list_of_questions, consequential_questions)
    366 if len(list_of_questions) > 0:
    367     path = self.subdict["filename"]
--> 368     raw_image = Image.open(path).convert("RGB")
    369     image = (
    370         vis_processors["eval"](raw_image).unsqueeze(0).to(self.summary_device)
    371     )
    372     question_batch = []

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/PIL/Image.py:3243, in open(fp, mode, formats)
   3240     filename = fp
   3242 if filename:
-> 3243     fp = builtins.open(filename, "rb")
   3244     exclusive_fp = True
   3246 try:

FileNotFoundError: [Errno 2] No such file or directory: '102141_2_eng'

Convert to dataframe and write csv

These steps are required to convert the dictionary of dictionarys into a dictionary with lists, that can be converted into a pandas dataframe and exported to a csv file.

[14]:

outdict2 = mutils.append_data_to_dict(mydict)
df2 = mutils.dump_df(outdict2)

[15]:

df2.head(10)

[15]:

	filename
0	102141_2_eng
1	102730_eng
2	106349S_por

[16]:

df2.to_csv("data_out2.csv")

[ ]: