# Image summary and visual question answering

This notebooks shows some preliminary work on Image Captioning and Visual question answering with lavis. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:

In [1]:
import misinformation
import misinformation.summary as sm

2023-01-27 13:43:45.543761: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-27 13:43:45.940025: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/pandriushchenko/anaconda3/envs/misinfo/lib/python3.10/site-packages/cv2/../../lib64:
2023-01-27 13:43:45.940060: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/pandriushchenko/anaconda3/envs/mi

Set an image path as input file path.

In [2]:
images = misinformation.find_files(
    path="../data/images/",
    limit=1000,
)

In [3]:
mydict = misinformation.utils.initialize_dict(images)

In [4]:
mydict

{'100132S_ara': {'filename': '../data/images/100132S_ara.png'},
 '100447_ind': {'filename': '../data/images/100447_ind.png'},
 '100127S_ara': {'filename': '../data/images/100127S_ara.png'},
 '100134S_ara': {'filename': '../data/images/100134S_ara.png'},
 '109257_1_spa': {'filename': '../data/images/109257_1_spa.png'},
 '100130S_ara': {'filename': '../data/images/100130S_ara.png'},
 '100131S_ara': {'filename': '../data/images/100131S_ara.png'},
 '102135S_eng': {'filename': '../data/images/102135S_eng.png'},
 '102435S_2_eng': {'filename': '../data/images/102435S_2_eng.png'},
 '100368_asm': {'filename': '../data/images/100368_asm.png'},
 '100361_asm': {'filename': '../data/images/100361_asm.png'},
 '102141_1_eng': {'filename': '../data/images/102141_1_eng.png'},
 '106958S_por': {'filename': '../data/images/106958S_por.png'},
 '102134S_eng': {'filename': '../data/images/102134S_eng.png'},
 '102133S_eng': {'filename': '../data/images/102133S_eng.png'},
 '100450_ind': {'filename': '../data/i

## Create captions for images and directly write to csv

In [5]:
for key in mydict:
    mydict[key] = sm.SummaryDetector(mydict[key]).analyse_image()

Convert the dictionary of dictionarys into a dictionary with lists:

In [6]:
outdict = misinformation.utils.append_data_to_dict(mydict)
df = misinformation.utils.dump_df(outdict)

Check the dataframe:

In [7]:
df.head(10)

Unnamed: 0,filename,const_image_summary,3_non-deterministic summary
0,../data/images/100132S_ara.png,a white car parked in front of a building cove...,[someone has wrapped up a large plastic bag ov...
1,../data/images/100447_ind.png,a woman drinking from a bottle while standing ...,[a woman drinks out of a bottle and stands nex...
2,../data/images/100127S_ara.png,a map of the world with arabic writing,"[a map of the world with a message in arabic, ..."
3,../data/images/100134S_ara.png,a woman is standing in front of a sign,"[two women walking and talking to each other, ..."
4,../data/images/109257_1_spa.png,a man in a suit and tie making a face,"[a man is smiling and making a funny face, man..."
5,../data/images/100130S_ara.png,a group of people walking down a street next t...,[two people on the street in front of a big tr...
6,../data/images/100131S_ara.png,a group of people standing in front of a tv,[the president is addressing his nation of the...
7,../data/images/102135S_eng.png,a woman standing in front of a store filled wi...,[people in a supermarket standing in front of ...
8,../data/images/102435S_2_eng.png,a man in a suit and glasses is talking,[the man is speaking about his favorite tv sho...
9,../data/images/100368_asm.png,a group of people standing next to each other,"[people doing a job next to a line of men, men..."


Write the csv file:

In [8]:
df.to_csv("./data_out.csv")

## Manually inspect the summaries

To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment, so please be patient. If you are sure of what you are doing.

`const_image_summary` - the permanent summarys, which does not change from run to run (analyse_image).

`3_non-deterministic summary` - 3 different summarys examples that change from run to run (analyse_image). 

In [9]:
misinformation.explore_analysis(mydict, identify="summary")

HBox(children=(Select(layout=Layout(width='20%'), options=('100132S_ara', '100447_ind', '100127S_ara', '100134…

## Generate answers to free-form questions about images written in natural language. 

Set the list of questions

In [10]:
list_of_questions = [
"How many persons on the picture?",
"Are there any politicians in the picture?",
"Does the picture show something from medicine?",    
]

In [11]:
for key in mydict:
    mydict[key] = sm.SummaryDetector(mydict[key]).analyse_questions(list_of_questions)

In [12]:
misinformation.explore_analysis(mydict, identify="summary")

HBox(children=(Select(layout=Layout(width='20%'), options=('100132S_ara', '100447_ind', '100127S_ara', '100134…

Convert the dictionary of dictionarys into a dictionary with lists:

In [13]:
outdict2 = misinformation.utils.append_data_to_dict(mydict)
df2 = misinformation.utils.dump_df(outdict2)

In [14]:
df2.head(10)

Unnamed: 0,filename,const_image_summary,3_non-deterministic summary,how many persons on the picture?,are there any politicians in the picture?,does the picture show something from medicine?
0,../data/images/100132S_ara.png,a white car parked in front of a building cove...,[the man is sitting on a car near a large bann...,1,no,no
1,../data/images/100447_ind.png,,,2,no,yes
2,../data/images/100127S_ara.png,,,0,no,no
3,../data/images/100134S_ara.png,,,2,no,yes
4,../data/images/109257_1_spa.png,,,1,yes,no
5,../data/images/100130S_ara.png,,,3,no,no
6,../data/images/100131S_ara.png,,,many,yes,no
7,../data/images/102135S_eng.png,,,6,no,no
8,../data/images/102435S_2_eng.png,,,1,yes,no
9,../data/images/100368_asm.png,,,15,yes,no


In [15]:
df2.to_csv("./data_out2.csv")