# Image summary and visual question answering

This notebooks shows how to generate image captions and use the visual question answering with [LAVIS](https://github.com/salesforce/LAVIS) library. 

The first cell is only run on google colab and installs the [ammico](https://github.com/ssciwr/AMMICO) package.

After that, you can import `ammico` and read in the files given a folder path.

In [None]:
# if running on google colab
# flake8-noqa-cell
import os

if "google.colab" in str(get_ipython()):
    # update python version
    # install setuptools
    # %pip install setuptools==61 -qqq
    # install ammico
    %pip install git+https://github.com/ssciwr/ammico.git -qqq
    # mount google drive for data and API key
    from google.colab import drive

    drive.mount("/content/drive")

In [None]:
import ammico

In [None]:
# Here you need to provide the path to your google drive folder
# or local folder containing the images
images = ammico.find_files(
    path="/content/drive/MyDrive/misinformation-data/",
    limit=10,
)

In [None]:
mydict = ammico.initialize_dict(images)

## Create captions for images and directly write to csv

Here you can choose between two models: `"base"` or `"large"`. This will generate the caption for each image and directly put the results in your dictionary `mydict`. Then you can transform it into the dataframe and this dataframe can be exported as a .csv file.

The results are written in the columns: 
- `const_image_summary` - the permanent summaries, which do not change from run to run (analyse_image).
- `3_non-deterministic summary` displays three different summaries generated with different seeds that change from run to run (analyse_image). 

You can also specify what kind of analysis you want to perform with `analysis_type`. `"summary"` will generate a summary for all pictures in your dictionary `mydict`, `"questions"` will prepare answers to your questions for all pictures, and `"summary_and_questions"` will do both. 
If you load the models (`summary_model`, `summary_vis_processors` for `"summary"` and `summary_vqa_model`, `summary_vqa_vis_processors`, `summary_vqa_txt_processors` for `"questions"`) into memory beforehand and pass them to the function, it can speed up the analysis many times. 



In [None]:
obj = ammico.SummaryDetector(mydict)
summary_model, summary_vis_processors = obj.load_model(model_type="base") # here we load the base model to the memory. This can dramatically speed up the calculation process then.
# summary_model, summary_vis_processors = ammico.load_model("large")

In [None]:
for key in mydict:
    mydict[key] = ammico.SummaryDetector(
        mydict[key],                                       # here we pass the dictionary containing the images
        analysis_type="summary",                           # here we specify the type of analysis to perform (summary, questions, summary_and_questions)
        summary_model=summary_model,                       # here we pass the model to use for the analysis
        summary_vis_processors=summary_vis_processors      # here we pass the visual processors to use for the analysis
        ).analyse_image()

### Convert to dataframe and write csv

Convert the dictionary of dictionarys into a dictionary with lists:

In [None]:
outdict = ammico.append_data_to_dict(mydict)
df = ammico.dump_df(outdict)

Check the dataframe:

In [None]:
df.head(10)

Write the csv file:

In [None]:
df.to_csv("/content/drive/MyDrive/misinformation-data/data_out.csv")

## Generate answers to free-form questions about images written in natural language. 

Set the list of questions as a list of strings `list_of_questions`, load the models to the memory and pass them to the function

In [None]:
list_of_questions = [
    "How many persons on the picture?",
    "Are there any politicians in the picture?",
    "Does the picture show something from medicine?",
]

In [None]:
(
    summary_vqa_model, 
    summary_vqa_vis_processors, 
    summary_vqa_txt_processors 
) = obj.load_vqa_model() # here we load the VQA model to the memory. This can dramatically speed up the calculation process then.


In [None]:
for key in mydict:
    mydict[key] = ammico.SummaryDetector(
        mydict[key],
        analysis_type="questions",
        summary_vqa_model=summary_vqa_model,
        summary_vqa_vis_processors=summary_vqa_vis_processors,
        summary_vqa_txt_processors=summary_vqa_txt_processors,         
        ).analyse_questions(list_of_questions)

Or you can perform two types of analysis at a time `analysis_type="summary_and_questions"`.

In [None]:
for key in mydict:
    mydict[key] = ammico.SummaryDetector(
        mydict[key],
        analysis_type="summary_and_questions",
        summary_model=summary_model,                   
        summary_vis_processors=summary_vis_processors,
        summary_vqa_model=summary_vqa_model,
        summary_vqa_vis_processors=summary_vqa_vis_processors,
        summary_vqa_txt_processors=summary_vqa_txt_processors,         
        ).analyse_questions(list_of_questions)

### Convert to dataframe and write csv
These steps are required to convert the dictionary of dictionarys into a dictionary with lists, that can be converted into a pandas dataframe and exported to a csv file.

In [None]:
outdict2 = ammico.append_data_to_dict(mydict)
df2 = ammico.dump_df(outdict2)

In [None]:
df2.head(10)

In [None]:
df2.to_csv("/content/drive/MyDrive/misinformation-data/data_out2.csv")

## Manually inspect the summaries and visual question answering

To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment since it loads the big model to memory for every picture, so please be patient. If you are sure of what you are doing. In this widget you can select the picture, the type of analysis and the question.


In [None]:
analysis_explorer = ammico.AnalysisExplorer(mydict)
analysis_explorer.run_server(port=8055)