# Image summary and visual question answering

This notebook shows how to generate image captions and use the visual question answering with AMMICO. 

The first cell imports `ammico`.


In [None]:
import ammico

The cell below loads the model for VQA tasks. By default, it loads a large model on the GPU (if your device supports CUDA), otherwise it loads a relatively smaller model on the CPU. But you can specify other settings (e.g., a small model on the GPU) if you want.

In [None]:
model = ammico.MultimodalSummaryModel()

Here you need to provide the path to your google drive folder or local folder containing the images

In [None]:
image_dict = ammico.find_files(
 path=str("../../data/in"),
 limit=-1, # -1 means no limit on the number of files, by default it is set to 20
)

The cell below creates an object that analyzes images and generates a summary using a specific model and image data.

In [None]:
img = ammico.ImageSummaryDetector(summary_model=model, subdict=image_dict)

## Image summary 

To start your work with images, you should call the `analyse_images` method.

You can specify what kind of analysis you want to perform with `analysis_type`. `"summary"` will generate a summary for all pictures in your dictionary, `"questions"` will prepare answers to your questions for all pictures, and `"summary_and_questions"` will do both.

Parameter `"is_concise_summary"` regulates the length of an answer.

Here we want to get a long summary on each object in our image dictionary.

In [None]:
summaries = img.analyse_images_from_dict(
 analysis_type="summary", is_concise_summary=False
)

## VQA

In addition to analyzing images in `ammico`, the same model can be used in VQA mode. To do this, you need to define the questions that will be applied to all images from your dict.

In [None]:
questions = ["Are there any visible signs of violence?", "Is it safe to be there?"]

Here is an example of VQA mode usage. You can specify whether you want to receive short answers (recommended option) or not.

In [None]:
vqa_results = img.analyse_images_from_dict(
 analysis_type="questions",
 list_of_questions=questions,
 is_concise_answer=True,
)