{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Image summary and visual question answering" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "This notebook shows how to generate image captions and use the visual question answering with AMMICO. \n", "\n", "The first cell imports `ammico`.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": {}, "outputs": [], "source": [ "import ammico" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "The cell below loads the model for VQA tasks. By default, it loads a large model on the GPU (if your device supports CUDA), otherwise it loads a relatively smaller model on the CPU. But you can specify other settings (e.g., a small model on the GPU) if you want." ] }, { "cell_type": "code", "execution_count": null, "id": "4", "metadata": {}, "outputs": [], "source": [ "model = ammico.MultimodalSummaryModel()" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "Here you need to provide the path to your google drive folder or local folder containing the images" ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "image_dict = ammico.find_files(\n", " path=str(\"../../data/in\"),\n", " limit=-1, # -1 means no limit on the number of files, by default it is set to 20\n", ")" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "The cell below creates an object that analyzes images and generates a summary using a specific model and image data." ] }, { "cell_type": "code", "execution_count": null, "id": "8", "metadata": {}, "outputs": [], "source": [ "img = ammico.ImageSummaryDetector(summary_model=model, subdict=image_dict)" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "## Image summary " ] }, { "cell_type": "markdown", "id": "10", "metadata": {}, "source": [ "To start your work with images, you should call the `analyse_images` method.\n", "\n", "You can specify what kind of analysis you want to perform with `analysis_type`. `\"summary\"` will generate a summary for all pictures in your dictionary, `\"questions\"` will prepare answers to your questions for all pictures, and `\"summary_and_questions\"` will do both.\n", "\n", "Parameter `\"is_concise_summary\"` regulates the length of an answer.\n", "\n", "Here we want to get a long summary on each object in our image dictionary." ] }, { "cell_type": "code", "execution_count": null, "id": "11", "metadata": {}, "outputs": [], "source": [ "summaries = img.analyse_images_from_dict(\n", " analysis_type=\"summary\", is_concise_summary=False\n", ")" ] }, { "cell_type": "markdown", "id": "12", "metadata": {}, "source": [ "## VQA" ] }, { "cell_type": "markdown", "id": "13", "metadata": {}, "source": [ "In addition to analyzing images in `ammico`, the same model can be used in VQA mode. To do this, you need to define the questions that will be applied to all images from your dict." ] }, { "cell_type": "code", "execution_count": null, "id": "14", "metadata": {}, "outputs": [], "source": [ "questions = [\"Are there any visible signs of violence?\", \"Is it safe to be there?\"]" ] }, { "cell_type": "markdown", "id": "15", "metadata": {}, "source": [ "Here is an example of VQA mode usage. You can specify whether you want to receive short answers (recommended option) or not." ] }, { "cell_type": "code", "execution_count": null, "id": "16", "metadata": {}, "outputs": [], "source": [ "vqa_results = img.analyse_images_from_dict(\n", " analysis_type=\"questions\",\n", " list_of_questions=questions,\n", " is_concise_answer=True,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "17", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "ammico", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 5 }