{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Image summary and visual question answering" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This notebooks shows some preliminary work on Image Captioning and Visual question answering with lavis. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `ammico` package that is imported here:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:25:33.335353Z", "iopub.status.busy": "2023-05-05T07:25:33.334707Z", "iopub.status.idle": "2023-05-05T07:25:47.294805Z", "shell.execute_reply": "2023-05-05T07:25:47.293740Z" }, "tags": [] }, "outputs": [], "source": [ "from ammico import utils as mutils\n", "from ammico import display as mdisplay\n", "import ammico.summary as sm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set an image path as input file path." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:25:47.299092Z", "iopub.status.busy": "2023-05-05T07:25:47.298422Z", "iopub.status.idle": "2023-05-05T07:25:47.305577Z", "shell.execute_reply": "2023-05-05T07:25:47.304412Z" }, "tags": [] }, "outputs": [], "source": [ "images = mutils.find_files(\n", " path=\"data/\",\n", " limit=10,\n", ")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:25:47.309146Z", "iopub.status.busy": "2023-05-05T07:25:47.308610Z", "iopub.status.idle": "2023-05-05T07:25:47.312885Z", "shell.execute_reply": "2023-05-05T07:25:47.311970Z" }, "tags": [] }, "outputs": [], "source": [ "mydict = mutils.initialize_dict(images)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create captions for images and directly write to csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here you can choose between two models: \"base\" or \"large\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:25:47.316783Z", "iopub.status.busy": "2023-05-05T07:25:47.316523Z", "iopub.status.idle": "2023-05-05T07:26:42.776280Z", "shell.execute_reply": "2023-05-05T07:26:42.775161Z" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r", " 0%| | 0.00/2.50G [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenameconst_image_summary3_non-deterministic summary
0data/102141_2_eng.pnga collage of images including a corona sign, a...[a person with glasses on holding a pipe, a pe...
1data/106349S_por.pnga man wearing a face mask while looking at a c...[a man holding a microphone and wearing a face...
2data/102730_eng.pngtwo people in blue coats spray disinfection a van[a couple of people that are spraying some kin...
\n", "" ], "text/plain": [ " filename const_image_summary \n", "0 data/102141_2_eng.png a collage of images including a corona sign, a... \\\n", "1 data/106349S_por.png a man wearing a face mask while looking at a c... \n", "2 data/102730_eng.png two people in blue coats spray disinfection a van \n", "\n", " 3_non-deterministic summary \n", "0 [a person with glasses on holding a pipe, a pe... \n", "1 [a man holding a microphone and wearing a face... \n", "2 [a couple of people that are spraying some kin... " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write the csv file:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:27:51.151351Z", "iopub.status.busy": "2023-05-05T07:27:51.150899Z", "iopub.status.idle": "2023-05-05T07:27:51.157158Z", "shell.execute_reply": "2023-05-05T07:27:51.156081Z" }, "tags": [] }, "outputs": [], "source": [ "df.to_csv(\"./data_out.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Manually inspect the summaries\n", "\n", "To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment, so please be patient. If you are sure of what you are doing.\n", "\n", "`const_image_summary` - the permanent summarys, which does not change from run to run (analyse_image).\n", "\n", "`3_non-deterministic summary` - 3 different summarys examples that change from run to run (analyse_image). " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:27:51.160728Z", "iopub.status.busy": "2023-05-05T07:27:51.160276Z", "iopub.status.idle": "2023-05-05T07:27:52.175322Z", "shell.execute_reply": "2023-05-05T07:27:52.173921Z" }, "tags": [] }, "outputs": [ { "ename": "AttributeError", "evalue": "module 'ammico.display' has no attribute 'explore_analysis'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[9], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mmdisplay\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexplore_analysis\u001b[49m(mydict, identify\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msummary\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", "\u001b[0;31mAttributeError\u001b[0m: module 'ammico.display' has no attribute 'explore_analysis'" ] } ], "source": [ "mdisplay.explore_analysis(mydict, identify=\"summary\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate answers to free-form questions about images written in natural language. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the list of questions" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:27:52.180084Z", "iopub.status.busy": "2023-05-05T07:27:52.179734Z", "iopub.status.idle": "2023-05-05T07:27:52.184640Z", "shell.execute_reply": "2023-05-05T07:27:52.183748Z" } }, "outputs": [], "source": [ "list_of_questions = [\n", " \"How many persons on the picture?\",\n", " \"Are there any politicians in the picture?\",\n", " \"Does the picture show something from medicine?\",\n", "]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:27:52.188241Z", "iopub.status.busy": "2023-05-05T07:27:52.187913Z", "iopub.status.idle": "2023-05-05T07:29:23.432971Z", "shell.execute_reply": "2023-05-05T07:29:23.429673Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\r", " 0%| | 0.00/1.35G [00:00 1\u001b[0m \u001b[43mmdisplay\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexplore_analysis\u001b[49m(mydict, identify\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msummary\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", "\u001b[0;31mAttributeError\u001b[0m: module 'ammico.display' has no attribute 'explore_analysis'" ] } ], "source": [ "mdisplay.explore_analysis(mydict, identify=\"summary\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert the dictionary of dictionarys into a dictionary with lists:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:29:23.944483Z", "iopub.status.busy": "2023-05-05T07:29:23.943772Z", "iopub.status.idle": "2023-05-05T07:29:23.954499Z", "shell.execute_reply": "2023-05-05T07:29:23.953487Z" } }, "outputs": [], "source": [ "outdict2 = mutils.append_data_to_dict(mydict)\n", "df2 = mutils.dump_df(outdict2)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:29:23.959267Z", "iopub.status.busy": "2023-05-05T07:29:23.958517Z", "iopub.status.idle": "2023-05-05T07:29:23.981117Z", "shell.execute_reply": "2023-05-05T07:29:23.980178Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenameconst_image_summary3_non-deterministic summaryHow many persons on the picture?Are there any politicians in the picture?Does the picture show something from medicine?
0data/102141_2_eng.pnga collage of images including a corona sign, a...[a person with glasses on holding a pipe, a pe...1noyes
1data/106349S_por.pnga man wearing a face mask while looking at a c...[a man holding a microphone and wearing a face...1yesyes
2data/102730_eng.pngtwo people in blue coats spray disinfection a van[a couple of people that are spraying some kin...2noyes
\n", "
" ], "text/plain": [ " filename const_image_summary \n", "0 data/102141_2_eng.png a collage of images including a corona sign, a... \\\n", "1 data/106349S_por.png a man wearing a face mask while looking at a c... \n", "2 data/102730_eng.png two people in blue coats spray disinfection a van \n", "\n", " 3_non-deterministic summary \n", "0 [a person with glasses on holding a pipe, a pe... \\\n", "1 [a man holding a microphone and wearing a face... \n", "2 [a couple of people that are spraying some kin... \n", "\n", " How many persons on the picture? Are there any politicians in the picture? \n", "0 1 no \\\n", "1 1 yes \n", "2 2 no \n", "\n", " Does the picture show something from medicine? \n", "0 yes \n", "1 yes \n", "2 yes " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.head(10)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2023-05-05T07:29:23.985003Z", "iopub.status.busy": "2023-05-05T07:29:23.984485Z", "iopub.status.idle": "2023-05-05T07:29:23.993360Z", "shell.execute_reply": "2023-05-05T07:29:23.992313Z" } }, "outputs": [], "source": [ "df2.to_csv(\"./data_out2.csv\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" }, "vscode": { "interpreter": { "hash": "f1142466f556ab37fe2d38e2897a16796906208adb09fea90ba58bdf8a56f0ba" } } }, "nbformat": 4, "nbformat_minor": 4 }