зеркало из
https://github.com/ssciwr/AMMICO.git
synced 2025-10-29 13:06:04 +02:00
586 строки
15 KiB
Plaintext
586 строки
15 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Image summary and visual question answering"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This notebooks shows how to generate image captions and use the visual question answering with [LAVIS](https://github.com/salesforce/LAVIS) library. \n",
|
|
"\n",
|
|
"The first cell is only run on google colab and installs the [ammico](https://github.com/ssciwr/AMMICO) package.\n",
|
|
"\n",
|
|
"After that, you can import `ammico` and read in the files given a folder path."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# if running on google colab\n",
|
|
"# flake8-noqa-cell\n",
|
|
"import os\n",
|
|
"\n",
|
|
"if \"google.colab\" in str(get_ipython()):\n",
|
|
" # update python version\n",
|
|
" # install setuptools\n",
|
|
" # %pip install setuptools==61 -qqq\n",
|
|
" # install ammico\n",
|
|
" %pip install git+https://github.com/ssciwr/ammico.git -qqq\n",
|
|
" # mount google drive for data and API key\n",
|
|
" from google.colab import drive\n",
|
|
"\n",
|
|
" drive.mount(\"/content/drive\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import ammico"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Here you need to provide the path to your google drive folder\n",
|
|
"# or local folder containing the images\n",
|
|
"mydict = ammico.find_files(\n",
|
|
" path=\"/content/drive/MyDrive/misinformation-data/\",\n",
|
|
" #path=\"../../data/images/\",\n",
|
|
" limit=2,\n",
|
|
")\n",
|
|
"mydict"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create captions for images and directly write to csv"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here you can choose between models: `\"base\"` or `\"large\"`. This will generate the caption for each image and directly put the results in your dictionary `mydict`. Then you can transform it into the dataframe and this dataframe can be exported as a .csv file.\n",
|
|
"\n",
|
|
"The results are written in the columns: \n",
|
|
"- `const_image_summary` - the permanent summaries, which do not change from run to run (analyse_image).\n",
|
|
"- `3_non-deterministic summary` displays three different summaries generated with different seeds that change from run to run (analyse_image). \n",
|
|
"\n",
|
|
"You can also specify what kind of analysis you want to perform with `analysis_type`. `\"summary\"` will generate a summary for all pictures in your dictionary `mydict`, `\"questions\"` will prepare answers to your questions for all pictures, and `\"summary_and_questions\"` will do both. \n",
|
|
"If you load the models (`summary_model`, `summary_vis_processors` for `\"summary\"` and `summary_vqa_model`, `summary_vqa_vis_processors`, `summary_vqa_txt_processors` for `\"questions\"`) into memory beforehand and pass them to the function, it can speed up the analysis many times. \n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"obj = ammico.SummaryDetector(mydict, analysis_type=\"summary\", model_type=\"base\") # here we load the base model to the memory. This can dramatically speed up the calculation process then.\n",
|
|
"#obj = ammico.SummaryDetector(mydict, analysis_type=\"summary\", model_type=\"large\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"for key in mydict:\n",
|
|
" mydict[key] = obj.analyse_image(analysis_type=\"summary\", subdict = mydict[key])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mydict"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"### Convert to dataframe and write csv\n",
|
|
"\n",
|
|
"Convert the dictionary of dictionarys into a dictionary with lists:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"outdict = ammico.append_data_to_dict(mydict)\n",
|
|
"df = ammico.dump_df(outdict)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Check the dataframe:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"df.head(10)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Write the csv file:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Generate answers to free-form questions about images written in natural language. "
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Load the model to the memory through object creation with parameters `analysis_type=\"questions\"` and `model_type=\"vqa\"`. Set the list of questions as a list of strings `list_of_questions`, and pass them to the function"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"list_of_questions = [\n",
|
|
" \"How many persons on the picture?\",\n",
|
|
" \"Are there any politicians in the picture?\",\n",
|
|
" \"Does the picture show something from medicine?\",\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"obj = ammico.SummaryDetector(mydict, analysis_type=\"questions\", model_type=\"vqa\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for key in mydict:\n",
|
|
" mydict[key] = obj.analyse_image(subdict = mydict[key], analysis_type=\"questions\", list_of_questions = list_of_questions)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mydict"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Or you can perform two types of analysis at a time `analysis_type=\"summary_and_questions\"`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"obj = ammico.SummaryDetector(mydict, analysis_type=\"summary_and_questions\", model_type=\"base\", list_of_questions = list_of_questions)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for key in mydict:\n",
|
|
" mydict[key] = obj.analyse_image(subdict = mydict[key], analysis_type=\"summary_and_questions\", list_of_questions = list_of_questions)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Convert to dataframe and write csv\n",
|
|
"These steps are required to convert the dictionary of dictionarys into a dictionary with lists, that can be converted into a pandas dataframe and exported to a csv file."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"outdict2 = ammico.append_data_to_dict(mydict)\n",
|
|
"df2 = ammico.dump_df(outdict2)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df2.head(10)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df2.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out2.csv\")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Manually inspect the summaries and visual question answering\n",
|
|
"\n",
|
|
"To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment since it loads the big model to memory for every picture, so please be patient. If you are sure of what you are doing. In this widget you can select the picture, the type of analysis and the question.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"analysis_explorer = ammico.AnalysisExplorer(mydict)\n",
|
|
"analysis_explorer.run_server(port=8055)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# New models\n",
|
|
"This is very heavy models. They requare approx 60GB of RAM and they can use 20+GB memory GPUs for acceleration."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"obj = ammico.SummaryDetector(mydict, analysis_type = \"summary_and_questions\", model_type = \"blip2_t5_caption_coco_flant5xl\", device_type= \"cpu\")\n",
|
|
"# list of the new models that can be used:\n",
|
|
"# \"blip2_t5_pretrain_flant5xxl\",\n",
|
|
"# \"blip2_t5_pretrain_flant5xl\",\n",
|
|
"# \"blip2_t5_caption_coco_flant5xl\",\n",
|
|
"# \"blip2_opt_pretrain_opt2.7b\",\n",
|
|
"# \"blip2_opt_pretrain_opt6.7b\",\n",
|
|
"# \"blip2_opt_caption_coco_opt2.7b\",\n",
|
|
"# \"blip2_opt_caption_coco_opt6.7b\",\n",
|
|
"\n",
|
|
"# You can use `pretrain_` model types for zero-shot image-to-text generation with prompts.\n",
|
|
"# Or you can use `caption_coco_`` model types to generate coco-style captions.\n",
|
|
"# `flant5` and `opt` means that the model equipped with FlanT5 and OPT LLMs respectively.\n",
|
|
"\n",
|
|
"#also you can perform all calculation on cpu if you set device_type= \"cpu\" or gpu if you set device_type= \"cuda\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for key in mydict:\n",
|
|
" mydict[key] = obj.analyse_image(subdict = mydict[key], analysis_type=\"summary_and_questions\")\n",
|
|
"\n",
|
|
"# analysis_type can be \n",
|
|
"# \"summary\",\n",
|
|
"# \"questions\",\n",
|
|
"# \"summary_and_questions\"."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mydict"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can also pass a list of questions to this cell if `analysis_type=\"summary_and_questions\"` or `analysis_type=\"questions\"`. But the format of questions has changed in new models. \n",
|
|
"\n",
|
|
"Here is an example of a list of questions:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"list_of_questions = [\n",
|
|
" \"Question: Are there people in the image? Answer:\",\n",
|
|
" \"Question: What is this picture about? Answer:\",\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for key in mydict:\n",
|
|
" mydict[key] = obj.analyse_image(subdict = mydict[key], analysis_type=\"questions\", list_of_questions=list_of_questions)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can also pass a question with previous answers as context into this model and pass in questions like this one to get a more accurate answer:\n",
|
|
"\n",
|
|
"You can combine as many questions as you want in a single query as a list."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"list_of_questions = [\n",
|
|
" \"Question: What country is in the picture? Answer: USA. Question: Why? Answer: Because there is an American flag in the background . Question: Where it comes from? Answer:\",\n",
|
|
" \"Question: Which city is this? Answer: Frankfurt. Question: why?\",\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for key in mydict:\n",
|
|
" mydict[key] = obj.analyse_image(subdict = mydict[key], analysis_type=\"questions\", list_of_questions=list_of_questions)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mydict"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can also ask sequential questions if you pass the argument `cosequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"list_of_questions = [\n",
|
|
" \"Question: Is this picture taken inside or outside? Answer:\",\n",
|
|
" \"Question: Why? Answer:\",\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for key in mydict:\n",
|
|
" mydict[key] = obj.analyse_image(subdict = mydict[key], analysis_type=\"questions\", list_of_questions=list_of_questions, consequential_questions=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mydict"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Convert to dataframe and write csv\n",
|
|
"\n",
|
|
"Convert the dictionary of dictionarys into a dictionary with lists:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"outdict = ammico.append_data_to_dict(mydict)\n",
|
|
"df = ammico.dump_df(outdict)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mydict"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Check the dataframe:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df.head(10)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Write the csv file:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.13"
|
|
},
|
|
"vscode": {
|
|
"interpreter": {
|
|
"hash": "f1142466f556ab37fe2d38e2897a16796906208adb09fea90ba58bdf8a56f0ba"
|
|
}
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|