diff --git a/Dockerfile b/Dockerfile index af5c26f..5727154 100644 --- a/Dockerfile +++ b/Dockerfile @@ -11,9 +11,6 @@ COPY --chown=${NB_UID} . /opt/misinformation # Install the Python package RUN python -m pip install /opt/misinformation -# Install additional dependencies for running the notebooks -RUN python -m pip install -r /opt/misinformation/requirements.txt - # Make JupyterLab the default for this application ENV JUPYTER_ENABLE_LAB=yes diff --git a/README.md b/README.md index 72c4493..16ce16e 100644 --- a/README.md +++ b/README.md @@ -44,15 +44,20 @@ This will install the package and its dependencies locally. ## Usage There are sample notebooks in the `misinformation/notebooks` folder for you to explore the package: -1. Text analysis: Use the notebook `get-text-from-image.ipynb` to extract any text from the images. The text is directly translated into English. If the text should be further analysed, set the keyword `analyse_text` to `True` as demonstrated in the notebook.\ +1. Text extraction: Use the notebook `get-text-from-image.ipynb` to extract any text from the images. The text is directly translated into English. If the text should be further analysed, set the keyword `analyse_text` to `True` as demonstrated in the notebook.\ **You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/get-text-from-image.ipynb)** Place the data files and google cloud vision API key in your google drive to access the data. -1. Facial analysis: Use the notebook `facial_expressions.ipynb` to identify if there are faces on the image, if they are wearing masks, and if they are not wearing masks also the race, gender and dominant emotion. +1. Emotion recognition: Use the notebook `facial_expressions.ipynb` to identify if there are faces on the image, if they are wearing masks, and if they are not wearing masks also the race, gender and dominant emotion. **You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/facial_expressions.ipynb)** -Place the data files in your google drive to access the data.** +Place the data files in your google drive to access the data. +1. Content extraction: Use the notebook `image_summary.ipynb` to create captions for the images and ask questions about the image content. +**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/image_summary.ipynb)** +1. Multimodal content: Use the notebook `multimodal_search.ipynb` to find the best fitting images to an image or text query. +**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/multimodal_search.ipynb)** 1. Object analysis: Use the notebook `ojects_expression.ipynb` to identify certain objects in the image. Currently, the following objects are being identified: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, cell phone. +**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/objects_expression.ipynb)** -There are further notebooks that are currently of exploratory nature (`colors_expression.ipynb` to identify certain colors on the image). +There are further notebooks that are currently of exploratory nature (`colors_expression.ipynb` to identify certain colors on the image). To crop social media posts use the `cropposts.ipynb` notebook. ## Features ### Text extraction diff --git a/docs/source/notebooks/Example faces.ipynb b/docs/source/notebooks/Example faces.ipynb index 7da9ac1..71e8798 100644 --- a/docs/source/notebooks/Example faces.ipynb +++ b/docs/source/notebooks/Example faces.ipynb @@ -83,7 +83,9 @@ "cell_type": "code", "execution_count": null, "id": "b37c0c91", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "mydict = mutils.initialize_dict(images)" @@ -102,7 +104,9 @@ "cell_type": "code", "execution_count": null, "id": "992499ed-33f1-4425-ad5d-738cf565d175", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "mdisplay.explore_analysis(mydict, identify=\"faces\")" @@ -120,7 +124,9 @@ "cell_type": "code", "execution_count": null, "id": "6f97c7d0", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "for key in mydict.keys():\n", @@ -139,7 +145,9 @@ "cell_type": "code", "execution_count": null, "id": "604bd257", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "outdict = mutils.append_data_to_dict(mydict)\n", @@ -158,7 +166,9 @@ "cell_type": "code", "execution_count": null, "id": "aa4b518a", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.head(10)" @@ -176,7 +186,9 @@ "cell_type": "code", "execution_count": null, "id": "4618decb", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.to_csv(\"data/data_out.csv\")" @@ -193,7 +205,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, diff --git a/docs/source/notebooks/Example multimodal.ipynb b/docs/source/notebooks/Example multimodal.ipynb index c091b84..7d0abf0 100644 --- a/docs/source/notebooks/Example multimodal.ipynb +++ b/docs/source/notebooks/Example multimodal.ipynb @@ -292,7 +292,9 @@ "cell_type": "code", "execution_count": null, "id": "e78646d6-80be-4d3e-8123-3360957bcaa8", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.head(10)" @@ -310,16 +312,26 @@ "cell_type": "code", "execution_count": null, "id": "185f7dde-20dc-44d8-9ab0-de41f9b5734d", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.to_csv(\"./data_out.csv\")" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2ef1132f-eb2a-43d7-be1f-69e879490f33", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, diff --git a/docs/source/notebooks/Example objects.ipynb b/docs/source/notebooks/Example objects.ipynb index 567ba77..ab6b621 100644 --- a/docs/source/notebooks/Example objects.ipynb +++ b/docs/source/notebooks/Example objects.ipynb @@ -72,7 +72,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "mdisplay.explore_analysis(mydict, identify=\"objects\")" @@ -88,7 +90,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "for key in mydict:\n", @@ -105,7 +109,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "outdict = mutils.append_data_to_dict(mydict)\n", @@ -122,7 +128,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.head(10)" @@ -138,16 +146,25 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.to_csv(\"./data_out.csv\")" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, diff --git a/docs/source/notebooks/Example summary.ipynb b/docs/source/notebooks/Example summary.ipynb index 1cf40c1..58a8688 100644 --- a/docs/source/notebooks/Example summary.ipynb +++ b/docs/source/notebooks/Example summary.ipynb @@ -17,7 +17,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "from misinformation import utils as mutils\n", @@ -35,7 +37,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "images = mutils.find_files(\n", @@ -47,7 +51,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "mydict = mutils.initialize_dict(images)" @@ -70,18 +76,22 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "obj = sm.SummaryDetector(mydict)\n", "summary_model, summary_vis_processors = obj.load_model(\"base\")\n", - "# summary_model, summary_vis_processors = mutils.load_model(\"large\")" + "# summary_model, summary_vis_processors = obj.load_model(\"large\")" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "for key in mydict:\n", @@ -121,7 +131,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.head(10)" @@ -137,7 +149,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "df.to_csv(\"./data_out.csv\")" @@ -159,7 +173,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "mdisplay.explore_analysis(mydict, identify=\"summary\")" diff --git a/docs/source/notebooks/Example text.ipynb b/docs/source/notebooks/Example text.ipynb index c31ffea..27717bc 100644 --- a/docs/source/notebooks/Example text.ipynb +++ b/docs/source/notebooks/Example text.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "id": "dcaa3da1", "metadata": {}, @@ -14,7 +13,9 @@ "cell_type": "code", "execution_count": null, "id": "f43f327c", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# if running on google colab\n", @@ -37,22 +38,23 @@ "cell_type": "code", "execution_count": null, "id": "cf362e60", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "import os\n", - "from IPython.display import Image, display\n", "import misinformation\n", "from misinformation import utils as mutils\n", - "from misinformation import display as mdisplay\n", - "import tensorflow as tf" + "from misinformation import display as mdisplay" ] }, { "cell_type": "code", "execution_count": null, "id": "27675810", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# download the models if they are not there yet\n", @@ -64,35 +66,27 @@ "cell_type": "code", "execution_count": null, "id": "6da3a7aa", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "images = mutils.find_files(path=\"data\", limit=10)" ] }, - { - "cell_type": "code", - "execution_count": null, - "id": "bf811ce0", - "metadata": {}, - "outputs": [], - "source": [ - "for i in images:\n", - " display(Image(filename=i))" - ] - }, { "cell_type": "code", "execution_count": null, "id": "8b32409f", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "mydict = mutils.initialize_dict(images)" ] }, { - "attachments": {}, "cell_type": "markdown", "id": "7b8b929f", "metadata": {}, @@ -113,7 +107,9 @@ "cell_type": "code", "execution_count": null, "id": "7c6ecc88", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "mdisplay.explore_analysis(mydict, identify=\"text-on-image\")" @@ -131,11 +127,12 @@ "cell_type": "code", "execution_count": null, "id": "365c78b1-7ff4-4213-86fa-6a0a2d05198f", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "for key in mydict:\n", - " print(key)\n", " mydict[key] = misinformation.text.TextDetector(\n", " mydict[key], analyse_text=True\n", " ).analyse_image()" @@ -153,7 +150,9 @@ "cell_type": "code", "execution_count": null, "id": "5709c2cd", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "outdict = mutils.append_data_to_dict(mydict)\n", @@ -164,7 +163,9 @@ "cell_type": "code", "execution_count": null, "id": "c4f05637", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# check the dataframe\n", @@ -175,17 +176,27 @@ "cell_type": "code", "execution_count": null, "id": "bf6c9ddb", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# Write the csv\n", "df.to_csv(\"./data_out.csv\")" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9012544e-f818-46ea-b087-3e150850a5d5", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -199,7 +210,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.5" + "version": "3.9.16" }, "vscode": { "interpreter": { diff --git a/misinformation/test/test_text.py b/misinformation/test/test_text.py index a102ed5..c0ec545 100644 --- a/misinformation/test/test_text.py +++ b/misinformation/test/test_text.py @@ -122,12 +122,30 @@ def test_text_summary(get_path): ref_file = get_path + "example_summary.txt" with open(ref_file, "r", encoding="utf8") as file: reference_text = file.read() - test_obj.subdict["text_english"] = reference_text + mydict["text_english"] = reference_text test_obj.text_summary() reference_summary = " I’m sorry, but I don’t want to be an emperor. That’s not my business. I should like to help everyone - if possible - Jew, Gentile - black man - white . We all want to help one another. In this world there is room for everyone. The way of life can be free and beautiful, but we have lost the way ." assert mydict["summary_text"] == reference_summary +def test_text_sentiment_transformers(): + mydict = {} + test_obj = tt.TextDetector(mydict, analyse_text=True) + mydict["text_english"] = "I am happy that the CI is working again." + test_obj.text_sentiment_transformers() + assert mydict["sentiment"] == "POSITIVE" + assert mydict["sentiment_score"] == pytest.approx(0.99, 0.01) + + +def test_text_ner(): + mydict = {} + test_obj = tt.TextDetector(mydict, analyse_text=True) + mydict["text_english"] = "Bill Gates was born in Seattle." + test_obj.text_ner() + assert mydict["entity"] == ["Bill", "Gates", "Seattle"] + assert mydict["entity_type"] == ["I-PER", "I-PER", "I-LOC"] + + def test_PostprocessText(set_testdict, get_path): reference_dict = "THE\nALGEBRAIC\nEIGENVALUE\nPROBLEM\nDOM\nNVS TIO\nMINA\nMonographs\non Numerical Analysis\nJ.. H. WILKINSON" reference_df = "Mathematische Formelsammlung\nfür Ingenieure und Naturwissenschaftler\nMit zahlreichen Abbildungen und Rechenbeispielen\nund einer ausführlichen Integraltafel\n3., verbesserte Auflage" diff --git a/misinformation/text.py b/misinformation/text.py index 9f3c776..a3f927c 100644 --- a/misinformation/text.py +++ b/misinformation/text.py @@ -122,26 +122,57 @@ class TextDetector(utils.AnalysisMethod): def text_summary(self): # use the transformers pipeline to summarize the text - pipe = pipeline("summarization") + # use the current default model - 03/2023 + model_name = "sshleifer/distilbart-cnn-12-6" + model_revision = "a4f8f3e" + pipe = pipeline("summarization", model=model_name, revision=model_revision) self.subdict.update(pipe(self.subdict["text_english"])[0]) - # def text_sentiment_transformers(self): - # pipe = pipeline("text-classification") + def text_sentiment_transformers(self): + # use the transformers pipeline for text classification + # use the current default model - 03/2023 + model_name = "distilbert-base-uncased-finetuned-sst-2-english" + model_revision = "af0f99b" + pipe = pipeline( + "text-classification", model=model_name, revision=model_revision + ) + result = pipe(self.subdict["text_english"]) + self.subdict["sentiment"] = result[0]["label"] + self.subdict["sentiment_score"] = result[0]["score"] + + def text_ner(self): + # use the transformers pipeline for named entity recognition + # use the current default model - 03/2023 + model_name = "dbmdz/bert-large-cased-finetuned-conll03-english" + model_revision = "f2482bf" + pipe = pipeline( + "token-classification", model=model_name, revision=model_revision + ) + result = pipe(self.subdict["text_english"]) + self.subdict["entity"] = [] + self.subdict["entity_type"] = [] + for entity in result: + self.subdict["entity"].append(entity["word"]) + self.subdict["entity_type"].append(entity["entity"]) class PostprocessText: def __init__( - self, mydict: dict = None, use_csv: bool = False, csv_path: str = None + self, + mydict: dict = None, + use_csv: bool = False, + csv_path: str = None, + analyze_text: str = "text_english", ) -> None: self.use_csv = use_csv if mydict: print("Reading data from dict.") self.mydict = mydict - self.list_text_english = self.get_text_dict() + self.list_text_english = self.get_text_dict(analyze_text) elif self.use_csv: print("Reading data from df.") self.df = pd.read_csv(csv_path, encoding="utf8") - self.list_text_english = self.get_text_df() + self.list_text_english = self.get_text_df(analyze_text) else: raise ValueError( "Please provide either dictionary with textual data or \ @@ -177,24 +208,28 @@ class PostprocessText: most_frequent_topics.append(self.topic_model.get_topic(i)) return self.topic_model, topic_df, most_frequent_topics - def get_text_dict(self): - # use dict to put text_english in list + def get_text_dict(self, analyze_text): + # use dict to put text_english or text_summary in list list_text_english = [] for key in self.mydict.keys(): - if "text_english" not in self.mydict[key]: + if analyze_text not in self.mydict[key]: raise ValueError( "Please check your provided dictionary - \ - no english text data found." + no {} text data found.".format( + analyze_text + ) ) - list_text_english.append(self.mydict[key]["text_english"]) + list_text_english.append(self.mydict[key][analyze_text]) return list_text_english - def get_text_df(self): - # use csv file to obtain dataframe and put text_english in list - # check that "text_english" is there - if "text_english" not in self.df: + def get_text_df(self, analyze_text): + # use csv file to obtain dataframe and put text_english or text_summary in list + # check that "text_english" or "text_summary" is there + if analyze_text not in self.df: raise ValueError( "Please check your provided dataframe - \ - no english text data found." + no {} text data found.".format( + analyze_text + ) ) - return self.df["text_english"].tolist() + return self.df[analyze_text].tolist() diff --git a/notebooks/facial_expressions.ipynb b/notebooks/facial_expressions.ipynb index fe00584..a35b1a4 100644 --- a/notebooks/facial_expressions.ipynb +++ b/notebooks/facial_expressions.ipynb @@ -66,9 +66,11 @@ "metadata": {}, "outputs": [], "source": [ + "# Here you need to provide the path to your google drive folder\n", + "# or local folder containing the images\n", "images = mutils.find_files(\n", - " path=\"drive/MyDrive/misinformation-data/\",\n", - " limit=1000,\n", + " path=\"/content/drive/MyDrive/misinformation-data/\",\n", + " limit=10,\n", ")" ] }, @@ -105,7 +107,7 @@ "metadata": {}, "outputs": [], "source": [ - "mydict = mutils.initialize_dict(images[0:4])" + "mydict = mutils.initialize_dict(images)" ] }, { diff --git a/notebooks/get-text-from-image.ipynb b/notebooks/get-text-from-image.ipynb index 0542220..9acb869 100644 --- a/notebooks/get-text-from-image.ipynb +++ b/notebooks/get-text-from-image.ipynb @@ -40,13 +40,9 @@ "outputs": [], "source": [ "import os\n", - "from IPython.display import Image, display\n", "import misinformation\n", "from misinformation import utils as mutils\n", - "from misinformation import display as mdisplay\n", - "import tensorflow as tf\n", - "\n", - "print(tf.config.list_physical_devices(\"GPU\"))" + "from misinformation import display as mdisplay" ] }, { @@ -56,30 +52,12 @@ "metadata": {}, "outputs": [], "source": [ - "# download the models if they are not there yet\n", - "!python -m spacy download en_core_web_md\n", - "!python -m textblob.download_corpora" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6da3a7aa", - "metadata": {}, - "outputs": [], - "source": [ - "images = mutils.find_files(path=\"../data/all/\", limit=1000)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bf811ce0", - "metadata": {}, - "outputs": [], - "source": [ - "for i in images[0:3]:\n", - " display(Image(filename=i))" + "# Here you need to provide the path to your google drive folder\n", + "# or local folder containing the images\n", + "images = mutils.find_files(\n", + " path=\"/content/drive/MyDrive/misinformation-data/\",\n", + " limit=10,\n", + ")" ] }, { @@ -89,7 +67,7 @@ "metadata": {}, "outputs": [], "source": [ - "mydict = mutils.initialize_dict(images[0:3])" + "mydict = mutils.initialize_dict(images)" ] }, { @@ -110,7 +88,7 @@ "source": [ "os.environ[\n", " \"GOOGLE_APPLICATION_CREDENTIALS\"\n", - "] = \"../data/misinformation-campaign-981aa55a3b13.json\"" + "] = \"/content/drive/MyDrive/misinformation-data/misinformation-campaign-981aa55a3b13.json\"" ] }, { diff --git a/notebooks/image_summary.ipynb b/notebooks/image_summary.ipynb index 04d53cc..4f2b884 100644 --- a/notebooks/image_summary.ipynb +++ b/notebooks/image_summary.ipynb @@ -14,6 +14,28 @@ "This notebooks shows some preliminary work on Image Captioning and Visual question answering with lavis. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# if running on google colab\n", + "# flake8-noqa-cell\n", + "import os\n", + "\n", + "if \"google.colab\" in str(get_ipython()):\n", + " # update python version\n", + " # install setuptools\n", + " !pip install setuptools==61 -qqq\n", + " # install misinformation\n", + " !pip install git+https://github.com/ssciwr/misinformation.git -qqq\n", + " # mount google drive for data and API key\n", + " from google.colab import drive\n", + "\n", + " drive.mount(\"/content/drive\")" + ] + }, { "cell_type": "code", "execution_count": null, @@ -43,9 +65,11 @@ }, "outputs": [], "source": [ + "# Here you need to provide the path to your google drive folder\n", + "# or local folder containing the images\n", "images = mutils.find_files(\n", - " path=\"../misinformation/test/data/\",\n", - " limit=1000,\n", + " path=\"/content/drive/MyDrive/misinformation-data/\",\n", + " limit=10,\n", ")" ] }, @@ -57,18 +81,7 @@ }, "outputs": [], "source": [ - "mydict = mutils.initialize_dict(images[0:10])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "mydict" + "mydict = mutils.initialize_dict(images)" ] }, { diff --git a/notebooks/multimodal_search.ipynb b/notebooks/multimodal_search.ipynb index 8f40d15..d7b0abb 100644 --- a/notebooks/multimodal_search.ipynb +++ b/notebooks/multimodal_search.ipynb @@ -16,6 +16,29 @@ "This notebooks shows some preliminary work on Image Multimodal Search with lavis library. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b0a6bdf", + "metadata": {}, + "outputs": [], + "source": [ + "# if running on google colab\n", + "# flake8-noqa-cell\n", + "import os\n", + "\n", + "if \"google.colab\" in str(get_ipython()):\n", + " # update python version\n", + " # install setuptools\n", + " !pip install setuptools==61 -qqq\n", + " # install misinformation\n", + " !pip install git+https://github.com/ssciwr/misinformation.git -qqq\n", + " # mount google drive for data and API key\n", + " from google.colab import drive\n", + "\n", + " drive.mount(\"/content/drive\")" + ] + }, { "cell_type": "code", "execution_count": null, @@ -25,7 +48,7 @@ }, "outputs": [], "source": [ - "import misinformation\n", + "import misinformation.utils as mutils\n", "import misinformation.multimodal_search as ms" ] }, @@ -46,9 +69,11 @@ }, "outputs": [], "source": [ - "images = misinformation.utils.find_files(\n", - " path=\"../data/images/\",\n", - " limit=1000,\n", + "# Here you need to provide the path to your google drive folder\n", + "# or local folder containing the images\n", + "images = mutils.find_files(\n", + " path=\"/content/drive/MyDrive/misinformation-data/\",\n", + " limit=10,\n", ")" ] }, @@ -61,19 +86,7 @@ }, "outputs": [], "source": [ - "mydict = misinformation.utils.initialize_dict(images)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d98b6227-886d-41b8-a377-896dd8ab3c2a", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "mydict" + "mydict = mutils.initialize_dict(images)" ] }, { diff --git a/notebooks/objects_expression.ipynb b/notebooks/objects_expression.ipynb index 4aa1431..7fc0070 100644 --- a/notebooks/objects_expression.ipynb +++ b/notebooks/objects_expression.ipynb @@ -14,6 +14,28 @@ "This notebooks shows some preliminary work on detecting objects expressions with cvlib. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# if running on google colab\n", + "# flake8-noqa-cell\n", + "import os\n", + "\n", + "if \"google.colab\" in str(get_ipython()):\n", + " # update python version\n", + " # install setuptools\n", + " !pip install setuptools==61 -qqq\n", + " # install misinformation\n", + " !pip install git+https://github.com/ssciwr/misinformation.git -qqq\n", + " # mount google drive for data and API key\n", + " from google.colab import drive\n", + "\n", + " drive.mount(\"/content/drive\")" + ] + }, { "cell_type": "code", "execution_count": null, @@ -39,9 +61,11 @@ "metadata": {}, "outputs": [], "source": [ + "# Here you need to provide the path to your google drive folder\n", + "# or local folder containing the images\n", "images = mutils.find_files(\n", - " path=\"../data/images-little-text/\",\n", - " limit=1000,\n", + " path=\"/content/drive/MyDrive/misinformation-data/\",\n", + " limit=10,\n", ")" ] }, @@ -54,15 +78,6 @@ "mydict = mutils.initialize_dict(images)" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "mydict" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/requirements.txt b/requirements.txt deleted file mode 100644 index 6fe56b6..0000000 --- a/requirements.txt +++ /dev/null @@ -1,28 +0,0 @@ -google-cloud-vision -cvlib -deepface<=0.0.75 -ipywidgets -numpy<=1.23.4 -opencv_python -pandas -pooch -protobuf -retina_face -setuptools -tensorflow -keras -openpyxl -pytest -pytest-cov -matplotlib -opencv-contrib-python -googletrans==3.1.0a0 -spacy -https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.4.1/en_core_web_md-3.4.1.tar.gz -jupyterlab -spacytextblob -textblob -git+https://github.com/sloria/TextBlob.git@dev -salesforce-lavis -bertopic -grpcio