Merge branch 'main' into add_itm

Этот коммит содержится в:
Petr Andriushchenko 2023-04-02 14:32:18 +02:00 коммит произвёл GitHub
родитель 502683f420 3b1c3ef1ed
Коммит 0fd6962dd3
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
15 изменённых файлов: 302 добавлений и 182 удалений

Просмотреть файл

@ -11,9 +11,6 @@ COPY --chown=${NB_UID} . /opt/misinformation
# Install the Python package
RUN python -m pip install /opt/misinformation
# Install additional dependencies for running the notebooks
RUN python -m pip install -r /opt/misinformation/requirements.txt
# Make JupyterLab the default for this application
ENV JUPYTER_ENABLE_LAB=yes

Просмотреть файл

@ -44,15 +44,20 @@ This will install the package and its dependencies locally.
## Usage
There are sample notebooks in the `misinformation/notebooks` folder for you to explore the package:
1. Text analysis: Use the notebook `get-text-from-image.ipynb` to extract any text from the images. The text is directly translated into English. If the text should be further analysed, set the keyword `analyse_text` to `True` as demonstrated in the notebook.\
1. Text extraction: Use the notebook `get-text-from-image.ipynb` to extract any text from the images. The text is directly translated into English. If the text should be further analysed, set the keyword `analyse_text` to `True` as demonstrated in the notebook.\
**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/get-text-from-image.ipynb)**
Place the data files and google cloud vision API key in your google drive to access the data.
1. Facial analysis: Use the notebook `facial_expressions.ipynb` to identify if there are faces on the image, if they are wearing masks, and if they are not wearing masks also the race, gender and dominant emotion.
1. Emotion recognition: Use the notebook `facial_expressions.ipynb` to identify if there are faces on the image, if they are wearing masks, and if they are not wearing masks also the race, gender and dominant emotion.
**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/facial_expressions.ipynb)**
Place the data files in your google drive to access the data.**
Place the data files in your google drive to access the data.
1. Content extraction: Use the notebook `image_summary.ipynb` to create captions for the images and ask questions about the image content.
**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/image_summary.ipynb)**
1. Multimodal content: Use the notebook `multimodal_search.ipynb` to find the best fitting images to an image or text query.
**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/multimodal_search.ipynb)**
1. Object analysis: Use the notebook `ojects_expression.ipynb` to identify certain objects in the image. Currently, the following objects are being identified: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, cell phone.
**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/misinformation/blob/main/notebooks/objects_expression.ipynb)**
There are further notebooks that are currently of exploratory nature (`colors_expression.ipynb` to identify certain colors on the image).
There are further notebooks that are currently of exploratory nature (`colors_expression.ipynb` to identify certain colors on the image). To crop social media posts use the `cropposts.ipynb` notebook.
## Features
### Text extraction

Просмотреть файл

@ -83,7 +83,9 @@
"cell_type": "code",
"execution_count": null,
"id": "b37c0c91",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mydict = mutils.initialize_dict(images)"
@ -102,7 +104,9 @@
"cell_type": "code",
"execution_count": null,
"id": "992499ed-33f1-4425-ad5d-738cf565d175",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mdisplay.explore_analysis(mydict, identify=\"faces\")"
@ -120,7 +124,9 @@
"cell_type": "code",
"execution_count": null,
"id": "6f97c7d0",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"for key in mydict.keys():\n",
@ -139,7 +145,9 @@
"cell_type": "code",
"execution_count": null,
"id": "604bd257",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"outdict = mutils.append_data_to_dict(mydict)\n",
@ -158,7 +166,9 @@
"cell_type": "code",
"execution_count": null,
"id": "aa4b518a",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.head(10)"
@ -176,7 +186,9 @@
"cell_type": "code",
"execution_count": null,
"id": "4618decb",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.to_csv(\"data/data_out.csv\")"
@ -193,7 +205,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},

Просмотреть файл

@ -294,7 +294,9 @@
"cell_type": "code",
"execution_count": null,
"id": "e78646d6-80be-4d3e-8123-3360957bcaa8",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.head(10)"
@ -312,11 +314,21 @@
"cell_type": "code",
"execution_count": null,
"id": "185f7dde-20dc-44d8-9ab0-de41f9b5734d",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.to_csv(\"./data_out.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2ef1132f-eb2a-43d7-be1f-69e879490f33",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

Просмотреть файл

@ -72,7 +72,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mdisplay.explore_analysis(mydict, identify=\"objects\")"
@ -88,7 +90,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"for key in mydict:\n",
@ -105,7 +109,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"outdict = mutils.append_data_to_dict(mydict)\n",
@ -122,7 +128,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.head(10)"
@ -138,16 +146,25 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.to_csv(\"./data_out.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},

Просмотреть файл

@ -17,7 +17,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from misinformation import utils as mutils\n",
@ -35,7 +37,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"images = mutils.find_files(\n",
@ -47,7 +51,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mydict = mutils.initialize_dict(images)"
@ -70,18 +76,22 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"obj = sm.SummaryDetector(mydict)\n",
"summary_model, summary_vis_processors = obj.load_model(\"base\")\n",
"# summary_model, summary_vis_processors = mutils.load_model(\"large\")"
"# summary_model, summary_vis_processors = obj.load_model(\"large\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"for key in mydict:\n",
@ -121,7 +131,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.head(10)"
@ -137,7 +149,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.to_csv(\"./data_out.csv\")"
@ -159,7 +173,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mdisplay.explore_analysis(mydict, identify=\"summary\")"

Просмотреть файл

@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "dcaa3da1",
"metadata": {},
@ -14,7 +13,9 @@
"cell_type": "code",
"execution_count": null,
"id": "f43f327c",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# if running on google colab\n",
@ -37,22 +38,23 @@
"cell_type": "code",
"execution_count": null,
"id": "cf362e60",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"from IPython.display import Image, display\n",
"import misinformation\n",
"from misinformation import utils as mutils\n",
"from misinformation import display as mdisplay\n",
"import tensorflow as tf"
"from misinformation import display as mdisplay"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "27675810",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# download the models if they are not there yet\n",
@ -64,35 +66,27 @@
"cell_type": "code",
"execution_count": null,
"id": "6da3a7aa",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"images = mutils.find_files(path=\"data\", limit=10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf811ce0",
"metadata": {},
"outputs": [],
"source": [
"for i in images:\n",
" display(Image(filename=i))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b32409f",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mydict = mutils.initialize_dict(images)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7b8b929f",
"metadata": {},
@ -113,7 +107,9 @@
"cell_type": "code",
"execution_count": null,
"id": "7c6ecc88",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mdisplay.explore_analysis(mydict, identify=\"text-on-image\")"
@ -131,11 +127,12 @@
"cell_type": "code",
"execution_count": null,
"id": "365c78b1-7ff4-4213-86fa-6a0a2d05198f",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"for key in mydict:\n",
" print(key)\n",
" mydict[key] = misinformation.text.TextDetector(\n",
" mydict[key], analyse_text=True\n",
" ).analyse_image()"
@ -153,7 +150,9 @@
"cell_type": "code",
"execution_count": null,
"id": "5709c2cd",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"outdict = mutils.append_data_to_dict(mydict)\n",
@ -164,7 +163,9 @@
"cell_type": "code",
"execution_count": null,
"id": "c4f05637",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# check the dataframe\n",
@ -175,17 +176,27 @@
"cell_type": "code",
"execution_count": null,
"id": "bf6c9ddb",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Write the csv\n",
"df.to_csv(\"./data_out.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9012544e-f818-46ea-b087-3e150850a5d5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
@ -199,7 +210,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
"version": "3.9.16"
},
"vscode": {
"interpreter": {

Просмотреть файл

@ -122,12 +122,30 @@ def test_text_summary(get_path):
ref_file = get_path + "example_summary.txt"
with open(ref_file, "r", encoding="utf8") as file:
reference_text = file.read()
test_obj.subdict["text_english"] = reference_text
mydict["text_english"] = reference_text
test_obj.text_summary()
reference_summary = " Im sorry, but I dont want to be an emperor. Thats not my business. I should like to help everyone - if possible - Jew, Gentile - black man - white . We all want to help one another. In this world there is room for everyone. The way of life can be free and beautiful, but we have lost the way ."
assert mydict["summary_text"] == reference_summary
def test_text_sentiment_transformers():
mydict = {}
test_obj = tt.TextDetector(mydict, analyse_text=True)
mydict["text_english"] = "I am happy that the CI is working again."
test_obj.text_sentiment_transformers()
assert mydict["sentiment"] == "POSITIVE"
assert mydict["sentiment_score"] == pytest.approx(0.99, 0.01)
def test_text_ner():
mydict = {}
test_obj = tt.TextDetector(mydict, analyse_text=True)
mydict["text_english"] = "Bill Gates was born in Seattle."
test_obj.text_ner()
assert mydict["entity"] == ["Bill", "Gates", "Seattle"]
assert mydict["entity_type"] == ["I-PER", "I-PER", "I-LOC"]
def test_PostprocessText(set_testdict, get_path):
reference_dict = "THE\nALGEBRAIC\nEIGENVALUE\nPROBLEM\nDOM\nNVS TIO\nMINA\nMonographs\non Numerical Analysis\nJ.. H. WILKINSON"
reference_df = "Mathematische Formelsammlung\nfür Ingenieure und Naturwissenschaftler\nMit zahlreichen Abbildungen und Rechenbeispielen\nund einer ausführlichen Integraltafel\n3., verbesserte Auflage"

Просмотреть файл

@ -122,26 +122,57 @@ class TextDetector(utils.AnalysisMethod):
def text_summary(self):
# use the transformers pipeline to summarize the text
pipe = pipeline("summarization")
# use the current default model - 03/2023
model_name = "sshleifer/distilbart-cnn-12-6"
model_revision = "a4f8f3e"
pipe = pipeline("summarization", model=model_name, revision=model_revision)
self.subdict.update(pipe(self.subdict["text_english"])[0])
# def text_sentiment_transformers(self):
# pipe = pipeline("text-classification")
def text_sentiment_transformers(self):
# use the transformers pipeline for text classification
# use the current default model - 03/2023
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model_revision = "af0f99b"
pipe = pipeline(
"text-classification", model=model_name, revision=model_revision
)
result = pipe(self.subdict["text_english"])
self.subdict["sentiment"] = result[0]["label"]
self.subdict["sentiment_score"] = result[0]["score"]
def text_ner(self):
# use the transformers pipeline for named entity recognition
# use the current default model - 03/2023
model_name = "dbmdz/bert-large-cased-finetuned-conll03-english"
model_revision = "f2482bf"
pipe = pipeline(
"token-classification", model=model_name, revision=model_revision
)
result = pipe(self.subdict["text_english"])
self.subdict["entity"] = []
self.subdict["entity_type"] = []
for entity in result:
self.subdict["entity"].append(entity["word"])
self.subdict["entity_type"].append(entity["entity"])
class PostprocessText:
def __init__(
self, mydict: dict = None, use_csv: bool = False, csv_path: str = None
self,
mydict: dict = None,
use_csv: bool = False,
csv_path: str = None,
analyze_text: str = "text_english",
) -> None:
self.use_csv = use_csv
if mydict:
print("Reading data from dict.")
self.mydict = mydict
self.list_text_english = self.get_text_dict()
self.list_text_english = self.get_text_dict(analyze_text)
elif self.use_csv:
print("Reading data from df.")
self.df = pd.read_csv(csv_path, encoding="utf8")
self.list_text_english = self.get_text_df()
self.list_text_english = self.get_text_df(analyze_text)
else:
raise ValueError(
"Please provide either dictionary with textual data or \
@ -177,24 +208,28 @@ class PostprocessText:
most_frequent_topics.append(self.topic_model.get_topic(i))
return self.topic_model, topic_df, most_frequent_topics
def get_text_dict(self):
# use dict to put text_english in list
def get_text_dict(self, analyze_text):
# use dict to put text_english or text_summary in list
list_text_english = []
for key in self.mydict.keys():
if "text_english" not in self.mydict[key]:
if analyze_text not in self.mydict[key]:
raise ValueError(
"Please check your provided dictionary - \
no english text data found."
no {} text data found.".format(
analyze_text
)
)
list_text_english.append(self.mydict[key]["text_english"])
list_text_english.append(self.mydict[key][analyze_text])
return list_text_english
def get_text_df(self):
# use csv file to obtain dataframe and put text_english in list
# check that "text_english" is there
if "text_english" not in self.df:
def get_text_df(self, analyze_text):
# use csv file to obtain dataframe and put text_english or text_summary in list
# check that "text_english" or "text_summary" is there
if analyze_text not in self.df:
raise ValueError(
"Please check your provided dataframe - \
no english text data found."
no {} text data found.".format(
analyze_text
)
)
return self.df["text_english"].tolist()
return self.df[analyze_text].tolist()

8
notebooks/facial_expressions.ipynb сгенерированный
Просмотреть файл

@ -66,9 +66,11 @@
"metadata": {},
"outputs": [],
"source": [
"# Here you need to provide the path to your google drive folder\n",
"# or local folder containing the images\n",
"images = mutils.find_files(\n",
" path=\"drive/MyDrive/misinformation-data/\",\n",
" limit=1000,\n",
" path=\"/content/drive/MyDrive/misinformation-data/\",\n",
" limit=10,\n",
")"
]
},
@ -105,7 +107,7 @@
"metadata": {},
"outputs": [],
"source": [
"mydict = mutils.initialize_dict(images[0:4])"
"mydict = mutils.initialize_dict(images)"
]
},
{

40
notebooks/get-text-from-image.ipynb сгенерированный
Просмотреть файл

@ -40,13 +40,9 @@
"outputs": [],
"source": [
"import os\n",
"from IPython.display import Image, display\n",
"import misinformation\n",
"from misinformation import utils as mutils\n",
"from misinformation import display as mdisplay\n",
"import tensorflow as tf\n",
"\n",
"print(tf.config.list_physical_devices(\"GPU\"))"
"from misinformation import display as mdisplay"
]
},
{
@ -56,30 +52,12 @@
"metadata": {},
"outputs": [],
"source": [
"# download the models if they are not there yet\n",
"!python -m spacy download en_core_web_md\n",
"!python -m textblob.download_corpora"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6da3a7aa",
"metadata": {},
"outputs": [],
"source": [
"images = mutils.find_files(path=\"../data/all/\", limit=1000)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf811ce0",
"metadata": {},
"outputs": [],
"source": [
"for i in images[0:3]:\n",
" display(Image(filename=i))"
"# Here you need to provide the path to your google drive folder\n",
"# or local folder containing the images\n",
"images = mutils.find_files(\n",
" path=\"/content/drive/MyDrive/misinformation-data/\",\n",
" limit=10,\n",
")"
]
},
{
@ -89,7 +67,7 @@
"metadata": {},
"outputs": [],
"source": [
"mydict = mutils.initialize_dict(images[0:3])"
"mydict = mutils.initialize_dict(images)"
]
},
{
@ -110,7 +88,7 @@
"source": [
"os.environ[\n",
" \"GOOGLE_APPLICATION_CREDENTIALS\"\n",
"] = \"../data/misinformation-campaign-981aa55a3b13.json\""
"] = \"/content/drive/MyDrive/misinformation-data/misinformation-campaign-981aa55a3b13.json\""
]
},
{

41
notebooks/image_summary.ipynb сгенерированный
Просмотреть файл

@ -14,6 +14,28 @@
"This notebooks shows some preliminary work on Image Captioning and Visual question answering with lavis. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# if running on google colab\n",
"# flake8-noqa-cell\n",
"import os\n",
"\n",
"if \"google.colab\" in str(get_ipython()):\n",
" # update python version\n",
" # install setuptools\n",
" !pip install setuptools==61 -qqq\n",
" # install misinformation\n",
" !pip install git+https://github.com/ssciwr/misinformation.git -qqq\n",
" # mount google drive for data and API key\n",
" from google.colab import drive\n",
"\n",
" drive.mount(\"/content/drive\")"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -43,9 +65,11 @@
},
"outputs": [],
"source": [
"# Here you need to provide the path to your google drive folder\n",
"# or local folder containing the images\n",
"images = mutils.find_files(\n",
" path=\"../misinformation/test/data/\",\n",
" limit=1000,\n",
" path=\"/content/drive/MyDrive/misinformation-data/\",\n",
" limit=10,\n",
")"
]
},
@ -57,18 +81,7 @@
},
"outputs": [],
"source": [
"mydict = mutils.initialize_dict(images[0:10])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mydict"
"mydict = mutils.initialize_dict(images)"
]
},
{

49
notebooks/multimodal_search.ipynb сгенерированный
Просмотреть файл

@ -16,6 +16,29 @@
"This notebooks shows some preliminary work on Image Multimodal Search with lavis library. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b0a6bdf",
"metadata": {},
"outputs": [],
"source": [
"# if running on google colab\n",
"# flake8-noqa-cell\n",
"import os\n",
"\n",
"if \"google.colab\" in str(get_ipython()):\n",
" # update python version\n",
" # install setuptools\n",
" !pip install setuptools==61 -qqq\n",
" # install misinformation\n",
" !pip install git+https://github.com/ssciwr/misinformation.git -qqq\n",
" # mount google drive for data and API key\n",
" from google.colab import drive\n",
"\n",
" drive.mount(\"/content/drive\")"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -25,7 +48,7 @@
},
"outputs": [],
"source": [
"import misinformation\n",
"import misinformation.utils as mutils\n",
"import misinformation.multimodal_search as ms"
]
},
@ -46,8 +69,12 @@
},
"outputs": [],
"source": [
"images = misinformation.utils.find_files(\n",
" path=\"../data/images/\",\n",
"# Here you need to provide the path to your google drive folder\n",
"# or local folder containing the images\n",
"images = mutils.find_files(\n",
" path=\"/content/drive/MyDrive/misinformation-data/\",\n",
" limit=10,\n",
")"
]
@ -61,19 +88,9 @@
},
"outputs": [],
"source": [
"mydict = misinformation.utils.initialize_dict(images)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c66aec87-ede7-4985-912e-3ca29245ebf2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"mydict"
"mydict = mutils.initialize_dict(images)"
]
},
{

37
notebooks/objects_expression.ipynb сгенерированный
Просмотреть файл

@ -14,6 +14,28 @@
"This notebooks shows some preliminary work on detecting objects expressions with cvlib. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# if running on google colab\n",
"# flake8-noqa-cell\n",
"import os\n",
"\n",
"if \"google.colab\" in str(get_ipython()):\n",
" # update python version\n",
" # install setuptools\n",
" !pip install setuptools==61 -qqq\n",
" # install misinformation\n",
" !pip install git+https://github.com/ssciwr/misinformation.git -qqq\n",
" # mount google drive for data and API key\n",
" from google.colab import drive\n",
"\n",
" drive.mount(\"/content/drive\")"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -39,9 +61,11 @@
"metadata": {},
"outputs": [],
"source": [
"# Here you need to provide the path to your google drive folder\n",
"# or local folder containing the images\n",
"images = mutils.find_files(\n",
" path=\"../data/images-little-text/\",\n",
" limit=1000,\n",
" path=\"/content/drive/MyDrive/misinformation-data/\",\n",
" limit=10,\n",
")"
]
},
@ -54,15 +78,6 @@
"mydict = mutils.initialize_dict(images)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mydict"
]
},
{
"cell_type": "markdown",
"metadata": {},

Просмотреть файл

@ -1,28 +0,0 @@
google-cloud-vision
cvlib
deepface<=0.0.75
ipywidgets
numpy<=1.23.4
opencv_python
pandas
pooch
protobuf
retina_face
setuptools
tensorflow
keras
openpyxl
pytest
pytest-cov
matplotlib
opencv-contrib-python
googletrans==3.1.0a0
spacy
https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.4.1/en_core_web_md-3.4.1.tar.gz
jupyterlab
spacytextblob
textblob
git+https://github.com/sloria/TextBlob.git@dev
salesforce-lavis
bertopic
grpcio