* colors expression by KMean algorithm

* object detection by imageai

* object detection by cvlib

* add encapsulation of object detection

* remove encapsulation of objdetect v0

* objects expression to dict

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added imageai to requirements

* add objects to dictionary

* update for AnalysisMethod baseline

* add objects dection support explore_analysis display

* extend python version of misinf to allow imageai

* account for older python

* use global functionality for dict to csv convert

* update for docker build

* docker will build now but ipywidgets still not working

* test code

* include test data folder in repo

* add some sample images

* load cvs labels to dict

* add test data

* retrigger checks

* add map to human coding

* get orders from dict, missing dep

* add module to test accuracy

* retrigger checks

* retrigger checks

* now removing imageai

* removed imageai

* move labelmanager to analyse

* multiple faces in mydict

* fix pre-commit issues

* map mydict

* hide imageai

* objects default using cvlib, isolate and disable imageai

* correct python version

* refactor faces tests

* refactor objects tests

* sonarcloud issues

* refactor utils tests

* address code smells

* update readme

* update notebook without imageai

Co-authored-by: Ma Xianghe <825074348@qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: iulusoy <inga.ulusoy@uni-heidelberg.de>
Этот коммит содержится в:
xiaohemaikoo 2022-10-04 11:34:44 +02:00 коммит произвёл GitHub
родитель f5d24a1b1d
Коммит fdcb228294
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
45 изменённых файлов: 1504 добавлений и 140 удалений

2
.gitignore поставляемый
Просмотреть файл

@ -1,5 +1,3 @@
data
# Byte-compiled / optimized / DLL files # Byte-compiled / optimized / DLL files
__pycache__/ __pycache__/
*.py[cod] *.py[cod]

Просмотреть файл

@ -1,6 +1,6 @@
repos: repos:
- repo: https://github.com/kynan/nbstripout - repo: https://github.com/kynan/nbstripout
rev: 0.6.0 rev: 0.6.1
hooks: hooks:
- id: nbstripout - id: nbstripout
files: ".ipynb" files: ".ipynb"
@ -17,6 +17,6 @@ repos:
hooks: hooks:
- id: flake8 - id: flake8
- repo: https://github.com/s-weigand/flake8-nb - repo: https://github.com/s-weigand/flake8-nb
rev: v0.5.2 rev: v0.5.0
hooks: hooks:
- id: flake8-nb - id: flake8-nb

Просмотреть файл

@ -1,8 +1,8 @@
FROM jupyter/base-notebook:2022-06-06 FROM jupyter/base-notebook
# Install system dependencies for computer vision packages # Install system dependencies for computer vision packages
USER root USER root
RUN apt update && apt install -y build-essential libgl1 libglib2.0-0 libsm6 libxrender1 libxext6 tesseract-ocr RUN apt update && apt install -y build-essential libgl1 libglib2.0-0 libsm6 libxrender1 libxext6
USER $NB_USER USER $NB_USER
# Copy the repository into the container # Copy the repository into the container
@ -24,6 +24,9 @@ ENV XDG_DATA_HOME=/opt/misinformation/data
RUN rm -rf $HOME/work RUN rm -rf $HOME/work
RUN cp /opt/misinformation/notebooks/*.ipynb $HOME RUN cp /opt/misinformation/notebooks/*.ipynb $HOME
ARG GOOGLE_CREDS
ENV GOOGLE_APPLICATION_CREDENTIALS=credentials.json
RUN echo ${GOOGLE_CREDS} > $GOOGLE_APPLICATION_CREDENTIALS
# Bundle the pre-built models (that are downloaded on demand) into the # Bundle the pre-built models (that are downloaded on demand) into the
# Docker image. # Docker image.
RUN misinformation_prefetch_models RUN misinformation_prefetch_models

Просмотреть файл

@ -13,3 +13,19 @@ Use the pre-processed social media posts (image files) and process to collect in
This development will serve the fight to combat misinformation, by providing more comprehensive data about its content and techniques. This development will serve the fight to combat misinformation, by providing more comprehensive data about its content and techniques.
The ultimate goal of this project is to develop a computer-assisted toolset to investigate the content of disinformation campaigns worldwide. The ultimate goal of this project is to develop a computer-assisted toolset to investigate the content of disinformation campaigns worldwide.
# Installation
The `misinformation` package can be installed using pip: Navigate into your package folder `misinformation/` and execute
```
pip install .
```
This will install the package and its dependencies locally.
# Usage
There are sample notebooks in the `misinformation/notebooks` folder for you to explore the package usage:
1. Facial analysis: Use the notebook `facial_expressions.ipynb` to identify if there are faces on the image, if they are wearing masks, and if they are not wearing masks also the race, gender and dominant emotion.
1. Object analysis: Use the notebook `ojects_expression.ipynb` to identify certain objects in the image. Currently, the following objects are being identified: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, cell phone.
There are further notebooks that are currently of exploratory nature (`colors_expression` to identify certain colors on the image, `get-text-from-image` to extract text that is contained in an image.)

Просмотреть файл

@ -1,4 +1,8 @@
from importlib import metadata try:
from importlib import metadata
except ImportError:
# Running on pre-3.8 Python; use importlib-metadata package
import importlib_metadata as metadata # type: ignore
# Export the version defined in project metadata # Export the version defined in project metadata
@ -6,4 +10,9 @@ __version__ = metadata.version(__package__)
del metadata del metadata
from misinformation.display import explore_analysis from misinformation.display import explore_analysis
from misinformation.utils import find_files from misinformation.utils import (
find_files,
initialize_dict,
append_data_to_dict,
dump_df,
)

109
misinformation/accuracy.py Обычный файл
Просмотреть файл

@ -0,0 +1,109 @@
import pandas as pd
import json
from misinformation import utils
from misinformation import faces
class LabelManager:
def __init__(self):
self.labels_code = None
self.labels = None
self.f_labels = None
self.f_labels_code = None
self.load()
def load(self):
self.labels_code = pd.read_excel(
"./misinformation/test/data/EUROPE_APRMAY20_data_variable_labels_coding.xlsx",
sheet_name="variable_labels_codings",
)
self.labels = pd.read_csv(
"./misinformation/test/data/Europe_APRMAY20data190722.csv",
sep=",",
decimal=".",
)
self.map = self.read_json("./misinformation/data/map_test_set.json")
def read_json(self, name):
with open("{}".format(name)) as f:
mydict = json.load(f)
return mydict
def get_orders(self):
return [i["order"] for i in self.map.values()]
def filter_from_order(self, orders: list):
cols = []
for order in orders:
col = self.labels_code.iloc[order - 1, 1]
cols.append(col.lower())
self.f_labels_code = self.labels_code.loc[
self.labels_code["order"].isin(orders)
]
self.f_labels = self.labels[cols]
def gen_dict(self):
labels_dict = {}
if self.f_labels is None:
print("No filtered labels found")
return labels_dict
cols = self.f_labels.columns.tolist()
for index, row in self.f_labels.iterrows():
row_dict = {}
for col in cols:
row_dict[col] = row[col]
labels_dict[row["pic_id"]] = row_dict
return labels_dict
def map_dict(self, mydict):
mapped_dict = {}
for id, subdict in mydict.items():
mapped_subdict = {}
mapped_subdict["id"] = id[0:-2]
mapped_subdict["pic_order"] = id[-1] if id[-2] == "0" else id[-2::]
mapped_subdict["pic_id"] = id
for key in self.map.keys():
# get the key name
mydict_name = self.map[key]["variable_mydict"]
mydict_value = self.map[key]["value_mydict"]
# find out which value was set
mydict_current = subdict[mydict_name]
# now map to new key-value pair
mapped_subdict[key] = 1 if mydict_current == mydict_value else 0
# substitute the values that are not boolean
if self.map[key]["variable_coding"] != "Bool":
mapped_subdict[key] = mydict_current
mapped_dict[id] = mapped_subdict
return mapped_dict
if __name__ == "__main__":
files = utils.find_files(
path="/home/inga/projects/misinformation-project/misinformation/misinformation/test/data/Europe APRMAY20 visual data/cropped images"
)
mydict = utils.initialize_dict(files)
# analyze faces
image_ids = [key for key in mydict.keys()]
for i in image_ids:
mydict[i] = faces.EmotionDetector(mydict[i]).analyse_image()
outdict = utils.append_data_to_dict(mydict)
df = utils.dump_df(outdict)
# print(df.head(10))
df.to_csv("mydict_out.csv")
# example of LabelManager for loading csv data to dict
lm = LabelManager()
# get the desired label numbers automatically
orders = lm.get_orders()
# map mydict to the specified variable names and values
mydict_map = lm.map_dict(mydict)
print(mydict_map)
lm.filter_from_order([1, 2, 3] + orders)
labels = lm.gen_dict()
print(labels)

127
misinformation/data/map_test_set.json Обычный файл
Просмотреть файл

@ -0,0 +1,127 @@
{
"V9_4": {
"order": 169,
"variable_label": "4=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Person visible",
"variable_coding": "Bool",
"variable_comment": "Yes if there's someone shown",
"variable_mydict": "face",
"value_mydict": "Yes"
},
"V9_5a": {
"order": 170,
"variable_label": "5a=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "More than one person shown",
"variable_coding": "Bool",
"variable_comment": "Yes if there are several individuals who appear in the post (do not count profile pictures)",
"variable_mydict": "multiple_faces",
"value_mydict": "Yes"
},
"V9_5b": {
"order": 171,
"variable_label": "5b=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "How many people shown?",
"variable_coding": "Int",
"variable_comment": "If more than 15, put 99",
"variable_mydict": "no_faces",
"value_mydict": "0"
},
"V9_6": {
"order": 172,
"variable_label": "6=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Face fully visible",
"variable_coding": "Bool",
"variable_comment": "Yes if you can see all their face (no mask on)",
"variable_mydict": "wears_mask",
"value_mydict": "No"
},
"V9_7": {
"order": 173,
"variable_label": "7=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Face ONLY partially visible",
"variable_coding": "Bool",
"variable_comment": "Yes if you can only see part of their face, including when they are wearing a mask",
"variable_mydict": "wears_mask",
"value_mydict": "Yes"
},
"V9_8": {
"order": 174,
"variable_label": "8=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Facial positive expression",
"variable_coding": "Bool",
"variable_comment": "Yes if they display some kind of positive facial expression (smiling, happy, relieved, hopeful etc.)",
"variable_mydict": "emotion (category)",
"value_mydict": "Positive"
},
"V9_8a": {
"order": 175,
"variable_label": "8a=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Positive expression: happiness",
"variable_coding": "Bool",
"variable_comment": "Yes if they display happiness",
"variable_mydict": "emotion",
"value_mydict": "happy"
},
"V9_9": {
"order": 176,
"variable_label": "9=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Facial negative expression",
"variable_coding": "Bool",
"variable_comment": "Yes if they display some kind of negative facial expression (crying, showing ager, fear, disgust etc.)",
"variable_mydict": "emotion (category)",
"value_mydict": "Negative"
},
"V9_10": {
"order": 177,
"variable_label": "10=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Negative expression: anxiety",
"variable_coding": "Bool",
"variable_comment": "Yes if they show fear or anxiety. If you can't tell, choose No=0",
"variable_mydict": "emotion",
"value_mydict": "fear"
},
"V9_11": {
"order": 178,
"variable_label": "11=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Negative expression: anger",
"variable_coding": "Bool",
"variable_comment": "Yes if they show anger or outrage. If you can't tell, choose No=0",
"variable_mydict": "emotion",
"value_mydict": "angry"
},
"V9_12": {
"order": 179,
"variable_label": "12=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Negative expression: disgust",
"variable_coding": "Bool",
"variable_comment": "Yes if they show disgust. If you can't tell, choose No=0",
"variable_mydict": "emotion",
"value_mydict": "disgust"
},
"V9_13": {
"order": 180,
"variable_label": "13=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Negative expression: other, specify",
"variable_coding": "Bool",
"variable_comment": "Yes if they show any other negative emotion, please specify. If you can't tell, choose No=0",
"variable_mydict": "emotion",
"value_mydict": "sad"
},
"V9_13_text": {
"order": 181,
"variable_label": "13=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Negative expression: other, specify",
"variable_coding": "Str",
"variable_mydict": "emotion",
"value_mydict": ""
},
"V11_3": {
"order": 189,
"variable_label": "111_3=PICTURE_SPECIFIC_VisualONLY",
"variable_explanation": "Respect of the rules",
"variable_coding": "Bool",
"variable_comment": "Yes if the post shows mask wearing, vaccine taking, social distancing, any proof of respecting the rules",
"variable_mydict": "wears_mask",
"value_mydict": "Yes"
}
}

Просмотреть файл

@ -3,6 +3,7 @@ from IPython.display import display
import misinformation.faces as faces import misinformation.faces as faces
import misinformation.text as text import misinformation.text as text
import misinformation.objects as objects
class JSONContainer: class JSONContainer:
@ -22,6 +23,7 @@ def explore_analysis(mydict, identify="faces"):
identify_dict = { identify_dict = {
"faces": faces.EmotionDetector, "faces": faces.EmotionDetector,
"text-on-image": text.TextDetector, "text-on-image": text.TextDetector,
"objects": objects.ObjectDetector,
} }
# create a list containing the image ids for the widget # create a list containing the image ids for the widget
# image_paths = [mydict[key]["filename"] for key in mydict.keys()] # image_paths = [mydict[key]["filename"] for key in mydict.keys()]

Просмотреть файл

@ -86,6 +86,8 @@ class EmotionDetector(utils.AnalysisMethod):
def set_keys(self) -> dict: def set_keys(self) -> dict:
params = { params = {
"face": "No", "face": "No",
"multiple_faces": "No",
"no_faces": 0,
"wears_mask": ["No"], "wears_mask": ["No"],
"age": [None], "age": [None],
"gender": [None], "gender": [None],
@ -145,7 +147,9 @@ class EmotionDetector(utils.AnalysisMethod):
# Sort the faces by sight to prioritize prominent faces # Sort the faces by sight to prioritize prominent faces
faces = list(reversed(sorted(faces, key=lambda f: f.shape[0] * f.shape[1]))) faces = list(reversed(sorted(faces, key=lambda f: f.shape[0] * f.shape[1])))
self.subdict["face"] = "yes" self.subdict["face"] = "Yes"
self.subdict["multiple_faces"] = "Yes" if len(faces) > 1 else "No"
self.subdict["no_faces"] = len(faces) if len(faces) <= 15 else 99
# note number of faces being identified # note number of faces being identified
result = {"number_faces": len(faces) if len(faces) <= 3 else 3} result = {"number_faces": len(faces) if len(faces) <= 3 else 3}
# We limit ourselves to three faces # We limit ourselves to three faces

52
misinformation/objects.py Обычный файл
Просмотреть файл

@ -0,0 +1,52 @@
from misinformation.utils import AnalysisMethod
from misinformation.objects_cvlib import ObjectCVLib
from misinformation.objects_cvlib import init_default_objects
# from misinformation.objects_imageai import ObjectImageAI
class ObjectDetectorClient(AnalysisMethod):
def __init__(self):
# The detector is default to CVLib
self.detector = ObjectCVLib()
def set_client_to_imageai(self):
# disable imageai temporarily
# self.detector = ObjectImageAI()
# maybe reactivate if new imageai release comes out
pass
def set_client_to_cvlib(self):
self.detector = ObjectCVLib()
def analyse_image(self, subdict=None):
"""Localize objects in the local image.
Args:
subdict: The dictionary for an image expression instance.
"""
return self.detector.analyse_image(subdict)
class ObjectDetector(AnalysisMethod):
od_client = ObjectDetectorClient()
def __init__(self, subdict: dict):
super().__init__(subdict)
self.subdict.update(self.set_keys())
def set_keys(self):
return init_default_objects()
def analyse_image(self):
self.subdict = ObjectDetector.od_client.analyse_image(self.subdict)
return self.subdict
@staticmethod
def set_client_to_cvlib():
ObjectDetector.od_client.set_client_to_cvlib()
@staticmethod
def set_client_to_imageai():
ObjectDetector.od_client.set_client_to_imageai()

77
misinformation/objects_cvlib.py Обычный файл
Просмотреть файл

@ -0,0 +1,77 @@
import cv2
import cvlib as cv
def objects_from_cvlib(objects_list: list) -> dict:
objects = init_default_objects()
for key in objects:
if key in objects_list:
objects[key] = "yes"
return objects
def init_default_objects():
objects = {
"person": "no",
"bicycle": "no",
"car": "no",
"motorcycle": "no",
"airplane": "no",
"bus": "no",
"train": "no",
"truck": "no",
"boat": "no",
"traffic light": "no",
"cell phone": "no",
}
return objects
class ObjectsMethod:
"""Base class to be inherited by all objects methods."""
def __init__(self):
# initialize in child class
pass
def analyse_image(self, subdict):
raise NotImplementedError()
class ObjectCVLib(ObjectsMethod):
def __init__(self, client_type=1):
# as long as imageai is not activated this remains empty
pass
def detect_objects_cvlib(self, image_path):
"""Localize objects in the local image.
Args:
image_path: The path to the local file.
"""
img = cv2.imread(image_path)
bbox, label, conf = cv.detect_common_objects(img)
# output_image = draw_bbox(im, bbox, label, conf)
objects = objects_from_cvlib(label)
return objects
def analyse_image_from_file(self, image_path):
"""Localize objects in the local image.
Args:
image_path: The path to the local file.
"""
objects = self.detect_objects_cvlib(image_path)
return objects
def analyse_image(self, subdict):
"""Localize objects in the local image.
Args:
subdict: The dictionary for an image expression instance.
"""
objects = self.analyse_image_from_file(subdict["filename"])
for key in objects:
subdict[key] = objects[key]
return subdict

114
misinformation/objects_imageai.py Обычный файл
Просмотреть файл

@ -0,0 +1,114 @@
from misinformation.utils import DownloadResource
from misinformation.objects_cvlib import ObjectsMethod
from misinformation.objects_cvlib import init_default_objects
from imageai.Detection import ObjectDetection
import cv2
import os
import pathlib
def objects_from_imageai(detections: list) -> dict:
objects = init_default_objects()
for obj in detections:
obj_name = obj["name"]
objects[obj_name] = "yes"
return objects
def objects_symlink_processor(name):
def _processor(fname, action, pooch):
if not os.path.exists(os.path.dirname(name)):
os.makedirs(os.path.dirname(name))
if not os.path.exists(name):
os.symlink(fname, name)
return fname
return _processor
pre_model_path = pathlib.Path.home().joinpath(
".misinformation", "objects", "resnet50_coco_best_v2.1.0.h5"
)
retina_objects_model = DownloadResource(
url="https://github.com/OlafenwaMoses/ImageAI/releases/download/essentials-v5/resnet50_coco_best_v2.1.0.h5/",
known_hash="sha256:6518ad56a0cca4d1bd8cbba268dd4e299c7633efe7d15902d5acbb0ba180027c",
processor=objects_symlink_processor(pre_model_path),
)
class ObjectImageAI(ObjectsMethod):
def __init__(self):
# init imageai client
retina_objects_model.get()
if not os.path.exists(pre_model_path):
print("Download retina objects model failed.")
return
self.imgai_client = ObjectDetection()
self.imgai_client.setModelTypeAsRetinaNet()
self.imgai_client.setModelPath(pre_model_path)
self.imgai_client.loadModel()
self.custom = self.imgai_client.CustomObjects(
person=True,
bicycle=True,
car=True,
motorcycle=True,
airplane=True,
bus=True,
train=True,
truck=True,
boat=True,
traffic_light=True,
cell_phone=True,
)
def detect_objects_imageai(self, image_path, custom=True, min_prob=30):
"""Localize objects in the local image.
Args:
image_path: The path to the local file.
custom: If only detect user defined specific objects.
min_prob: Minimum probability that we trust as objects.
"""
img = cv2.imread(image_path)
if custom:
box_img, detections = self.imgai_client.detectCustomObjectsFromImage(
custom_objects=self.custom,
input_type="array",
input_image=img,
output_type="array",
minimum_percentage_probability=min_prob,
)
else:
box_img, detections = self.imgai_client.detectObjectsFromImage(
input_type="array",
input_image=img,
output_type="array",
minimum_percentage_probability=min_prob,
)
objects = objects_from_imageai(detections)
return objects
def analyse_image_from_file(self, image_path):
"""Localize objects in the local image.
Args:
image_path: The path to the local file.
"""
objects = self.detect_objects_imageai(image_path)
return objects
def analyse_image(self, subdict):
"""Localize objects in the local image.
Args:
subdict: The dictionary for an image expression instance.
"""
objects = self.analyse_image_from_file(subdict["filename"])
for key in objects:
subdict[key] = objects[key]
return subdict

Двоичные данные
misinformation/test/data/IMG_1730.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 10 MiB

Двоичные данные
misinformation/test/data/IMG_2746.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1005 KiB

Двоичные данные
misinformation/test/data/IMG_2750.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 801 KiB

Двоичные данные
misinformation/test/data/IMG_2805.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 758 KiB

Двоичные данные
misinformation/test/data/IMG_2806.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 788 KiB

Двоичные данные
misinformation/test/data/IMG_2807.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.4 MiB

Двоичные данные
misinformation/test/data/IMG_2808.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.3 MiB

Двоичные данные
misinformation/test/data/IMG_2809.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.2 MiB

Двоичные данные
misinformation/test/data/d755771b-225e-432f-802e-fb8dc850fff7.png Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 1.3 MiB

Просмотреть файл

@ -0,0 +1,37 @@
{"image01":
{
"filename": "./data/image01.jpg",
"person": "yes",
"bicycle": "no",
"car": "no",
"motorcycle": "no",
"airplane": "no",
"bus": "no",
"train": "no",
"truck": "no",
"boat": "no",
"traffic light": "no",
"cell phone": "yes",
"gender": "male",
"wears_mask": "no",
"race": "asian"
},
"image02":
{
"filename": "./data/image02.jpg",
"person": "no",
"bicycle": "no",
"car": "yes",
"motorcycle": "no",
"airplane": "no",
"bus": "yes",
"train": "no",
"truck": "yes",
"boat": "no",
"traffic light": "yes",
"cell phone": "no",
"gender": "male",
"wears_mask": "no",
"race": "asian"
}
}

Просмотреть файл

@ -0,0 +1,17 @@
{
"filename": ["./data/image01.jpg", "./data/image02.jpg"],
"person": ["yes", "no"],
"bicycle": ["no", "no"],
"car": ["no", "yes"],
"motorcycle": ["no", "no"],
"airplane": ["no", "no"],
"bus": ["no", "yes"],
"train": ["no", "no"],
"truck": ["no", "yes"],
"boat": ["no", "no"],
"traffic light": ["no", "yes"],
"cell phone": ["yes", "no"],
"gender": ["male", "male"],
"wears_mask": ["no", "no"],
"race": ["asian", "asian"]
}

Просмотреть файл

@ -0,0 +1,3 @@
,filename,person,bicycle,car,motorcycle,airplane,bus,train,truck,boat,traffic light,cell phone,gender,wears_mask,race
0,./data/image01.jpg,yes,no,no,no,no,no,no,no,no,no,yes,male,no,asian
1,./data/image02.jpg,no,no,yes,no,no,yes,no,yes,no,yes,no,male,no,asian
1 filename person bicycle car motorcycle airplane bus train truck boat traffic light cell phone gender wears_mask race
2 0 ./data/image01.jpg yes no no no no no no no no no yes male no asian
3 1 ./data/image02.jpg no no yes no no yes no yes no yes no male no asian

12
misinformation/test/data/example_faces.json Обычный файл
Просмотреть файл

@ -0,0 +1,12 @@
{
"filename": "./test/data/IMG_2746.png",
"face": "Yes",
"multiple_faces": "Yes",
"no_faces": 11,
"wears_mask": ["No", "No", "Yes"],
"age": [36, 35, 33],
"gender": ["Man", "Man", "Man"],
"race": ["white", "white", null],
"emotion": [["sad", 73.24264486090212], ["fear", 84.20093247879356], null],
"emotion (category)": ["Negative", "Negative", null]
}

Просмотреть файл

@ -0,0 +1,14 @@
{
"filename": "./test/data/IMG_2809.png",
"person": "yes",
"bicycle": "no",
"car": "yes",
"motorcycle": "no",
"airplane": "no",
"bus": "yes",
"train": "no",
"truck": "no",
"boat": "no",
"traffic light": "no",
"cell phone": "no"
}

Просмотреть файл

@ -0,0 +1 @@
{'image_objects': {'filename': './misinformation/test/data/IMG_2809.png', 'person': 'yes', 'bicycle': 'yes', 'car': 'yes', 'motorcycle': 'no', 'airplane': 'no', 'bus': 'yes', 'train': 'no', 'truck': 'no', 'boat': 'no', 'traffic light': 'no', 'cell phone': 'no'}}

Просмотреть файл

@ -0,0 +1,6 @@
{
"image_faces": {
"filename": "/test/data/image_faces.jpg"},
"image_objects":
{"filename": "/test/data/image_objects.jpg"}
}

25
misinformation/test/test_faces.py Обычный файл
Просмотреть файл

@ -0,0 +1,25 @@
import misinformation.faces as fc
import json
def test_analyse_faces():
mydict = {
"filename": "./test/data/IMG_2746.png",
}
mydict = fc.EmotionDetector(mydict).analyse_image()
print(mydict)
with open("./test/data/example_faces.json", "r") as file:
out_dict = json.load(file)
for key in mydict.keys():
if key != "emotion":
assert mydict[key] == out_dict[key]
# json can't handle tuples natively
for i in range(0, len(mydict["emotion"])):
temp = (
list(mydict["emotion"][i])
if type(mydict["emotion"][i]) == tuple
else mydict["emotion"][i]
)
assert temp == out_dict["emotion"][i]

31
misinformation/test/test_objects.py Обычный файл
Просмотреть файл

@ -0,0 +1,31 @@
import json
import pytest
import misinformation
import misinformation.objects as ob
import misinformation.objects_cvlib as ob_cvlib
@pytest.fixture()
def default_objects():
return ob.init_default_objects()
def test_objects_from_cvlib(default_objects):
objects_list = ["cell phone", "motorcycle", "traffic light"]
objects = ob_cvlib.objects_from_cvlib(objects_list)
out_objects = default_objects
for obj in objects_list:
out_objects[obj] = "yes"
assert str(objects) == str(out_objects)
def test_analyse_image_cvlib():
mydict = {"filename": "./test/data/IMG_2809.png"}
ob_cvlib.ObjectCVLib().analyse_image(mydict)
with open("./test/data/example_objects_cvlib.json", "r") as file:
out_dict = json.load(file)
for key in mydict.keys():
print(key)
assert mydict[key] == out_dict[key]

40
misinformation/test/test_utils.py Обычный файл
Просмотреть файл

@ -0,0 +1,40 @@
import json
import pandas as pd
import misinformation.utils as ut
def test_find_files():
result = ut.find_files(
path="./test/data/", pattern="*.png", recursive=True, limit=10
)
assert len(result) > 0
def test_initialize_dict():
result = [
"/test/data/image_faces.jpg",
"/test/data/image_objects.jpg",
]
mydict = ut.initialize_dict(result)
with open("./test/data/example_utils_init_dict.json", "r") as file:
out_dict = json.load(file)
assert mydict == out_dict
def test_append_data_to_dict():
with open("./test/data/example_append_data_to_dict_in.json", "r") as file:
mydict = json.load(file)
outdict = ut.append_data_to_dict(mydict)
print(outdict)
with open("./test/data/example_append_data_to_dict_out.json", "r") as file:
example_outdict = json.load(file)
assert outdict == example_outdict
def test_dump_df():
with open("./test/data/example_append_data_to_dict_out.json", "r") as file:
outdict = json.load(file)
df = ut.dump_df(outdict)
out_df = pd.read_csv("./test/data/example_dump_df.csv", index_col=[0])
pd.testing.assert_frame_equal(df, out_df)

152
notebooks/colors_expression.ipynb Обычный файл
Просмотреть файл

@ -0,0 +1,152 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook shows primary color analysis of color image using K-Means algorithm.\n",
"The output are N primary colors and their corresponding percentage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.cluster import KMeans\n",
"import matplotlib.pyplot as plt\n",
"import cv2\n",
"import numpy as np\n",
"import requests"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def centroid_histogram(clt):\n",
" # grab the number of different clusters and create a histogram\n",
" # based on the number of pixels assigned to each cluster\n",
" numLabels = np.arange(0, len(np.unique(clt.labels_)) + 1)\n",
" (hist, _) = np.histogram(clt.labels_, bins=numLabels)\n",
"\n",
" # normalize the histogram, such that it sums to one\n",
" hist = hist.astype(\"float\")\n",
" hist /= hist.sum()\n",
"\n",
" # return the histogram\n",
" return hist\n",
"\n",
"\n",
"def plot_colors(hist, centroids):\n",
" # initialize the bar chart representing the relative frequency\n",
" # of each of the colors\n",
" bar = np.zeros((50, 300, 3), dtype=\"uint8\")\n",
" startX = 0\n",
" # loop over the percentage of each cluster and the color of\n",
" # each cluster\n",
" for (percent, color) in zip(hist, centroids):\n",
" # plot the relative percentage of each cluster\n",
" endX = startX + (percent * 300)\n",
" cv2.rectangle(\n",
" bar, (int(startX), 0), (int(endX), 50), color.astype(\"uint8\").tolist(), -1\n",
" )\n",
" startX = endX\n",
"\n",
" # return the bar chart\n",
" return bar"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# load the image and convert it from BGR to RGB so that\n",
"# we can dispaly it with matplotlib\n",
"# image_path = './data/blue.jpg'\n",
"# image = cv2.imread(image_path)\n",
"\n",
"file = requests.get(\n",
" \"https://heibox.uni-heidelberg.de/thumbnail/537e6da0a8b44069bc96/1024/images/100361_asm.png\"\n",
")\n",
"image = cv2.imdecode(np.fromstring(file.content, np.uint8), 1)\n",
"\n",
"# BGR-->RGB cv to matplotlib show\n",
"image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
"\n",
"# show our image\n",
"plt.figure()\n",
"plt.axis(\"off\")\n",
"plt.imshow(image)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# reshape the image to be a list of pixels\n",
"image = image.reshape((image.shape[0] * image.shape[1], 3))\n",
"\n",
"# cluster the pixel intensities\n",
"clt = KMeans(n_clusters=8)\n",
"clt.fit(image)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# build a histogram of clusters and then create a figure\n",
"# representing the number of pixels labeled to each color\n",
"hist = centroid_histogram(clt)\n",
"bar = plot_colors(hist, clt.cluster_centers_)\n",
"\n",
"# show our color bart\n",
"plt.figure()\n",
"plt.axis(\"off\")\n",
"plt.imshow(bar)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for (percent, color) in zip(hist, clt.cluster_centers_):\n",
" print(\"color:\", color, \" percentage:\", percent)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Просмотреть файл

@ -42,7 +42,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"images = misinformation.find_files(\n", "images = misinformation.find_files(\n",
" path=\"/home/jovyan/shared/data/test_no_text/\",\n", " path=\"/home/inga/projects/misinformation-project/misinformation/data/test_no_text/\",\n",
" limit=1000,\n", " limit=1000,\n",
")" ")"
] ]

Двоичные данные
notebooks/obj_dect_cvlib/image.jpg Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 121 KiB

Двоичные данные
notebooks/obj_dect_cvlib/image02.jpg Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 792 KiB

103
notebooks/obj_dect_cvlib/objdect-cvlib.ipynb Обычный файл
Просмотреть файл

@ -0,0 +1,103 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style =\" color : green ;font - weight : bold \">ImageAI for Object Detection</span>\n",
"http://imageai.org/#features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A simple, high level, easy-to-use open source Computer Vision library for Python.\n",
"\n",
"It was developed with a focus on enabling easy and fast experimentation. Being able to go from an idea to prototype with least amount of delay is key to doing good research.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>cvlib detect_common_objects pretrained on coco dataset.</p>\n",
"Underneath it uses YOLOv3 model trained on COCO dataset capable of detecting 80 common objects in context."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cv2\n",
"import matplotlib.pyplot as plt\n",
"import cvlib as cv\n",
"from cvlib.object_detection import draw_bbox"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"im = cv2.imread(\"image.jpg\")\n",
"\n",
"bbox, label, conf = cv.detect_common_objects(im)\n",
"\n",
"output_image = draw_bbox(im, bbox, label, conf)\n",
"\n",
"plt.imshow(output_image)\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"im = cv2.imread(\"image02.jpg\")\n",
"\n",
"bbox, label, conf = cv.detect_common_objects(im)\n",
"\n",
"output_image = draw_bbox(im, bbox, label, conf)\n",
"\n",
"plt.imshow(output_image)\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

80
notebooks/obj_dect_cvlib/yolov3.txt Обычный файл
Просмотреть файл

@ -0,0 +1,80 @@
person
bicycle
car
motorcycle
airplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
couch
potted plant
bed
dining table
toilet
tv
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush

Двоичные данные
notebooks/obj_dect_imageai/image.jpg Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 121 KiB

Двоичные данные
notebooks/obj_dect_imageai/imagenew.jpg Обычный файл

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 126 KiB

Просмотреть файл

@ -0,0 +1,147 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<span style =\" color : green ;font - weight : bold \">ImageAI for Object Detection</span>\n",
"http://imageai.org/#features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ImageAI provides API to recognize 1000 different objects in a picture using pre-trained models that were trained on the ImageNet-1000 dataset. The model implementations provided are SqueezeNet, ResNet, InceptionV3 and DenseNet.\n",
"</p>\n",
"ImageAI provides API to detect, locate and identify 80 most common objects in everyday life in a picture using pre-trained models that were trained on the COCO Dataset. The model implementations provided include RetinaNet, YOLOv3 and TinyYOLOv3."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are 80 possible objects that you can detect with the\n",
"ObjectDetection class, and they are as seen below.\n",
"\n",
" person, bicycle, car, motorcycle, airplane,\n",
" bus, train, truck, boat, traffic light, fire hydrant, stop_sign,\n",
" parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra,\n",
" giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard,\n",
" sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket,\n",
" bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange,\n",
" broccoli, carrot, hot dog, pizza, donot, cake, chair, couch, potted plant, bed,\n",
" dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave,\n",
" oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer,\n",
" toothbrush."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>requirements</p>\n",
"<p>tensorflow==1.15.0</p>\n",
"<p>numpy==1.19.5</p>\n",
"<p>scipy==1.4.1</p>\n",
"<p>keras==2.1.0</p>\n",
"<p>imageai==2.0.2</p>\n",
"\n",
"<p>Or update to newest version, see https://github.com/OlafenwaMoses/ImageAI</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download the RetinaNet model file for object detection\n",
"\n",
"https://github.com/OlafenwaMoses/ImageAI/releases/download/1.0/resnet50_coco_best_v2.0.1.h5"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from imageai.Detection import ObjectDetection\n",
"import matplotlib.pyplot as plt\n",
"import skimage.io\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"execution_path = os.getcwd()\n",
"\n",
"detector = ObjectDetection()\n",
"detector.setModelTypeAsRetinaNet()\n",
"detector.setModelPath(os.path.join(execution_path, \"resnet50_coco_best_v2.0.1.h5\"))\n",
"detector.loadModel()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"detections = detector.detectObjectsFromImage(\n",
" input_image=os.path.join(execution_path, \"image.jpg\"),\n",
" output_image_path=os.path.join(execution_path, \"imagenew.jpg\"),\n",
")\n",
"\n",
"for eachObject in detections:\n",
" print(eachObject[\"name\"], \" : \", eachObject[\"percentage_probability\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image = skimage.io.imread(\"image.jpg\")\n",
"imagenew = skimage.io.imread(\"imagenew.jpg\")\n",
"\n",
"_, axis = plt.subplots(1, 2)\n",
"axis[0].imshow(image, cmap=\"gray\")\n",
"axis[1].imshow(imagenew, cmap=\"gray\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

178
notebooks/objects_expression.ipynb Обычный файл
Просмотреть файл

@ -0,0 +1,178 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Objects Expression recognition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebooks shows some preliminary work on detecting objects expressions with cvliv and imageai. It is mainly meant to explore its capabilities and to decide on future research directions. We package our code into a `misinformation` package that is imported here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import misinformation\n",
"import misinformation.objects as ob"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ObjectDetector currently support 2 clinet types: CLIENT_CVLIB and CLIENT_IMAGEAI, default is CLIENT_CVLIB."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set an image path as input file path."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"images = misinformation.find_files(\n",
" path=\"/home/inga/projects/misinformation-project/misinformation/data/test_no_text/\",\n",
" limit=1000,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mydict = misinformation.utils.initialize_dict(images)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Detect objects with default client type: CLIENT_CVLIB."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for key in mydict:\n",
" mydict[key] = ob.ObjectDetector(mydict[key]).analyse_image()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Convert the dictionary of dictionarys into a dictionary with lists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"outdict = misinformation.utils.append_data_to_dict(mydict)\n",
"df = misinformation.utils.dump_df(outdict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Write the csv file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"./data_out.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To check the analysis, you can inspect the analyzed elements here. Loading the results takes a moment, so please be patient. If you are sure of what you are doing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"misinformation.explore_analysis(mydict, identify=\"objects\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
},
"vscode": {
"interpreter": {
"hash": "f1142466f556ab37fe2d38e2897a16796906208adb09fea90ba58bdf8a56f0ba"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Просмотреть файл

@ -1,6 +1,6 @@
[build-system] [build-system]
requires = [ requires = [
"setuptools>=61", "setuptools==61",
] ]
build-backend = "setuptools.build_meta" build-backend = "setuptools.build_meta"
@ -13,7 +13,7 @@ maintainers = [
{ name = "Inga Ulusoy", email = "ssc@iwr.uni-heidelberg.de" }, { name = "Inga Ulusoy", email = "ssc@iwr.uni-heidelberg.de" },
{ name = "Dominic Kempf", email = "ssc@iwr.uni-heidelberg.de" }, { name = "Dominic Kempf", email = "ssc@iwr.uni-heidelberg.de" },
] ]
requires-python = ">=3.8" requires-python = ">=3.9"
license = { text = "MIT" } license = { text = "MIT" }
classifiers = [ classifiers = [
"Programming Language :: Python :: 3", "Programming Language :: Python :: 3",
@ -21,11 +21,20 @@ classifiers = [
"License :: OSI Approved :: MIT License", "License :: OSI Approved :: MIT License",
] ]
dependencies = [ dependencies = [
"deepface",
"ipywidgets >=8",
"pooch",
"retina-face",
"google-cloud-vision", "google-cloud-vision",
"cvlib",
"deepface",
"ipywidgets",
"numpy",
"opencv_python",
"pandas",
"pooch",
"protobuf",
"retina_face",
"setuptools",
"tensorflow",
"keras",
"openpyxl",
] ]
[project.scripts] [project.scripts]

Просмотреть файл

@ -1,11 +0,0 @@
deepface
ipywidgets>=8
pooch
retina-face
opencv-python
matplotlib
numpy
keras-ocr
tensorflow
google-cloud-vision
pytesseract

Просмотреть файл

@ -1,5 +1,14 @@
deepface
ipywidgets>=8
pooch
retina-face
google-cloud-vision google-cloud-vision
cvlib
deepface
ipywidgets
numpy
opencv_python
pandas
pooch
protobuf
retina_face
setuptools
tensorflow
keras
openpyxl