AMMICO/build/html/readme_link.html

227 строки
18 KiB
HTML
Исходник Ответственный История

Этот файл содержит неоднозначные символы Юникода

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html class="writer-html5" lang="en" data-content_root="./">
<head>
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>AMMICO - AI-based Media and Misinformation Content Analysis Tool &mdash; AMMICO 0.2.2 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=b86133f3" />
<link rel="stylesheet" type="text/css" href="_static/css/theme.css?v=e59714d7" />
<script src="_static/jquery.js?v=5d32c60e"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="_static/documentation_options.js?v=000c92bf"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
<script src="_static/js/theme.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="FAQ" href="faq_link.html" />
<link rel="prev" title="Welcome to AMMICOs documentation!" href="index.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home">
AMMICO
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">AMMICO - AI-based Media and Misinformation Content Analysis Tool</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#installation">Installation</a></li>
<li class="toctree-l2"><a class="reference internal" href="#usage">Usage</a></li>
<li class="toctree-l2"><a class="reference internal" href="#features">Features</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#text-extraction">Text extraction</a></li>
<li class="toctree-l3"><a class="reference internal" href="#content-extraction">Content extraction</a></li>
<li class="toctree-l3"><a class="reference internal" href="#emotion-recognition">Emotion recognition</a></li>
<li class="toctree-l3"><a class="reference internal" href="#color-hue-detection">Color/hue detection</a></li>
<li class="toctree-l3"><a class="reference internal" href="#cropping-of-posts">Cropping of posts</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="faq_link.html">FAQ</a></li>
<li class="toctree-l1"><a class="reference internal" href="create_API_key_link.html">Instructions how to generate and enable a google Cloud Vision API key</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebooks/DemoNotebook_ammico.html">AMMICO Demonstration Notebook</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebooks/DemoNotebook_ammico.html#Step-0:-Create-and-set-a-Google-Cloud-Vision-Key">Step 0: Create and set a Google Cloud Vision Key</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebooks/DemoNotebook_ammico.html#Step-1:-Read-your-data-into-AMMICO">Step 1: Read your data into AMMICO</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebooks/DemoNotebook_ammico.html#The-detector-modules">The detector modules</a></li>
<li class="toctree-l1"><a class="reference internal" href="modules.html">AMMICO package modules</a></li>
<li class="toctree-l1"><a class="reference internal" href="license_link.html">License</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">AMMICO</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">AMMICO - AI-based Media and Misinformation Content Analysis Tool</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/ssciwr/AMMICO/blob/main/docs/source/readme_link.md" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="ammico-ai-based-media-and-misinformation-content-analysis-tool">
<h1>AMMICO - AI-based Media and Misinformation Content Analysis Tool<a class="headerlink" href="#ammico-ai-based-media-and-misinformation-content-analysis-tool" title="Link to this heading"></a></h1>
<p><img alt="License: MIT" src="https://img.shields.io/github/license/ssciwr/AMMICO" />
<img alt="GitHub Workflow Status" src="https://img.shields.io/github/actions/workflow/status/ssciwr/AMMICO/ci.yml?branch=main" />
<img alt="codecov" src="https://img.shields.io/codecov/c/github/ssciwr/AMMICO" />
<img alt="Quality Gate Status" src="https://sonarcloud.io/api/project_badges/measure?project=ssciwr_ammico&amp;metric=alert_status" />
<img alt="Language" src="https://img.shields.io/github/languages/top/ssciwr/AMMICO" />
<a class="reference external" href="https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/DemoNotebook_ammico.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
<p>This package extracts data from images such as social media posts that contain an image part and a text part. The analysis can generate a very large number of features, depending on the user input. See <a class="reference external" href="https://dx.doi.org/10.31235/osf.io/v8txj">our paper</a> for a more in-depth description.</p>
<p><strong><em>This project is currently under development!</em></strong></p>
<p>Use pre-processed image files such as social media posts with comments and process to collect information:</p>
<ol class="arabic simple">
<li><p>Text extraction from the images</p>
<ol class="arabic simple">
<li><p>Language detection</p></li>
<li><p>Translation into English or other languages</p></li>
<li><p>Cleaning of the text, spell-check</p></li>
<li><p>Sentiment analysis</p></li>
<li><p>Named entity recognition</p></li>
<li><p>Topic analysis</p></li>
</ol>
</li>
<li><p>Content extraction from the images</p>
<ol class="arabic simple">
<li><p>Textual summary of the image content (“image caption”) that can be analyzed further using the above tools</p></li>
<li><p>Feature extraction from the images: User inputs query and images are matched to that query (both text and image query)</p></li>
<li><p>Question answering</p></li>
</ol>
</li>
<li><p>Performing person and face recognition in images</p>
<ol class="arabic simple">
<li><p>Face mask detection</p></li>
<li><p>Probabilistic detection of age, gender and race</p></li>
<li><p>Emotion recognition</p></li>
</ol>
</li>
<li><p>Color analysis</p>
<ol class="arabic simple">
<li><p>Analyse hue and percentage of color on image</p></li>
</ol>
</li>
<li><p>Multimodal analysis</p>
<ol class="arabic simple">
<li><p>Find best matches for image content or image similarity</p></li>
</ol>
</li>
<li><p>Cropping images to remove comments from posts</p></li>
</ol>
<section id="installation">
<h2>Installation<a class="headerlink" href="#installation" title="Link to this heading"></a></h2>
<p>The <code class="docutils literal notranslate"><span class="pre">AMMICO</span></code> package can be installed using pip:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">ammico</span>
</pre></div>
</div>
<p>This will install the package and its dependencies locally. If after installation you get some errors when running some modules, please follow the instructions in the <a class="reference external" href="https://ssciwr.github.io/AMMICO/build/html/faq_link.html">FAQ</a>.</p>
</section>
<section id="usage">
<h2>Usage<a class="headerlink" href="#usage" title="Link to this heading"></a></h2>
<p>The main demonstration notebook can be found in the <code class="docutils literal notranslate"><span class="pre">notebooks</span></code> folder and also on google colab: <a class="reference external" href="https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/DemoNotebook_ammico.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
<p>There are further sample notebooks in the <code class="docutils literal notranslate"><span class="pre">notebooks</span></code> folder for the more experimental features:</p>
<ol class="arabic simple">
<li><p>Topic analysis: Use the notebook <code class="docutils literal notranslate"><span class="pre">get-text-from-image.ipynb</span></code> to analyse the topics of the extraced text.<br />
<strong>You can run this notebook on google colab: <a class="reference external" href="https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/get-text-from-image.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></strong><br />
Place the data files and google cloud vision API key in your google drive to access the data.</p></li>
<li><p>To crop social media posts use the <code class="docutils literal notranslate"><span class="pre">cropposts.ipynb</span></code> notebook.
<strong>You can run this notebook on google colab: <a class="reference external" href="https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/cropposts.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></strong></p></li>
</ol>
</section>
<section id="features">
<h2>Features<a class="headerlink" href="#features" title="Link to this heading"></a></h2>
<section id="text-extraction">
<h3>Text extraction<a class="headerlink" href="#text-extraction" title="Link to this heading"></a></h3>
<p>The text is extracted from the images using <a class="reference external" href="https://cloud.google.com/vision">google-cloud-vision</a>. For this, you need an API key. Set up your google account following the instructions on the google Vision AI website or as described <a class="reference external" href="https://ssciwr.github.io/AMMICO/build/html/create_API_key_link.html">here</a>.
You then need to export the location of the API key as an environment variable:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">export</span> <span class="n">GOOGLE_APPLICATION_CREDENTIALS</span><span class="o">=</span><span class="s2">&quot;location of your .json&quot;</span>
</pre></div>
</div>
<p>The extracted text is then stored under the <code class="docutils literal notranslate"><span class="pre">text</span></code> key (column when exporting a csv).</p>
<p><a class="reference external" href="https://py-googletrans.readthedocs.io/en/latest/">Googletrans</a> is used to recognize the language automatically and translate into English. The text language and translated text is then stored under the <code class="docutils literal notranslate"><span class="pre">text_language</span></code> and <code class="docutils literal notranslate"><span class="pre">text_english</span></code> key (column when exporting a csv).</p>
<p>If you further want to analyse the text, you have to set the <code class="docutils literal notranslate"><span class="pre">analyse_text</span></code> keyword to <code class="docutils literal notranslate"><span class="pre">True</span></code>. In doing so, the text is then processed using <a class="reference external" href="https://spacy.io/">spacy</a> (tokenized, part-of-speech, lemma, …). The English text is cleaned from numbers and unrecognized words (<code class="docutils literal notranslate"><span class="pre">text_clean</span></code>), spelling of the English text is corrected (<code class="docutils literal notranslate"><span class="pre">text_english_correct</span></code>), and further sentiment and subjectivity analysis are carried out (<code class="docutils literal notranslate"><span class="pre">polarity</span></code>, <code class="docutils literal notranslate"><span class="pre">subjectivity</span></code>). The latter two steps are carried out using <a class="reference external" href="https://textblob.readthedocs.io/en/dev/index.html">TextBlob</a>. For more information on the sentiment analysis using TextBlob see <a class="reference external" href="https://towardsdatascience.com/my-absolute-go-to-for-sentiment-analysis-textblob-3ac3a11d524">here</a>.</p>
<p>The <a class="reference external" href="https://huggingface.co/">Hugging Face transformers library</a> is used to perform another sentiment analysis, a text summary, and named entity recognition, using the <code class="docutils literal notranslate"><span class="pre">transformers</span></code> pipeline.</p>
</section>
<section id="content-extraction">
<h3>Content extraction<a class="headerlink" href="#content-extraction" title="Link to this heading"></a></h3>
<p>The image content (“caption”) is extracted using the <a class="reference external" href="https://github.com/salesforce/LAVIS">LAVIS</a> library. This library enables vision intelligence extraction using several state-of-the-art models such as BLIP and BLIP2, depending on the task and user selection. Further, it allows feature extraction from the images, where users can input textual and image queries, and the images in the database are matched to that query (multimodal search). Another option is question answering, where the user inputs a text question and the library finds the images that match the query.</p>
</section>
<section id="emotion-recognition">
<h3>Emotion recognition<a class="headerlink" href="#emotion-recognition" title="Link to this heading"></a></h3>
<p>Emotion recognition is carried out using the <a class="reference external" href="https://github.com/serengil/deepface">deepface</a> and <a class="reference external" href="https://github.com/serengil/retinaface">retinaface</a> libraries. These libraries detect the presence of faces, as well as provide probabilistic assessment of their age, gender, race, and emotion based on several state-of-the-art models. It is also detected if the person is wearing a face mask - if they are, then no further detection is carried out as the mask affects the assessment acuracy. Because the detection of gender, race and age is carried out in simplistic categories (e.g., for gender, using only “male” and “female”), and because of the ethical implications of such assessments, users can only access this part of the tool if they agree with an ethical disclosure statement (see FAQ). Moreover, once users accept the disclosure, they can further set their own detection confidence threshholds.</p>
</section>
<section id="color-hue-detection">
<h3>Color/hue detection<a class="headerlink" href="#color-hue-detection" title="Link to this heading"></a></h3>
<p>Color detection is carried out using <a class="reference external" href="https://github.com/obskyr/colorgram.py">colorgram.py</a> and <a class="reference external" href="https://github.com/vaab/colour">colour</a> for the distance metric. The colors can be classified into the main named colors/hues in the English language, that are red, green, blue, yellow, cyan, orange, purple, pink, brown, grey, white, black.</p>
</section>
<section id="cropping-of-posts">
<h3>Cropping of posts<a class="headerlink" href="#cropping-of-posts" title="Link to this heading"></a></h3>
<p>Social media posts can automatically be cropped to remove further comments on the page and restrict the textual content to the first comment only.</p>
</section>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="index.html" class="btn btn-neutral float-left" title="Welcome to AMMICOs documentation!" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="faq_link.html" class="btn btn-neutral float-right" title="FAQ" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2022, Scientific Software Center, Heidelberg University.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>