зеркало из
https://github.com/ssciwr/AMMICO.git
synced 2025-10-29 21:16:06 +02:00
Deploying to gh-pages from @ ssciwr/AMMICO@19f33c3177 🚀
Этот коммит содержится в:
родитель
985204502d
Коммит
db8487d7e4
Двоичные данные
build/doctrees/ammico.doctree
Двоичные данные
build/doctrees/ammico.doctree
Двоичный файл не отображается.
Двоичные данные
build/doctrees/environment.pickle
Двоичные данные
build/doctrees/environment.pickle
Двоичный файл не отображается.
Двоичные данные
build/doctrees/faq_link.doctree
Двоичные данные
build/doctrees/faq_link.doctree
Двоичный файл не отображается.
Двоичные данные
build/doctrees/notebooks/DemoNotebook_ammico.doctree
Двоичные данные
build/doctrees/notebooks/DemoNotebook_ammico.doctree
Двоичный файл не отображается.
Двоичные данные
build/doctrees/readme_link.doctree
Двоичные данные
build/doctrees/readme_link.doctree
Двоичный файл не отображается.
@ -101,6 +101,12 @@
|
||||
<li class="toctree-l4"><a class="reference internal" href="#utils.AnalysisMethod.set_keys"><code class="docutils literal notranslate"><span class="pre">AnalysisMethod.set_keys()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="#utils.AnalysisType"><code class="docutils literal notranslate"><span class="pre">AnalysisType</span></code></a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#utils.AnalysisType.QUESTIONS"><code class="docutils literal notranslate"><span class="pre">AnalysisType.QUESTIONS</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#utils.AnalysisType.SUMMARY"><code class="docutils literal notranslate"><span class="pre">AnalysisType.SUMMARY</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#utils.AnalysisType.SUMMARY_AND_QUESTIONS"><code class="docutils literal notranslate"><span class="pre">AnalysisType.SUMMARY_AND_QUESTIONS</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="#utils.DownloadResource"><code class="docutils literal notranslate"><span class="pre">DownloadResource</span></code></a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#utils.DownloadResource.get"><code class="docutils literal notranslate"><span class="pre">DownloadResource.get()</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#utils.DownloadResource.resources"><code class="docutils literal notranslate"><span class="pre">DownloadResource.resources</span></code></a></li>
|
||||
@ -177,7 +183,7 @@
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="text.TextDetector">
|
||||
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">text.</span></span><span class="sig-name descname"><span class="pre">TextDetector</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">subdict</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">analyse_text</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">skip_extraction</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">accept_privacy</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'PRIVACY_AMMICO'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#text.TextDetector" title="Link to this definition"></a></dt>
|
||||
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">text.</span></span><span class="sig-name descname"><span class="pre">TextDetector</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">subdict</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">dict</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">skip_extraction</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">accept_privacy</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'PRIVACY_AMMICO'</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#text.TextDetector" title="Link to this definition"></a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">AnalysisMethod</span></code></p>
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="text.TextDetector.analyse_image">
|
||||
@ -410,6 +416,27 @@ These colors are: “red”, “green”, “blue”, “yellow”,”cyan”,
|
||||
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="utils.AnalysisType">
|
||||
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">utils.</span></span><span class="sig-name descname"><span class="pre">AnalysisType</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#utils.AnalysisType" title="Link to this definition"></a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">Enum</span></code></p>
|
||||
<dl class="py attribute">
|
||||
<dt class="sig sig-object py" id="utils.AnalysisType.QUESTIONS">
|
||||
<span class="sig-name descname"><span class="pre">QUESTIONS</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">'questions'</span></em><a class="headerlink" href="#utils.AnalysisType.QUESTIONS" title="Link to this definition"></a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py attribute">
|
||||
<dt class="sig sig-object py" id="utils.AnalysisType.SUMMARY">
|
||||
<span class="sig-name descname"><span class="pre">SUMMARY</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">'summary'</span></em><a class="headerlink" href="#utils.AnalysisType.SUMMARY" title="Link to this definition"></a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py attribute">
|
||||
<dt class="sig sig-object py" id="utils.AnalysisType.SUMMARY_AND_QUESTIONS">
|
||||
<span class="sig-name descname"><span class="pre">SUMMARY_AND_QUESTIONS</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">'summary_and_questions'</span></em><a class="headerlink" href="#utils.AnalysisType.SUMMARY_AND_QUESTIONS" title="Link to this definition"></a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="utils.DownloadResource">
|
||||
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">utils.</span></span><span class="sig-name descname"><span class="pre">DownloadResource</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">kwargs</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#utils.DownloadResource" title="Link to this definition"></a></dt>
|
||||
|
||||
@ -53,6 +53,7 @@
|
||||
<li class="toctree-l3"><a class="reference internal" href="#after-we-prepared-right-environment-we-can-install-the-ammico-package">3. After we prepared right environment we can install the <code class="docutils literal notranslate"><span class="pre">ammico</span></code> package</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="#micromamba">Micromamba</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="#windows">Windows</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="#version-clashes-between-tensorflow-and-numpy">Version clashes between tensorflow and numpy</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="#what-happens-to-the-images-that-are-sent-to-google-cloud-vision">What happens to the images that are sent to google Cloud Vision?</a></li>
|
||||
@ -160,6 +161,10 @@ source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
|
||||
<p>Be careful, it requires around 7 GB of disk space.</p>
|
||||
<p><img alt="Screenshot 2023-06-01 165712" src="https://github.com/ssciwr/AMMICO/assets/8105097/3dfb302f-c390-46a7-a700-4e044f56c30f" /></p>
|
||||
</section>
|
||||
<section id="version-clashes-between-tensorflow-and-numpy">
|
||||
<h3>Version clashes between tensorflow and numpy<a class="headerlink" href="#version-clashes-between-tensorflow-and-numpy" title="Link to this heading"></a></h3>
|
||||
<p>Due to the <code class="docutils literal notranslate"><span class="pre">faces</span></code> module, the tensorflow version is currently fixed to at most <code class="docutils literal notranslate"><span class="pre">2.14.0</span></code>. This requires that <code class="docutils literal notranslate"><span class="pre">numpy</span></code> is restricted to <code class="docutils literal notranslate"><span class="pre">numpy==1.23.5</span></code>. If you experience issues with compatibility between tensorflow and numpy, you can try fixing the numpy version to this version.</p>
|
||||
</section>
|
||||
</section>
|
||||
<section id="what-happens-to-the-images-that-are-sent-to-google-cloud-vision">
|
||||
<h2>What happens to the images that are sent to google Cloud Vision?<a class="headerlink" href="#what-happens-to-the-images-that-are-sent-to-google-cloud-vision" title="Link to this heading"></a></h2>
|
||||
|
||||
@ -90,6 +90,7 @@
|
||||
| <a href="#I"><strong>I</strong></a>
|
||||
| <a href="#M"><strong>M</strong></a>
|
||||
| <a href="#P"><strong>P</strong></a>
|
||||
| <a href="#Q"><strong>Q</strong></a>
|
||||
| <a href="#R"><strong>R</strong></a>
|
||||
| <a href="#S"><strong>S</strong></a>
|
||||
| <a href="#T"><strong>T</strong></a>
|
||||
@ -117,6 +118,8 @@
|
||||
<li><a href="ammico.html#display.AnalysisExplorer">AnalysisExplorer (class in display)</a>
|
||||
</li>
|
||||
<li><a href="ammico.html#utils.AnalysisMethod">AnalysisMethod (class in utils)</a>
|
||||
</li>
|
||||
<li><a href="ammico.html#utils.AnalysisType">AnalysisType (class in utils)</a>
|
||||
</li>
|
||||
<li><a href="ammico.html#faces.EmotionDetector.analyze_single_face">analyze_single_face() (faces.EmotionDetector method)</a>
|
||||
</li>
|
||||
@ -255,6 +258,14 @@
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
||||
<h2 id="Q">Q</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="ammico.html#utils.AnalysisType.QUESTIONS">QUESTIONS (utils.AnalysisType attribute)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
||||
<h2 id="R">R</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
@ -287,6 +298,12 @@
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="ammico.html#utils.AnalysisType.SUMMARY">SUMMARY (utils.AnalysisType attribute)</a>
|
||||
</li>
|
||||
<li><a href="ammico.html#utils.AnalysisType.SUMMARY_AND_QUESTIONS">SUMMARY_AND_QUESTIONS (utils.AnalysisType attribute)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
|
||||
<h2 id="T">T</h2>
|
||||
|
||||
@ -142,6 +142,12 @@
|
||||
<li class="toctree-l3"><a class="reference internal" href="ammico.html#utils.AnalysisMethod.set_keys"><code class="docutils literal notranslate"><span class="pre">AnalysisMethod.set_keys()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="ammico.html#utils.AnalysisType"><code class="docutils literal notranslate"><span class="pre">AnalysisType</span></code></a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="ammico.html#utils.AnalysisType.QUESTIONS"><code class="docutils literal notranslate"><span class="pre">AnalysisType.QUESTIONS</span></code></a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="ammico.html#utils.AnalysisType.SUMMARY"><code class="docutils literal notranslate"><span class="pre">AnalysisType.SUMMARY</span></code></a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="ammico.html#utils.AnalysisType.SUMMARY_AND_QUESTIONS"><code class="docutils literal notranslate"><span class="pre">AnalysisType.SUMMARY_AND_QUESTIONS</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="ammico.html#utils.DownloadResource"><code class="docutils literal notranslate"><span class="pre">DownloadResource</span></code></a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="ammico.html#utils.DownloadResource.get"><code class="docutils literal notranslate"><span class="pre">DownloadResource.get()</span></code></a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="ammico.html#utils.DownloadResource.resources"><code class="docutils literal notranslate"><span class="pre">DownloadResource.resources</span></code></a></li>
|
||||
|
||||
@ -560,7 +560,7 @@ text_df.to_csv("data_out.csv")
|
||||
<section id="The-detector-modules">
|
||||
<h1>The detector modules<a class="headerlink" href="#The-detector-modules" title="Link to this heading"></a></h1>
|
||||
<p>The different detector modules with their options are explained in more detail in this section. ## Text detector Text on the images can be extracted using the <code class="docutils literal notranslate"><span class="pre">TextDetector</span></code> class (<code class="docutils literal notranslate"><span class="pre">text</span></code> module). The text is initally extracted using the Google Cloud Vision API and then translated into English with googletrans. The translated text is cleaned of whitespace, linebreaks, and numbers using Python syntax and spaCy.</p>
|
||||
<p><img alt="fa9661e35a864a6989f950e5186bb570" class="no-scaled-link" src="../_images/text_detector.png" style="width: 800px;" /></p>
|
||||
<p><img alt="f40c056cdd6749149c1b183dbe0662e5" class="no-scaled-link" src="../_images/text_detector.png" style="width: 800px;" /></p>
|
||||
<p>The user can set if the text should be further summarized, and analyzed for sentiment and named entity recognition, by setting the keyword <code class="docutils literal notranslate"><span class="pre">analyse_text</span></code> to <code class="docutils literal notranslate"><span class="pre">True</span></code> (the default is <code class="docutils literal notranslate"><span class="pre">False</span></code>). If set, the transformers pipeline is used for each of these tasks, with the default models as of 03/2023. Other models can be selected by setting the optional keyword <code class="docutils literal notranslate"><span class="pre">model_names</span></code> to a list of selected models, on for each task:
|
||||
<code class="docutils literal notranslate"><span class="pre">model_names=["sshleifer/distilbart-cnn-12-6",</span> <span class="pre">"distilbert-base-uncased-finetuned-sst-2-english",</span> <span class="pre">"dbmdz/bert-large-cased-finetuned-conll03-english"]</span></code> for summary, sentiment, and ner. To be even more specific, revision numbers can also be selected by specifying the optional keyword <code class="docutils literal notranslate"><span class="pre">revision_numbers</span></code> to a list of revision numbers for each model, for example <code class="docutils literal notranslate"><span class="pre">revision_numbers=["a4f8f3e",</span> <span class="pre">"af0f99b",</span> <span class="pre">"f2482bf"]</span></code>.</p>
|
||||
<p>Please note that for the Google Cloud Vision API (the TextDetector class) you need to set a key in order to process the images. This key is ideally set as an environment variable using for example</p>
|
||||
@ -661,7 +661,7 @@ image_df.to_csv("/content/drive/MyDrive/misinformation-data/data_out.csv&qu
|
||||
<section id="Image-summary-and-query">
|
||||
<h2>Image summary and query<a class="headerlink" href="#Image-summary-and-query" title="Link to this heading"></a></h2>
|
||||
<p>The <code class="docutils literal notranslate"><span class="pre">SummaryDetector</span></code> can be used to generate image captions (<code class="docutils literal notranslate"><span class="pre">summary</span></code>) as well as visual question answering (<code class="docutils literal notranslate"><span class="pre">VQA</span></code>).</p>
|
||||
<p><img alt="c1bb5284d8da452db91b3ed56781bca5" class="no-scaled-link" src="../_images/summary_detector.png" style="width: 800px;" /></p>
|
||||
<p><img alt="3c0880fbcb6c41e188e97e8df958f889" class="no-scaled-link" src="../_images/summary_detector.png" style="width: 800px;" /></p>
|
||||
<p>This module is based on the <a class="reference external" href="https://github.com/salesforce/LAVIS">LAVIS</a> library. Since the models can be quite large, an initial object is created which will load the necessary models into RAM/VRAM and then use them in the analysis. The user can specify the type of analysis to be performed using the <code class="docutils literal notranslate"><span class="pre">analysis_type</span></code> keyword. Setting it to <code class="docutils literal notranslate"><span class="pre">summary</span></code> will generate a caption (summary), <code class="docutils literal notranslate"><span class="pre">questions</span></code> will prepare answers (VQA) to a list of questions as set by the user,
|
||||
<code class="docutils literal notranslate"><span class="pre">summary_and_questions</span></code> will do both. Note that the desired analysis type needs to be set here in the initialization of the detector object, and not when running the analysis for each image; the same holds true for the selected model.</p>
|
||||
<p>The implemented models are listed below.</p>
|
||||
@ -951,7 +951,7 @@ image_df.to_csv("/content/drive/MyDrive/misinformation-data/data_out.csv&qu
|
||||
<section id="Detection-of-faces-and-facial-expression-analysis">
|
||||
<h2>Detection of faces and facial expression analysis<a class="headerlink" href="#Detection-of-faces-and-facial-expression-analysis" title="Link to this heading"></a></h2>
|
||||
<p>Faces and facial expressions are detected and analyzed using the <code class="docutils literal notranslate"><span class="pre">EmotionDetector</span></code> class from the <code class="docutils literal notranslate"><span class="pre">faces</span></code> module. Initially, it is detected if faces are present on the image using RetinaFace, followed by analysis if face masks are worn (Face-Mask-Detection). The probabilistic detection of age, gender, race, and emotions is carried out with deepface, but only if the disclosure statement has been accepted (see above).</p>
|
||||
<p><img alt="1d72c51aea934efea31bf149f207463f" class="no-scaled-link" src="../_images/emotion_detector.png" style="width: 800px;" /></p>
|
||||
<p><img alt="157f731f8b5e49b797ba4dd2a4fab868" class="no-scaled-link" src="../_images/emotion_detector.png" style="width: 800px;" /></p>
|
||||
<p>Depending on the features found on the image, the face detection module returns a different analysis content: If no faces are found on the image, all further steps are skipped and the result <code class="docutils literal notranslate"><span class="pre">"face":</span> <span class="pre">"No",</span> <span class="pre">"multiple_faces":</span> <span class="pre">"No",</span> <span class="pre">"no_faces":</span> <span class="pre">0,</span> <span class="pre">"wears_mask":</span> <span class="pre">["No"],</span> <span class="pre">"age":</span> <span class="pre">[None],</span> <span class="pre">"gender":</span> <span class="pre">[None],</span> <span class="pre">"race":</span> <span class="pre">[None],</span> <span class="pre">"emotion":</span> <span class="pre">[None],</span> <span class="pre">"emotion</span> <span class="pre">(category)":</span> <span class="pre">[None]</span></code> is returned. If one or several faces are found, up to three faces are analyzed if they are partially concealed by a face mask. If
|
||||
yes, only age and gender are detected; if no, also race, emotion, and dominant emotion are detected. In case of the latter, the output could look like this: <code class="docutils literal notranslate"><span class="pre">"face":</span> <span class="pre">"Yes",</span> <span class="pre">"multiple_faces":</span> <span class="pre">"Yes",</span> <span class="pre">"no_faces":</span> <span class="pre">2,</span> <span class="pre">"wears_mask":</span> <span class="pre">["No",</span> <span class="pre">"No"],</span> <span class="pre">"age":</span> <span class="pre">[27,</span> <span class="pre">28],</span> <span class="pre">"gender":</span> <span class="pre">["Man",</span> <span class="pre">"Man"],</span> <span class="pre">"race":</span> <span class="pre">["asian",</span> <span class="pre">None],</span> <span class="pre">"emotion":</span> <span class="pre">["angry",</span> <span class="pre">"neutral"],</span> <span class="pre">"emotion</span> <span class="pre">(category)":</span> <span class="pre">["Negative",</span> <span class="pre">"Neutral"]</span></code>, where for the two faces that are detected (given by <code class="docutils literal notranslate"><span class="pre">no_faces</span></code>), some of the values are returned as a list
|
||||
with the first item for the first (largest) face and the second item for the second (smaller) face (for example, <code class="docutils literal notranslate"><span class="pre">"emotion"</span></code> returns a list <code class="docutils literal notranslate"><span class="pre">["angry",</span> <span class="pre">"neutral"]</span></code> signifying the first face expressing anger, and the second face having a neutral expression).</p>
|
||||
|
||||
Двоичные данные
build/html/objects.inv
Двоичные данные
build/html/objects.inv
Двоичный файл не отображается.
@ -118,7 +118,13 @@
|
||||
<ol class="arabic simple">
|
||||
<li><p>Textual summary of the image content (“image caption”) that can be analyzed further using the above tools</p></li>
|
||||
<li><p>Feature extraction from the images: User inputs query and images are matched to that query (both text and image query)</p></li>
|
||||
<li><p>Question answering</p></li>
|
||||
<li><p>Question answering about image content</p></li>
|
||||
</ol>
|
||||
</li>
|
||||
<li><p>Content extractioni from the videos</p>
|
||||
<ol class="arabic simple">
|
||||
<li><p>Textual summary of the video content that can be analyzed further</p></li>
|
||||
<li><p>Question answering about video content</p></li>
|
||||
</ol>
|
||||
</li>
|
||||
<li><p>Performing person and face recognition in images</p>
|
||||
@ -176,7 +182,8 @@ You then need to export the location of the API key as an environment variable:<
|
||||
</section>
|
||||
<section id="content-extraction">
|
||||
<h3>Content extraction<a class="headerlink" href="#content-extraction" title="Link to this heading"></a></h3>
|
||||
<p>The image content (“caption”) is extracted using the <a class="reference external" href="https://github.com/salesforce/LAVIS">LAVIS</a> library. This library enables vision intelligence extraction using several state-of-the-art models such as BLIP and BLIP2, depending on the task and user selection. Further, it allows feature extraction from the images, where users can input textual and image queries, and the images in the database are matched to that query (multimodal search). Another option is question answering, where the user inputs a text question and the library finds the images that match the query.</p>
|
||||
<p>The image and video content (“caption”) is now extracted using the Qwen2.5-VL
|
||||
model. Qwen2.5-VL is a multimodal large language model capable of understanding and generating content from both images and videos. With its help, AMMMICO supports tasks such as image/video summarization and image/video visual question answering, where the model answers users’ questions about the context of a media file.</p>
|
||||
</section>
|
||||
<section id="emotion-recognition">
|
||||
<h3>Emotion recognition<a class="headerlink" href="#emotion-recognition" title="Link to this heading"></a></h3>
|
||||
|
||||
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
Загрузка…
x
Ссылка в новой задаче
Block a user