{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "1 - Topic modeling:\n", "\n", "This technique allows you for automated assessment of the text content and semantics. More reading: https://en.wikipedia.org/wiki/Topic_model\n", "You may use this for large scale screening to determine oddities in website contents, emails, tweets, discussion forums or even social networks.\n", "Each of the data sources requires specific dataminig approach. \n", "\n", "In this notebook, you will analyze data obtained from Twitter firehose api.\n", "https://developer.twitter.com/en/docs/twitter-api/enterprise/compliance-firehose-api/overview\n", "The advantage of working with this API is that you can request the access as a research or government body and get much more data, compared to privat API access (https://developer.twitter.com/en/docs/twitter-api/getting-started/getting-access-to-the-twitter-api.)\n", "\n", "However, you can use this Jupyter Notebook to process any texts you need.\n", "Depending on the context of your data colelction, you are able to spot forst 3 phases of the disinformation killchain just based on visualization of the topics.\n", "\n", "That is:\n", "Recon - When you see that certain topic suddenly resonates within the sampling space. When sampling is repeated to include the increments, there will be minor clusters around the initial structures.\n", "Build - Clusters will be larger and new entities will appear to interact. \n", "Seed - Similar cluster structures starts to appear in data from multiple sources.\n", "\n", "(Copy - Signifficant growth in cluster sizes and entity numbers per monitored info - space. Note that visibility of this phase depends on the method of sampling and may not occur if the sampling rate is too low)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use your own datsets if you modify the cell with the directory path below:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#----\n", "directory=\"./dataset-kherson/kherson-11-2022/all-lang/\"\n", "#----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code below is set up for you so that you do not have to change anything. \n", "Simply run each cell and see the output. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/73 [00:00