The present document is the deliverable “D4.1 – Social stream conversion report” (henceforth referred to as D4.1) of the RED-Alert project.
The main objective of the document is to define the work already done in Task 4.1: Convert social streams in readable event streams compatible with ontology analysis, i.e. how the information that flows between all the modules of the project has been structured. These structures will be basic to define the rules of the creation of alerts by the CEP engine.
During this task, a common structure for transfer and store of all the information has been defined. The best format for structuring the information is in JSON format, since it is sufficiently flexible and at the same time simple to store information for each of the modules.
For each of the methodologies, a structure of how to represent each of the functionalities has been established.
The objective of the Project is to create real-time alerts of suspicious content on social networks. But in order to have this results we need to define how will the system create these alerts and how will be the data transformed from raw data to alerts.
The main steps to data transformation are:
1. Raw data: The project will acquire data from the social media channels.
2. Data Collection: There will be two main methods to acquire data from Social Networks:
• Data directly extracted from social networks: By means of Social Media APIs. Each social media platform has their own API to get data from it, usually by query searches (searches of content based on keywords).
• Data uploaded by the LEAs: Data uploaded by the users, which may come from one of the previous social networks or from different ones, in a CSV format.
3. Data analysis: Data is analyzed by means of several methods, in summary:
• Texts of messages using Natural Language Processing (NLP) technologies. Extraction of concepts, sentiment, topics, etc. of the content.
• Relations between users through Social Network Analysis (SNA) to obtain communities, the most influential users, etc.
• The videos, audios and images (SMA) by means of technologies of analysis of these types of contents, with the aim to recognize symbols, images, actions, etc. classified as suspicious and convert audio to text.
4. Alerts: The alerts are created by means of Complex Event Processing (CEP) technology. With the CEP it will be possible to define a set of patterns (rules) to identify suspicious messages, from the point of view of the content or the author. The main advantage of the CEP engine is that it works with an intake of data from multiple sources and in real time. Within RED-Alert, these sources are the result of the output of the other RED-Alert components, namely SNA, NLP and SMA.
During the rest of WP4 (Task 4.2 Develop an event driven inference mechanism to cope with uncertainty and Task 4.3 Implement the inference mechanism) the rules that describe the patterns to create the alerts from the input JSON will be defined.
These rules will be built by defining:
• A temporary window to extract the information.
• A set of logical rules that define alerts based on the knowledge of the content, if it is suspicious of being related to radicalization propaganda.