Design & Implementation
The main part of this work was to design and implement the components needed by Crowd Frame to
perform behavioural logging. The design and implementation process presented in this chapter is divided
in two main sections: in the first one, direct intervention on Crowd Frame is actuated to implement the
mechanisms to perform logging, to structure each HTTP request payload, and to add the event listeners
to the DOM elements; in the second one, two centralized log collection solutions are created, one using
AWS services and one using a dedicated server.
4.1 Event capturing and client-side log management
Beside di↵erences in various events, a common management strategy is needed to handle log messages:
log message needs to be built from the raw event data; HTTP POST request must be build and sent;
status of the current context must be considered.
To do so, two new services were realised: section service and user actions logging service.
4.1.1 Section service
Section service comes from a refactoring of the skeleton component of Crowd Frame (Section 3.1.3).
The skeleton contained logic and functionalities to maintain the knowledge of which part of the task
the worker was approaching. For a more precise log message tracking, this type of information is also
needed by the user actions logging service. To make it available to both skeleton component and logger
service, a new class was implemented. It maintains and updates a group of variables which overall state
identify the section.
4.1.2 User action logger service
This new class provides all the functionalities to manage events data, build log messages, build HTTP
POST requests, and send the requests. Essentially, this service sends POST requests containing in the
payload field the data from an event to the collection server.
The logger keeps a state of the environment in which the task is performed:
• Worker ID;
12 Chapter 4 — Design & Implementation
• Task name;
• Batch name;
• Unit ID;
• Current section;
It also maintains three variables for the service functioning and log tracking:
• Next sequence number for the message;
• Initialization time;
• Collection server endpoint.
There are two additional variables needed to perform the correct directive injection over the custom
elements of the skeleton. These variables correspond to the activation status of the logger and which
events must be logged.
For each event monitored (Section 3.2.2), it has been implemented a function in which event data,
passed from the event listener in the custom element directive, is used to build a JSON object that
will be sent in the payload of the POST request. Once the JSON object has been built, data handling
function calls the log function which calls the function for building request’s payload and send the POST
request to the collection server.
Request’s payload is built using the JSON object of the event and state variables. At each log request,
sequence number is set and then incremented, and a client time variable is added to the payload.
This class is initialized by the skeleton with the necessary context information, and it sends a special
context log message, containing the IP address of the worker, to the server.
4.1.3 Requests’ structure
By analyzing the events listed in Section 3.2.2, a generic json object was structured as base for every
log request and a specific one was designed for each event.
As a general note, each event request will contain a ‘section‘ field which stores the task section in
which the event was generated.
4.1.3.1 Base body
The base request body is structured as following:
{
"worker": "string",
"task": "string",
"batch": "string",
"unitID": "string",
"type": "string",
"sequence": "integer",
4.1 Event capturing and client-side log management 13
"client_time": "string",
"details": {}
}
Where: ‘worker‘ is the worker ID, ‘task‘ is the task name, ‘batch‘ is the batch name, ‘unitID‘ is the
unit identification code, ‘type‘ identifies the event, ‘sequence‘ is the requests‘ progressive sequence number,
‘client time‘ is the epoch time of the client’s event, and ‘details‘ contains more specific information
about the event.
4.1.3.2 Context log
The first log to be sent (sequence number = 0) is referred as ‘context‘ and the ‘details‘ field contains
information about the user agent and the IP address. This log message is the only not containing the
‘section‘ field.
{
"ua": "string",
"ip": "string"
}
After being processed by the server (see Section 4.2.2), the ‘details‘ field contains additional information
from the ip lookup: country code, region, city, zip, latitude, longitude, ISP, “is a mobile
connection?”, and “is using a proxy or VPN?”.
4.1.3.3 Mouse movements
In the base body the field "type": "movements" is set.
When mouse movements are made, every 100 ms, timestamp and (x, y) coordinates are mapped to
a dictionary and bu↵ered. When a dwell time of 500 ms occurs, the dictionaries contained in the bu↵er
are pushed in an array of the ‘details‘ dictionary as value of ‘points‘ field.
{
"section": "string",
"points": [
{
"timeStamp": "string",
"x": "integer",
"y": "integer"
},
{
"timeStamp": "string",
"x": "integer",
"y": "integer"
},
14 Chapter 4 — Design & Implementation
{},
{}
]
}
4.1.3.4 Mouse click
In the base body the field "type": "click" is set.
Mouse click event is produced when a left, or right, mouse button is pressed. A worker could possibly
generate many consecutive clicks in a small interval; to prevent logging a high number of useless events
a debounce time was introduced and timestamp of the first click and last one in the “clicks chain”.
Additionally, (x, y) coordinates, DOM target and number of clicks are logged.
{
"section": "string",
"mouseButton": "right || left",
"startTime": "string",
"endTime": "string",
"x": "integer",
"y": "integer",
"target": "string",
"clicks": "integer"
}
4.1.3.5 Button click
In the base body the field "type": "button" is set.
Special type of mouse clicking related to a button DOM element. This event is debounced like the
previous one but information extracted from the event is: button targeted, timestamp of the first click,
and (x, y) coordinates.
{
"section": "string",
"timestamp": "string",
"button": "string",
"x": "integer",
"y": "integer"
}
4.1.3.6 Shortcuts
In the base body the field "type": "shortcut" is set.
Key combinations corresponding to shortcuts are monitored. From the event key pressed for the
shortcut are extracted:
4.1 Event capturing and client-side log management 15
• ‘ctrl‘: a boolean set to ‘true‘ if ctrl key, or command ket, was pressed;
• ‘alt‘: a boolean set to ‘true‘ if alt key was pressed;
• ‘key‘: a string containing the value of the key pressed for the shortcut.
{
"section": "string",
"timestamp": "string",
"ctrl": "boolean",
"alt": "boolean",
"key": "string"
}
4.1.3.7 Keypress
In the base body the field "type": "keySequence" is set.
Every keypress is registered inside a bu↵er as a dictionary containing the timestamp and the key
pressed. Similar to mouse movements, keypress are bu↵ered to form a list and, additionally, the full
sentence is reconstructed. The dwell time before event handling completion is set to 1 s.
{
"section": "string",
"keySequence": [
{
"timeStamp": "string",
"key": "string"
},
{},
{}
],
"sentence": "string"
}
4.1.3.8 Selection
In the base body the field "type": "selection" is set.
Custom-made event to detect selection. Selection start timestamp, selection end timestamp, and
content of the selection, are logged.
{
"section": "string",
"startTime": "string",
"endTime": "string",
"selected": "string"
}
16 Chapter 4 — Design & Implementation
4.1.3.9 Before unload, focus, blur
In the base body the field "type": "unload || window_focus || window_blur" is set.
When the web page is closed a last log request, containing section and timestamp, is sent. An
analogous request is sent on window focus or blur.
{
"section": "string",
"timestamp": "string"
}
4.1.3.10 Scroll
In the base body the field "type": "scroll" is set.
Scroll has a specific event listener and, like move movements, it needs a debouncing factor to prevent
”spamming”. For this event the debounce time is set to 300 ms and start timestamp, end timestamp,
(x, y) coordinates of the top left corner, are saved for logging.
{
"section": "string",
"startTimestamp": "string",
"endTimestamp": "string",
"x": "integer",
"y": "integer"
}
4.1.3.11 Resize
In the base body the field "type": "resize" is set.
When the window is resized, or the section is changed, new sizes are logged.
{
"section": "string",
"width": "integer",
"height": "integer",
"scrollWidth": "integer",
"scrollHeight": "integer",
"timestamp": "string"
}
4.1.3.12 Copy, cut, paste
In the base body the field "type": "copy || cut || paste" is set.
{
"section": "string",
4.1 Event capturing and client-side log management 17
"timestamp": "string",
"target": "string"
}
”target” refers to the DOM element where copy, or cut, is applied. If the event is paste, then ”target”
is substituted with ”text”, which contains the text being pasted.
4.1.3.13 Text input Backspace & Blur
In the base body the field "type": "text" is set.
When backspace is pressed inside a text input, or blur happens, text contained in the text input is
logged.
{
"section": "string",
"timestamp": "string",
"text": "string"
}
4.1.3.14 Radio group input
In the base body the field "type": "radioChange" is set.
Radio button changes triggers an event that log the group and the value of the radio button.
{
"section": "string",
"timestamp": "string",
"group": "string",
"value": "string"
}
4.1.3.15 Crowd Xplorer query and results
In the base body the field "type": "query || queryResults" is set.
A custom event listener for queries and query results is attached to Crowd Xplorer. When a query
is made, the query itself is logged.
{
"section": "string",
"query": "string"
}
When the results are retrieved, an array of URLs is created and logged.
18 Chapter 4 — Design & Implementation
{
"section": "string",
"urlArray": []
}
4.1.4 Event listener injection
As stated in Section 3.2, Angular uses Directives to inject new functionalities to custom elements. Elements
to be targeted depend on the set of events of interest, and, as previously identified in Section 3.2.2,
there are several events to be monitored.
Directives for the previous events are built using the same template: each event is monitored relative
to one custom element that is selected using its CSS selector, then, the element and the user actions
logging service are passed to the directive constructor, and, finally, every event listener is attached to
the element only if the logger is active and if the task is configured to monitor that specific event. This
is possible thanks to the variables exposed by the logging service.
Event listeners use event debouncing and bu↵ering to optimize function calls and reduce load provoked
by spammable events such as clicks, mouse movements and scroll.
Using rxJS library, event listeners are added to the skeleton’s elements, and for each one a pipe of
operations is configured to extract useful data and perform event manipulation. Usage of this library
allows composition of events, such as for the text selection event, which start when mouse is pressed
and held, and ends when the mouse is released.
Each event listener calls a function from the user actions logging service which is going to do the
operations described in Section 4.1.2.
4.2 Centralized logging
The characteristics described in Section 3.3 for building a centralized logging server can be applied to a
dedicated server but also using a cloud based solution on AWS.
4.2.1 Centralized logging using an AWS solution
AWS, or Amazon Web Services, is a cloud platform o↵ering over 200 web services for cloud computing,
from 25 data centers all around the world[18].
The idea about building a cloud based solution required to conduct another phase of analysis regarding
the specific services available on AWS that could integrate to o↵er the service described above.
The outcome was a combination of services as shown in Figure 4.2, with a linear structure in which each
component has a specific role.
As illustrated by Figure 4.2, a HTTP endpoint is exposed. This receives requests from clients,
passes each body to the enqueuing service, which passes a batch of requests to a serverless function.
Each function compute the set of requests retrieved from the queue and send them to a database for
storage.
All the infrastructure can be launched through the initialization script include in Crowd Frame.
An analysis of each component was conducted to better understand its behaviour and functioning.