COMP226 Assignment 1

COMP226 Assignment 1: Reconstruct a

Limit Order Book

Continuous

Assessment Number

1 (of 2)

Weighting 15%

Assignment Circulated Monday 1 March 2021

Deadline 17:00 Friday 19 March 2021

Submission Mode Submit a single R file "solution.R" to the CodeGrade Assignment

on Canvas

Learning Outcomes

Assessed

Have an understanding of market microstructure and its impact

on trading.

Goal of Assignment Reconstruct a limit order book from order messages

Marking Criteria Code correctness (85%); Code readability (15%)

Submission necessary

in order to satisfy

module requirements

Late Submission

Penalty

Standard UoL policy; resubmissions after the deadline may

not be considered.

Expected time taken Roughly 8-12 hours

Warning

Submissions are automatically put through a plagiarism and collusion detection

system. Students found to have plagiarized or colluded will likely receive a mark of

zero. Do not discuss or show your work to others. In previous years, students have

had their studies terminated and left without a degree because of plagiarism.

Rscript from Rstudio

In this assigment, we use Rscript (which is provided by R) to run our code, e.g.,

Rscript main.R template.R input/book_1.csv input/empty.txt

In R studio, you can call Rscript from the "terminal" tab (as opposed to the "console").

On Windows, use Rscript.exe not Rscript:

Rscript.exe main.R template.R input/book_1.csv input/empty.txt

Distributed code and sample input and output data

As a first step, please download comp226_a1.zip from:

https://liverpool.instructure.com/courses/17934/files/3438715/download?download_frd=1

Then unzip comp226_a1.zip, which will yield the following contents in the directory

comp226_a1:

comp226_a1

├── common.R

├── input

│ ├── book_1.csv

│ ├── book_2.csv

│ ├── book_3.csv

│ ├── empty.txt

│ ├── message_a.txt

│ ├── message_ar.txt

│ ├── message_arc.txt

│ ├── message_ex_add.txt

│ ├── message_ex_cross.txt

│ ├── message_ex_reduce.txt

│ └── message_ex_same_price.txt

├── main.R

├── output

│ ├── book_1-message_a.out

│ ├── book_1-message_ar.out

│ ├── book_1-message_arc.out

│ ├── book_2-message_a.out

│ ├── book_2-message_ar.out

│ ├── book_2-message_arc.out

│ ├── book_3-message_a.out

│ ├── book_3-message_ar.out

│ └── book_3-message_arc.out

└── template.R

2 directories, 23 files

Brief summary

You are provided with three .R files, two complete, which should not be edited: • main.R is the file that you will run e.g., with Rscript, by specyfying several command

line arguments described below (with an example shown above in the box "Rscript from

Rstudio")

• common.R contains complete working functions that are used by main.R in conjunction

with the incomplete functions in template.R

and one incomplete file that contains 6 empty functions which you should complete

for this assignment: • template.R is the file that you will edit -- the distributed version contains empty

functions

If you run main.R using template.R as it is distributed, it runs without error, but does not

produce the desired output because the 6 functions in template.R are provided empty. To

complete the assignment you will need to correctly complete these 6 functions.

You should submit a single R file that contains your implementation of some or ideally all of

these 6 functions. Your submission will be marked via a combination of:

• automated tests (for code correctness, 85%, breakdown by function given below);

and

• human visual inspection (for code readability, 15%, in particular, for appropriate

naming of variables and functions (5%), good use of comments (5%), and sensible,

consistent code formatting (5%)).

Correct sample output is provided so that you can check whether your code implemetations

produces the correct output. You can either check this "offline" for example with a tool like

diff (https://en.wikipedia.org/wiki/Diff) to compare the output that you produce with that

given out, or via CodeGrade by submitting your code on Canvas where you will recieve

feedback on a number of relevant test cases.

template.R versus solution.R

You are given template.R, which you should extend by implementing 6 functions.

Throughout this handout, we also generate example output using a file solution.R that

contains a correct implementation of all 6 of these functions. Obviously, you are not

given the file solution.R, however the example output will be helpful for checking that

your function implementations work correctly.

Two sets of functions to implement

As described in detail in the rest of this document, you are required to implement the

following 6 functions. The percentage in square brackets correspond to the breakdown of the

correctness marks by function.

Limit order book stats:

1. book.total_volume <- function(book) [10%]

2. book.best_prices <- function(book) [10%]

3. book.midprice <- function(book) [10%]

4. book.spread <- function(book) [10%]

Updating the limit order book:

5. book.reduce <- function(book, message) [15%]

6. book.add <- function(book, message) [30%]

Running main.R with template.R

An example of calling main.R with template.R is as follows.

Rscript main.R template.R input/book_1.csv input/empty.txt

As seen in this example, main.R takes as arguments the path to three input files:

1. an R file with the 6 functons (template.R in the example)

2. initial order book (input/book_1.csv in the example)

3. order messages to be processed (input/empty.txt in the example)

Note: the order of the arguments matters.

Let's see part of the source code and the output that it produces.

options(warn=-1)

args <- commandArgs(trailingOnly = TRUE); nargs = length(args)

log <- (nargs == 4) # TRUE is there are exactly 4 arguments

arg_format <- "<--log> "

if (nargs < 3 || nargs > 4) # check that there are 3 or 4 arguments

stop(paste("main.R has 3 required arguments and 1 optional flag:", arg_format))

if (nargs == 4 && args[1] != "--log") # if 4 check that --log is the first

stop(paste("Bad arguments format, expected:", arg_format))

solution_path <- args[nargs-2]

book_path <- args[nargs-1]

messages_path <- args[nargs]

if (!all(file.exists(c(solution_path, book_path, messages_path))))

stop("File does not exist at path provided.")

source(solution_path); source("common.R") # source common.R from pwd

book <- book.load(book_path)

book <- book.reconstruct(data.load(messages_path), init=book, log=log)

book.summarise(book)

So in short, this part of the code:

• checks that the command line arguments are ok

• assigns them to variables (solution_path, data_path, and book_path respectively)

• sources common.R and the file at solution_path

• loads the initial book

• reconstructs the book according to the messages

• prints out the book

• prints out the book stats

Let's see the output for the example above:

$ Rscript main.R template.R input/book_1.csv input/empty.txt

$ask

oid price size

1 a 105 100

$bid

oid price size

1 b 95 100

Total volume:

Best prices:

Mid-price:

Spread:

Now let's see what the output would look like for a correct implementation:

$ Rscript main.R solution.R input/book_1.csv input/empty.txt

$ask

oid price size

1 a 105 100

$bid

oid price size

1 b 95 100

Total volume: 100 100

Best prices: 95 105

Mid-price: 100

Spread: 10

You will see that now the order book stats have been included in the output, because the

four related functions that are empty in template.R have been implemented in solution.R.

The initial order book

Here is the contents of input/book_1.csv, which is one of the 3 provided examples of an

initial book:

oid,side,price,size

a,S,105,100

b,B,95,100

Let's justify the columns to help parse this input:

oid side price size

a S 105 100

b B 95 100

The first row is a header row. Every subsequent row contains a limit order, which is

described by the following fields:

• oid (order id) is stored in the book and used to process (partial) cancellations of orders

that arise in "reduce" messages, described below;

• side identifies whether this is a bid ('B' for buy) or an ask ('S' for sell);

• price and size are self-explanatory.

Existing code in common.R will read in a file like input/book_1.csv and create the

corresponding two (possibly empty) orders book as two data frames that will be stored in the

list book, a version of which will be passed to all of the six functions that you are required to

implement.

Note that if we now change the message file to a non-empty one, template.R will produce

the same output (since it doesn't parse the messages; you need to write the code, functions

5 and 6, to do that):

$ Rscript main.R template.R input/book_1.csv input/message_a.txt

$ask

oid price size

1 a 105 100

$bid

oid price size

1 b 95 100

Total volume:

Best prices:

Mid-price:

Spread:

If correct message parsing and book updating is implemented, book would be updated

according to input/adds_only.txt to give the following output:

$ Rscript main.R solution.R input/book_1.csv input/message_a.txt

$ask

oid price size

8 a 105 100

7 o 104 292

6 r 102 194

5 k 99 71

4 q 98 166

3 m 98 88

2 j 97 132

1 n 96 375

$bid

oid price size

1 b 95 100

2 l 95 29

3 p 94 87

4 s 91 102

Total volume: 318 1418

Best prices: 95 96

Mid-price: 95.5

Spread: 1

Before we go into details on the message format and reconstructing the order book, let's

discuss the first four functions that compute the book stats, which we also see correctly

computed in this example.

Computing limit order book stats

The first four of the functions that you need to implement compute limit order book stats,

and can be developed and tested without parsing the order messages at all. In particular,

you can develop and test the first four functions using an empty message file,

input/empty.txt, as in the first example above.

The return values of the four functions should be as follows (where, as usual in R, single

numbers are actually numeric vectors of length 1):

• book.total_volumes should return a list with two named elements, bid, which should

contain the total volume in the bid book, and ask, which should contain the total volume

in the ask book;

• book.best_prices <- function(book) should return a list with two named elements,

bid, which should contain the best bid price, and ask, which should contain the best ask

price;

• book.midprice should return the midprice of the book;

• book.spread should return the spread of the book;

You should check that the output of these functions in the example above that uses

solution.R are what you expect them to be.

We now move on to the reconstructing the order book from the messages in the input

message file.

Reconstructing the order book from messages

You do not need to look into the details of the (fully implemented) functions

book.reconstruct or book.handle in common.R that manage the reconstruction of the book

from the starting initial book according to the messages (but you can if you want).

In the next section, we describe that there are two types of message, "Add" messages and

"Reduce" messages. All you need to know to complete the assignment is that messages in

the input file are processed in order, i.e., line by line, with "Add" messages passed to

book.add and "Reduce" messages passed to book.reduce, along with the current book in

both cases.

Message Format

The message file contains one message per line (terminated by a single linefeed character,

'\n'), and each message is a series of fields separated by spaces.

There are two types of messages: "Add" and "Reduce" messages. Here's an example,

which contains an "Add" message followed by a "Reduce" message:

A c S 97 36

R a 50

An "Add" message looks like this:

'A' oid side price size

• 'A': fixed string identifying this as an "Add" message;

• oid: "order id" used by subsequent "Reduce" messages;

• side: 'B' for a buy order (a bid), and an 'S' for a sell order (an ask);

• price: limit price of this order;

• size: size of this order.

A "Reduce" message looks like this:

'R' oid size

• 'R': fixed string identifying this as a "Reduce" message;

• oid: "order id" identifies the order to be reduced;

• size: amount by which to reduce the size of the order (not the new size of the order); if

size is equal to or greater than the existing size of the order, the order is removed from

the book.

Processing messages

"Reduce" messages will affect at most one existing limit order in the book.

"Add" messages will either:

• not cross the spread and then add a single row to the book (orders at the same price

are stored separately to preserve their distinct "oid"s);

• cross the spread and in that case can affect any number of orders on the other side of

the book (and may or may not result in a remaining limit order for residual volume).

The provided example message files are split into cases that include crosses and those that

don't. This alllows you to develop your code incrementally and test it on inputs of differing

difficulty.

We do an example of each case, one by one. In each example we start from

input/book_1.csv; we only show this initial book in the first case.

Example of processing a reduce message

$ Rscript main.R solution.R input/book_1.csv input/empty.txt

$ask

oid price size

1 a 105 100

$bid

oid price size

1 b 95 100

Total volume: 100 100

Best prices: 95 105

Mid-price: 100

Spread: 10

$ cat input/message_ex_reduce.txt

R a 50

$ Rscript main.R solution.R input/book_1.csv input/message_ex_reduce.txt

$ask

oid price size

1 a 105 50

$bid

oid price size

1 b 95 100

Total volume: 100 50

Best prices: 95 105

Mid-price: 100

Spread: 10

Example of processing an add (non-crossing) message

$ cat input/message_ex_add.txt

A c S 97 36

$ Rscript main.R solution.R input/book_1.csv input/message_ex_add.txt

$ask

oid price size

2 a 105 100

1 c 97 36

$bid

oid price size

1 b 95 100

Total volume: 100 136

Best prices: 95 97

Mid-price: 96

Spread: 2

Example of processing a crossing add message

$ cat input/message_ex_cross.txt

A c B 106 101

$ Rscript main.R solution.R input/book_1.csv input/message_ex_cross.txt

$ask

[1] oid price size

<0 rows> (or 0-length row.names)

$bid

oid price size

1 c 106 1 2 b 95 100

Total volume: 101 0

Best prices: 106 NA

Mid-price: NA

Spread: NA

Sample output

We provide sample output for 9 cases, namely all combinations of the following 3 initial

books and 3 message files.

The 3 initial books are found in the input subdirectory and are called:

• book_1.csv

• book_2.csv

• book_3.csv

The 3 message files are also found in the input subdirectory and are called:

file

messages_a.txt add messages only, i.e., requires book.add but not book.reduce; for

all three initial books, none of the messages cross the spreed

messages_ar.txt add and reduce messages, but for the initial book book_3.csv, no

add message crosses the spread

messages_arc.txt add and reduce messages, with some adds that cross the spread for

all three initial books

The 9 output files can be found in the output subdirectory of the comp226_a1 directory.

output

├── book_1-message_a.out

├── book_1-message_ar.out

├── book_1-message_arc.out

├── book_2-message_a.out

├── book_2-message_ar.out

├── book_2-message_arc.out

├── book_3-message_a.out

├── book_3-message_ar.out

└── book_3-message_arc.out

0 directories, 9 files

Hints for order book stats

For book.spread and book.midprice a nice implementation would use book.best_prices,

which you should then implement first.

Hints for book.add and book.reduce

A possible way to implement book.add and book.reduce that makes use of the different

example message files is the following:

• First, do a partial implementation of book.add, namely implement add messages that do

not cross. Check your implementation with message_a.txt. • Next, implement book.reduce fully. Check your combined (partial) implementation of

book.add and book.reduce with message_ar.txt and book_3.csv (only this

combination with message_ar.txt has no crosses).

• Finally, complete the implementation of book.add to deal with crosses. Check your

implementation with message_arc.txt and any initial book or with message_ar.txt and

book_1.csv or book_2.csv.

Hint on book.sort

$ Rscript main.R solution.R input/book_1.csv input/message_ex_same_price.txt

$ask

oid price size

2 j 105 132

1 a 105 100

$bid

oid price size

1 b 95 100

2 k 95 71

Total volume: 171 232

Best prices: 95 105

Mid-price: 100

Spread: 10

Note that earlier messages are closer to the top of the book. This is due to price-time

precedence.

Price-time precedence

In this assignment, orders are executed according to price time precedence:

• Best price first, but when two orders have the same price, the earlier one is

executed first

We provide book.sort that respects price-time precedence. It relies on the fact that the

order ids increase as follows:

a < k < ab < ba

where < is indicating "comes before" in the message files.

book.sort <- function(book, sort_bid=T, sort_ask=T) {

if (sort_ask && nrow(book$ask) >= 1) {

book$ask <- book$ask[order(book$ask$price,

nchar(book$ask$oid),

book$ask$oid,

decreasing=F),]

row.names(book$ask) <- 1:nrow(book$ask) }

if (sort_bid && nrow(book$bid) >= 1) {

book$bid <- book$bid[order(-book$bid$price,

nchar(book$bid$oid),

book$bid$oid,

decreasing=F),]

row.names(book$bid) <- 1:nrow(book$bid) }

book

}

This method will ensure that limit orders are sorted first by price and second by time of

arrival (so that for two orders at the same price, the older one is nearer the top of the

book).

You are welcome (and encouraged) to use book.sort in your own implementations. In

particualar, by using it you can avoid having to find exactly where to place an order in the

book.

Hint on using logging in book.reconstruct

In common.R a logging option has been added to book.reconstruct:

book.reconstruct <- function(data, init=NULL, log=F) {

if (nrow(data) == 0) return(book)

if (is.null(init)) init <- book.init()

book <- Reduce(

function(b, i) {

new_book <- book.handle(b, data[i,])

if (log) {

cat("Step", i, "\n\n")

book.summarise(new_book, with_stats=F)

cat("====================\n\n") }

new_book

1:nrow(data), init, )

book.sort(book) }

If you want to use this for debugging, you can turn it on with the --log flag as in the

following example:

Rscript main.R --log solution.R input/book_1.csv input/message_arc.txt

Then book.summarise will be used to give output after each message is processed by

book.reconstruct.

Hint on stringsAsFactors=FALSE

Notice the use of stringsAsFactors=FALSE in the book.load function (similarly in

data.load) from common.R.

book.load <- function(path) {

df <- read.table(

path, fill=NA, stringsAsFactors=FALSE, header=TRUE, sep=','

)

book.sort(list(

ask=df[df$side == "S", c("oid", "price", "size")],

bid=df[df$side == "B", c("oid", "price", "size")]

))

}

Its use here is not optional, it is necessary and what ensures that the oid column of

book$bid and book$ask have type character.

It is also crucial that you make sure that you ensure that the type of your oid columns in

your books remain character rather than factors. The following examples will explain the

use of stringsAsFactors and help you to achieve this.

First we introduce a function that will check the type of this column on different data frames

that we will construct:

check <- function(df) {

checks <- c("is.character(df$oid)",

"is.factor(df$oid)")

for (check in checks)

cat(sprintf("%20s: %5s",

check,

eval(parse(text=check))),

'\n') }

Now let's use this function to explore different cases. First we look at the case of reading a

csv.

> check(read.csv('input/book_1.csv'))

is.character(df$oid): FALSE

is.factor(df$oid): TRUE

> check(read.csv('input/book_1.csv', stringsAsFactors=FALSE))

is.character(df$oid): TRUE

is.factor(df$oid): FALSE

What about creating a data.frame? > check(data.frame(oid="a", price=1))

is.character(df$oid): FALSE

is.factor(df$oid): TRUE

> check(data.frame(oid="a", price=1, stringsAsFactors=FALSE))

is.character(df$oid): TRUE

is.factor(df$oid): FALSE

What about using rbind? > empty_df <- data.frame(oid=character(0), price=numeric(0))

> non_empty_df <- data.frame(oid="a", price=1, stringsAsFactors=FALSE) > check(rbind(empty_df, data.frame(oid="a", price=1)))

is.character(df$oid): FALSE

is.factor(df$oid): TRUE

> check(rbind(empty_df, non_empty_df))

is.character(df$oid): TRUE

is.factor(df$oid): FALSE

> check(rbind(non_empty_df, data.frame(oid="a", price=1)))

is.character(df$oid): TRUE

is.factor(df$oid): FALSE

Note that with a non-empty data frame, the existing type persists! However, when the

data.frame is empty the type of the oid column is malleable and it is crucial to use

stringsAsFactors=FALSE. We see the same behaviour when we rbind a list with a

data.frame. > check(rbind(empty_df, list(oid="a", price=1)))

is.character(df$oid): FALSE

is.factor(df$oid): TRUE

> check(rbind(empty_df, list(oid="a", price=1), stringsAsFactors=FALSE))

is.character(df$oid): TRUE

is.factor(df$oid): FALSE

> check(rbind(non_empty_df, list(oid="a", price=1)))

is.character(df$oid): TRUE

is.factor(df$oid): FALSE

Again, it is crucial to use stringsAsFactors=FALSE when the data.frame is empty. I

suggest to use it in every case.

Hints for code readability

• Good variable names clearly describe what is being stored, e.g., best_bid,

executed_so_far, rather than x and y. Good function names describe clearly what the

function does.

• Write informative comments, e.g., "# Check if the order id is in the book already".

• Use consistent spacing.

Submission

Submission is via CodeGrade on Canvas. Remember to call the file that you submit

"solution.R".

Note that, as shown in the video, you can get feedback on whether your code is passing

some automated tests even before the submission deadline.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

mgt202辅导、讲解 java/pytho... 2025-06-28
讲解 pbt205—project-based l... 2025-06-28
辅导 comp3702 artificial int... 2025-06-28
辅导 cs3214 fall 2022 projec... 2025-06-28
辅导 turnitin assignment讲解... 2025-06-28
辅导 finite element modellin... 2025-06-28
讲解 stat3600 linear statist... 2025-06-28
辅导 problem set #3讲解 matl... 2025-06-28
讲解 elen90066 embedded syst... 2025-06-28
讲解 automatic counting of d... 2025-06-28
讲解 ct60a9602 functional pr... 2025-06-28
辅导 stat3600 linear statist... 2025-06-28
辅导 csci 1110: assignment 2... 2025-06-28
辅导 geography调试r语言 2025-06-28
辅导 introduction to informa... 2025-06-28
辅导 envir 100: introduction... 2025-06-28
辅导 assessment 3 - individu... 2025-06-28
讲解 laboratory 1讲解留学生... 2025-06-28
辅导 ct60a9600 renewable ene... 2025-06-28
辅导 economics 140a homework... 2025-06-28

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！