COMP24011 Lab3:
Features for Estimating Autonomous Vehicle Poses
Riza Batista-Navarro and Francisco Lobo Academic session: 2023-24
Introduction
In this exercise, you will investigate matching visual features in a series of images captured during the navigation of an autonomous vehicle (AV). These features are used in estimating poses (i.e., camera trajectories) based on a visual odometry algorithm. Let’s start by introducing some terms and their definitions to help clarify concepts in autonomous robot navigation.
Odometry is the use of sensors to estimate a robot’s change in position relative to a known posi- tion. Visual odometry (VO) is a specific type of odometry where only cameras are used as sensors, as opposed to using, e.g., global positioning system (GPS) sensors or light detection and ranging (LIDAR) sensors. It is based on the analysis of a sequence of camera images. Simultaneous loc- alisation and mapping (SLAM) is a task whereby a robot needs to build a map of its current environment while at the same time trying to determine its position relative to that map.
In this exercise, you will explore a monocular (single-camera) VO solution to the SLAM Evaluation 2012 challenge, which made use of the KITTI data set. However, it is worth noting that VO is limited in that it can only perform trajectory estimation after each pose, and hence trajectory optimisation is achieved only locally. In contrast, global optimisation is achieved through loop closure: the correction of the trajectory upon revisiting an already encountered location.
The original KITTI data set consists of 4541 images and occupies several gigabytes. To reduce disk quota and running time requirements, this lab exercise will use a simplified version of the data set that we refer to as MyKITTI. This contains only the first 501 images of the original, but these still occupy close to 150 MB and so cannot be included in your GitLab repo. For this reason, you will need to download MyKITTI.zip from Dropbox and extract it.
The examples in this manual assume that you extract MyKITTI.zip in your home directory, and hence that the path to the data set is ~/MyKITTI. The command-line tool for this lab accepts an optional parameter allowing you to set a different path for the MyKITTI images.
In the lab3 branch of your COMP24011_2023 GitLab repo you will find a fully functional monocular VO system. The implementation provided to you builds upon another library available on GitHub, which was ported to Python 3 and recent OpenCV 4 versions, and has additional functionality specific to this lab exercise. To use our VO system, you will need to install the Python bindings for OpenCV by issuing the following command
$ pip install opencv-contrib-python
OpenCV is a comprehensive computer vision framework. It is very popular in both academic and
industrial contexts, and thus benefits from active development.
The lab code depends on OpenCV 4 functionality which has changed in recent years. Indeed, the continuous development of OpenCV has involved changes in data structures as well as changes in the API due to patent issues. The above command installs version 4.8.0.x on the Kilburn lab machines which will be the only supported version for this assignment.
In this exercise you will use OpenCV to complete computer vision tasks, which will involve getting familiar with its 2D Features Framework. The VO system for the lab uses the Scale-Invariant Feature Transform (SIFT) algorithm for feature detection and matching, so you will find the OpenCV-Python tutorials on these topics very useful.
1
The code we provide already implements feature tacking, which identifies feature correspondences between adjacent images. Your job is to implement feature matching and allow for inspecting any feature matches. This will require coding the following feature matching strategies:
1) distance thresholding,
2) nearest neighbour, and
3) nearest neighbour distance ratio.
You will also need to extract relevant information about the resulting feature matches; namely the coordinates of all the features involved and the distance for each matching pair.
The VO System
Once you refresh the lab3 branch of your GitLab repo you will find the following Python files.
run_odometry.py
visual_odometry.py
vision_tasks_base.py
vision_tasks.py
This is the command-line tool that runs the visual odometry accord- ing to the subcommand (and the parameters) provided by the user. It contains the RunOdometry class.
This is the main module of the visual odometry system implementing the PinholeCamera and VisualOdometry classes.
This module contains the VisionTasksBase “abstract” class that spe- cifies the signatures of four methods you need to implement, and imple- ments the interface used in RunOdometry and VisualOdometry.
This is the module that you need to complete for this exercise. It contains the VisionTasks class that is derived from VisionTasksBase, and must implement its abstract methods in order to complete feature matching and retrieve their details.
In order to complete this exercise you will need to understand both vision_tasks_base.py and vision_tasks.py but you do not need to know the details of how run_odometry.py or visual_odometry.py are coded.
The VO tool provides comprehensive help messages. To get started run the command
$ ./run_odometry.py -h
usage: run_odometry.py [-h] -d DATASET
{view_trajectory,view_feature,view_info,get_info} ...
options:
-h, --help show this help message and exit
-d DATASET, --dataset DATASET
path to KITTI dataset directory (option required
unless dataset is located at ~/MyKITTI)
subcommands:
select which odometry command to run
{view_trajectory,view_feature,view_info,get_info}
view_trajectory show the car camera view and calculated trajectory
view_feature
view_info
get_info
show the matches for a frame feature using OpenCV (and
save this image as opencv_visual.png)
show the matches for a frame feature using both OpenCV
and the info algorithm (and save this composite image
as custom_visual.png)
use the info algorithm to get details of matches for a
frame feature
2
As noted in the introduction, you do not need to specify where the MyKITTI data set is located if you extract it to ~/MyKITTI, otherwise every time you run run_odometry.py you will need to add the appropriate -d option. The VO tool supports four subcommands: view_trajectory, view_feature, view_info and get_info. Each of these subcommands has its own help message which you can access with commands like
$ ./run_odometry.py view_trajectory -h
usage: run_odometry.py view_trajectory [-h] [frame_id]
positional arguments:
frame_id index of frame to stop visualisation (1 to 500)
options:
-h, --help show this help message and exit
Thus, if you extracted MyKITTI at /tmp/MyKITTI and want to visualise the trajectory of the AV up to frame 150, you should run the command
You should see the trajectory and camera view in real time (depending on processing power). The VO tool will open a window that sequentially displays the first 151 images in the data set (with IDs 0 to 150), and will show in another window the true trajectory (drawn in green) and the estimated trajectory (drawn in red) of the vehicle. While it is running you will get on the shell progress indication showing the frame index being processed.
You need to use a graphical environment that the OpenCV library supports, in order to run the view_trajectory, view_feature and view_info subcommands. These will work as expected if you use the Ubuntu graphical login on Kilburn lab machines.
Determining the vehicle’s trajectory only involves feature tracking, which has been implemented for you in VisionTasksBase. Once you start implementing the feature matching algorithms, you can begin using the subcommands to show and report on feature matches. You will probably use view_feature first, as this requires you to code only a particular matching algorithm. To get the command-line help run
$ ./run_odometry.py -d /tmp/MyKITTI view_trajectory 150 vo params: (None, None, ’~/MyKITTI’)
debug run: view_trajectory(150,)
Processing frame 150
ret value: None
$ ./run_odometry.py view_feature -h
usage: run_odometry.py view_feature [-h] -a {dt,nn,nndr} -t THRESHOLD
positional arguments:
frame_id feature_id
frame_id
feature_id
options:
-h, --help
index of chosen frame (1 to 500)
index of chosen feature (0 to 1499)
show this help message and exit
-a {dt,nn,nndr}, --algorithm {dt,nn,nndr}
name of the matching algorithm used to match features
-t THRESHOLD, --threshold THRESHOLD
decimal value of threshold for feature matching
(option required except for nn algorithm)
3
The subcommands view_info and get_info have similar command-line options. The possible choices for algorithm correspond to the feature matching strategies that you’ll implement in Tasks 1 to 3 below. These are:
dt to use your implementation of distance thresholding,
nn to use your implementation of nearest neighbour, and
nndr to use your implementation of nearest neighbours distance ratio.
The valid ranges of threshold value depend on the choice of matching algorithm, as follows:
• between 0 and 500 (inclusive1) for the dt and nn,
• between 0 and 1 (inclusive) for the nndr algorithm.
The threshold value is optional when using nn as described in Task 2.
As an example, once you completed Task 1 below, you can view the matches of the feature with index 127 of frame 100 according to the distance thresholding algorithm for a threshold value of 232.5 by running
This will display frames 100 and 101 side-by-side and OpenCV, using its own cv2.drawMatchesKnn function, will draw lines from feature 127 in the first frame to its matches in the second, as shown in Figure 1.
Figure 1: Viewing a feature
Because of the small size of Figure 1, it is difficult to see that feature 127 is the bottom right corner of the lower window in frame 100, and that it is matched with the bottom right corners of the upper and lower windows in frame 101, but this will be easier on the screen.
While frame indices correspond to images in the MyKITTI data set, feature indices are generated by OpenCV at runtime. This means that there is no guarantee that feature 127 is as shown in Figure 1 if you run the above example. There will be a feature corresponding to the bottom right corner of the lower window in frame 100, but depending on the C libraries installed on the lab machine this feature may be assigned a different index.
This issue affects all of the examples below (though hopefully feature indices won’t be too unreliable when using Kilburn lab machines). However, please note that this will not affect the marking as both your solution and the reference implementation will be running on the same machine!
You will be able to use the view_info and get_info subcommands, once you’ve completed Task 4 below. Using view_info instead of view_feature will produce the image in Figure 2. It combines the OpenCV visualisation of feature matches with a rendering of the coordinates that your imple- mentation of VisionTasks.matching_info() returns (in green). This will allow you to visually inspect if your code calculates the correct coordinate values.
1. As this value is applied on Euclidean distances, in principle the threshold can be any positive number. However, for this exercise, only threshold values between 0 and 500 (inclusive) will be tested.
$ ./run_odometry.py view_feature -a dt -t 232.5 101 127 vo params: (’dt’, 232.5, ’~/MyKITTI’)
debug run: view_feature(101, 127)
Processing frame 101
ret value: None
4
Figure 2: Viewing feature information
To get the details of the matches that your function returns, you need to run
$ ./run_odometry.py get_info -a dt -t 232.5 101 127 vo params: (’dt’, 232.5, ’../MyKITTI’)
debug run: get_info(101, 127)
Processing frame 101
ret value: ((1066, 152),
[(1032, 152), (1033, 9)],
ret count: 3
[70.21395874023438, 172.14529418945312])
As the example shows, this will be a tuple with the following three elements:
• a tuple of two integers,
giving the 𝑥 and 𝑦 coordinates of the chosen feature in the previous frame,
• a list of tuples,
each with 𝑥 and 𝑦 coordinates in the current frame matching the chosen feature; and
• a list of floats,
each giving the distance between the chosen feature and the corresponding match.
For the example, in frame 100 the bottom right corner of the lower window is located (1066, 152) and its matches in frame 101 have coordinates (1032,152) and (1033,9) on the lower and upper window, respectively.
Assignment
For this lab exercise, the only Python file that you need to modify is vision_tasks.py. You will develop your own version of this script, henceforth referred to as “your solution” in this document, following the tasks outlined below.
Although you do not need to make any changes to the vision_tasks_base.py script, it is advisable that you familiarise yourself with its contents before you proceed to developing your solution. In particular, it is useful to take note of the following details in relation to the VisionTasksBase class:
• Its constructor creates a detector object by calling the OpenCV function cv2.SIFT_create, which firstly creates a feature detector based on the Scale-Invariant Feature Transform (SIFT) algorithm that is configured to extract 1500 features from each image (as specified by the NUM_FEATURES constant).
• Its featureMatching function takes two images as arguments: cur_image, the image cor- responding to the given as input to drive_car; and prev_image, the image immediately preceding cur_image in the frame sequence.
5
• By calling the detectAndCompute function of the detector object, a list of keypoints and a list of descriptors are computed for each of prev_image and cur_image.
Note that the size of the list of keypoints and list of descriptors returned, is the same as NUM_FEATURES.
• A feature matching strategy — based on your own implementation of the dt, nn and nndr functions in vision_tasks.py — is then applied through a call to matching_algo, taking as parameters the list of descriptors extracted for each of prev_image and cur_image, as well as a threshold value (if provided).
This lab exercise requires you to implement three feature matching algorithms (which were ex- plained in the Week 7 lectures) and one function for determining the coordinates of the matches. Stub code has been provided in vision_tasks.py, which you must modify to turn it into your own solution.
In your implementation, you can make use of a class called BFMatcher (short for brute-force matcher). This is an OpenCV class that implements some functions for computing the similarity (e.g., in terms of Euclidean distance) between image descriptors (i.e., features).
bf = cv2.BFMatcher()
Without supplying any input arguments to cv2.BFMatcher(), Euclidean distance is used as the distance metric by default (which is what we need).
For each of the tasks below, you should use the knnMatch function of the BFMatcher object to compute matches between two sets of descriptors. For example:
matches = bf.knnMatch(des1, des2, k=100)
will return the closest 100 matches for every descriptor in an image, in order of increasing Euclidean distance.
The arguments des1 and des2 correspond to the descriptors computed for the query (preceding or previous) image and reference (current) image, respectively. Meanwhile, the value of k specifies the maximum number of closest matches that should be returned for each query descriptor.
The above call to knnMatch will return a list matches which is of the same size as the number of descriptors in the query image. Each element in matches is a list itself that contains k elements which are of type DMatch.
The return type of the feature matching functions that you are required to implement should similarly be a list of lists of DMatch objects; however, you need to write code to remove any matches that should be eliminated depending on the specified algorithm and/or threshold value.
Any DMatch object has the following attributes, which you will likely find useful:
distance
queryIdx
the Euclidean distance between the query descriptor and the matched descriptor,
the index of the query keypoint in the list of keypoints detected from the query (i.e. preceding) image,
the index of the reference keypoint in the list of keypoints detected from the reference (i.e. current) image.
trainIdx
For example, if m is a DMatch object, one can obtain the value of the Euclidean distance between
a query descriptor and the matched descriptor with:
dist_value = m.distance
6
A DMatch object contains information on which keypoints from the query and reference images have been considered as a match. The first step towards obtaining these keypoints is by determining their indices, as follows:
In the above, each of query_kp_index and ref_kp_index is an integer which is an index. Thus, one can obtain the actual keypoints with:
assuming that prev_kp and cur_kp are the list of keypoints detected from the query (preceding or previous) and reference (current) images, respectively. Here, each of query_kp and ref_kp is of type KeyPoint. It has an attribute called pt which is a tuple of two elements, corresponding to the 𝑥 and 𝑦 coordinates of the keypoint. However, the coordinates are provided as floating point values, thus one needs to cast them into integers, as in the following example:
Bearing in mind the above notes on OpenCV functions and classes, you can proceed to completing the tasks below.
Task 1: In your solution, write a function called dt that implements feature matching based on the distance thresholding algorithm. It returns a list of lists of DMatch objects.
You can verify that your function behaves correctly on the command line. For example, you should obtain output that is similar to the following.
query_kp_index = m.queryIdx
ref_kp_index = m.trainIdx
query_kp = prev_keypoints[query_kp_index]
ref_kp = cur_keypoints[ref_kp_index]
x_coord_query_kp = int(query_kp.pt[0])
y_coord_query_kp = int(query_kp.pt[1])
$ ./run_odometry.py get_info -a dt 150 209 -t 100 vo params: (’dt’, 100.0, ’~/MyKITTI’)
debug run: get_info(150, 209)
Processing frame 150
ret value: ((226, 119), [(208, 115)], [72.02082824707031])
ret count: 3
$ ./run_odometry.py get_info -a dt 150 209 -t 300 vo params: (’dt’, 300.0, ’~/MyKITTI’)
debug run: get_info(150, 209)
Processing frame 150
ret value: ((226, 119), [(208, 115), (394, 52), (519, 213), \
(872, 160), (402, 48)], [72.02082824707031, 242.6643829345703, \
282.074462890625, 290.75762939453125, 294.5997314453125])
ret count: 3
Task 2: In your solution, write a function called nn that implements feature matching based on the nearest neighbours algorithm. It returns a list of lists of DMatch objects.
Recall that the nearest neighbours algorithm can be used with or without a threshold value; hence the --threshold argument is optional if --algorithm is set to nn. In this case your function will get the value None as the threshold argument.
Again, you can verify that your function behaves correctly on the command line. For example, you should obtain output that is similar to the following.
7
$ ./run_odometry.py get_info -a nn 150 209 vo params: (’nn’, None, ’~/MyKITTI’)
debug run: get_info(150, 209)
Processing frame 150
ret value: ((226, 119), [(208, 115)], [72.02082824707031])
ret count: 3
$ ./run_odometry.py get_info -a nn 150 209 -t 100 vo params: (’nn’, 100.0, ’~/MyKITTI’)
debug run: get_info(150, 209)
Processing frame 150
ret value: ((226, 119), [(208, 115)], [72.02082824707031]) ret count: 3
$ python run_odometry.py get_info -a nn 150 209 -t 50
vo params: (’nn’, 50.0, ’~/MyKITTI’)
debug run: get_info(150, 209)
Processing frame 150
ret value: ((0, 0), [], [])
ret count: 3
Task 3: In your solution, write a function called nndr that implements feature matching based on the nearest neighbours distance ratio algorithm. It returns a list of lists of DMatch objects.
As above, you can verify that your function behaves correctly on the command line. For example, you should obtain output that is similar to the following.
$ ./run_odometry.py get_info -a nndr 120 45 -t 0.97 vo params: (’nndr’, 0.97, ’~/MyKITTI’)
debug run: get_info(120, 45)
Processing frame 120
ret value: ((22 , 14), [(61 , 163)], [156.46725463867188])
ret count: 3
$ ./run_odometry.py get_info -a nndr 120 45 -t 0.93 vo params: (’nndr’, 0.93, ’~/MyKITTI’)
debug run: get_info(120, 45)
Processing frame 120
ret value: ((0, 0), [], [])
ret count: 3
Task 4:
In your solution, write a function called matching_info that takes the matches obtained by any of the above-described feature matching functions, and determines the image coordinates of the query keypoint and of every matching reference keypoint. It returns a tuple with the following three elements:
• a tuple of two integers, which corresponds to the 𝑥 and 𝑦 coordinates of the query keypoint;
• a list of tuples, where each tuple has two integers corresponding to the 𝑥 and 𝑦 coordinates of a matching reference keypoint; and
• a list of floats, which corresponds to the distances between the query keypoint and the matching keypoints.
If the matches passed on the function is an empty list, the return value should be a tuple whose first element is (0,0) and whose second and third elements are empty lists: ((0, 0), [], []).
8
You can verify that your function behaves correctly on the command line by inspecting the return values of of the get_info subcommand, as exemplified in Tasks 1, 2 and 3 above. Additionally, you can use the view_info subcommand to visualise the coordinates that you have obtained and check if they are the same as what OpenCV’s own cv2.drawMatchesKnn function returns.
Submission
Please follow the README.md instructions in your COMP24011_2023 GitLab repo. Refresh the files of your lab3 branch and develop your solution to the lab exercise. The solution consists of a single file called vision_tasks.py which must be submitted to your GitLab repo and tagged as lab3_sol. The README.md instructions that accompany the lab files include the git commands necessary to commit, tag, and then push both the commit and the tag to your COMP24011_2023 GitLab repo. Further instructions on coursework submission using GitLab can be found in the CS Handbook, including how to change a git tag after pushing it.
The deadline for submission is 18:00 on Friday 24th November. In addition, no work will be considered for assessment and/or feedback if submitted more than 2 weeks after the deadline. (Of course, these rules will be subject to any mitigating procedures that you have in place.)
The lab exercise will be auto-marked offline. The automarker program will download your sub- mission from GitLab and test it against our reference implementation. For each task the return value of your function will be checked on a random set of valid arguments. A time limit of 10 seconds will be imposed on every function call, and exceeding this time limit will count as a runtime error. If your function does not return values of the correct type, this will also count as a runtime error.
A total of 20 marks is available in this exercise, distributed as shown in the following table. Task Function Marks
1 VisionTasks.dt() 5
2 VisionTasks.nn() 5
3 VisionTasks.nndr() 5
4 VisionTasks.matching_info() 5
The marking scheme for all tasks is as follows:
• You obtain the first 0.5 marks if all tests complete without runtime errors.
• The proportion of tests with fully correct return values determines the remaining 4.5 marks.
Important Clarifications
• It will be very difficult for you to circumvent time limits during testing. If you try to do this, the most likely outcome is that the automarker will fail to receive return values from your implementation, which will have the same effect as not completing the call. In any case, an additional time limit of 300 seconds for all tests of each task will be enforced.
• This lab exercise is fully auto-marked. If you submit code which the Python interpreter does not accept, you will score 0 marks. The Python setup of the automarker is the same as the one on the department’s Ubuntu image, but only a minimal set of Python modules are available. If you choose to add import statements to the sample code, it is your responsibility to ensure these are part of the default Python package available on the lab machines.
• It doesn’t matter how you organise your lab3 branch, but you should avoid having multiple files with the same name. The automarker will sort your directories alphabetically (more specifically, in ASCII ascending order) and find submission files using breadth-first search. It will mark the first vision_tasks.py file it finds and ignore all others.
• Every file in your submission should only contain printable ASCII characters. If you include other Unicode characters, for example by copying and then pasting code from the PDF of the lab manuals, then the automarker is likely to reject your files.