Exercise 3 - Object Detection
Published: Monday Nov. 30.
Exercise 3: Object Detection
Setup
Follow all the same setup steps from Exercise 1 for setting up your robot, and developing through the dts exercises
interface.
Finally, make sure your shell is up to date, and the first time you run things through the dts exercises
interface add the --pull
flag once locally and once on the robot:
$ dts exercises test -b ![ROBOT_NAME] --pull
$ dts exercises test --sim --pull
You may choose to copy over any of the other lane following modules that we tuned for Exercises 1 and 2. You may borrow from someone else in the class. If you borrow the work of someone else, you must acknowledge them, for which they will receive a small bonus.
Additionally, in this case you will need more things locally, since you will be generating annotated data and training a deep neural network for object detection. It is recommended that you do so in a virtual environment.
To do so you should install virtualenv
(pip install virtualenv
) and then run from the object_detection
folder:
$ virtualenv venv
$ source venv/bin/activate
We will be working in the object_detection
exercises folder in dt-exercises
.
Enter that directory, and run
$ ./setup.sh
This will clone the simulator locally into the the utils/data_collection
folder, and do a number of other things such as install requirements, download the real robot dataset, and process it for you.
Data Collection
In order for your agent to work both in simulation and on the real robot, you will construct an annotated dataset that is comprised of labels from both modalities. In the case of real data, you should be grateful that two students from last year annotated a huge dataset of real images. When you ran setup.sh
they were downloaded into a folder called dataset
. We also wrote the transformation funtion
for you in data_collection/data_transfer.py
which prepares the data for you.
For the simulated images, you will have to create those yourself automatically.
Do note that in this exercise, we don’t want to differentiate the objects from one another: they will all have the same class. Our images will include duckies, busses, trucks, and cones.
We thus have five classes:
- 0: background
- 1: duckie
- 2: cone
- 3: truck
- 4: bus
To collect our data, we’ll use the segmented
flag in the simulator.
Try running
$ ./utils/data_collection.py
which cycles between the segmented simulator and the normal one.
You will have to modify the code in ./utils/data_collection.py
in order to get a good dataset.
First, you will notice that there is noise that needs to be removed (it appears as “snow”). You could consider using some basic image processing technique covered in class or some other open_cv
functions to remove this.
Notice that when we’re in the segmented simulator, all the objects we’re interested in
have the exact same color, and the lighting and domain randomization are turned off. Just like the
data_collection.py
file does, we
can also turn the segmentation back off for the
same position of the agent. In other words, we can essentially produce two 100%
identical images, save for the fact that one is segmented and the other is not.
Then, to collect the dataset:
- We want as many images as reasonable. The more data you have, the better your model, but also, the longer your training time.
- We want to remove all non-
classes
pixels in the segmented images. You’ll have to identify the white lines, the yellow lines, the stop lines, etc, and remove them from the masks. Do the same for the coloured “snow” that appears in the segmented images. - We want to identify each class by the numbers mentioned above
- We also want the bounding boxes, and corresponding classes.
Your dataset must respect a certain format. The images must be 224x224x3 images. The boxes must be in [xmin, ymin, xmax, ymax]
format.
The labels must be an np.array
indexed the same way as the boxes (so labels[i]
is the label of boxes[i]
).
We want to be able to read your .npz
, so you must respect this format:
img = data[f"arr_{0}"]
boxes = data[f"arr_{1}"]
classes = data[f"arr_{2}"]
Additionally, each .npz
file must be identified by a number. So, if your dataset contains 1000 items, you’ll have
npzs ranging from 0.npz
to 999.npz
.
Do note that even though your dataset images have to be of size 224x224, you are allowed to feed smaller or bigger images to your model. If you wish to do so, simply resize the images at train/test/validation time.
Model Training
Now we can train our model using the dataset that we have created.
The two files that you will need to modify are:
exercise_ws/src/object_detection/include/object_detection/model.py
which defines your model, and
utils/train.py
which will execute the training for you.
The way that you would like to do this is really up to. You can use TensorFlow or PyTorch. As an example, we provide this pytorch object detection tutorial.
If you follow the tutorial, in utils/train.py
you will have to define your Dataset
class. It should provide:
- The bounding boxes for each class in each image (contrarily to the tutorial, you calculated this earlier in the Data collection part of this exercise)
- The class labels for each bounding box
- The normal, non-segmented image
- An ID for the image (you should just use the index of the
.npz
)
You are also free to experiment with the model.py
that you define. Note that here we don’t have masks annotated so we could use FasterRCNN
instead of MaskRCNN
:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, 5)
A note on the tutorial
Make sure to carefully read the tutorial. Blindly copying it won’t directly work. The training data it expects is very specific, and you should make sure that you follow its structure exactly.
Additionally, weirdly enough, the tutorial expects you to have some files that it does not link to.
Perhaps having a look (and a download) at these links might save you some time:
- https://github.com/pytorch/vision/blob/master/references/detection/engine.py
- https://github.com/pytorch/vision/blob/master/references/detection/coco_utils.py
- https://github.com/pytorch/vision/blob/master/references/detection/transforms.py
You can also safely remove the evaluate
call that the tutorial uses, and it will save you the headache
of installing most of the coco_utils
and coco_evaluate
dependencies.
Verifying your Model
You should probably write a way to visually evaluate the performance of your model.
Something like displaying the input image and overlaying the bounding boxes (colored by class) would be simple but very effective.
You should also carefully read model.py
, as there are comments in it that describe the API your
wrapper should respect.
Training Hardware
But how should you actually train your model? If you have a recent-ish nvidia GPU, you can directly train on your computer. For reference, using a dataset with 2000 items, training on a GTX960 was very doable.
If you don’t have a GPU, or if your GPU is too slow, you can use Google Colab. We included
a .ipynb
in the notebooks
directory. You can open it with Google Colab, upload the root of
this exercise to your Google Drive, and the provided notebook will mount the folder from your
drive into the Colab runtime, and then call your training script. To access the saved weights,
simply download them from your Google Drive.
Integration
Finally, we will use the exercises infrastructure that we have used in Ex. 1 and 2 to integrate our model into a functional end-to-end robot capability.
You will see quite a few packages in the exercise_ws
but you should only need to worry about 1 (or maybe 2 if you consider the Bonus), that is the object_detection
package.
The only real decision you will have to make in the object_detection_node
is underwhat conditions to consider your detections to be worthy of reporting, and thus triggering a transition to the pedestrian avoidance behavior.
Optional Reading: What are the Other Packages Doing?
In order to integrate your object detector into the lane following stack, we have slightly modified the “Finite State Machine” of the system to include a new state DUCKIE_AVOID
. If you look at the lane_following_pedestrians.yaml
in the fsm/config
directory, you can probably figure out how it works. When you publish your object detection to be True
, it triggers a state change. In turn this causes a different controller to become active (this is regulated through the car_cmd_switch
which basically acts as a multiplexer selecting between the different controllers - if you take a look at lane_following_pedestrians.yaml
in the car_cmd_switch.config
directory, you can probably figure out how it works). In this case the active controller is the pedestrian_avoidance_controller_node
which is defined in the pedestrian_avoidance
package. At present this controller just outputs 0 linear and angular velocity (makes the robot stop). The duckietown_demos
package is included because we defined a new demo
in the launch
folder, lane_following_pedestrians.launch
, that we are now launching instead of lane_following.launch
.
Bonus (max 5%)
As mentioned in the optional reading, the pedestrian_avoidance_controller_node
just outputs zeroes. A fun challenge could be to publish more detailed information about the whereabouts of the pedestrian in the object_detection_node
and then use that information to do planning to navigate around the pedestrian. This will take some work though.
Deliverables
To submit your assignment,
- you should make two different submissions to the
aido5-LFP-sim-validation
challenge (notice it’sLFP
and notLF
this time. One of them should be optimized to run in the simulator, and the other for on robot. You should change your submission’s label in the filesubmission.yaml
to beuser-label: sim-exercise-3
before submitting for the simulation, anduser-label: real-exercise-3
for the real robot. The output that you get on the challenge server for thereal-exercise-3
does not matter, we will run that submission on our own robots for the evaluation.
Note that you can use the same code for both submissions, but having two different submissions will allow you to tune parameters for the real robot and the simulator separately.
-
You should also submit a video of your code running on your robot with
dts exercises test -b ![YOUR_ROBOT]
. This will be useful in case a problem happens when we try to run your code on our robot. You can submit the video here. -
Please send to Liam and Anthony and Charlie a link to your github repo through a private message on Slack. Also please mention whether you borrowed any of the other
lane_following
packages from someone else.
Grading
This assignment is worth 15% of your final grade. Compared with Exercise 1, more weight will be given to the intermediate outputs and less to the end-to-end performance of the agent.
- 4% creation of your dataset
- 5% training of your model
- 3% integration in simulation
- 3% integration on the Duckiebot
Please report problems quickly on discord, in class, on slack, as github issues, etc. If you are not sure if something is working properly ask. Don’t spend hours trying to fix it first.
Other Pointers, Helpers, and Things of Interest
You may find it helpful to look at what an existing dataset comprises. Try downloading the PennFudanPed dataset, a sample pedestrian detection dataset.
You’ll notice that if you try opening
the masks in that dataset, your computer will display a black image. That’s because each
segmented pedestrian’s mask is a single digit and the image only has one channel,
even though the mask was saved as a .jpg
.
Try scaling the masks from 0 to 255, using something like np.floor(mask / np.max(mask) * 255).astype(np.uint8)
.
This will make the masks into something akin to a .bmp
. Then, use opencv’s applyColorMap
feature on that to visualize the results. Try looking at the two display
functions found in utils.py
for inspiration.
You’ll also notice that the dataset doesn’t include any bounding boxes. That’s okay. For training with PennFudanPed, we have to compute them through numpy and opencv, just like we will on your own dataset.
Actually, for our own training, we won’t need the masks! All we want are the bounding boxes. But PennFudanPed is a useful example, as it shows how we can extract bounding boxes from masks, something we will also do for our own dataset. To see how to do this, you may skip ahead to the tutorial linked in the Training section.
The deadline is officially set for Dec. 23, but no points will be deducted if it is received before Dec. 31.