I built a LEGO box around a Raspberry Pi 3A+, it's power source, a Pi camera and a servomotor. The box can be used to keep small valuable items (e.g. wedding ring), and will only open if it detects and recognizes faces that I've saved to the database.
Presenting... the finished product.
Small valuables like this ring can be kept in the box.
Click above or here open the YouTube video
Of course, being built out of LEGO, a would-be thief could just dismantle the box... hehe! Still, this was a fun project and my first foray into combining LEGO with AI.
This project draws from the disciplines of mechanical, electronic and software engineering. The rest of this post explains how the box works and also some of the design decisions I've made. You may use the table of contents below to skip ahead if you'd like. Enjoy!
Table of Contents
My goal was to design the box to be compact enough to fit in the palm of my hand, which was a challenge because of all the electronics required. The only external electronic component is the camera, which I mounted on one side of the box, and hid under the golden circular bricks. Sometimes, I'm reminded of Iron Man's chestplate when I see this...
The box encapsulates all the electronics, yet manages to be small enough to fit in the palm of my hand.
I wanted to build something Oriental-looking this time, and decided to go with red as the main colour, with a dash of gold and brown. I added gold Asian window screens on the other three sides of the box to accentuate the Oriental look.
The gold Asian window screens accentuate the Oriental look.
The main mechanical feature of this project is the opening and closing of the box. The objective was for a single servomotor to open and close both covers simultaneously. I tried different gear layouts and four-bar linkages, and decided to go with the bevel gear mechanism shown in the animation below. Here, I've uncoupled the bevel gear from the servo, and manually rotated the shaft to illustrate how it works.
Animation of the bevel gear opening mechanism.
The advantages of this solution are:
- The servo and geartrain can be placed within space constraints.
- Minimal freeplay of covers. If freeplay were present, even if the covers were shut, you would be able to pry them open slightly - definitely undesirable for this project. To minimise freeplay in LEGO gears, the trick is to minimise the number of gears used, therefore reducing gear backlash.
To avoid having to dismantle the box in order to charge the battery, or to do troubleshooting, I added removable panels on 4 sides of the box. The panels serve mainly aesthetic purposes, and once removed, the ports of the Raspberry Pi and HAT are exposed as shown below:
On/off switch on the battery HAT. | |
USB Mini-B and Micro-B charging ports, HDMI port, and camera. | |
USB-A port. |
What was the main factor in selecting electronic components? SPACE!!! I wanted everything to be contained within the box, while keeping the overall dimensions of the box as petite as possible. I was also adamant that the cross-section of the box should be a square instead of a rectangle (this would look weird).
The electronics, contained within the LEGO box.
The 5 main electronic components.
The design choices for these components are summarised in the table below:
Component | Manufacturer/Model | Rationale for Choice |
---|---|---|
Computer | Raspberry Pi 3 Model A+ | I used a Pi 3 Model B+ initially, but changed to Model A+ because of it's smaller size and square shape. It only has 512MB of RAM compared to 1GB in the B+, but it did not seem to adversely affect performance. |
Servomotor | LOBOT LGS-01 Micro Anti-block Servo 270° Rotation | The main requirement was that the servo had to be able to inferface with LEGO bricks. Once I found one that could, I ordered it and designed the rest of the box around it. |
Camera | Raspberry Pi Camera Module v2 | A straightforward choice, with the added bonus that it connects to the Pi via a compact ribbon cable. |
Battery | WaveShare HAT-LI-01 (Li-Ion Battery HAT) + Generic 14500 Li-Ion rechargeable battery | This was tricky - I really wanted the box to be portable and thus it had to have its own power source. However, most powerbanks were too bulky to fit in the box. This component fit the bill nicely, with the only downside being having to recharge it every half an hour. |
USB Micro-B to Mini-B adapter | Generic | Connects to the battery HAT, and juts out to the edge of the box. This allows the battery to be recharged by connecting a USB Mini-B cable, without having to dismantle the box to reach the HAT. |
The wiring is pretty straightforward. First, attach the battery HAT on the Raspberry Pi. Next, connect the servo to 3 GPIO pins of the HAT (as the HAT has encapsulated the Pi's pins) following this guide. Lastly, connect the ribbon cable of the Pi camera to the Raspberry Pi's CSI port. That's it!
If you're not familiar with servomotors, here is an excellent guide on how they work, and how to calibrate them, if required. I found some differences between the LOBOT servo specs and actual performance, and had to calibrate the servo.
The servo is hidden below this piece of paper, and rotates the pointer at different duty cycles.
I recorded the rotation angles for different duty cycles as shown above, and found that it varied linearly between the ranges in the table below.
Angle | Duty Cycle | |
---|---|---|
min | 0 deg | 2.5% |
max | 300 deg | 12.5% |
Using linear interpolation:
(duty_cycle - 2.5)/(12.5 - 2.5) = (angle - 0)/(300 - 0)
The equation above reduces to this, and now we have a direct relationship between the angle ('human language') and the duty cycle ('servo language'). This equation is used in servomotor.py.
duty_cycle = angle/30.0 + 2.5
This section explains how to install the required libraries on the Raspberry Pi. Basic knowledge of using Raspberry Pi such as flashing the Raspbian OS is assumed. Do note that some of the following installation steps can be time consuming.
-
Follow Adrian Rosebrock's guide to:
- Install pip
- Create a virtual environment (called cv in our case)
- Install numpy
- Build and install OpenCV from source (this will take a few hours)
-
Run the following commands in a terminal.
workon <virtual env name> pip install dlib pip install face_recognition pip install PyYAML pip install "camerapi[array]" pip install RPi.GPIO
- dlib contains the core facial recognition functionalities
- face_recognition is built on top of dlib, and makes it simpler to use
- PyYAML allows the reading of the settings.yml file
- "camerapi[array]" allows the Raspberry Pi camera to be used
- RPi.GPIO allows the Raspberry Pi to control the servomotor using GPIO pins
-
Clone this repository.
git clone https://github.com/leeping-ng/LEGO-Facial-Recognition-Box.git
If you've been connecting a keyboard, mouse and monitor to the Pi, it's time to disconnect them and remote into the Pi. After all, the box was designed to work as a standalone product (from the physical world point of view). I'd recommend using VNC Viewer to remote into the Pi instead of ssh. Although VNC will have more latency, it allows the videostream from the camera to be viewed remotely.
-
In settings.yml, configure the following settings and place the face images to the designated folder:
- faces_folder: set the path of the directory where you will store the face images. Note that, for example, putting an image named 'Harry Potter.jpg' in this folder is only the first step to grant access.
- grant_access: set a list of whitelisted names (whitelisted means you grant them access). Continuing on the previous example, the second step is to add 'Harry Potter' to this whitelist.
- use_accurate_detector: set to True to use the more accurate but slower HOG detector, or set to False to use the faster but less accurate Haar Cascade detector. More on this topic here.
- open_angle: set the angle for the servomotor to rotate to in order to open the box.
- close_angle: set the angle for the servomotor to rotate to in order to close the box.
- tolerance: the lower the tolerance, the less strict the neural network when matching faces. 0.6 is the default value for best performance.
-
To start the program:
cd <directory containing LEGO-Facial-Recognition-Box> bash scan.sh
The program will keep running in a loop. If a whitelisted face is detected, the box opens and the program ends. Click on the image window and press 'q' to return to terminal. Do note the limited (~0.5h) battery life!
The LEGO box with a stream from the Raspberry Pi.
- To close the box:
bash close.sh
You might want to take this a step further, and execute the scan.sh script on startup. This way, once you switch on the Pi, the program starts running without requiring you to run any commands. However, you would still have to remote into the Pi to close the covers.
So, how does facial recognition work? Adam Geitgey, the creator of the face_recognition
library which we're using, explained it nicely in this article.
From a high level, the facial recognition used in this repo can be broken down into 4 steps:
- Face detection: Detect all the faces in a given image, and return the bounding box coordinates around each face.
- Adam's article explains a method called HOG or Histogram of Oriented Gradients, and this would be the default method if using
face_recognition.face_locations()
. - I've included the option to use a faster but less accurate method called Haar Cascades.
- Adam's article explains a method called HOG or Histogram of Oriented Gradients, and this would be the default method if using
- Correcting facial poses: For faces that are turned slightly away from the camera, use face landmark estimation to centre the face as much as possible. This will be automatically done for us when we call
face_recognition.face_encodings()
in the next step. - Encoding faces: For each detected face, convert the area of the face into a 128-dimension representative vector (a.k.a. embedding), using
face_recognition.face_encodings()
. - Comparing embeddings: Compare the embedding of the detected face with the embeddings of the whitelisted faces in our database, using
face_recognition.compare_faces()
. If there's a match, grant access to the person.- A Support Vector Machine (SVM) classifier is used to compare the 128-dim vectors. This only took milliseconds on my Pi.
You can check out the full API for face_recognition
here.
I ran a series of experiments to assess the frame rate, averaging the results for about a minute each. 'With Face' means that I put my face in front of the camera to allow face detection at all times, while for 'No Face' I just pointed the camera at a wall.
We can draw the following conclusions from the results below:
- The steps of detecting faces and encoding faces are similarly computationally expensive.
- We can't do much about the encoding face step, as we're using a pre-trained model, and it happens to be a pretty deep network at 29 convolutional layers.
- However, we could try to speed up the face detection step by using Haar Cascades instead of HOG.
With Face | No Face | |
---|---|---|
Detect face (HOG) | 570ms | 570ms |
Encode face | 520ms | ~0ms |
Frame Rate | 0.91 FPS | 1.70 FPS |
Swapping out HOG with Haar Cascades, we see from below a three-fold improvement in detection speed!
HOG | Haar Cascades | |
---|---|---|
Face detection time | 570ms | 180ms |
Frame Rate (With Face) | 0.91 FPS | 1.34 FPS |
Frame Rate (No Face) | 1.70 FPS | 5.40 FPS |
However, this increased speed comes at the expense of accuracy. The Haar Cascades detector doesn't work as accurate as the HOG detector, especially if its parameters aren't tuned right. The choice of detector really depends on your requirements, thus I decided to add an option to choose the detector in settings.yml.
The other, and probably more obvious, method to improve frame rate is to lower the image resolution. Prior to this test, I was using 320x240 as the default resolution for all other tests. The results below (with face, HOG detector) show that:
- Lowering the image resolution improved the face detection time, as expected.
- However, it didn't have noticeable impact on the time to encode faces. This is because the ResNet network that's used to encode faces has a fixed input size of 150x150 pixels. Thus, regardless of the image resolution, all detected faces are resized to 150x150 and passed to the network.
Resolution (WxH, in pixels) | 100x75 | 320x240 | 500x375 |
---|---|---|---|
Detect face (HOG) | 60ms | 570ms | 1,360ms |
Encode face | 530ms | 520ms | 530ms |
Frame Rate | 1.68 FPS | 0.91 FPS | 0.50 FPS |
Additionally, dropping the resolution down to 100x75 had the following detrimental effects:
- I had to move my face very close (~20cm) to the camera.
- Higher prediction errors, as the input image to the face encoding network is of very poor quality.
As a result of these trade-offs, I decided to keep the original 320x240 resolution.
Adrian Rosebrock's tutorial on deploying facial recognition software on Raspberry Pi was my starting point for this project. Thanks to Adrian's tutorial, I was able to tackle steps such as building and installing OpenCV from source on an Pi, without much difficulty. Also, thanks to Davis King and Adam Geitgey for their respective dlib
and face_recognition
libraries!