Se Eu Começasse HOJE em Machine Learning, Eu Faria esse projeto!

33.5k views4376 WordsCopy TextShare

Daniel Romero

Se eu estivesse começando do zero hoje, como eu criaria um portfólio para chamar atenção e conseguir...

Video Transcript:

If I were starting from scratch today, how would I get my first job in Machine Learning? Or how would I build a portfolio project that would really stand out? And the short answer is I would make a video, which means that in practice I would make a video documenting my project development process.

From explaining the problem, implementing possible solutions, to the final presentation, focusing on showcasing as many skills as possible. The project choice fluctuates between finding a problem to solve or replicating an interesting project. For example, an anomaly detection system, which is a process to find patterns in data that don't align with expected behavior.

This Machine Learning technique is widely used in credit card companies and fintechs in general to detect anomalies helping to combat fraud. I could get a dataset on Kaggle about this and start my project. But in this example, the plan is to stand out, so what I want to do is something different.

About two years ago I found a startup called AISight, which provided a system for anomaly detection to monitor industrial machinery. They developed a sensor that collected vibration data from machines and based on that a trained Machine Learning model analyzed the collected data to detect anomalies. And this, besides being very interesting, is something quite different.

Imagine a system that's capable of predicting problems in industrial machinery before they actually happen. Well, now looking for more details about the startup, the site is down because the company was acquired. But in my search I ended up finding Tractian, a company that provides exactly this service.

The cool thing is that on their page there are several details describing the features of the sensor. So this is my plan. Use Tractian's sensor description as inspiration, reproduce some behaviors of the sensor, collect and process the data, extract features, train and test the model to detect anomalies and make everything work.

It's a quite different project that allows exploring many possibilities and displaying a good set of skills. Here's an important point. Remember this part.

Showcasing skills. That should be the final goal of the video. So every extra explanation counts.

To create an anomaly detection system, we first need to collect data that characterize the normality of the system or normal operation. And then, we can simulate an anomaly and continue collecting, this time to record data from the anomaly. Once we have the data, we can use the dataset to train the machine learning model that can detect anomalies in a machine.

In practice, we'll have a sensor collecting data from the air conditioner and sending it to an API that will execute a trained model. There's an infinite number of data types we can collect from an electric motor, like sound, current consumption, temperature, vibration. But for this project, we'll focus on detecting anomalies through vibration only.

For this, we'll use a simple three-axis accelerometer. And here we have the first novelty. How does an accelerometer work?

Well, an accelerometer detects acceleration linear, that is, acceleration along an axis. In the case of the sensor I'll use, it features measurement on three axes. This type of sensor is called an IMU, or inertial measurement unit, which is an electronic device that measures and reports specific force, angular rate, and sometimes the magnetic field around it, using a combination of accelerometers, magnetometers or gyroscopes, which are implemented at a microscopic level, using a technology called MEMS, or micro electromechanical systems.

The interesting part is how this technology is implemented inside the chip, which basically combines mechanical and electrical components in a structure that is only a few micrometers in size. The internal part of this sensor has a seismic mass, which is an H-shaped structure, with sensory extremities or sensory fingers. This seismic mass is attached to the substrate at the extremities, and this allows a back-and-forth movement where the fingers approach the electrodes during movement, generating capacitive detection, where the change in capacitance between the fixed electrodes and the seismic mass are used to determine the acceleration of the body.

That's how the accelerometer measures acceleration, or in other words, the rate of change in velocity of an object. In other words, it detects static forces like gravity, at 9. 8 m/s², or for our example, dynamic forces like vibrations or movements.

Cool, right? Now, back to the project. Well, in this prototype, we'll use just one sensor, but keep in mind that, in a real project, various different sensors are combined, including temperature.

Regarding the microcontroller that I'll use to connect the sensor, I'll use an ESP32 connected to Wi-Fi. This way it's possible to collect data from the accelerometer, establish connection with the Python API and make a request to know if the server is ready to receive the data. If the ESP32 receives a response back with the ready signal, then the sensor will send the measurements in JSON format, as an HTTP Post request to the API.

Let's go to the lab. Ok, these are the components we'll use: the ESP32, the MPU6050, which is an accelerometer and gyroscope. But here we'll work only with the accelerometer.

And I'll connect these components on a breadboard to test and validate. Checking the sensor's pinout, we have VCC, which powers the sensor with 3. 3V or 5V, depending on the sensor.

In this case, it's 3. 3V. Ground, SCL, which is the clock line for I2C communication.

I2C stands for Inter Integrated Circuit, which is a serial communication protocol that allows different devices to communicate through a shared bus. And to communicate the accelerometer with the ESP32, we'll only need two pins. SCL to set the rhythm of communication, as I mentioned before, the clock.

And SDA, which is the data line responsible for transmitting and receiving information. On the ESP32, GPIO pins are available, General Purpose Input Output, or general purpose input and output. And these are the pins I'll connect the sensor to.

This is the final scheme to connect everything, and I'll do this now on the breadboard. Ok, now I'll talk a bit about the code that will run on the microcontroller. I found it easier to program this in Arduino IDE, because of the communications and drivers.

In the libraries we have Wi-Fi for connection with the internet, HTTP Client, for making requests HTTP, Arduino JSON, for handling data in JSON format, Adafruit MPU6050, to interact with the sensor and finally Wire, which is used for I2C communications with the sensor. In the constants we have some configurations for samples per second, total samples per cycle. This means that the sensor will collect about 200 motion records per second.

And the ports, I2C 21 and 22. The initialization has an LED on to give visual feedback, we have the I2C configuration, initialization of the MPU-6050, if it fails the LED will blink 3 times repeatedly, if it works the LED blinks 2 times. Then we have the basic configuration of the sensor with the accelerometer range at plus or minus 4G and the filter bandwidth at 260 Hz.

The acceleration range means the size of the force of a movement that the sensor can measure. In this case, as we're configuring for plus or minus 4G, so it can measure accelerations between negative 4 times Earth's gravity and positive 4 times Earth's gravity, remembering that the sensor goes up to plus or minus 16G. The bandwidth represents how fast the sensor can respond to movement changes.

If the bandwidth is 100 Hz, it means that it can measure variations in acceleration that happen up to 100 times per second. In this case, we're at 260 Hz, expecting to collect about 200 motion records per second. In professional sensors, this value would be much higher, reaching 32 thousand Hz, for example.

In the end, the combination of these two represents almost a sensitivity adjustment. Next we have the Wi-Fi, during the connection the LED blinks once slowly and after it connects it blinks 5 times quickly. Then I have extra functions that will be used inside the main loop.

In the loop, I call this extra function, which makes a get request, to check if the server is available and waits for the response to continue. If the server isn't ready, the loop waits 100 milliseconds to try again. Next there's the preparation of a JSON document for the data, and then during one second there's the collection of 200 samples, that's like one sample every 5 milliseconds.

We have the x, y and z acceleration values, these values are shared on the serial monitor every 50 samples. This is to help with feedback in the Arduino IDE to see if it's working well. And when sending the data, the LED lights up during the sending, the JSON document goes via HTTP post to the server, then the LED turns off right after sending, waits 10 milliseconds and starts the next cycle.

But in the HTTP server, which will receive the data transmitted by the sensor, we have a Python script that's responsible for creating CSV files with the accelerometer readings. This get method responds to the requests get from the ESP32, returning ok to inform that it is ready to receive the data. The post method receives the sensor data in JSON format, processes the received data, creates a CSV with timestamp, returns success or error, in case of failure.

In save data to CSV, it converts JSON to CSV format in the structure x, y and z per line. In the rest of the script, we have complementary functions like create server, which creates the directory if it doesn't exist, configures the HTTP handler, initializes the server on the specified port, in this case 4242. Ok, let's go to the tests.

I'll connect the ESP32 to USB. Here in the IDE first we'll compile the program to check if there are no errors. Ok, now I'll upload it to the microcontroller.

It worked successfully, I'll start the Python server. Here in the IDE there's a monitor to track the data sending and you can see it working. A cool way is the serial plotter, which basically plots a graph with the sample data collected.

I'll activate the serial plotter, I'll leave it still on the table, so we can see the stability, let me zoom in. And if I cause any vibration, we have a change in the graphs. With everything working, I'll finish it off and mount all of this on this prototype board.

Connect it to the battery and put it on top of the air conditioner, which will serve as our guinea pig. The plan is to collect data at normal speed of use and then at various speeds, which cause larger vibrations. And finally, I'll insert a deliberate anomaly and collect data from a possible problem.

After that, we'll move to the next phase with Machine Learning. After a few minutes, soldering with a posture looking like a shrimp, the prototype was ready. For the finishing, to isolate and hide my lack of practice with soldering, I used these heat-shrink tubes, which shrink with hot air.

Now it's just plug in the battery, put it on top of the air conditioner and start collecting data. During collection, I'll save the samples in a directory structure that shows which is the baseline of normality and which are the speeds as I adjust the speeds on the air conditioner. A short pause to add our anomaly, which is a magnet fixed on the metal cylinder that makes the air circulate.

Because of the magnet on the cylinder, we'll have a deliberate miscalibration, making it wobble. With the data collected, I try to remove the magnet. After a few minutes, I lost patience and disassembled the air conditioner.

And after a few more attempts, finally, I removed it. With everything in order, let's move to the analysis. For the analysis, I'll load the script first, let's check the graphs to understand the data.

First there's a comparison of the raw data, you can see the normal operation, where the Z axis maintains a constant level around 10G. G means G-force, which is a measure of acceleration that uses Earth's gravity as a reference, where one G equals 9. 8 m/s².

So, in this case here, 10G means that the sensor is experiencing an acceleration ten times greater than the acceleration of gravity. The X and Y axes are close to zero, with smooth lines and little variation. In operation with the anomaly, we have more intense oscillations on the Z axis, and the X and Y axes also show more variations, that is, a more irregular pattern in all axes.

In this other graph, with noise removal, you can see a small variation between -0. 2 and 0. 2 in normal operation, and -0.

6 to 0. 6 in operation with the anomaly. Showing the largest oscillations.

Next, we have this 3D plot, with the normal distribution of the accelerometer data, in the 3 axes X, Y and Z. The green points are more in the center and represent normal operation. And the orange ones, in larger volume, are more spread out, occupying a larger area and represent the anomaly.

And of course, you can see an overlap. Remember that points farther from the center represent more intense vibrations. In this other visualization, we have the average to show the central value of the measurements over time.

Now yes, you can see a clear separation between normal operations and anomalies. This indicates that the average vibration level during anomaly events is consistently different from normal operation. In this other plot, we have the variance which measures how the data disperses in relation to the average.

It's clear to see that the anomalies represent much larger values than normal operation. Another way to interpret is that the vibrations are more intense and regular than in normal operation. While in normal operation the vibrations are more contained, this grouping reflects stability.

Finally, we have kurtosis which indicates how much the data concentrates around the average, that is, if we have more extreme values or not. Here you can see a more scattered distribution of points, with some overlaps between normal and anomaly. These higher values suggest the presence of more intense and frequent vibration peaks.

Each of these characteristics provides us a different perspective of the data. The average gives an overall view of vibration, the variance reveals the intensity of oscillations and kurtosis indicates the presence of extreme events. If we put this together, we have a very interesting set of indicators to detect anomalies.

In the last plot of this analysis we have the Fast Fourier Transform, or FFT, which is a mathematical tool that allows decomposing a signal into its constituent frequencies. In the context of machine vibration analysis, FFT is very useful because several mechanical problems generate different vibration patterns at specific frequencies. For example, a bearing with a defect can generate vibrations at one frequency, while an imbalance can generate another frequency.

In the graphs we can clearly see this difference. The blue line represents normal operation and shows a smooth frequency profile. And with low magnitude, while the red line of anomaly shows peaks at certain frequencies, especially on the Z axis, where the magnitude reaches 16 times the normal value.

Ok, let's go to the model training script. Well, here I start by loading the data in CSV from each type of normal operation and anomaly. I remove the DC to normalize the data about direct current or DC in signal processing digital.

DC represents the mean value or offset of the signal over time. In this case of the accelerometer, removing the DC means subtracting the average of the signal to center the data around zero. This is very important to eliminate constant biases that come from the sensors.

Additionally, this function allows focusing on the dynamic variations of the signal and removes some effects of gravity on certain axes of the accelerometer. Next, I add random noise to increase robustness during training. We have the extraction of five characteristics or features for each axis.

Standard deviation, which is the variability, kurtosis, which focuses on the shape of the distribution. We also have maximum absolute amplitude. RMS, which is the root mean square, and the range, which brings the difference between maximum values and minimum.

After that, the datasets are created for training and validation testing, where the number of samples is limited to avoid bias. The features from each selected file are extracted. Then comes the chosen algorithm to calculate how far an anomaly point is from the average.

In this case, we have the Mahalanobis distance. The function of this algorithm is to produce a measure of how strange a point is in relation to the normal distribution of the data. That done, we have the next function, whose objective is to find the best THRESHOLD to separate what is normal and what is an anomaly.

This function uses cross-validation of data and aggressively penalizes false positives. In anomaly detection, the threshold defines the limit between normal behavior and anomalous. In this implementation, the approach was quite conservative, so I penalize five times more the false positives than the false negatives.

The plan is to avoid false alarms. After that, I have the validation of the model to calculate the rates of false positives and the generation of a performance report. Ok, I'll execute everything and we'll analyze the results.

Let's start with the plot of the distribution of Mahalanobis distances. Few normal cases in blue, most of the distances were concentrated between two and six. There's a peak near three and a half which indicates that this is the most common value for normal cases.

Few normal cases exceeded the THRESHOLD or threshold and in the case of anomalies in red the distributions are mainly between eight and a half and fourteen. There's a peak around ten indicating that typical anomalies have this distance and there is a small overlap between three and five and a half apparently. The threshold or this THRESHOLD dashed at five point seventy-one separates most of the cases normal from anomalies.

Some normal cases are above the THRESHOLD these here, that is, false positives and some anomalies are below the THRESHOLD indicating false negatives. Well, this distribution shows that the model can reasonably separate the cases normal from anomalies and maintains a realistic margin of error, which is expected in applications practices. Here there is a tendency to be more conservative with anomalies, preferring to classify as normal doubtful cases, but in the end it has a good hit rate for normal cases, forty seven out of fifty and a rate that I would call moderate in the hit of anomalies, thirty nine out of fifty.

To close we have the classification report. For normal cases the precision was zero point eighty-one, that is, when the model says it's normal it's right eighty one percent of the time. In recall we have zero point ninety-four, which means that of the samples that are actually normal, the model detects ninety-four percent of them.

And the F1 score, which is the balanced average between precision and recall, was at eighty-seven percent. In the case of anomaly, in precision when the model says it's an anomaly it's right ninety three percent of the time. In recall it says that of the real anomalies the model detects seventy-eight percent of them.

And the F1 score was at eighty five percent. In general metrics the accuracy was at zero point eighty-six, which means that the model gets eighty-six percent right of all predictions. The AUC score was at zero point eighty seven, which indicates a good ability to separate between classes.

And in this case the closer to one point zero this value is, the better. Ok, now that we have the trained model we can proceed to the next phase and perform the inference. For inference first I'll update the sensor software, so I have here another version of the program that will run on the ESP32.

In this version we have the same initial configuration, the sensor initialized on the ports OTC, connects Wi-Fi, we have the same accelerometer configurations of the sensor and almost nothing changes. For data collection it's a little different. We'll collect only one hundred samples.

We organize the data in a JSON, but this time as a 2D Matrix, which matches exactly with the input format expected by the model. And then there's a data sending to the anomaly detection API. Ok, let's check the API.

Well, I made this structure using Fast API, kept everything in a single file for didactic purposes. In this API we'll receive data from the accelerometer, which are sent by means of a request post with vibration samples. Internally the API loads the model, which was trained in the previous step, which contains the mean and which variance of the normal data and the threshold for classification.

So here we have the pre-process for the new data, we remove the DC, we calculate the statistical means by axis. Another calculation made is how different a sample is from the normal pattern through the Mahalanobis distance and we also calculate the confidence, considering the recent history for stability. And finally we have the prediction to classify if the received sample is normal or is an anomaly.

You must have noticed that we have these calculations both in the training phase and in inference, but it's good to highlight that these calculations in the two phases have different purposes. In training we use Mahalanobis to define the threshold, this was done with normal data and known anomalous data, that is, labeled. The goal was to find a threshold value to separate the classes well.

In the inference phase the calculation of the distance of Mahalanobis is done for each new sample, then we compare this distance with the threshold saved in the model. The goal this time is to classify new samples as normal or anomalous. We no longer need to define a threshold, just use what was defined in training.

It's as if in training we were calibrating the system to define what is normal and what is not. And in inference we're using this calibration to make the measurements. Ok, let's run everything and see the logs.

And here we already have information in real time, showing that we are operating normally. The is anomaly field returns false. To better visualize this information, I prepared a dashboard.

I made a small application in react just to display the classification status. Remember what I said at the beginning of the video? Details, whenever you can, pay attention to details.

Well, here in the dashboard we have the panel, current status, which shows 3 important metrics. Confidence which represents how confident the model is during the current classification, in my case 62, 63%. The Mahalanobis distance calculated for the current data.

And the threshold which is the limit that defines when something is considered an anomaly. In the graph we have 3 lines to display the evolution over time. Blue line is the distance which here is around 20.

The dashed green line is the threshold, showing the limit at 571, a constant value. And the pink line confidence shows the confidence of the model, in this case close to 62, sometimes 63%. Even having the distance value greater than the threshold value, the system indicates normal operation.

And this happens because the model is using other characteristics of the distance, just to remember. So in the end we have a dashboard to monitor in real time the state of the system. I'll do a real time test and give some hits on the side of the air conditioner.

And we have a change in the graphs as well as the notification of anomaly detected. For this example, with the data samples that were collected I achieved a result in my view satisfactory. Of course with more data I could train a neural network and maybe do something more sophisticated.

I could try to map the internal parts of the air conditioner to try to identify the origin of the anomaly or even a more robust implementation of the sensor use combining the accelerometer and gyroscope data to increase accuracy. Or any other more sophisticated algorithm for this function. The point is that the objective, as I said at the beginning of the video, was to demonstrate the maximum of skills in explaining and solving a problem.

And of course depending on the job opening or your interest, the focus on these demonstrations can be adjusted. For example, I could show product engineering skills, prototyping a case to put the sensor using Fusion 360 and printing it afterwards. This would even give a nice finish to the project.

But the 3D printer didn't arrive in time. You know to really impress I could make a hardware project from scratch. Choosing the sensor, the microcontroller and all the PCB components in a design using KiCad, for example.

And in the end send it to be produced in a factory, like JLCPCB or PCBWay or any other of this type. Only then to carry out the other steps. Or simply as a software engineer I could have focused on the API in a more professional way, separating the components and functions in a more organized manner.

Maybe in a version 2. 0 of this project I can make some of these upgrades or even use Rust on the ESP32. I know that the number of possibilities is enormous and it's impossible to do everything, but being meticulous with the project and presentation will make the difference.

To conclude, it's worth remembering that a project like this will not guarantee a job position, but it counts as excellent training and can catch attention and make you stand out among thousands of applications. That's all for today, the links are in the description, thank you very much for watching and see you next time.