Find out why proper data collection and analysis is so important in building a Machine Learning-powered solution for EV charging stations.
In this article, we would like to discuss one of our recent projects in the Automotive industry. We saw an opportunity to implement a Machine Learning-powered solution for Predictive Maintenance and Anomaly Detection for our partners, a global EV charging company. The company decided that it would be a great idea to give this Data Science project a chance and let us implement it.
Our partner offers modern Electric Vehicle charging solutions with premium 24/7 customer service, including turnkey certified charging stations, a software platform with a wide variety of management features for businesses and organizations, and a trusted EV driver mobile application. Since its inception, the company gained the trust of its customers, proven market leaders in the Automotive industry. Our partner offers a proven 3-step process to design and deploy the perfect EV charging solution for each unique scenario. The steps of the process include assessing the most fitting solution for the customer’s site, facilitating the installation of charging stations, and offering consultations on ensuring the optimal performance and top experience for the end-users. The drivers can use a custom mobile application that displays the location of the closest EV charging station, station ID, the provided power level, whether the station is occupied or free at the moment, and much more.
The EV charging company we partnered with, is always looking to keep user experience on the highest level, that’s why they constantly work on maintaining EV charging stations in top condition and reducing possible downtime. Unfortunately, EV chargers are at risk of breaking down at any moment, just like any other devices, resulting in inconvenience for the end-users.
All EV charging stations are operating through an open standard, and they are able to communicate with the network, providing information on charging sessions and the overall condition of each station. The standard is called the OCPP protocol - an application protocol for communication between an EV charging station and the central management system or a charging station network. The same principle as with cell phones and cell phone networks. Charging stations owners can change OCPP-based networks, but the protocol will always remain the same.
Machine Learning technology is able to detect anomalies in the behavior of these stations, and it is possible to build a Predictive Maintenance solution that will minimize downtime and prevent serious damage to the devices. Finding anomalies would eventually lead us to the causes of failures and help us to predict and prevent them.
To learn how Machine Learning might be applied to keep assets in top shape read our article "Predictive Maintenance Part 1: The Domain Overview". If you want to dive deeper into the Predictive Maintenance challenge learn the ways Machine Learning can help businesses and organizations in any industry check our article "Predictive Maintenance Part 2: Machine Learning techniques to solve a maintenance problem".
There are several possible options for achieving this goal with ML algorithms, so the first step was to frame the problem correctly. There are two main types of ML algorithms to deal with problems:
Supervised ML algorithms are a great fit for predictions made on a set of examples. They can help with classification problems and regression problems.
Unsupervised ML algorithms are for unlabeled data and the cases we need to find hidden characteristics in a dataset. Unsupervised ML algorithms can help to find anomalous data.
Problem framing for a Machine Learning project is very important to the success of an entire solution. There is a set of recommendations for business and tech experts, which you can learn here, if you are interested in more details.
In this case, we began with problem framing, which consisted of the following steps:
Articulating the problem (using Predictive Maintenance to determine when the station will break down).
Looking for data that was already labeled (finding out whether our partner already had labelled data suitable for Machine Learning solution).
Finding out that data comes directly from charging devices in EV Stations.
Determining quantifiable outputs (as an output we agreed to expect the date when the station will go out of order)
Together with our partner, we've decided that an ultimate goal for the ML project would be to implement a Predictive Maintenance solution for different EV Charging Stations they install for their customers. Only after precise problem framing we moved to data analysis.
After the series of meetings with Intelliarts, our partners made the strategically correct decision to perform a thorough data analysis first. For every project that relies on data, we use the CRISP-DM methodology, which we find more useful than analogs like KDD and SEMMA, because it has a very important “Business Understanding” phase. Previously, we compared all three methodologies, so if you want to know more details on why we decided to pick CRISP-DM, you can check out this article for a deeper breakdown.
Here is how the stages of CRISP-DM methodology look like:
So, after defining the task that was possible to solve with Machine Learning, we moved on to understanding the data, according to this methodology.
There are three types of EV charging stations named L1, L2, and L3. They all differ in the speed of charging. We've been provided with access to the database with the latest production updates. The data investigation was split into a high-level overview and a deep investigation. After two steps of data overview, we’ve figured out that about a dozen of collections among a total of over a hundred were suitable for Machine Learning.
The EV charging company had records of the data for the period of five years:
25,000 records in 2015
50,000 records in 2016
150,000 records in 2017
300,000 records in 2018
550,000 records in 2019
1,000,000 records in 2020
After some research, we found out that the data from the beginning of 2020 was the most suitable for this project, so we decided to use it.
Our partner has business relationships with numerous vendors of the stations, so, as a first step, we had decided to focus exclusively on the power Level 2 BTC stations. We made this decision due to the popularity of this vendor and the fact that information was covered the best for ML needs in the database. However, we still faced some major data labeling issues, which are common for all projects of this kind:
Some stations had mismatching identifiers
In some cases, it was hard to distinguish between real sessions and fake or test ones
Charging session status was flagged as "Invalid" in variety of situations, rather than having a particular status for each particular situation - because of that it was sometimes hard to distinguish what had really caused a charging session failure
If the power value was less than a certain number, the car was marked as fully charged
Two or more charging sessions were sometimes combined into one
The information about station reboots looked unrealistic in some cases
Some charging sessions had a start time bigger than a stop time
To sum it up, the quality of those data labels was not enough for creating an effective ML-Powered Predictive Maintenance solution that was planned initially. The biggest challenge is that the issues were reported manually anytime after the actual event (charging, breakdown or maintenance) occurred. Because of that, the Remaining Useful Life (RUL), a subjective estimate of the remaining years and days of each component of the system, couldn’t be calculated precisely enough. The labels showing the maintenance mode of the station don’t always mean that the station was actually broken. In some cases, it was impossible to find out what actually caused the failure.
So as the result of the initial data analysis we understood that the labeling process requires significant improvement, so that failed charging sessions have clear flags and distinct failure types, as ambiguous data will not work for an efficient Predictive Maintenance solution. Ideally, the labeling process should be automated to minimize human interaction and reduce potential errors in the data collection flow. Another key factor to make a solution like this a reality is to collect properly labeled data for at least 1 or 2 years, and improving the data collection pipeline will be a great place to start.
The second goal was to implement an Anomaly Detection solution to detect abnormal behavior of the stations. We decided that the same data selection (information about BTC stations of level 2 in 2020) will be a good place to start. The only features that described session behavior and could be used for Anomaly Detection were: power, charging time, parking time, and information about charging periods that took place during a charging session. Let's take a look at the data from one of the charging stations and see what insights we might get out of it:
The most interesting thing we spotted after the analysis is that there are charging sessions with a really long duration, sometimes up to 30130 minutes, which is not normal, may be considered an anomaly, and requires further investigation. 10 kWh/hour power level is also a huge value for that kind of EV, so it requires deeper investigation too. And there were other suspicious charging stations behaviors that we decided to investigate.
We used the following unsupervised anomaly detection algorithms to find anomalies:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise), which is useful for finding arbitrary-shaped clusters and clusters with noise, also known as outliers. In this algorithm, if the point is close to many points of a particular cluster, it belongs to it. DBSCAN determines the number of clusters while detecting the outliers, it is very robust to outliers, performs well with arbitrary shape clusters, very effective when the distribution of values in the feature space cannot be assumed and works well in the multidimensional feature space for searching outliers. On the other hand, this algorithm requires powerful computing resources and is very sensitive to some parameters.
The isolation forest algorithm structures data points as nodes of an isolation tree while assuming that anomalies are rare events with feature values that differ a lot from expected data points. This is a precise and easy to optimize algorithm with a few parameters and is very effective when the distribution of values in the feature space cannot be assumed. However, when the algorithm isn’t optimized correctly, you can easily waste time on training and money on computing power.
The local outlier factor (LOF) gives an anomaly score for each data point. It is achieved by measuring the local density deviation of a given data point considering data points around it. This algorithm can work and provide great results out of the box for various domains, however, in higher dimensions the detection accuracy gets affected.
After the Anomaly Detection analysis, we can see that there are two normal clusters for two types of cars for the Level 2 EV charging stations. These stations send a number of short charging sessions that can be considered abnormal. Additionally, there is information about the reboots of stations, but it is unclear what exactly caused them. We found two clusters with normal behavior and one cluster with abnormal behavior. The next step is to understand the data from the business side. This task lies on the product management team which collaborates closely with the station technical experts. The team of station experts will validate our findings and a decision about the next steps will be made: whether we continue the investigation or start building software with our findings in mind.
In case you're interested in another case study on manufacturing, check this post on the problem of false defect detection and how we solved it.
An effective Predictive Maintenance solution requires clear and properly labeled data for at least a few years. The ambiguous data for many of the parameters is not the best for building correct assumptions. It is quite common for companies to collect historical data, but to build an effective Machine Learning solution the collection process should be adjusted appropriately. Our partners had challenges similar to any company that steps on the path of introducing a solution like this.
We advised our partners to update a data-collection pipeline to get all necessary information from the OCPP communication protocol. You can learn more about the creation of a data collection pipeline for ML-powered Anomaly Detection solutions in this article. This is how the data pipeline might look:
Proper data warehouse architecture should also be implemented for better results. In our case, the data on a collection phase was transformed and saved in the aggregated state, but the raw OCPP data was never stored as it is, and this could be improved. Storing raw data is very important for an organization because this allows having all the information in its original state. With raw data, the organization will be able to transform data into different forms, performing deep analysis, generating reports, and merging with other data sources for getting more insights.
Data warehouse architecture suitable for ML project might look like this:
Data analysis is a crucial thing to do before building a powerful ML-based Predictive Maintenance solution. There are some common issues for the organizations, which we talked about in the article, but ultimately knowing and understanding the data makes a business stronger. EV charging company got data analysis and consultations from Intelliarts at the right moment to improve their data collection strategy.
Outcomes of the project so far:
Now our partners have information that the raw data should be collected for at least one year or more in a proper format. They know exactly what type of information is needed for the desired solution and are going to increase the variety of sensors in their stations.
The data labeling should be automated with manual input minimized, while the labels themselves should have only one meaning, and more label categories should be added.
The raw data should be stored separately from application storage. That storage could be used as a source of data for Machine Learning tasks.
At the moment our partners are collecting more data for the project, and we are remaining business partners and continue to work on the project. The data analysis showed some aspects that could be improved to build an ML-powered Predictive Maintenance and Anomaly Detection solution faster and eventually gain a strategic advantage on a competitive EV Chargers market.
We at Intelliarts AI love to help companies to solve the challenges with data strategy design and implementation, so if you have any questions related to ML pipelines in particular or other areas of Data Science — feel free to reach out