Fighting Crime and Saving the Environment through Anomaly  Detection

A few years ago, if someone stole your credit card information, you might not know about it until you noticed that your bill was $200,000 higher than you expected. Now, credit card companies automatically email you if they suspect identity theft. How do they know that someone else might be using your account? The answer lies in a method called anomaly detection.

What is an anomaly? An anomaly is something that stands out, or is a bit unusual or unexpected. For example, let’s say you have a group of numbers [1, 2, 3, 4, 5, -136, 3, 2, 1]. We can tell that -136 stands out — it’s an anomaly. In the case of identifying stolen credit cards, banks use anomaly detection to check for unusual account activity. For example, if you’ve spent your entire life in Princeton and the bank suddenly sees that you’re buying a yacht in Switzerland, this will raise flags that someone else is using your card (this is also why you need to notify banks before you travel abroad). With the ever-increasing amount of information produced daily, detecting these anomalies has become essential to many disciplines, ranging from fraud detection and crime prevention to environmental preservation. Here we look at two common methods of anomaly detection and how they are used to solve some of today’s most challenging problems.

Cluster Analysis

Cluster-2.svg_
Results of a cluster analysis

Given a set of data, cluster analysis separates the data into groups, or “clusters,” such that the data points in the same group are most closely related to each other. After creating different groups within the data, we can find the data points that do not clearly belong in a group and mark them as unusual. Cluster analysis is used extensively in law enforcement to decide where to position officers. Through a cluster analysis of previous crimes, agencies can identify the areas that are prone to dangerous criminal activity and concentrate manpower there. By ignoring the “anomalies” and focusing on areas where crime most often occurs, police departments have been able to reduce crime through efficient allocation of their officers.

Density-based Analysis

SLINK-density-data.svg_
Results of a density analysis

Unlike cluster analysis, which divides data into groups, density-based analysis finds areas with the highest density of data points. Density-based analysis has been particularly useful in environmental applications. A 2010 study on the concentrations of dangerous pollutants in Taiwan used density-based analysis to find areas with high densities of pollutants without needing to collect extensive amounts of data. Using this method, the researchers were able to identify not only “hazard zones” with high amounts of pollutants but also the original sources of the pollutants.

Although methods such as cluster analysis and density-based analysis have proven to be very efficient in detecting anomalies, they are by no means perfect. However, with increasing amounts of data and improving algorithms, anomaly detection has evolved to become a key tool in solving many of the problems we face today. Next time you get a notification from your credit company about fraud, be sure to thank anomaly detection for noticing that you’re not the one buying a $200,000 yacht.

About The Author

Eugene Tang

Eugene Tang is currently a Junior studying computer science at Princeton University. He loves science, but unfortunately could only choose one to major in. When he is not hacking away, he enjoys catching a good game of cards, going for a swim, or of course, reading articles about awesome new innovations in science.