Share this post:
This blog post is an excerpt from our solution tutorial – “Gather, visualize, and analyze IoT data“. The tutorial walks you through setting up an IoT device, gathering mobile sensor data in the Watson IoT Platform, exploring data and creating visualizations and then using advanced machine learning services to analyze data and detect anomalies in the historical data.
So, what is Anomaly Detection?
Anomaly detection is a technique used to identify unusual patterns that do not conform to expected behavior, called outliers. It has many applications in business, from intrusion detection (identifying strange patterns in network traffic that could signal a hack) to system health monitoring (spotting a malignant tumor in an MRI scan), and from fraud detection in credit card transactions to fault detection in operating environments.
In our day-to-day life, knowingly or unknowingly, We carry an IoT device. It is our mobile phone with inbuilt sensors which provides data from accelerometer and gyroscope. How about saving this sensor data somewhere and detect anomalies in that data?
That sounds like a cool idea. How can we achieve this? Do I need to code an app and ask users to download it from the store? Not required. A simple node.js application running on a mobile browser will provide us with the sensor data.
This tutorial uses the following IBM Cloud products:
Here’s the flow or architecture diagram,
Here’s where IBM Data Science Experience comes handy. You will use the Jupyter Notebook that is available in the IBM Data Science Experience service to load your historical data and detect anomalies using z-score. You will start by creating a new project and then import the Jupyter notebook(.ipynb) through URL.
Anomaly detection will be performed using z-score. Z-score is a standard score that indicates how many standard deviations an element is from the mean. A z-score can be calculated from the following formula:
z = (X - µ) / σ where z is the z-score, X is the value of the element, µ is the population mean, and σ is the standard deviation.
Create a new project
- Go to the IBM Cloud Catalog and select Data Science Experience.
- Create the service and launch it’s dashboard by clicking Get Started
- Create a New Project and enter
Detect Anomalyas the Name.
- Create and select Object Storage and Spark services. Refresh
Connection to CloudantDB for data
- Click on Assets > + Add to Project > Connection
- Select the iot-db Cloudant DB where the device data is stored.
- Check the Credentials then click Create
Create a jupyter(ipynb) notebook
- Click New notebook > From URL
Anomaly-detection-samplefor the Name.
https://raw.githubusercontent.com/IBM-Cloud/iot-device-phone-simulator/master/anomaly-detection/Anomaly-detection-DSX.ipynbin the URL.
Check that the notebook is created with metadata and code.
Recommended version for this notebook is
Python 2 with Spark 2.1. To update, Kernel > Change kernel. To Trust the notebook, File > Trust Notebook.
Run the notebook and detect anomalies
- Select the cell that starts with
!pip install --upgrade pixiedust,and then click Run or Ctrl + Enter to execute the code.
- When the installation is complete, restart the Spark kernel by clicking the Restart Kernel icon.
- In the next code cell, Import your Cloudant credentials to that cell by completing the following steps:
- Select the Connections tab.
- Click Insert to code. A dictionary called credentials_1″ is created with your Cloudant credentials. If the name is not specified as “credentials_1”, rename the dictionary to
credentials_1is used in the remaining cells.
- name that is required for the notebook code to run.
In the cell with the database name (
dbName) enter the name of the Cloudant database that is the source of data, for example, iotp_yourWatsonIoTPorgId_DBName_Year-month-day. To visualize data of different devices, change the values of
You can find the exact database by navigating to your iot-db CloudantDB instance you created earlier > Launch Dashboard.
Save the notebook and execute each code cell one after another or run all (Cell > Run All) and by end of the notebook you should see anomalies for device movement data (oa, ob, and og).
You can change the time interval of interest to desired time of the day. Look for
- Along with anomaly detection, the key findings or takeaways from this section are
- Usage of Spark to prepare the data for visualization.
- Usage of Pandas for data visualization
- Bar charts, Histograms for device data.
- Correlation between two sensors through Correlation matrix.
- A box plot for each devices sensor, produced with the Pandas plot function.
- Density Plots through Kernel density estimation (KDE).
via IBM Cloud Blog https://ibm.co/2pQcNaA
February 22, 2018 at 11:45AM