Datasets Deep Learning Client App Tutorial
Video Tutorial
The purpose of this sample project is to show you how to use datasets and datascience to predict if a customer loan will be accepted or denied.
The video tutorial below summarizes the steps you need to take to build the deep-learning model using the datasets.
In the next sections you will find the detailed transcript of the video and the code lines.
Two copies of datasets from FusionCreator will be used: Customer Loans and Demographics. The copies are downloaded from your Azure BlobStorage .
Get it from GitHub
If you have a GitHub account, here’s the link to the sample app repository: https://github.com/FusionFabric/ffdc-sample-deeplearning
Clone it and follow the instructions from the README.md file.
Prerequisites
To start building this client application
You must register an application on FusionCreator that includes the two previously mentioned datasets from the Dataset Catalog. After registration, download the datasets from Azure. See step 6 from the Application Wizard and Datasets for more information.
You need a recent Python installation on your machine. However, due to the dependency on the
keras
library, you must install a compatible Python version, such as 2.7 or 3.6, or previous versions. See the library repository Readme.md for details.Install Jupyter Notebook, that allows you to run the deep-learning program found in GitHub repository:
pip install notebook
.Install the Python libraries that you will use for this sample application:
Create a Python model file and name it datasets-neural-network.py. You will add here all the required code for your client application.
Copy the following code to your model file to import the libraries into your model, in order to handle and manipulate the datasets.
datasets-neural-network.py
Load Data
In this section you load the Customer Loans and Demographics datasets you downloaded from your Azure Data Share. Create a folder named Data in your working directory. Copy the datasets to this folder.
Load the customer_loans.csv dataset to your model and see its top 5 entries:
datasets-neural-network.py
Load the customer_demographics.csv dataset to your model and see its top 5 entries:
datasets-neural-network.py
Examine Missing Data
The next code lines check if there are any columns with null values in the datasets. A zero value means that there are no null values in that column.
datasets-neural-network.py
Examine Outliers
In this section you check if the loaded data has any outliers on the numerical columns beyond three standard deviations.
datasets-neural-network.py
datasets-neural-network.py
Having no outliers means that the data is clean.
Descriptive Statistics
In this section you see the descriptive statistics for the numerical columns of the data loaded into your model, such as count, mean, standard deviation, minimum value, maximum value and the 25%, 50% and 75% quantiles.
datasets-neural-network.py
Data Histograms
Use the code lines below to plot a histogram with the distribution of the data:
datasets-neural-network.py
datasets-neural-network.py
Customer Location Heatmap
You generate a Heatmap that shows the location of the customers based on latitude ang longitude columns from the Demographics dataset. The red color on the Heatmap defines a higher density of customers.
datasets-neural-network.py
Join Datasets Together
You analyzed the datasets independently until now. Using the following code lines, you join the Customer Loans and Demographics datasets together. See the top 5 entries from the joined dataset.
datasets-neural-network.py
Create Features
This section demonstrates how to create a base feature set for your deep learning model. You use some of the columns from the previously joined data.
You build a neural network classifier which predicts “Approved” or “Denied” based on the input features.
The target variable is the column LoanStatus, which receives the values “Approved” or “Denied”.
datasets-neural-network.py
Scale Data
Scale your data using MinMaxScaler from sklearn library.
datasets-neural-network.py
Neural Network Model Creation
Start building your neural network model by defining two functions.
- create_model - builds a feed forward neural network for binary classification
datasets-neural-network.py
- train_and_evaluate_model - trains the network and returns the validation accuracy for the K-Fold cross validation
datasets-neural-network.py
If you run this model on a Mac OS Machine, replace
with
K-Fold Cross Validation
In this section you set the K-Fold Cross Validation, that trains and tests the model with every data point from the train and test set. You use 3 different splits for training the model. After the run, if the Estimated Accuracy is closer to 1, the better is the model.
datasets-neural-network.py
Model Creation After K-Fold Cross Validation
If K-Fold Cross Validation accuracy is high, you can use your model for prediction.
Retrain your model with more train and test data, to fit the final model.
datasets-neural-network.py
Model Performance
The code lines below plot the loss and the accuracy for the train and test set.
datasets-neural-network.py
datasets-neural-network.py
If you run this model on a Mac OS Machine, replace:
with
and
with
Final Code Review
Here are the code files discussed on this page.
datasets-neural-network.py