“You essentially have software writing software,” says Jen-Hsun Huang, CEO of graphics processing leader Nvidia.
The focus of this article is to get started with creating a model using Deep Learning algorithms. In this piece, I will not compare it with human brains or our intelligence (we have a lot of great theories and articles on those concepts), rather, I’ll try to touch upon the facts on how deep learning is helping us in solving some of the most major problems of Computer Science and early Artificial Intelligence. We will also touch upon the advantages and the applications of deep learning, followed by answers to some questions we had while creating a deep learning model. We will have references and pointers to the tips and tricks, presentations, and articles of top Deep Learning practitioners. We will have many sequels for this topic to get deep into Deep Learning. For instance, part 2 of this series will be focused on creating a Deep Learning model using H2O.ai‘s Deep Learning package.
To begin with, I am not a Deep Learning champion. Rather, I’m a learner of this technique and sharing my thoughts and knowledge through this article to encourage someone else to learn it. If you find any omissions, errors or points you want to cover, please drop a line in the comments section. It will help me learn and also others to read.
Being a Data Science professional, what moved me to Deep learning is its expanding application in solving various severe problems. Deep Learning is now a trend and applications are immense. It is one of the most growing and exciting branches of Machine Learning. What makes it so interesting, is its ability to learn the hidden features from the data which helps a machine to learn a task without being explicitly programmed (rule-based systems) or supply handcrafted features (used for other Machine Learning algorithms to improve their learning capability).
With the approach of Deep Learning, those who benefit the most are researchers and practitioners in the fields of image, speech, and video recognition, who are seeing some actionable results. It has helped AI to get close to its original goal of becoming the brains of the robot. It also has a role to play in the growing field of the Internet of Things (IoT). Ajit Jaokar and I wrote a previous article where we described how Deep Learning plays a role in IoT and why H2O’s deep learning algorithm is suitable for it. For enterprises, deep learning is already playing an important role in streamlining customer service and assisting in automation of many human-intensive tasks. You might have already encountered bots powered by deep learning trying to answer your product queries or helping you book your favorite pizza orders. We are already enjoying speaking to Apple’s Siri and Microsoft’s Cortana, where Deep learning is a crucial component. Deep learning is also unleashing improvements in the field of medicine and health care. Don’t be surprised if computers start diagnosing your diseases by reading X-rays and MRI scans. If this inspires you, then take a look at the 13 companies that use deep learning to produce actionable results.
The most remarkable thing about deep learning is that we don’t program them to perform any of the acts described above. Rather, we feed the deep learning algorithm with tons of data such as images or speeches to train it, and the algorithm figures out for itself how to recognize the desired targets. The ability of Deep Learning methods to learn complex nonlinear relations by churning high amount of data, creating features by themselves makes it stand out from the other traditional Machine Learning techniques.
To know how a standard Deep Learning algorithm works, we have to follow its predecessors, neural networks. Well, some practitioners also refer Deep learning as Deep Neural Networks, which is also a choice. In short, a neural network is a family of three layers — an input layer, hidden layer, and an output layer as discussed below.
- The input layer consists of neurons that accept the input values. The output from these neurons is same as the input predictors.
- The output layer is the final layer of a neural network that returns the result back to the user environment. Based on the design of a neural network, it also signals the previous layers
on how they have performed in learning the information and accordingly improved their functions. - Hidden layers are in between input and output layers. Typically, the number of hidden layers range from one to many. It is the central computation layer that has the functions that map the input to the output of a node.
I have explained a bit more about neural networks in my upcoming book, Data Science using Oracle Data Miner and Oracle R Enterprise, published by Apress. To understand neural networks better; you can refer some of the below books:
What Led From Neural Networks to Deep Learning?
- The introduction of ‘Deep’ architecture that supports multiple hidden layers. This creates multiple levels of representation or learning a hierarchy of feature which was absent for early neural networks.
- Improvements and changes to support for a variety of architectures (DBN, RBM,CNN, and RNN) to suit different kinds of problems.
- The ability of the optimized algorithms for handling computation with a large scale of data.
- Introduction of optimization and regularization parameters, such as dropout to reduce overfitting the training data.
- Availability of Deep Learning packages in open source and widely used programming languages, which brings innovations into the field.
Adding just multiple hidden layers doesn’t make it more productive. So, how can you can you make your Deep Learning more productive?
- More data and even more data: The more data you supply to train the Deep Learning algorithm, better it becomes. Also, the data should be a good mixture of positive and negative cases, so that it acquires actual knowledge to distinguish different cases correctly. The growth of Deep Learning and their accuracy is linked with the rise of Big Data. Digitalization and Big Data processing frameworks helped deep learning getting the full stomach food they wished.
- The art of tuning the model: The hyperparameters and knowledge of many available hyperparameters in deep learning models are necessary to tune a deep learning model. It optimizes the Deep Learning algorithm’s performance on a data set and helps learn it accurately. In our next article, we will discuss these parameters when we run through creating a deep learning model using H2O.
With this, we come to a problem where the sophisticated deep neural networks require a massive amount of computing performance for training. This obstacle is now managed with the arrival of high-powered GPU at a reasonable price from players like Nvidia and Intel. It is seen that using GPU a deep learning model can be trained way faster than using a CPU. The Nvidia blog notes that in a benchmark study, with GPU acceleration, neural net training is 10-20 times faster than with CPUs. That means training is reduced from days or even weeks to just hours.
R users can create Deep learning using the below packages. The package names and some descriptions are directly taken from CRAN. We discuss and create Deep Learning models using H2O and R in the next article.
- h2o: R Interface for H2O. This offers R scripting functionality for H2O, the open source math engine for big data that computes parallel distributed machine learning algorithms, such as generalized linear models, gradient boosting machines, random forests, and neural networks (Deep Learning) within various cluster environments.
- mxnet: brings flexible and efficient GPU computing and state-of-art deep learning to R.
- deepnet: Deep Learning toolkit in R.
- neuralnet: Training of neural networks. Train neural networks using back propagation, resilient backpropagation with (Riedmiller, 1994) or without weight backtracking (Riedmiller and Braun, 1993) or the modified globally convergent version by Anastasiadis et al. (2005). The package allows adjustable settings through a custom choice of error and activation function. Furthermore, the calculation of generalized weights (Intrator O & Intrator N, 1993) is achieved.
- rnn: Recurrent Neural Network. Implementation of a Recurrent Neural Network in R.
- darch: Package for Deep Architectures and Restricted Boltzmann Machines. The darch package is built by the code from G. E. Hinton and R. R. Salakhutdinov (available under Matlab Code for deep belief nets). This method includes a pre-training with the contrastive divergence method published by G.E Hinton and a fine tuning with conventional known training algorithms like backpropagation or conjugate gradients. Additionally, supervised fine-tuning can be improved with max out and dropout, two recently developed techniques to improve fine-tuning for deep learning.
- autoencoder: Sparse autoencoder for automatic learning of representative features from unlabeled data.
Download your favorite Deep Learning package and start coding. To help you with getting started, discussed below are some possible answers to some of the fundamental questions comes to our mind when we start creating a deep learning model.
- How many neurons to use the input layer? The number of inputs or features
- How many hidden layers should we use? To figure out how many hidden layers to use, you have to rely on standard machine learning cross validation.
- How many neurons should we use in each hidden layer? There is no miraculous technique for selecting the optimum number of hidden neurons. The most commonly relied rule of thumb is the optimal size of the hidden layer usually being between the size of the input and size of the output layers. Also, a rough approximation can be taken by the geometric pyramid rule proposed by Masters, which is for a three-layer network with n input and m output neurons; the hidden layer would have sqrt(n∗m)sqrt(n∗m) neurons.
- Can a Deep Learning network work without hidden layers? Yes, it can. Please don’t call it a “deep” learning network. It is swallow and so can work well on linearly separable data.
- How many output layer neurons should we have? The number of target classes we hold.
Here are few useful resources which would come handy while creating a Deep Learning model
- Top 13 Deep Learning Tips and Tricks by Arno Candel, Chief Architect, H2O.ai
- Deeplearning4j’s Beginners Guide to Deep Learning.
In part 2 of the article, we will have some activity where we create a deep learning model using the aforementioned h2o package.