13.1 Introduction to Artificial Neural Networks
Artificial Neural Networks (ANN) are models used for classification and continuous numeric prediction. Although it is sometimes considered as a “black box” due to its difficult interpretation, it is also known to be very accurate.
It is called a neural network because it is of its similarities to biological activity in the brain, where neurons are connected by networks of nerve fibers called axons. Axons carry electrical impulses between nuerons. The more specific axons are used by deliberate attention and practice, the more the axon gets reinforced so that it conducts synaptic activity faster and easier. In this way, we learn from experience and complex sets of inputs can be translated into appropriate actions and decisions.
Neural networks are commonly used in a wide variety of use cases such as currency market trading, bankruptcy predictions, picking stocks and commodity trading, detecting fraud in credit card and monetary transactions, and customer relationship management (CRM). The main strength of ANN is its high predictive performance. This occurs because the structure of neural networks supports the capture of complex relationships between its predictors and responses, which is not possible with many other machine learning algorithms.
The main concepts behind ANN is to combine the input data in a flexible way that captures the relationships among these variables and between them and the response variable. For example, in a linear regression model it forms the relationship between the response and predictor variables chosen by the user.
In many cases the right form of the relationship is too complicated to be captured accurately by simpler algorithms. For example, with multiple linear regression (MLR) we can include different input variables but the specified form of relationship remains linear. ANN however can recognize more complex interactions between variables and can model non-linear relationshps between input and output variables.
Structure of Neural Networks
Layers, Nodes, and linkages
ANN uses a series on networked linkages. There is an input layer that consists of nodes that accept the input values from variables. Successive layers are called hidden layers. Nodes in the hidden layer(s) receive inputs from the previous layers. The outputs of nodes in each layer are inputs to nodes in the next layer. The last layer is called the output layer.
ANN uses a number of interconnected computational elements referred to as nodes or neurons. The nodes or neurons are represented by circles. Connections from one node to another is represented by arrows. Values on the connecting arrows are called weights. The weights on arrows are like coefficients, subject to iterative adjustments.
The basic idea behind these components of ANN is it uses a combination of input information in a complex neural net model. The model coefficients are constantly tweaked in an iterative process. If increasing the weight results in better model performance the weight is increased. If increasing the weight results in poorer model performance, the weight is reduced. The network’s performance in classification and numeric prediction informs successive tweaks.
The figure below shows a diagram for this architecture with one input layer, two hidden layers, and one node in the output later depicting the numeric target value to be predicted.
The neural network below has only one hidden layer. For simplicity of expression, the arrow heads on the lines are omitted. There are two output nodes in this example; one represents one possible outcome of a binary outcome (e.g., TRUE). The other represents the other possible outcome of the binary outcome (FALSE).
Neurons
The basic unit in a neural network is neuron, sometimes called a node. The neuron receives input from other nodes and computes an output.
An elementary neuron with the number of R inputs is shown below. Each input is weighted with an appropriate w. The sum of the weighted inputs and the bias forms the input n to the transfer function f. Neurons may use a variety of transfer function to generate their output.
Transfer Functions
The function f, is also known as the transfer function or activation function. Tranfer functions operate in the hidden layer of the neural network and at the output layer. A transfer function can be chosen that provides a linear or nonlinearity output. Common transfer functions you may encounter are linear functions, logistic functions, and tanh functions.
Choosing the form of transfer function for the output layer depends on the type of output desired. A linear output function makes sense when the expected output is essentially linear, like when predicting continuous numeric outcomes, such as when conducting multiple linear regressions (MLR). A logistic transfer function is useful when the outcome needs to be a probability, which falls within the range of zero to one.
Linear activation functions produce straight lines and are not limited in terms of how big or small the outcome value is.
Below is a image of the logistic function. Regardless of the value of x, the outcome is bounded between zero and one. While the logistic function is useful as an output function, it is rarely used as a transfer function in the hidden layer because it can cause neural networks to get stuck during training.
The tanh activation function has several properties that make it a better choice for a transfer function in the hidden layer. It does not tend to get stuck during training. It also translates negative inputs into negative outputs and inputs near zero into outputs near zero.
Types of Effects
Simple, additive effects
When we worked with multiple linear regression (MLR) models in this class, we learned that they are additive simple models. They are additive in the sense that price could equal the sum of the effect of car age on price and the effect of car mileage on price. The effect of age on price is independent of the effect of mileage on price.
Interaction effects
Sometimes the effect of one input variable on the outcome depends on the value of another input variable on the outcome. When this happens, it is called interaction. The outcome value depends on the interaction between multiple variables.
Consider the example of baking batches of cookies. Temperature of the oven and length of time in the oven both influence the yield of cookies being baked. When the oven temperature is low and cookies are left in the oven for a short time, some of the cookies will not be fully cooked. When the oven is low and cookies are left in the oven for a long time, the cookies can fully cook so many of the cookies will be usable. When the oven is hot and cookies are left in the oven for a short time, the cookies will get fully cooked but will not overcook, so many cookies will be useable. Finally if the oven is hot and the cookies are left in a long time, many of the cookies will be overcooked and therefore not useable. The table below shows this example for the cookies.
The table above shows interactions in tabular form. The plot below is called an interaction plot to show the interaction in a chart. Does adding heat improve yield? It depends on whether time is short or long. Does adding time improve yield? It depends on whether heat is low or high.
An advantage of neural networks over simple models like MLR and CART, is that neural networks can automatically include the effects of interactions among variables. When such interactions exist, the network can measure and include them to produce more accurate models. When no interaction effects exist, the neural network will model the more simple effects.