Neural Networks
In order to get into the workings behind a neural network, let's build up from the concepts of logistic regression.
Remember this flow chart from logistic regression?
This, Logistic regression, can be visualized as a very basic neural network :
Where one neuron computes z and then applies the $ Z = w^TX+b $ and then applies a non-linear activation function $ A = g(Z)$ (where $g(z) = \sigma(z)$ in this case)
A neural network is basically a logistic regression on steroids!
This is what a typical neural network looks like :
It contains 3 layers; The input layer, The hidden layer(s), and The output layer.
In a neural network, the input layer contains the values x, and the output layer contains the predicted value of y.
The layers in between are called "hidden layers" because the true value of these nodes are not observed.
Each neuron in a neural network does the exact same thing here as well (calculates $ Z = w^TX+b $ and then applies a non-linear activation function $ A = g(Z)$)
If I mark everything in a very simple neural net, this is what it would look like :
This here, is a 2 layer neural network.(we don't count the input layer, so $1^{st}$ layer is $A^{[1]}$ and second layer is $A^{[2]}$)
(the one above it, is a 3-layered one)
Here we've written terms like, w[1] and a[1]
But here, [1] isn't referring to the $1^{st}$ example, it refers to the $1^{st}$ layer of neurons, so $w^{[1]}$ refers to the weights associated with the $1^{st}$ layer of neurons.
By this same convention, we sometimes refer the input layer as $A^{[0]}$
Also!
Important part!
$w^{[l]}_{jk}$ refers to the weight from the $k^{th}$ neuron in the $(l-1)^{th}$ layer to the $j^{th}$ neuron in the $l^{th}$ layer.
(Remember this, it's important.)
We see here that an example x is an (3x1) matrix like $$ x= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = a^{[0]}$$
$a^{[1]}$ and $b^{[1]}$ will be 3 dimensional since we have 3 nodes.
Therefore,
$$a^{[1]} = \begin{bmatrix}a^{[1]}_1 \\ a^{[1]}_2 \\ a^{[1]}_3 \end{bmatrix} b^{[1]} = \begin{bmatrix}b^{[1]}_1 \\ b^{[1]}_2 \\ b^{[1]}_3 \end{bmatrix}$$
and $W^{[1]}$ will be a 3x3 matrix:
$$W^{[1]} = \begin{bmatrix} w^{[1]}_{11} & w^{[1]}_{12} & w^{[1]}_{13} \\ w^{[1]}_{21} & w^{[1]}_{22} & w^{[1]}_{23} \\ w^{[1]}_{31} & w^{[1]}_{32} & w^{[1]}_{33} \end{bmatrix}$$
And $a^{[2]}$ would be a number (1x1 matrix) since that gives the prediction.
$b^{[2]}$ would also be a number (1x1 matrix)
And $$W^{[2]} = \begin{bmatrix} w^{[2]}_{11} & w^{[2]}_{12} & w^{[2]}_{11} \end{bmatrix}$$
To help remember these, you can see how each nodes are stacked vertically and therefore, each expression corresponding to different nodes are also stacked vertically in that layer.
Therefore, no. of rows of $A^{[l]}$/$b^{[l]}$ = no. of nodes in layer $l$
Also for matrix of weights, $W^{[l]}$, it'll have the the dimension ($n^{[l]}, n^{[l-1]}$) where $n^{[l]}$ is the no. of neurons in layer $l$ and $n^{[l-1]}$ is the number of neurons in layer $l-1$
If we see what each of these nodes of the $1^{st}$ layer do :
$$z^{[1]}_1 = w^{[1]T}_1x + b^{[1]}_1 = \sigma (z^{[1]}_1)$$
$$z^{[1]}_2 = w^{[1]T}_2x + b^{[1]}_2 = \sigma (z^{[1]}_2)$$
$$z^{[1]}_3 = w^{[1]T}_3x + b^{[1]}_3 = \sigma (z^{[1]}_3)$$
We can see, that we have a matrix :
$$W^{[1]} = \begin{bmatrix}-&w^{[1]T}_1 &- \\-&w^{[1]T}_2 &-\\-&w^{[1]T}_3 &- \end{bmatrix}$$
(You might be tempted to call it $W^{[1]T}$ but that just messes up the whole notation and the whole dimension uniformity and what not)
To get the expression $Z^{[1]}$, we can do a matrix multiplication something like this
$$W^{[1]}*x = \begin{bmatrix}-&w^{[1]T}_1 &- \\-&w^{[1]T}_2 &-\\-&w^{[1]T}_3 &- \end{bmatrix} * \begin{bmatrix}x_1 \\ x_2 \\x_3 \end{bmatrix}$$ to get $$\begin{bmatrix}-&w^{[1]T}_1x &- \\-&w^{[1]T}_2x &-\\-&w^{[1]T}_3x &- \end{bmatrix}$$
and now adding with vector $b^{[1]} = \begin{bmatrix}b^{[1]}_1 \\ b^{[1]}_2 \\ b^{[1]}_3 \end{bmatrix}$ we get our desired equations :
$$\begin{bmatrix}-&w^{[1]T}_1x + b^{[1]}_1 &- \\-&w^{[1]T}_2x + b^{[1]}_2 &-\\-&w^{[1]T}_3x + b^{[1]}_3 &- \end{bmatrix} = W^{[1]}x + b = z^{[1]} = \begin{bmatrix}z^{[1]}_1 \\ z^{[1]}_2 \\ z^{[1]}_3 \end{bmatrix}$$
And then $a^{[1]}$ is simply $\sigma (z^{[1]})$
Therefore, a forward pass is something like this :
$z^{[1]} = W^{[1]}a^{[0]} + b^{[1]}$ (because input x, is referred to as $a^{[0]}$)
$a^{[1]} = \sigma (z^{[1]})$
$z^{[2]} = W^{[2]}a^{[1]} + b^{[1]}$
$a^{[2]} = \sigma (z^{[2]})$
We can expand the same set of equations to vectorize the feed forward over the entire training examples as:
$Z^{[1]} = W^{[1]}A^{[0]} + b^{[1]}$ dimensions : (3,m) = (3x3)*(3xm) + (3,1)
$A^{[1]} = \sigma (Z^{[1]})$ dimensions : (3xm) = $\sigma$(3xm)
$Z^{[2]} = W^{[2]}A^{[1]} + b^{[2]}$ dimensions : (1xm) = (1x3)*(3xm) + (1x1)
$A^{[2]} = \sigma (Z^{[2]})$ dimensions = (1xm) = $\sigma$(1xm)
Here, the different examples are just stacked horizontally (as they were in logistic regression)
Even in the deep neural networks, you would notice that the only thing that changes is the number of layers, but the essentially the same set of calculations are done there as well.
That's all for feed forward in a simple neural network.
In the following articles, we'll see how we can improve the cost function by trying out different non-linear activation functions and following that we'll discuss about how to train this neural network with more efficient activation function and backpropagation.
Hope you understood this article.
Until next one.
Ciao
Satwik
Previous Post : Backpropagation (Part 1)
Next Post : Activation Functions
Comments
Post a Comment