MATRIX EQUATIONS IN DEEP LEARNING RESOLUTION FOR M DATA HAS N PARAMETERS

This article on the vectorization of learning equations by neural network aims to give the matrix equations on [1-3]: first on the model Z [8, 9] of the perceptron [6] which calculates the inputs X , the W weights and the bias, second on the quantization function [10] [11], called the loss function [6, 7] [8]. and finally the gradient descent algorithm for maximizing likelihood and minimizing Z errors [4, 5] that can be applied in the classification of emotions by facial recognition.


Introduction
The important equations are based on the neural network, more precisely in the perceptron. Our concern in this writing is to implement a generalized solution based on the model. Ζ = w .x + w .x + b, the cost function L = − ∑ y ( ) * log a ( ) + 1 − y ( ) * log (1 − a ( ) ) and on the descent of gradients These different equations will have to be vectorized to have not only for 2 types of emotions with 2 parameters but to bring out the matrix equations of 8 (P) emotions with K parameters each. Our article returns to a review of the literature, the methodology used and finally the presentation of the experimental results.

Machine learning equations 2.1. The model
The perceptron model is a basic unit of neural networks [2], it is a binary classification model, capable of linearly separating 2 classes of points by a decision boundary, including positive and negative emotions. This model can be expressed by the following equation [13]: To improve the model, it is necessary to accompany this model with a probability [14], the further the face of an individual is from the decision boundary [15], the more it will be obvious (probable) that it belongs to a class determined, and the logistic activation function is a sigmoid function [16] of the form: Z(a) = which allows to convert the output Z into a single probability which follows Bernoulli's law [17], that of a face belonging to a single class, later in the CNN learning, we use a function ReLU which is defined as [ 3]: f (x) = max (0, x). Si x < 0, so f(x) = 0. Si x ≥ 0, so f(x) = x and which increases the chances of the network converging and does not cause saturation of the neurons unlike to the two functions tanh and sigmoid. With , data associated with weight , and an additional coefficient b, called the Bias. The objective is to adjust the parameters b and W to have an efficient model that makes the smallest errors between the outputs a(Z) and the real data Y [16,18].

The cost function [18]
A function that allows to quantify the errors made by the model in a classification, which allows to measure the distances between the outputs a (Z) and the data Y, with the aim of maximizing the likelihood L by minimizing the function -log (L). This function is of the form: 2.3. The Gradient Descent Algorithm [19] Starting from our model Z and the function L, we need an automatic mechanism which consists in adjusting the parameters W and b of the function to minimize the errors of the model, that is to say to minimize the function cout (Log LOSS) hence the need to determine how this function varies according to the different parameters. The calculation of the gradient is simply the derivative of the function cout and is carried out according to the form below [6,16]: We have : = ∑ ( a ( ) − y ( ) ). x for the derivative ∂w1, = ∑ ( a ( ) − y ( ) ). x for The vectorization of equations [20,21] Machine learning by neural network requires the use of certain elementary operations that we recover from matrix equations: addition and subtraction, transpose and multiplications [22].
We have a convention, where m is the number of data and n: number of variables in our DataSet n (positive emotion and negative emotion).

The X to Z Matrix Transformation
X (m, 2), W (2,1), b (m, 1) and if we think of rewriting in matrix form, we have Z= X*W +b can be a real number b, such that b = b ∈ ℝ, with the principle of broadcasting, we also transform the equation a into a matrix.

Thus, we can replace each term by :
we can go out σ ; where Z is vector.

Vectorization of Gradient Descent functions.
The model has three parameters, namely w , w , b. where w , w ∈ W,

Vectorization on a DataSet of P data of K parameters
On aura:

Conclusion
Our approach being to bring out the material equations of three essential points in automatic learning by neural network, I quote the model Z with its inputs (X1…Xn) associated with the weight W, and the bias b, generalize a matrix for the cout function and the Gradient Descent algorithm, this procedure based on the vectorization of equations and the generalization of the latter can be applied to our dataset FER13 [23] dealing with facial emotional effects which may in the long run be used in the