Comparison of BPA and LMA Methods for Takagi - Sugeno Type MIMO Neuro-Fuzzy Network to Forecast Electrical Load TIME Series

This paper describes an accelerated Backpropagation algorithm (BPA) that can be used to train the Takagi-Sugeno (TS) type multi-input multi-output (MIMO) neuro-fuzzy network efficiently. Also other method such as accelerated Levenberg-Marquardt algorithm (LMA) will be compared to BPA. The training algorithm is efficient in the sense that it can bring the performance index of the network, such as the sum squared error (SSE), Mean Squared Error (MSE), and also Root Mean Squared Error (RMSE), down to the desired error goal much faster than that the simple BPA or LMA. Finally, the above training algorithm is tested on neuro-fuzzy modeling and forecasting application of Electrical load time series.


INTRODUCTION
This paper proposes two extended-training algorithms for a multi-input multi-output (MIMO) Takagi-Sugeno type neuro-fuzzy (NF) network and neurofuzzy approach for modeling and forecasting application of electrical load time series. The neurofuzzy approach attempts to exploit the merits of both neural network and fuzzy logic based modeling techniques. For example, the fuzzy models are based on fuzzy if-then rules and are to a certain degree transparent to interpretation and analysis. Whereas, the neural networks based model has the unique learning ability. Here, TS type MIMO neuro-fuzzy network is constructed by multilayer feedforward network representation of the fuzzy logic system as described in section 2, whereas their training algorithms are described in section 3. Simulation experiments and results are shown in Section 4, and finally brief concluding remarks are presented in section 5.

NEURO-FUZZY SYSTEMS SELECTION FOR FORECASTING
The common approach to numerical data driven for neuro-fuzzy modeling and identification is using Takagi-Sugeno (TS) type fuzzy model along with continuously differentiable membership functions such as Gaussian function and differentiable operators for constructing the fuzzy inference mechanism and defuzzyfication of output data using the weighted Note: Discussion is impected before December, 1 st 2007, and will be publiced in the "Jurnal Teknik Elektro" volume 8, number 1 March 2008. average defuzzifier. The corresponding output inference can then be represented in a multilayer feedforward network structure. In principle, the neuro-fuzzy network's architecture used here is identically to the multi input single output architecture of ANFIS [1], which is shown in Figure 1a. Jang introduced 2 inputs 1 output architecture with Takagi-Sugeno type fuzzy model with two rules. The Neuro-Fuzzy model ANFIS incorporates a five-layer network to implement a Takagi-Sugeno type fuzzy system. Figure 1a shows the proposed model of ANFIS and it can process a large number of fuzzy rules. It uses the least mean square method to determine the linear consequents of the Takagi-Sugeno rules. As continuity from ANFIS structure, feedforward multi input multi output is proposed by Palit and Babuška [2] and Palit and Popovic [3] as shown in Figure 1b.  The product nodes (x) represent the antecedent conjunction operator and the output of this node is the corresponding to degree of fulfillment or firing strength of the rule. The division sign (/), together with summation nodes (+), join to make the normalized degree of fulfillment ( b z l / ) of the corresponding rule, which after multiplication with the corresponding TS rule consequent ( l j y ), is used as input to the last summation part (+) at the defuzzyfied output value, which, being crisp, is directly compatible with the actual data.
Forecasting of time series is based on numerical input-output data. Demonstrating this to the NF networks, a TS type model with linear rules consequent (can be also singleton model for a special case), is selected [2]. In this model, the number of membership functions ( M ) to be implemented for fuzzy partitioning of input to be equal to the number of a priori selected rules. In the next section of this chapter, accelerated BPA and LMA will be applied to accelerate the convergence speed of the training and to avoid other inconveniences.

Neuro implementation of Fuzzy Logic System
The fuzzy logic system (FLS) considered in Figure  1b, can be easily reduced to a MISO-NF network by setting number outputs m = 1, it means this structure is similar with Figure 1a. Takagi-Sugeno (TS) type fuzzy model, and with Gaussian membership functions (GMFs), product inference rule, and a weighted average defuzzifier can be defined as (1)-(4) (see [4]). ..
The corresponding th l rule from the above fuzzy logic system (FLS) can be written as where, i x with i = 1, 2, …, n; are the n system inputs,  parameters from the rules consequent, as the equivalent adjustable parameters of the network. If all these parameters of NF network are properly selected, then the FLS can correctly approximate any nonlinear system based on given data.

TRAINING ALGORITHM FOR FEEDFORWARD NEURAL NETWORK
The MIMO Feedforward NF network that is represented in Figure 1b can generally be trained using suitable training algorithms. Some standard training algorithms are Backpropagation Algorithm (BPA) and Levenberg-Marquardt Algorithm (LMA). BPA, the simplest algorithm for neural network (NN) training, is a supervised learning technique used for training artificial neural networks. It was first described by Paul Werbos in 1974, and further developed by David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams in 1986. This algorithm is a learning rule for multi-layered neural networks, credited to Rumelhart. It is most useful for feed-forward networks (networks without feedback, or simply, that have no connections that loop). The term is an abbreviation for "backwards propagation of errors". BPA is used to calculate gradient of error of the network with respect to the network's modifiable weights. This gradient is almost always used in a simple stochastic gradient descent algorithm to find weights that minimize the error.
In forecasting application, pure BPA, has slow convergence speed in comparison to other second order training [2]. That is why this algorithm needs to be improved using momentum and modified error index. Alternatively, the more efficient training algorithm, such as Levenberg-Marquardt algorithm (LMA) can also be used for training the MIMO Feedforward NN systems [3].

BPA and Accelerated BPA
is minimized.
The problem is, how BPA works to adjust parameters ( Furthermore, the gradient steepest descent rule for training of feedforward neural network is based on the recursive expressions: Where SSE is the performance function at the th k iteration step and ( ) ( ) ( ) ( ) are the free parameters update at the next th k iteration step, the starting values of them are randomly ..
Therefore, the corresponding chain rules are By substituting equation (21-24) into the (9-12), updated rules for free parameters can be written as where l h term is the normalized degree of fulfillment of the th l rule.
To improve the training performance of feedforward NN, additional modified error index version should be added to the modified BPA with momentum, as proposed by Xiaosong et al. [5]. This approach can be seen in equations (36-41).
The constant gamma γ should be defined properly.

Accelerated LMA
To accelerate the convergence speed on neuro-fuzzy network training that happened in BPA, the Levenberg-Marquardt algorithm (LMA) was proposed and proved [4].

If a function ( )
is meant to minimize with respect to the parameter vector w using Newton's method, the update of parameter vector w is defined as: is the Hessian matrix and is taken to be SSE function as follows: are generally defined as: where the Jacobian matrix ( )  (46) is assumed to be zero. Therefore, the update equations according to (42) will be: Now let us see the LMA modifications of the Gauss-Newton method.
) identity matrix, and the parameter µ is multiplied or divided by some factor whenever the iteration steps increase or decrease the value of ( ) Here, the updated equation according to (43) This is important to know that for large µ , the algorithm becomes the steepest descent algorithm with step size 1/ µ , and for small µ , it becomes the Gauss-Newton method.
For faster convergence reason and also to overcome the possible trap at local minima and to reduce oscillation during the training [6], like in BPA, a small momentum term mo (practically in electrical load forecasting, adding mo around 5% to 10% will give better results) also can be added, so that final update (50) becomes Furthermore, like in BPA, Xiaosong et al [5] also proposed to add modified error index (MEI) term in order to improve training convergence. Similar to equations (39-41), the corresponding gradient with MEI can now be defined by using a Jacobian matrix as: (70) This can also be written in matrix form using pseudo inverse as The terms eqv E (is the equivalent error vector), eqv D and A are matrices of size (Nx1), (MxN)    From Figure 3a, it can be seen that LMA has much faster training compared to BPA. LMA needs only around 100 epochs to achieve the performance, whereas BPA needs 1000 epochs. See Table 1 for detailed comparison between these two methods.
The result shows that SSE1, SSE2 and SSE3 indicate the sum squared error at output 1, 2 and 3, respectively, of the Takagi-Sugeno-type MIMO NF network, and SSE indicates the summing of all SSE values which are contributed by all outputs. The similar definition is also applied to MSE1, MSE2, MSE3, RMSE1, RMSE2, and RMSE3.

CONCLUSION REMARKS
In the paper neuro-fuzzy approaches with two types of training algorithms have been presented for shortterm forecasting of electrical load. Performance results from Section 4 proved that the trained NF network using LMA is found to be very efficient in modeling and prediction of the various nonlinear dynamics, compared to BPA. An efficient training algorithm based on combination of LMA with additional modified error index extension (MEI) and adaptive version of learning rate (momentum) have been developed to train the Takagi-Sugeno type multi-input multi-output (MIMO) and multi-input single-output (MISO) Neuro-fuzzy network, improving the training performance. Several issues need to be addressed in the future works which includes mainly transparency and interpretability of generated fuzzy model (rules). For the latter issue set theoretic similarity measures [7] should be computed for each pair of fuzzy sets and the fuzzy sets which are highly similar should be merged together into a single one.