+ All documents
Home > Documents > Radial Basis Function Networks (RBFN)

Radial Basis Function Networks (RBFN)

Date post: 14-May-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
41
Radial Basis Function Networks (RBFN) by Ulf Ekblad Essay for the course Artificial Neural Networks, 5A5435, 5 p. Examiner: Thomas Lindblad Spring semester 2002
Transcript

Radial Basis Function Networks (RBFN)

by

Ulf Ekblad

Essay for the course Artificial Neural Networks, 5A5435, 5 p.

Examiner: Thomas Lindblad Spring semester 2002

2

Contents

1. Introduction ............................................................................. 4

2. Biological systems.................................................................. 5

2.1 Biological neurons..................................................................................................5

2.2 Biological model for artificial neuron networks .................................................5

3. Artificial neural networks ....................................................... 7

3.1 Definitions...............................................................................................................7

3.2 Perceptrons .............................................................................................................7

4. Radial Basis Functions (RBF) ................................................ 9

4.1 Introduction............................................................................................................9

4.2 Weight determination ..........................................................................................10

4.3 Function generalisation and approximation .....................................................11

5. Radial Basis Function Networks (RBFN)............................. 12

5.1 Introduction..........................................................................................................12

5.2 Parameter determinations...................................................................................12

5.3 Training ................................................................................................................13 5.3.1 Model complexity .....................................................................................................................13 5.3.2 Supervised training ...................................................................................................................14 5.3.3 Two-stage training ....................................................................................................................14 5.3.4 Unsupervised training ...............................................................................................................15 5.3.5 Training methods comparisons .................................................................................................17

5.4 Regularisation ......................................................................................................17

5.5 Pruning and growing ...........................................................................................18 5.5.1 Basis function selection: the forward-selection method............................................................18 5.5.2 Basis function elimination: the backward-elimination method .................................................19

3

5.6 Scaling and the width parameter........................................................................19

5.7 Normalisation .......................................................................................................20

6. Applications of RBFN............................................................ 21

6.1 Introduction..........................................................................................................21

6.2 Radar target recognition .....................................................................................21 6.2.1 Automatic target recognition.....................................................................................................21 6.2.2 Target classification..................................................................................................................22

6.3 Interference cancellation .....................................................................................23 6.3.1 Introduction...............................................................................................................................23 6.3.2 RBFNs used for simulations .....................................................................................................24 6.3.3 Simulations ...............................................................................................................................25 6.3.4 Network evaluations .................................................................................................................25

6.4 Chaotic processes .................................................................................................26 6.4.1 Introduction...............................................................................................................................26 6.4.2 Simulations ...............................................................................................................................27 6.4.3 Conclusions...............................................................................................................................28

6.5 Classification ........................................................................................................30 6.5.1 Theory.......................................................................................................................................30 6.5.2 Classification of images ............................................................................................................31 6.5.3 Conclusions...............................................................................................................................34

6.6 English pronunciation learning ..........................................................................34 6.6.1 NETtalk ....................................................................................................................................34 6.6.2 Simulations ...............................................................................................................................35 6.6.3 Conclusions...............................................................................................................................36

7. Conclusion............................................................................. 37

8. Acronyms............................................................................... 39

9. Bibliography .......................................................................... 40

4

1. Introduction Radial basis function networks (RBFNs) are special cases of artificial neural networks. Biological systems are used as model for artificial neural networks in the way that artificial neural networks simulate the internal structure of the brain with its neurons and the interconnections between them. Outside stimuli are the input to the system (i.e. the network). Within the network the signals propagate in ways that are controlled by weighting the connections between the nodes (i.e. the neurons). In artificial neural networks, the weights are adjusted by a process called training.1 The training procedure can either be automatic (which is called unsupervised training or training without a teacher) or needing a response from a teacher (which is called supervised training or training with a teacher). RBFNs are special cases of artificial neural networks in the senses that each neuron consists of a radial basis function (RBF), that there is only one hidden layer of neurons, that there is only one output node, and that two-stage training procedure can be used. Unfortunately, the terminology in the area of artificial neural networks is not standardised. It is, however, the hope of the author that the terminology used in this essay will neither be inconsistent, nor confusing. We will here give a short description of the biological systems that are the models of artificial neural networks before giving a general but brief overview of artificial neural networks. After these fundamental background concepts, we describe radial basis functions and then going to the main part of this essay, which are radial basis function networks. We start the descriptions of the RBFNs by presenting the theory and end it with some brief accounts applications of RBFNs. The applications consist often of simulations of real applications. These simulations, which are from many different areas of applications, then show to what an extent the RBFNs may be useful and whether they present any advantages over artificial neural networks, which are not of the RBFN type.

1 G. Binning, M. Baatz, J. Klenk, and G. Schmidt, ”Will machines start to think like humans? Artificial versus natural Intelligence.”, Europhysics News, Vol. 33, No. 2, March/April 2002, p. 45.

5

2. Biological systems

2.1 Biological neurons The idea behind artificial neural networks is the functioning of the brain. The human brain consists of neurons, elementary nerve cells, that communicate through a network of axons and synapses. The axons (as they were later to be called) were first described in 1718 by van Leuwenhook and, in 1824, the first observations of neurons were made by Dutrochet. The present day image of the nerve cell is due to Deiters in 1865 and, in 1897, Sherington discovered the synapses.2 There are about 1011 neurons in the human brain and 105 - 104 synapses per neuron. The communication is performed by electrical impulses, which propagate at speed of about 100 m/s. The total number of connexions is estimated to be around 1015. The neuron system is very flexible, especially in childhood but seems to persist during the whole lifetime. For instance, in the cat brain, the number of synoptic contacts has been observed to increase from a few hundred to 12 000 between the 10th and the 35th day. However, the long stated opinion that the number of neurons decreases after birth during the whole lifetime has recently been questioned.3 The neuron network gets its input from sensory receptors. These stimuli, in the form of electrical signals, can be either from external or internal receptors. The results of the brain information process are handled by effectors resulting in human responses. The human information processing system can thus be said to consist of three stages, or layers: the receptors (the sensory input system), the neural network (the brain), and the effectors (the muscular output system or the motor organs). In the brain the information is processed, evaluated, and compared, through internal feedback, with stored information (the memory). The output to the effectors is controlled through external feedback by monitoring the motor organs.

2.2 Biological model for artificial neuron networks The main element in the biological network is the neuron. The electrical impulses, or signals, between the neurons are carried through the axons. After the reception of incoming signals, the neuron can generate a pulse in response to them. The time between this firing and the reception of the signals is called the period of latent summation. In order for the neuron to fire, certain conditions have to be fulfilled. 2 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ). 3 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ).

6

The incoming signals can either be excitatory or inhibitory: excitatory signals are those signals that stimulate the neuron to fire; inhibitory signals hinder the neuron to fire. The excitation must exceed the inhibition by a certain amount, the threshold. In the model, the excitatory connections are assigned positive weight values and the inhibitory are assigned negative weight values. Hence we conclude that the neuron fires when the sum of the weights exceeds the threshold value during the period of latent summation. The neuron only receives signals from neighbouring neurons and the neuron itself. The probability for a neuron to fire is high when the signals arrive closely spaced in time and synchronously. It has been discovered that the signals are binary and that there is a time interval, the refractory period, between any two signals passing the axon. This makes it suitably to discretise the time. These time units are of the order of one millisecond. The refractory periods are not equal all over the brain, so we may say that the biological neuron network consists of a set of interconnected neurons communicating via discrete asynchronous signals.

7

3. Artificial neural networks

3.1 Definitions The first definition of an artificial neuron was made in 1943 by McCulloch and Pitts. In 1949, the first representation of an ensemble of simultaneously working neurons as a model for the brain was put together by Donald Hebb. In 1958, Frank Rosenblatt introduced the concept of perceptron. It was the first model in which a learning process could be applied.4 One of the many definitions of artificial neural systems, or neural networks, is that it is a “physical cellular systems which can acquire, store, and utilize experiential knowledge.”5. Another is that it is “an interconnection of neurons such that neuron outputs are connected, through weights, to all other neurons including themselves”6. The artificial neurons are the basic processing elements of the network and can be considered as the nodes in the network. These neurons are organised in layers and operate in parallel. Feedback connections both within the layer and to adjacent layers are allowed. The strength of each of these connections is expressed by weights. One of the fundamental differences between ordinary computers and neural networks are that neural networks have to be taught which is called training or learning. Instead of programming an algorithm to solve a problem, an architecture with initial weights are chosen. Then knowledge is acquired from various sets of input data. The learning can be either supervised or unsupervised (learning without supervision). Neural networks are used as classifiers and to perform function approximations. Detection of data clustering is an important usage of neural networks.

3.2 Perceptrons The perceptron is an artificial neuron model with learning invented by Frank Rosenblatt in 1958. It can be defined as a linear step-function which takes n integer values x1, x2, …, xn as input and calculates an output o according to

��

��� >

=�

otherwise

xwifo i

ii

,0

,1 θ (1)

4 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ). 5 Jacek M. Zurada, Introduction to Artificial Neural Systems, West Publishing Company, 1992, p. XV. 6 Jacek M. Zurada, Introduction to Artificial Neural Systems, West Publishing Company, 1992, p. 37.

8

where iw , i =1, 2, …, n, are synaptic coefficients (the weights) and θ the threshold (or the bias).7 This one neuron perceptron is not well adapted for many real problems, which often are not linear. By adding more neurons, the computational power increases. These so-called multi-layered perceptron (MLP) models have become the competitors of RBFNs. Multi-layered perceptron networks (MLPNs) is a network of hidden neurons and can be defined as an architecture fulfilling the following properties8:

• The neurons are distributed in several layers. • The first layer consists of the input layer corresponding to the input variables. • The input to one layer (apart from the first layer) consists of the output from

the neurons of the previous layer. One disadvantage of MLPNs are that they converge slowly and are often trapped in local minima in the parameter spaces.9

7 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ). 8 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ). 9 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 166.

9

4. Radial Basis Functions (RBF)

4.1 Introduction Radial Basis Functions (RBFs) were first introduced in 1985 by M.J.D. Powell10 for use in multivariable interpolations and function approximations. It was, however, Broomhead and Lowe who first used them in neural networks. An RBF cφ is a function with a symmetric output around a centre cµ , i.e.

||)(||)( cc xx µφφ −= (2) where || || denotes the vector norm. One symmetric function that is often used as an RBF is the Gaussian function

2

2

)( σφr

er−

= (3) with the Euclidean norm. As is well-known, Gaussian functions are characterised by a width or scale parameter σ . RBFs are also in many cases dependent on a scale parameter called the width of the function. A set of RBFs can be used to construct a basis of functions. Linear combinations of RBFs, as

�=

−=M

jjj xwxy

1

||)(||)( µφ (4)

where the jw are called weights, can be used for representing a wide class of

functions. An RBFN is a feedforward network with three layers: the inputs, a hidden layer (also called a kernel layer), and the output node(s). In the RBFN, each hidden node is represented by an RBF, where each RBF, jφ , has a centre position, jµ , and a width,

jσ , associated with it.

10 M.J.D. Powell, ”Radial basis functions for multivariable interpolation: A review”, in IMA Conf. on Algorithms for the approximation of functions and data, pp. 143-167, 1985.

10

4.2 Weight determination Suppose there is a one-dimensional continuous function

nn txh =)( , n = 1, 2, …, N (5) where nx are d-dimensional vectors, i.e.

����

����

=

=

=

)...,,,(

.

.

.

)...,,,(

)...,,,(

21

222

21

2

112

11

1

Nd

NNN

d

d

xxxx

xxxx

xxxx

(6)

then we can write

�=

=N

kikik tw

1

φ (7)

where

||)(|| kiik xx −= φφ , i, k = 1, 2, …, N (8)

or in matrix form

TW =Φ (9) Solving for w , we get

TW 1−Φ= (10) Generalisation to m-dimensional space gives

nn txh =)( , n = 1, 2, …, N (11) or in component form

ni

ni txh =)( , n = 1, 2, …, N; i = 1, 2, …, m (12)

We can then write

�=

−Φ=N

n

nkinik tw

1

1)( (13)

11

These weights then give the function h through exact interpolation.

4.3 Function generalisation and approximation The above procedure can be modified in order to give a radial function model for function generalisation and approximation. In practical applications, one is not generally interested in interpolating a function passing through every data point, but rather in obtaining a smooth fit. Also, this does not necessitate the same number of basis functions. We can write

�=

+=M

jkjkjk wxwxy

10)()( φ , with NM ≤ (14)

where 0kw are the biases and M the number of basis functions (usually M < N). The purpose of adding bias parameters is to compensate for the difference between the average value over the data set of the basis function activations and the corresponding values of the targets. By adding an extra constant basis function

10 =φ (15) we can write

�=

=M

jjkjk xwxy

0

)()( φ , with NM ≤ (16)

which in matrix form is written as

Φ= WY (17) The existence of linear superpositions of Gaussian basis functions for function approximations was proven in 1990.11 The existence of other radial basis functions for function approximations has also been proven.

11 J. Kowalski, E. Hartman, and J. Keeler, ”Layered neural networks with gaussian hidden units as universal approximators”, Neural Computation, 2:210-215, 1990.

12

5. Radial Basis Function Networks (RBFN)

5.1 Introduction Neural networks are non-parametric models in the sense that their weights have no special meaning in relation to the problems to which they are applied. When training the network, the estimations obtained for the weights are not the primary goal. Radial basis function networks (RBFNs) may to some extent be characterised as a linear network, although, for obtaining good generalisations, non-linear optimisation has to be used.12 The non-linear models used in statistics are possible to apply on RBFNs. RBFNs may be said to be artificial neural networks with an input layer, one hidden layer, one output layer, and RBFs at the nodes (the hidden units) as is shown in Fig. 1. They are artificial neural networks of multilayer feedforward type. RBFNs converge fast to the optimum points and have very high fitting capabilities. These facts make RBFNs very suitable for chaos control.13

5.2 Parameter determinations The designing an RBFN involves selecting a number of parameters of the RBFN, namely:

• The type of basis functions (φ ). • The number of basis functions (M). • The centres of the basis functions ( jµ ).

• The widths of the basis functions ( jσ ).

• The weights of the basis functions ( jw ).

The types of basis functions and the number of them is set from the beginning. All the others are determined by the training and the determination is performed in such a way as to minimise a suitable cost function. Usually Gaussian or, more generally, other types of bell-shaped functions with compact support are chosen. The number M of basis functions depends on the model selection. The widths of the basis functions can be determined so that each bases

12 Mark J.L. Orr, Introduction to Radial Basis Function Networks, Technical Report, Centre for Cognitive Science, Univ. of Edinburgh, 1996. 13 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 166.

13

function gets its own individual value jσ instead of global value σ common for all

basis functions. Figure 1: The general architecture of an RBFN.14

5.3 Training 5.3.1 Model complexity Artificial neuron networks, including RBFN, do not have as goal to find the best fit to existing data. Instead the purpose of training is to find a general approach, a model, of how to find the best fit for data outside the training set. This problem of model selection involves a trade-off between the two components of the generalisation error: bias and variance. Choosing a too simple model will result in a high bias whereas choosing a too complex model may result in a high variance and may thus be depend too highly on the specific choice of training set. A too complex model may, however, have a low bias. Finding the best mix of bias and variance is a problem of finding the right number of free parameters. In the case of RBFNs, this means, among others, the determination of the optimal number of hidden units. One way of doing this would be to train several networks each with different numbers of hidden units. 14 B. Walczak and D.L. Massart, ”Local modelling with radial basis function networks”, Chemometrics and Intelligent Laboratory Systems, 50, 2000, p. 181.

n

Output layer

Hidden layer with m nodes

y(x)

φ1 φj φm

x x

w1 wj

wm Weights

Input layer with n inputs 1

14

One advantage of RBFN is that this brute-force method does not have to be used, due to the localised nature of RBFs and two-stage training. There exists mainly two ways of doing this for RBFNs. The first is to choose a number of basis functions and then add a regularisation term to the cost function. This can be used in order to increase e.g. smoothing. An alternative is to interrupt prematurely the training process, in order not to adjust the model too closely to the training data. The second method consists of either adding or deleting hidden units during the training process. This will lead to so-called pruning or growing algorithms. 5.3.2 Supervised training The minimisation of the cost function is performed by iteratively updating the parameters once per training sample. Choosing the cost function as

�=n

nEE (18)

with

{ }� −=k

nk

nkn xytE

2)(

21

(19)

where n

kt is the target value of the output unit k resulting from the input vector nx . The update equations can readily be obtained15 as

{ })(1n

knkkj xytw −=∆ η (20)

{ } kjk

nnkk

j

jn

njj wxty

xx � −

−=∆ )(

||||)( 22 σ

µφηµ

(21)

{ } kjk

nnkk

j

jn

njj wxty

xx � −

−=∆ )(

||||)( 33 σ

µφησ (22)

where 1η , 2η , and 3η are the learning rates. 5.3.3 Two-stage training All these three sets of parameters can be updated simultaneous. For non-stationary environments and on-line settings, this procedure may be suitable. However, there are

15 S. Geman, E. Bienenstock, and R. Doursat, ”Neural networks and the bias/variance dilemma”, Neural Computation, 4 (1): 1 – 58, 1992.

15

situations when the parameter updating can be separated into a two-stage procedure. This occurs e.g. for static mapping. The two-stage procedure consists of:

• Determining the centres of the basis functions ( jµ ) and the widths of the

basis functions ( jσ ).

• Thereafter, with the use of the centres of the basis functions ( jµ ) and the

widths of the basis functions ( jσ ) thus obtained, the weights of the basis

functions ( jw ) are determined.

The most used learning or training algorithms for determining the centre positions is the K-means clustering method, for determining the widths the K-nearest neighbour method, and for the weight determination the gradient-descent method.16 It turns out that this two-stage procedure results in little loss in quality compared to simultaneously determining all three sets of parameters. On the contrary, it can even give better solutions, especially for finite training data and computational resources. The first stage of learning is unsupervised in that it only uses the input values { nx }. In the next stage supervised learning, i.e. training where target information is used, can be performed. With the basic parameters of the RBFs determined, we obtain a single-layer network (an RBFN). This network has linear output units. 5.3.4 Unsupervised training The basic idea of the RBFN is to locate the centres at regions of high data concentration and to have the widths related to the spreading of the data at the corresponding positions and to the distances to nearby data concentrations. This can be obtained by various methods:

• Selecting subsets randomly. • Using clustering algorithms. • Determining the widths.

Selecting subsets randomly From the training data set, choose randomly a subset of data points. Then located the RBF centres ( jµ ) there. This gives a starting point from where one can further tune in

the centre locations.

16 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 169.

16

Using clustering algorithms There are various clustering techniques that can be used for locating the RBF centres ( jµ ). The K-means clustering algorithm is described here. Partition N data points nx

into K clusters or subsets jS . Then minimise the sum of squares clustering function

��= ∈

−=K

j Snj

n

j

xJ1

2|||| µ (23)

where jµ given by

�∈

=jSn

n

jj x

N1µ (24)

is the geometrical mean of the data points in subset jS .

Determining the widths The widths of the RBFs can either be chosen equal for all of the units or different for each one. If a common, or global, width is chosen, it can be set as a multiple of the average distance between the basis centres. The choice determines the degree of smoothing, with a small width leading to lesser smoothing functions. Different, or local, widths can be obtained by determining, for each unit, the average distance to the L nearest neighbours and then setting the width equal to a multiple of this mean. Usually a value of 1.5 to 2 is chosen for this multiple. Matrix formulation Minimising the cost function (Eq. 18), we obtain the weights as (cf. Eq. 10)

TW T †Φ= (25) where n

knk tT =)( and )()( njnj xφ=Φ . Φ is the design matrix and

TT ΦΦΦ=Φ −1† )( (26)

denotes the pseudo-inverse of Φ , in which

ΦΦ= TA (27) is called the variance matrix.

17

5.3.5 Training methods comparisons In unsupervised training the output data is neglected. This leads to a sub-optimisation of the RBFN parameters, the centres of the basis functions ( jµ ) and the widths of the

basis functions ( jσ ). Supervised training, where the output data is not neglected, may

lead to more optimal values of the centres and the widths. However, supervised training has some disadvantages, as e.g.:

• The non-linear optimisation techniques used in supervised training is often computational expensive.

• Supervised training may result in non-localised RBFs.

• With supervised training, the advantage of RBFN’s two-stage learning possibility is lost.

5.4 Regularisation In order to favour certain solutions over other can be made by use of the regularisation technique. The technique can be said to be some sort of Occam’s razor in the sense that it penalises complex and non-smooth solutions. The regularisation is carried out by adding a penalty function to the cost function (Eq.s 18 and 19). The modified cost function often used in RBFNs is

{ }� �= =

+−=N

n

M

jj

nn wxytE1 1

22)( λ (28)

where λ is the regularisation parameter. From Eq. 28, we see that RBFNs with large weights are penalised. Increasing λ , the network function gets smoother. Introduction of the penalty function results in a change of the variance matrix. Instead of Eq. 27, we now have

MT IA λ+ΦΦ= (29)

and the weight vector (Eq. 25 with Eq. 26) becomes

TIW TM

T Φ+ΦΦ= −1T )( λ (30)

18

The regularisation parameter λ can most conveniently be obtained by using the generalised cross validation (GCV) method, which gives the following iterative solution for λ 17:

)()ˆ(ˆ

1

212

PtraceWWAAATtracePT

T

T

−− −= λλ (31)

where P is the projection matrix

TN AIP ΦΦ−= −1 (32)

with NI being the identity matrix of dimension N. N is the number of training samples for an RBFN with M hidden units and one output unit. We have sufficient training samples making N > M. The RBFN’s output is a linear combination of the hidden units outputs. The N-dimensional output vector t , where each component is the response of one training sample, lies in an N-dimensional space. Since N > M, the output vector, in an M-dimensional space spanned by the M N-dimensional hidden unit vectors closest to t , will be the projection of t onto this M-dimensional subspace.

5.5 Pruning and growing 5.5.1 Basis function selection: the forward-selection method One way of growing an RBFN is by use of forward selection, where you from a set of basis functions, select the hidden unit (i.e. basis function) that decreases the error most and add it to the network configuration. This adding of hidden units increases the model complexity. The process is interrupted when some criterion, as e.g. the GCV, stops decreasing. This forward selection technique of optimisation is non-linear, but it has some advantages. It does not, for instance, require a fix number of units to start with and it is, from a computational viewpoint, efficient. If a hidden unit (i.e. a basis function) is added, then the new projection matrix is

11

111

++

+++ −=

MMT

M

MT

MMMMM fPf

PffPPP (33)

where M is the number of basis functions and 1+Mf the column of the design matrix Φ corresponding to the last added hidden unit. 17 Mark J.L. Orr, Introduction to Radial Basis Function Networks, Technical Report, Centre for Cognitive Science, Univ. of Edinburgh, 1996.

19

The effect, of adding a hidden unit to the network, to the sum squared error is given by

11

21

1)(

++

++ =−

MMT

M

MMMM fPf

ftPEE (34)

The hidden unit to be chosen is the one that reduces the sum-squared error the most. Using orthogonal least squares, a variant of this forward selection method can be constructed.18 5.5.2 Basis function elimination: the backward-elimination method One way of pruning an RBFN is by use of backward elimination, where you use all basis functions initially and then successively remove the hidden unit (i.e. basis function) that decreases the error least. The process is interrupted when some model selection criterion stops decreasing.

5.6 Scaling and the width parameter The width parameter σ in an RBFN plays the roll of a scale factor. This is quite evident when thinking of clustering. If the width of a bell-shaped function is very small, each single point becomes a cluster, but if it is very large, the entire data set becomes a cluster. This is a situation one is not comfortable with. What one wants to find is some sort of true clusters, i.e. scale independent clusters. The obtained clusters should thus be non-varying over a large scale interval, in the sense that the true clusters should emerge independently from the use of a large range of width parameters. If we train the RBFN with a constant output value t and use constant values for the weights w and the widths σ , then the only variables to be estimated are the centre locations jµ . When using a Gaussian RBF

��

���

��

��� −

−= 22

||||exp

σµ

φ jnx

(35)

Eq. 21 becomes

��

���

��

��� −

−−

=∆ 224 2

||||exp

||||

σµ

σµ

ηµ jn

jn

j

xx (36)

where w24 ηη = .

18 Mark J.L. Orr, Introduction to Radial Basis Function Networks, Technical Report, Centre for Cognitive Science, Univ. of Edinburgh, 1996.

20

It can be shown that at equilibrium we have

[ ] 0)(*);( =∇ xpx σφ (37) where )(xp is the input probability density and * denotes the convolution operation. When the clustering has been obtain using a large range of width values, solving Eq. 37 then gives the cluster centres.

5.7 Normalisation In many cases using normalised RBFs are of great value. It can, for instance, be used as probability values, since the range of the basis functions are between 0 and 1, but it can also be used for noisy data interpretation and regression. A normalised basis function is obtain from Eq. 2 by the introduction of normalisation factor and hence we have

�=

−= M

kk

ii

x

xx

1

||)(||

||)(||)(

µφ

µφφ (38)

where M is the number of basis functions or kernels. RBFNs using normalised RBFs are called normalised RBFN.

21

6. Applications of RBFN

6.1 Introduction RBFNs are used for many various applications. Chaotic time series prediction, speech pattern classification, image processing, medical diagnosis, non-linear system identification, adaptive equalisation in communication systems, non-linear feature extraction, hand gesture recognition, face recognition, classification of ECGs, mapping hand gestures to speech, object recognition, minefield detection, and design of adaptive equalisers are some examples of RBFN applications. These applications use the function approximation capabilities of RBFNs, based on its local structure and efficient training algorithms. However, it is often possible to use an MLP (Multi-Layered Perceptron) instead. Depending on the problem, MLPs may sometimes perform better, but there are numerous examples of when RBFNs perform better or at least obtain similar performance levels. The advantage of an RBF may often be that it is quicker to train while its disadvantage is its higher memory demands. Below follows some examples and simulations of RBFN applications.

6.2 Radar target recognition 6.2.1 Automatic target recognition In automatic target recognition (ATR) systems targets are to be recognised quickly and correctly from the radar echo. RBFNs have been used as a tool for classification of radar echoes using range profiles.19 Target recognition is based on the concept of scattering centres. These centres are the main sources of the radar cross section (RCS), but they do not necessarily correspond to physical significant points of the target. The radar range profile, obtained from these scattering centres, is very sensitive to changes in the aspect angle.20 This sensitivity and the low SNR (Signal-to-Noise Ratio) make target recognition a difficult task. One way of reducing these difficulties to obtain stable and shift-

19 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, pp. 709-720, 1996. 20 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 709.

22

invariant patterns, is to perform a non-coherent amplitude averaging of a series of range profiles measured within a small angle of various aspect angels.21 6.2.2 Target classification Having performed the pre-processing consisting of the above mentioned averaging and a Fourier transformation, the classification by an RBFN can be carried out. The architecture of the RBFN is illustrated in Fig. 2. The output from the classifier is22

��� ∈

=otherwise

Xxife k

k��

���

,0

,1 (39)

where kX is the sample set of the k th class.

Figure 2: The architecture of the RBFN used in classifying radar targets23. There are 96 nodes in the hidden layer, with 32 nodes for each of the three classes24. The function to be minimised by the learning process is the error function25

��∈ =

−=Xx

i

M

ii xyxeE 2

1

))()(( (40)

where X is the set of all training samples. Models of three military aircraft on a rotating platform in an anechoic chamber were used for obtaining radar range profiles in the Ku-band. Since it has been proven that

21 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 710. 22 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 712. 23 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 712. 24 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 716. 25 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 712.

23

RBFN with equal smoothing factor is suitable for universal approximation26 all hidden units had the same width. The result suggests that RBFNs has better performances than conventional kernel classifiers and hence that RBFNs are candidates for radar target recognition.27

6.3 Interference cancellation 6.3.1 Introduction It has been demonstrated28, through simulations, that RBFNs can be used for interference cancellation where traditional linear cancellers often fail, since interference cancellation usually requires non-linear signal processing. In interference cancellation methods, interference in the observed signal is minimised or, preferably, completely cancelled. It is carried out by estimating the interference generated by a reference signal, the reference input {rn} (see Fig. 3).

Figure 3: An adaptive interference cancellation system operating on discrete-time signals.29 The observed signal, i.e. the corrupted signal, { nx } is given by

nnn isx += (41) 26 J. Park and J.W. Sandberg, ”Universal approximation using radial basis functions network”, Neural Computation, No. 1, pp. 246-257, 1991, referenced in Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 715. 27 Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, 1996, p. 719. 28 Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, pp. 247-268, 1995. 29 Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, 1995, p. 248.

24

where { ns } is the desired signal and { ni } the interference. The estimate of the desired signal, the canceller output ns , is given by

nnnn iiss ˆˆ −+= (42) where ni is an estimate of the interference ni . An interference canceller can be implemented using a class of neural networks consisting of normalised Gaussian basis functions.30 6.3.2 RBFNs used for simulations Simulations of adaptive interference cancellers using RBF-based neural networks has been published.31 Several networks were simulated:

• GRBFN (Gaussian RBFN, as in Eq. 4), • NGBFN (Normalised Gaussian Basis Function Network), • CNLS (Connectionist Normalised Local Spline) network (which is an

extension of the NGBFN), and • GMBFN (Gaussian Mixture Basis Function Network).

An NGBFN is described by32:

�=

=M

j

normjj rwrf

1

)()( φ (43)

where

�=

−−

−−

=M

k k

k

j

j

normj

cr

cr

r

12

2

2

2

)exp(

)exp(

)(

σ

σφ (44)

By adding a set of N-dimensional vector weights { jg }, j = 1, 2, … , M, we obtain the

CNLS network33:

30 Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, 1995, p. 254. 31 Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, pp. 247-268, 1995. 32 Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, 1995, p. 251. 33 Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, 1995, p. 252.

25

�=

−+= •M

j

normjjjj rcrgwrf

1

)())(()( φ (45)

The GMBFN is based on a Gaussian-mixture statistical data model and is described by34:

�=

− −ΣΣ+=M

j

GMjjjrrjirjGM rcrwrf

1

1,, )())(()( φ (46)

where

�=

−Σ−−

Σ

−Σ−−

Σ=

M

k

kkrrT

k

krr

k

jjrrT

j

jrr

j

GMj crcrp

crcrp

r

1

1,

5.0,

1,

5.0,

)2

)()(exp(

||

)2

)()(exp(

||)(φ (47)

The parameters M , { jp }, { jw }, { jc }, { jir ,Σ }, and { jrr ,Σ }, with j = 1, 2, …, M for

all, have statistical meanings based on the mixture model. 6.3.3 Simulations In the first group of simulations where 2000 sample were used for training, the 4-node GMBFN obtained the best approximation results. The GMBFN canceller was, however, computationally very demanding due to its algorithm. The GRBFN canceller obtained considerably poorer results than the three normalised networks. In another group of simulations, the cancellers based on RBFs performed much better than other types of cancellers. Here the GRBFN canceller had the best results, about 3 dB better than the GMBFN canceller. The networks sizes varied from 5 to 40 and even for RBF-based non-linear networks with quite small sizes its cancellers were effective in reducing interferences. Also when the noise power was significant, the non-linear cancellers showed much better results than linear cancellers. In the last group of simulations, the CNLS cancellers obtained the best performance. The explanation for this is that it is not only the normalisation of the basis functions that has a positive effect on the performance but also the extra linear weight terms of the CNLS. 6.3.4 Network evaluations The three different types simulations above have shown that even though the non-linear RBFN were of moderate sizes (the number of nodes not exceeding 40), they rather well estimated and suppressed the interferences.

34 Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, 1995, p. 256.

26

The GRBFN, with stochastic gradient learning algorithm, was robust against ill-initialised parameters, was relatively fast to train, and had moderate computational requirements. Another feature of the GRBFN is that it tends to compensate a small network size by changing the centres and the widths of the Gaussian basis functions. One disadvantage, however, was that the GRBFN did not seem to be good in approximating globally linear mappings. It turned out that the NGBFN was less useful than the GRBFN in many of the simulations. An alternative to NGBFN is to use the CNLS (since an NGBFN is essentially a CNLS network with zero gradient weights). Advantages of the CNLS network are its flexible function representation capability and its fast learning. Disadvantages are its difficulty in adjusting the values of centres and spread parameters. Both linear and non-linear mapping can be approximated using the GMBFN. The GMBFN can even exactly implement linear mappings while the GRBFN and the NGBFN only can approximate such mappings. MPLs, with sigmoidal functions, tested were nor as useful nor as reliable as the RBF-based networks. For instance, it took much longer for MPLs with two layers to converge than it did for the RBF-based networks. Complex extension of the real RBFN shows promising results in complex channel equalisation and is expected to be useful in cancellations of interferences in the areas of narrowband spatial array processing and beam-forming.

6.4 Chaotic processes 6.4.1 Introduction Chaos has random-like behaviour as statistical systems have, although it results from deterministic dynamics. These systems are complex, parametric, and non-linear. The future behaviour of such systems is usually not possible to predict, at least not to any higher degree of accuracy. One of the many methods and techniques used for controlling chaos is to use RBFNs.35 The irregular and chaotic behaviour of solutions to deterministic equations of motion is a consequence of the non-linearity of the equations. The dynamics of the process is highly sensitive to the initial conditions. In spite of its chaotic character, the dynamics of the process are predictable in short time intervals. This property of short-term predictability is partly due to the deterministic nature of the equations of motion of the system.

35 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 166.

27

Short-term predictability of chaotic time series using regularized RBFNs has been investigated.36 RBFNs were used to model the chaotic process. Chaotic invariants of the actual and the reconstructed time series were compared and found that the reconstructed signal approximated the actual signal. 6.4.2 Simulations That the regularized RBFNs efficiently approximated the dynamics of the chaotic process was demonstrated through the use of two case studies: one case study being one where the physical laws of the chaotic process are known; and the second case study being one of the real-life chaotic process, sea clutter, where the physical laws are unknown.37 In another simulation38, a set of 1000 points representing a chaotic time series was obtained using

)1(1 ttt xxkx −=+ (48) where k = 4 and 1.0)0( =x . The chaotic time series is shown in Fig. 4. For training, 200 points were used; for monitoring, 200 points; and for testing, 700 points. Different points were used for each task. The predicted points versus the observed ones are shown in Fig. 5.

Figure 4: The training set, consisting of 200 points, of the chaotic time series.39

Figure 5: Values of the predicted points versus the values of the observed plots. All the 1000 points are used.40

36 Simon Haykin, Sadasivan Puthusserypady, and Paul Yee, “Dynamic Reconstruction of a Chaotic Process using Regularized RBF Networks”, McMaster University Communications Research Laboratory Technical Report 353, September 1997. 37 Simon Haykin, Sadasivan Puthusserypady, and Paul Yee, “Dynamic Reconstruction of a Chaotic Process using Regularized RBF Networks”, McMaster University Communications Research Laboratory Technical Report 353, September 1997. 38 B. Walczak and D.L. Massart, ”Local modelling with radial basis function networks”, Chemometrics and Intelligent Laboratory Systems, 50, pp. 179-198, 2000. 39 B. Walczak and D.L. Massart, ”Local modelling with radial basis function networks”, Chemometrics and Intelligent Laboratory Systems, 50, 2000, p. 189.

28

There are chaotic processes that can be decomposed into a sum of a linear and a non-linear part. It turns out that RBFNs can be used to approximate the non-linear part.41 A chaotic dynamical system described by

)()()()( xfxAxfxfxfx NNL +=+==� (49) where xAxfL =)( is the linear part and )(xfN the non-linear part of the dynamical system. It is assumed that the linear part is known and the non-linear part is unknown.

If )(ˆ xfN is the approximation of the non-linear part and if

0)(ˆ)()(~

≈−= xfxfxf NNN (50) then the system is dominated by the linear part and a simple linear feedback controller for stabilisation and tracking control can be constructed.42 The RBFN used in this simulation is different from the usual ones in the respect that it has more than one output node (see Fig. 6). With Gaussian RBFs iφ as activation functions of the hidden nodes, the outputs are

�= �

��

��� −−=

L

i i

ikik

xwy

122

||||exp

σµ

, mk ,,2,1 �= (51)

where L is the number of hidden nodes, kiw the weight between the k th output and the i th input, x the input vector, iµ the i th centre position and iσ the i th width parameter. 6.4.3 Conclusions The control system is shown in Fig. 7. The simulations of two continuous–time chaotic systems, the Duffing and the Lorenz systems, show that this control scheme is very effective in stabilising the chaotic systems.43

40 B. Walczak and D.L. Massart, ”Local modelling with radial basis function networks”, Chemometrics and Intelligent Laboratory Systems, 50, 2000, p. 189. 41 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, pp. 165-183, 2000. 42 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 167. 43 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 180.

29

Figure 6: The RBFN with more than one output node.44

Figure 7: The control system with the RBFN approximator, where )(tv is an external input (reference signal), )(tu a scalar control input, and k a constant feedback gain vector.45

44 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 169. 45 Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, 2000, p. 170.

30

6.5 Classification 6.5.1 Theory In MLPNs different classes of data are separated by hyper-planes. In RBFNs, however, localised kernels are placed around each group of data belonging to a class. The classification rule is to choose the class with the highest posterior probability. The posterior probability )|( xCP k of pattern x belonging to class kC is

)()()|( xCPxCP kkk φ= (52) where

�=

iii

kk CPCxp

Cxpx

)()|()|(

)(φ (53)

Here )|( kCxp is the class conditional density (the conditional probability density function). Using a common set of M basis functions for all class conditional distributions, a normalised RBFN is obtained:

�=

=M

jjkjk xwxCP

1

)()|( φ (54)

where

)|()()|(

)()|()(

1

xjPnPnxp

jPjxpx M

n

j ==�

=

φ (55)

The weights are

)|()(

)()|(jCP

jPCPCjP

w kkk

kj == (56)

)|( xjP , and thus the basis function )(xjφ , is the posterior probability of feature j

being present in the input space with the unit centres representing feature vectors. Given these features, the RBFN output gives the posterior probability of class membership. The prior probability of each feature is given by the weights.

31

6.5.2 Classification of images Image classification can be obtained by using the concentration histogram image (CHI) of two images as a density function.46 The CHIs are approximated with RBFs. In this approach, the density distribution of the CHIs are interpreted as three-dimensional density functions. These are approximated with Gaussian-shaped RBF. The RBFNs approximate a three-dimensional response function for the CHI density distribution. This is done by using a certain number of Gaussian RBFs at given locations with adequate weights. Performing adaptive learning from CHI data samples, a simplified representation of the CHI density distribution is obtained through weight values characterising fixed RBFs. The number of these RBFs is the nothing else than the number of clusters for the classification. On the one hand, since the number of initial basis functions is limited and since they cannot move, too few RBFs will not result in a high enough resolution of the cluster centre. On the other hand, since computation time rises with the square of initial basis functions, too many RBFs will make the computations too expensive. Thus, as a compromise, in the experiment of Stubbings and Hutter47, a 10-pixel-size grid is chosen. The z-values at the grid locations are taken as training values. In a 256 x 256 CHI, this gives 25 x 25 = 256 training points which will also be the number of initial basis functions. The radius of the basis functions has to be greater than or equal to 10 and is this example chosen to be 12 which is known to a good value for most classification tasks. When the cluster centre positions have been determined with the RBFN, each pixel in the CHI is assigned to the closest cluster. Secondary ion mass spectrometry (SIMS) is a method for surface and interface analysis. The SIMS images used in this experiment are shown in Fig. 8 and Fig. 9 before respectively after Anscombe transformation and Wiener filtering was performed. The Anscombe variance-stabilising transformation

83

22121 ,, += iiii Py (57)

where

21 ,iiP is the intensity of the points ,1i 2i of the original images, is used in order

to transform the original peaks in the CHI to Gaussian-shaped peaks so that they can be approximated with Gaussian basis functions. The images were then de-noised by using the Wiener filter. This results in more compact and distinct pixel clusters. MATLAB was used for the implementation of the algorithm. Since the images produced a CHI with many cluster centres over the whole area, some with large 46 T. Stubbings and H. Hutter, “Classification of analytical images with radial basis function networks and forward selection”, Chemometrics and Intelligent Systems, No. 49, pp. 163-172, 1999. 47 T. Stubbings and H. Hutter, “Classification of analytical images with radial basis function networks and forward selection”, Chemometrics and Intelligent Systems, No. 49, 1999, p. 167.

32

distances and some with very short distances between them, classification was very computer demanding.

Figure 8: Scatter diagram of the original images 1 and 2.48

Figure 9: Scatter diagram of images 1 and 2 after Anscombe-transformation and Wiener filtering have been performed.49

A three-dimensional sample grid of the 25 x 25 points was produced (see Fig. 10). On each of these peaks an RBF was placed. Forward-selection techniques reduced this number to eight (see Fig. 11) by using a stopping criterion of eight basis functions. The choice of eight comes from not only additional analysis and chemical considerations but also experimental experience in interpreting CHI’s. The information in these eight most important basis functions contains the centre location and the height of the peak. This peak height is a measure of the number of pixels assigned to the cluster. The co-ordinate centres of the RBFs are taken as the cluster centres. The last step is to assign every image pixel to one of the clusters. This is done by using a minimum-distance measure weighted by the height of the RBFs. A comparison between Fig. 11 and Fig. 9 shows that the cluster centres determined by forward selection (Fig. 11) correspond well with the cluster centres identifiable in the CHI (Fig. 9). This novel RBF classification scheme presented in Stubbings and Hutter was also compared with two other classification methods, namely the minimum-distance classification method and the fuzzy c-means clustering method. The minimum-distance algorithm produces some false placements of cluster centres that are not representative of the phases and consequently these phases are not correctly classified. The fuzzy c-means clustering algorithm results in bad positioning of the centres even though the membership assignment is fairly good.

48 T. Stubbings and H. Hutter, “Classification of analytical images with radial basis function networks and forward selection”, Chemometrics and Intelligent Systems, No. 49, 1999, p. 168. 49 T. Stubbings and H. Hutter, “Classification of analytical images with radial basis function networks and forward selection”, Chemometrics and Intelligent Systems, No. 49, 1999, p. 168.

33

Figure 10: Three-dimensional plot of CHIs of SIMS images 1 and 2 showing the original CHI as a three-dimensional density distribution of the intensity values.50

Figure 11: RBF approximation of Fig. 9 showing the centres of the eight RBFs at the locations of the cluster centres in the CHI.51

50 T. Stubbings and H. Hutter, “Classification of analytical images with radial basis function networks and forward selection”, Chemometrics and Intelligent Systems, No. 49, 1999, p. 169.

34

6.5.3 Conclusions The classification method based on RBFs and forward selection presented in Stubbings and Hutter works well for images of density distributions of CHIs. In the example chosen, the most important peaks, corresponding to cluster centres, were assigned correctly. Good classified images were obtained by assigning the surrounding pixels in the CHI, weighted by the height of the corresponding RBF (here a Gaussian function), to these peaks. A major advantage of the RBF classification method is that only two parameters are needed, namely the number of classes expected and the radius of the RBFs. Usually this radius can be assumed to be a constant.

6.6 English pronunciation learning 6.6.1 NETtalk RBFs and generalised RBFs have been tested52 for there capabilities of learning to pronounce English text, the so-called NETtalk task53, the goal of which is to learn to pronounce English words by studying a dictionary of correct pronunciation. In the test, the task was to map each individual letter in a word to a phoneme and a stress. NetTalk is an artificial neuron network that has been thought to read aloud an English text. The input comes from a scanner with an OCR and the output is to a word synthesiser. The architecture of NetTalk is as follows54 (cf. Fig. 12):

• The input layer consists of 7 groups of 29 neurons. Each group corresponds to a character.

• The hidden layer consists of 80 neurons. • The output layer consists of 26 neurons and gives the characteristics of the

phonemes. The network consists thus of 309 neurons and up to 18 320 connexions.55

51 T. Stubbings and H. Hutter, “Classification of analytical images with radial basis function networks and forward selection”, Chemometrics and Intelligent Systems, No. 49, 1999, p. 169. 52 Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992. 53 See T.J. Sejnowski and C.R. Rosenberg, ”Parallel networks that learn to pronounce English text”, Complex Systems, 1, pp. 145-168, 1987. 54 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ). 55 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ).

35

6.6.2 Simulations In the simulation, one set of 1000 words for training and one set of 1000 words for testing were chosen randomly from the 20 002 words in the NETtalk dictionary. From the 1000 training words, 200 were chosen for cross validation. The features describing the letter to be pronounced are centred at a 7-letter window. The surrounding features are present to give the context, which is needed since a letter is pronounced differently depending on the context (cf. Fig. 12). The result (part of the result) is shown in Table 1, where the figure in the stress column indicates the percentage of stress assignments correctly classified; the phoneme column indicates the percentage of phonemes correctly classified; the letter column indicates the percentage of phonemes and stress correctly classified; and the word column indicates the percentage of words correctly classified, i.e. when all the letters in the word are correctly classified. The back propagation algorithm is a sigmoidal network trained via back propagation.

Figure 12: Schematic structure of NetTalk56 with the input “a cat”. It can clearly be seen in Table 1 that RBF is performing better than nearest neighbour but worse than the back propagation algorithm. In Table 2 we see the improvement in using a generalised RBF method, which consists of using gradient descent methods in order to have supervised learning. Table 2 shows that the generalised RBF method result in a much better performance that the RBF method.

56 Apprentissage automatique: les réseaux de neurons, Internet (www.grappa.univ-lille3.fr/polys/apprentissage ).

36

Table 1: Percentage correctly classified stress, phoneme, letter, and word.57

Algorithm Stress Phoneme Letter Word Nearest neighbour 74.0 61.1 53.1 3.3 RBF 80.3 65.6 57.0 3.7 Back propagation 81.3 80.8 70.6 13.6

Table 2: Percentage correctly classified stress, phoneme, letter, and word.58

Algorithm Stress Phoneme Letter Word RBF 80.3 65.6 57.0 3.7 Generalised RBF 82.4 84.1 73.8 19.8

6.6.3 Conclusions The tests of RBF learning consisted of unsupervised learning of centre locations and supervised learning of output-layer weights. For NETtalk it was found that the RBFNs did not perform as well as sigmoidal networks.59 If, however, the learning of centre locations was also supervised, as in the generalised RBF method, then this method exceeded the performance of sigmoidal networks. By using supervised learning of feature weights the performance of RBFNs could as be improved, although to a lesser extent than the learning of centre locations.60

57 Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992. 58 Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992. 59 Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992. 60 Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992.

37

7. Conclusion We have here given a brief introduction to the subject of RBFN. The overview started with a background on the biological systems that lie behind the concept of artificial neural networks and the models that this in turn has given us of how the brain works. This was then followed by an attempt to define artificial neural networks and a few definitions were given. One basic element of RBFNs is radial basis functions, which we described before turning to the theory of RBFNs. The last part described some applications and simulations of RBFNs. RBFNs are built up of many attractive features as the centred RBF with width parameter. Apart from training techniques common with MPLNs, they also permit the using of two-stage training. The simulations of RBFNs found in the literature are generally very favourable to RBFNs. Many cases of comparisons between RBFNs and other artificial neural networks (MLPNs) have shown various advantages for RBFNs. This does not, however, necessarily mean that the cases where MLPNs are to be preferred are in a minority. In the literature61, there can be found cases that indicate that the RBFNs performances can be inferior to that of other types of artificial neural networks. The examples of simulations given here (radar target recognition, interference cancellation, chaotic processes, classification, and English pronunciation learning) are indicative of the wide spread of areas in which RBFNs can be used. Even these few examples have shown that RBFNs should not be seen as a competitor of other artificial networks but rather as a complement, since one and the same method is not suitable for solving all problems. The following attractive features62 may summarise the usefulness of RBFNs:

• RBFNs provide excellent approximations to smooth functions. • RBFN “centres” are interpretable as “prototypes”. • RBFNs can be learned very quickly.

61 See e.g. J. Moody and C.J. Darken, ”Fast learning in networks of locally-tuned processing units”, Neural Computation, 1 (2), pp. 281-294, 1989, referenced in Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992. 62 Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992.

38

It has been said that “RBFNs are expected to continue to enjoy popularity as a versatile and practical non-linear function approximator for a variety of engineering applications.”63

63 Joydeep Ghosh and Arindam Nag, An Overview of Radial Basis Function Networks, Dept. of Electrical and Computer Engineering, Univ. of Texas, (Internet).

39

8. Acronyms ATR Automatic Target Recognition CHI Concentration Histogram Image CNLS Connectionist Normalised Local Spline GCV Generalised Cross Validation GMBFN Gaussian Mixture Basis Function Network GRBFN Gaussian Radial Basis Function Network MLP Multi-Layered Perceptron MLPN Multi-Layered Perceptron Network NGBFN Normalised Gaussian Basis Function Network RBF Radial Basis Function RBFN Radial Basis Function Network RCS Radar Cross Section SIMS Secondary Ion Mass Spectrometry SNR Signal-to-Noise Ratio

40

9. Bibliography ”Apprentissage automatique: les réseaux de neurons”, Internet (www.grappa.univ-lille3.fr/polys/apprentissage). G. Binning, M. Baatz, J. Klenk, and G. Schmidt, ”Will machines start to think like humans? Artificial versus natural Intelligence.”, Europhysics News, Vol. 33, No. 2, March/April 2002, pp. 44-47. Inhyok Cha and Saleem Kassam, ”Interference cancellation using radial basis function networks”, Signal Processing, No. 47, pp. 247-268, 1995. S. Geman, E. Bienenstock, and R. Doursat, ”Neural networks and the bias/variance dilemma”, Neural Computation, 4 (1): 1 – 58, 1992. Joydeep Ghosh and Arindam Nag, An Overview of Radial Basis Function Networks, Dept. of Electrical and Computer Engineering, Univ. of Texas, (Internet). (This is the main source for the theoretical parts of this essay.) Simon Haykin, Sadasivan Puthusserypady, and Paul Yee, “Dynamic Reconstruction of a Chaotic Process using Regularized RBF Networks”, McMaster University Communications Research Laboratory Technical Report 353, September 1997. Keun Bum Kim, Jin Bae Park, Yoon Ho Choi, and Guanrong Chen, ”Control of chaotic dynamical systems using radial basis function network approximators”, Information Sciences, 130, pp. 165-183, 2000. J. Kowalski, E. Hartman, and J. Keeler, ”Layered neural networks with gaussian hidden units as universal approximators”, Neural Computation, 2:210-215, 1990. Mark J.L. Orr, Introduction to Radial Basis Function Networks, Technical Report, Centre for Cognitive Science, Univ. of Edinburgh, 1996. J. Moody and C.J. Darken, ”Fast learning in networks of locally-tuned processing units”, Neural Computation, 1 (2), pp. 281-294, 1989. J. Park and J.W. Sandberg, ”Universal approximation using radial basis functions network”, Neural Computation, No. 1, pp. 246-257, 1991. Qun Zhao and Zheng Bao, “Radar Target Recognition Using a Radial Basis Function Neural Network”, Neural Networks, Vol. 9, No. 4, pp. 709-720, 1996. T.J. Sejnowski and C.R. Rosenberg, ”Parallel networks that learn to pronounce English text”, Complex Systems, 1, pp. 145-168, 1987.

41

T. Stubbings and H. Hutter, “Classification of analytical images with radial basis function networks and forward selection”, Chemometrics and Intelligent Systems, No. 49, pp. 163-172, 1999. B. Walczak and D.L. Massart, ”Local modelling with radial basis function networks”, Chemometrics and Intelligent Laboratory Systems, 50, pp. 179-198, 2000. Dietrich Wettschereck and Thomas Dietterich, “Improving the Performance of Radial Basis Function Networks by Learning Center Locations”, Advances in Neural Information Processing Systems 4, Ed. by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992. Jacek M. Zurada, Introduction to Artificial Neural Systems, West Publishing Company, 1992.


Recommended