+ All documents
Home > Documents > Parametric signal restoration using artificial neural networks

Parametric signal restoration using artificial neural networks

Date post: 14-Nov-2023
Category:
Upload: ldz
View: 1 times
Download: 0 times
Share this document with a friend
16
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996 351 Parametric Signal Restoration Using Artificial Neural Networks Andrzej Materka,” Senior Member, IEEE, and Shizuo Mizushina Abstract-The problem of parametric signal restoration given its blurredhonlinearly distorted version contaminated by addi- tive noise is discussed. It is postulated that feedforward artificial neural networks can be used to find a solution to this problem. The proposed estimator does not require iterative calculations that are normally performed using numerical methods for signal parameter estimation. Thus high speed is the main advantage of this approach. A two-stage neural network-based estimator architecture is considered in which the vector of measurements is projected on the signal subspace and the resulting features form the input to a feedforward neural network. The effect of noise on the estimator performance is analyzed and compared to the least-squares technique. It is shown, for low and moderate noise levels, that the two estimators are similar to each other in terms of their noise performance, provided the neural network approximates the inverse mapping from the measurement space to the parameter space with a negligible error. However, if the neural network is trained on noisy signal observations, the proposed technique is superior to the least-squaresestimate (LSE) model fitting. Numerical examples are presented to support the analytical results. Problems for future research are addressed. I. INTRODUCTION ERY often, when medical diagnosis or biological study V require estimation of the value of a physical quantity, there is no direct access to the variable which carries the information of interest. The signal that can be measured is dependent on this variable and depends also on other unknown variables. This applies to most of the noninvasive diagnostic techniques and is a characteristic of any measurement method which features a significant signal blurring andor contamina- tion by random noise. In some cases, the signal becomes also distorted due to the nonlinearity of biological/physical media. Examples include on-line identification of arterial circulation parameters [ 11, [2], evoked potential parameter estimation from noisy signals [3], signal and image analysis in laboratory instrumentation, e.g., related to spectrometry [4], [5] or elec- trophoresis [6], parameter estimation of multicompartmental responses of biological systems [7], [8], and many others. In these examples, a model of the signal of interest is known. The model often incorporates electrical signals and systems [l], [2]. In [l], a step change in the aorta peripheral Manuscript recieved April 13, 1994; revised September 28, 1995. This work was supported by the Polish Scientific Research Committee under Grant 8T1 lFO1010. Asterisk indicates corresponding author. *A. Materka was with the Department of Electrical and Computer Systems Engineering, Monash University, Melbourne (Caulfield East), Australia. He is now with the Institute of Electronics, Technical University of Loda, Stefanowskiego 18, Lodz, 90-924 Poland (e-mail [email protected]). S. Mizushina is with the Research Institute of Electronics, Shizuoka University, Hamamatsu 432, Japan. Publisher Item Identifier S 0018-9294(96)02430-5. 0018-9294/96$05 resistance causes a measurable change in the aortic flow. The shape of the signal waveform is thus known (a step in the resistance value), along with a differential equation that describes the relationship between the signal and the aortic flow. However, the actual value of the step is unknown and needs to be estimated given the measured flow waveform. On-line estimation of the signal parameter is essential to clinical applications of the methodology that require tracking of time-varying parameters [ 11. Least-squares curve fitting is normally used for parameter estimation [l], [2]. However, due to its iterative nature, this technique is not suitable for parameter real-time tracking at reasonable costs of the computing equipment. Even for a simple, three-parameter signal, a typical PC386/387 machine does not make on-line estimation possible [2]. Delay time of an evoked potential is another example of a signal parameter of a diagnostic value [3]. This parameter estimation by the method of least squares is in general even more time consuming when compared to other parameters [9]-[ 111. This is caused by the fact that iterative minimization of the sum-of-error-squares function is often terminated at a local minimum of the function when the time delay is one of the optimized parameters. For a global minimum to be achieved a good initiation is required. To find a good starting point, least-squares fitting is performed for a number of fixed values of time delay and the value giving the lowest error is accepted as a final solution [9], [ll]. Such a procedure significantly increases the time of calculations. Thus there is a need for a fast time-delay estimation technique that would not be iterative. The radiant flux spectrum, as recorded by a spectrometer, is given by the convolution of the “true” radiant flux spectrum (as it would be recorded by a perfect instrument) with the spectrometer response function [4]. The observed spectrum is a blurred version of the radiant flux spectrum. This blurring phenomenon takes place in any measuring instrument, not only a spectrometer. Let t be the independent variable, which may represent time, wavelength, wave number, or other physical quantity, depending on the nature of the measurement. The signal y(t) observed at the output of a linear measuring instrument is described by the convolution integral 00 y(t) = / h(t - 7)2(7)d7 -k E(t) (1) --oo where z(t) is the undistorted signal of interest, h(t) is the instrument impulse response function, T is the variable of integration, and E( t) denotes the observation noise. Depending 1.00 0 1996 IEEE
Transcript

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996 351

Parametric Signal Restoration Using Artificial Neural Networks

Andrzej Materka,” Senior Member, IEEE, and Shizuo Mizushina

Abstract-The problem of parametric signal restoration given its blurredhonlinearly distorted version contaminated by addi- tive noise is discussed. It is postulated that feedforward artificial neural networks can be used to find a solution to this problem. The proposed estimator does not require iterative calculations that are normally performed using numerical methods for signal parameter estimation. Thus high speed is the main advantage of this approach. A two-stage neural network-based estimator architecture is considered in which the vector of measurements is projected on the signal subspace and the resulting features form the input to a feedforward neural network. The effect of noise on the estimator performance is analyzed and compared to the least-squares technique. It is shown, for low and moderate noise levels, that the two estimators are similar to each other in terms of their noise performance, provided the neural network approximates the inverse mapping from the measurement space to the parameter space with a negligible error. However, if the neural network is trained on noisy signal observations, the proposed technique is superior to the least-squares estimate (LSE) model fitting. Numerical examples are presented to support the analytical results. Problems for future research are addressed.

I. INTRODUCTION ERY often, when medical diagnosis or biological study V require estimation of the value of a physical quantity,

there is no direct access to the variable which carries the information of interest. The signal that can be measured is dependent on this variable and depends also on other unknown variables. This applies to most of the noninvasive diagnostic techniques and is a characteristic of any measurement method which features a significant signal blurring andor contamina- tion by random noise. In some cases, the signal becomes also distorted due to the nonlinearity of biological/physical media. Examples include on-line identification of arterial circulation parameters [ 11, [2], evoked potential parameter estimation from noisy signals [3], signal and image analysis in laboratory instrumentation, e.g., related to spectrometry [4], [5] or elec- trophoresis [6] , parameter estimation of multicompartmental responses of biological systems [7], [8], and many others.

In these examples, a model of the signal of interest is known. The model often incorporates electrical signals and systems [l], [2]. In [l], a step change in the aorta peripheral

Manuscript recieved April 13, 1994; revised September 28, 1995. This work was supported by the Polish Scientific Research Committee under Grant 8T1 lFO1010. Asterisk indicates corresponding author.

*A. Materka was with the Department of Electrical and Computer Systems Engineering, Monash University, Melbourne (Caulfield East), Australia. He is now with the Institute of Electronics, Technical University of Loda, Stefanowskiego 18, Lodz, 90-924 Poland (e-mail [email protected]).

S . Mizushina is with the Research Institute of Electronics, Shizuoka University, Hamamatsu 432, Japan.

Publisher Item Identifier S 0018-9294(96)02430-5.

0018-9294/96$05

resistance causes a measurable change in the aortic flow. The shape of the signal waveform is thus known (a step in the resistance value), along with a differential equation that describes the relationship between the signal and the aortic flow. However, the actual value of the step is unknown and needs to be estimated given the measured flow waveform. On-line estimation of the signal parameter is essential to clinical applications of the methodology that require tracking of time-varying parameters [ 11. Least-squares curve fitting is normally used for parameter estimation [l], [2]. However, due to its iterative nature, this technique is not suitable for parameter real-time tracking at reasonable costs of the computing equipment. Even for a simple, three-parameter signal, a typical PC386/387 machine does not make on-line estimation possible [ 2 ] .

Delay time of an evoked potential is another example of a signal parameter of a diagnostic value [3]. This parameter estimation by the method of least squares is in general even more time consuming when compared to other parameters [9]-[ 111. This is caused by the fact that iterative minimization of the sum-of-error-squares function is often terminated at a local minimum of the function when the time delay is one of the optimized parameters. For a global minimum to be achieved a good initiation is required. To find a good starting point, least-squares fitting is performed for a number of fixed values of time delay and the value giving the lowest error is accepted as a final solution [9], [ll]. Such a procedure significantly increases the time of calculations. Thus there is a need for a fast time-delay estimation technique that would not be iterative.

The radiant flux spectrum, as recorded by a spectrometer, is given by the convolution of the “true” radiant flux spectrum (as it would be recorded by a perfect instrument) with the spectrometer response function [4]. The observed spectrum is a blurred version of the radiant flux spectrum. This blurring phenomenon takes place in any measuring instrument, not only a spectrometer. Let t be the independent variable, which may represent time, wavelength, wave number, or other physical quantity, depending on the nature of the measurement. The signal y ( t ) observed at the output of a linear measuring instrument is described by the convolution integral

00

y ( t ) = / h(t - 7 ) 2 ( 7 ) d 7 -k E ( t ) (1) --oo

where z( t ) is the undistorted signal of interest, h(t) is the instrument impulse response function, T is the variable of integration, and E ( t ) denotes the observation noise. Depending

1.00 0 1996 IEEE

358 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996

on the function h(t) , two narrow pulse components of the signal x ( t ) that are sufficiently close to each other in the t domain cannot be distinguished at the output of the instrument since each pulse is spread over a range of the independent variable. Thus convolution involves a loss of resolution. There have been many methods proposed to process the instrument response y(t) in order to reconstruct the original waveform x ( t ) , e.g., [4], [12], and [13]. Most of them involve iterative calculations which offer certain advantages [13] at the expense of a relatively long computational time.

It has been demonstrated [ 141 that feedforward artificial neural networks (ANN'S) can be used for model parameter es- timation. The advantage of this approach is that the estimation process becomes relatively fast. This is because no iterative calculations are involved in generating the estimated parameter values. They are obtained at the output of a feedfonvard neural network, e.g., electrical or optical. The response time is limited only by signal propagation through such a network. Thus the proposed estimation technique is highly suitable for real-time applications. A neural network has to be trained before being applied to the estimation task. The training process involves adjustment of the network internal connections (weights) in order to minimize the estimation error over a set of examples. This could be a computationally intensive process. However, it has to be performed once only for a given class of signals whose parameters are to be estimated. The weights become fixed once the training has been completed. Training examples are calculated using mathematical equations which model the signal and the underlying mechanism of its generation. Thus an a priori knowledge about the problem at hand is incorporated with the proposed technique.

The method is discussed in more theoretical details in this paper. Its properties are compared to the well-known technique of least-squares model fitting. Three numerical examples are presented to better illustrate the discussion. Problems for further research are addressed.

11. MATERIALS AND METHODS

All the signal examples mentioned in Section I can be represented by a response of a general continuous dynamical system, either linear or nonlinear, to the signal of interest ~ ( t ) . The dynamical system either represents the biological organ that generates the signal y ( t ) , e.g., vascular system [2] or corresponds to a laboratory instrument, e.g., spectrometer [4]. This system distorts the shape of the signal ~ ( t ) and contaminates it by noise. From this point of view this is a degrading system. A parameterized model of the signal z( t ) is known and the actual values of the parameters, that relate to a particular system response y ( t ) , are to be estimated given this response. The response is measured at some n values of the independent variable t to form the measurement (observation) vector

Y = [Yl Yz ' . Y n l T (2)

where y2 = f( t , ,Q) + c,, i = 1 , 2 , . . . , n and E, de- notes the random noise term. The vector quantity B = [B, 82 . . B P I T , B E 0, represents the signal parameters

which remain constant within the observation interval [tl, tn]. The parameter space 0 is a subset of P. The continuous function f ( . ) is the response of the degrading system to the signal of interest. This function can either be expressed analytically or evaluated numerically using the model equations. The vector

f = [f(tl,Q) f ( t 2 , Q ) ' . f(tn,Q)l' (3)

can be used for the neural network training if the system equations are known. If they are unknown, the neural network can be trained using a set of vectors (2) measured for known parameter values. In the latter case, the training examples are noisy. One can show that this leads to biased parameter esti- mates which, however, have lower variance when compared to those obtained by noiseless training [15]. There is a possibility of compromising the bias for the variance by introducing certain smoothness constraints into the training equations. This interesting and useful property is proven in the Appendix and illustrated by numerical examples in the following section.

In the proposed approach, the n-element measurement vec- tor y is applied to the input of a trained neural network of p outputs. The output variables form the estimated parameter vector S. There are n measurements available and p parameters which are unknown. It can be shown that, p observations are sufficient to uniquely define the mapping Q(y) in the noiseless case, provided the related Jacobi' s determinant is nonzero. On the other hand, an increased number of measurements, n > p , can lead to a smaller parameter variance [16]. This, however, means a larger number of the input variables to the neural network. Thus the neural network would become more complex for larger n. The increased !ANN complexity would make the training process duration prohibitively long for some applications [17]. One faces then the well-known problem of dimensionality reduction [ 181. In the classical approach to finding solution to this problem, the measurement vector is transformed into a lower-dimensional space while optimizing certain performance index. In statistics, principal component model and factorial model are used for the purpose [19]. These two models are closely related to each other. They can be defined using eigenvectors of the measurements covari- ance matrix. Dimensionality reduction by principal component analysis (PCA) has been already found useful in biomedical engineering, e.g., [ 181, [20], among many other applications, and will be adopted for the purpose of this study as well.

The PCA analysis can be interpreted in terms of a lin- ear transformation that can be performed by the so-called linear combiner neural network [21]. Following this, a two- stage neural network architecture is proposed to estimate signal parameters. It is shown schematically in Fig. 1. The signal of interest excites the input of a degrading system which either represents a biological system under investigation or an imperfect measuring instrument. The measurements y1, yz, . . . , and yn form the input to the PCA network. This network transforms the measurements, as follows:

(4)

where z = [zl z~ . . zP]' is the feature vector, and E is an n x p matrix whose columns are those eigenvectors of the

z = (y - f)TE

MATERKA AND MIZUSHINA: PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS

data covariance matrix that correspond to p largest eigenvalues of this matrix [19]. The vector f in (4) is a mean value of the vector f over the parameter space 0. In the case of the noiseless training, i.e., when the model equations are available, the data covariance matrix is described by

(5)

where E[.] denotes the expectation operator. If the model equations are not available, the sample data covariance matrix

R = E[(f - f)(f -

can be used to construct the matrix E in (4) with f replaced by the mean value y . In either case, if A1 > A2 > . . . > A, are the largest eigenvalues of the dispersion matrix and e l , e2, ... , and ep are its respective eigenvectors, e, = [e,l e22 . . . e;,IT,i = 1 , 2 , . . . , p , then the projection matrix is defined as follows:

Eigenvalue analysis can be used to find the elements of the ma- trix E. In an alternative approach, an adaptive neural network can perform the PCA [22], [23], robust PCA [24], or nonlinear PCA [25], whichever gives better feature representation to the particular signal. The transformation (4) based on (7) is assumed in this paper to define the signal features.

Fig. 1 shows that the features are applied to the input of an artificial neural network (ANN). The neural network con- sidered here is a static system, e.g., electronic analogldigital circuit, comprising a number of nonlinear processing units arranged in layers [26], [27]. The lowest layer is formed by the neural network input nodes whereas the uppermost layer is formed by its output nodes. Each processing unit in the internal (hidden) layers realizes a simple nonlinear function, most often sigmoidal (squashing), e.g., tanh(a). The input Q

to any processing unit is a weighted sum of outputs from the layer below it. Referring back to Fig. 1, it is assumed that the output of the neural network is a continuous multivariable function of the signal features

where w = [ w ~ w2 . . w,]’ is the vector of weights which represent internal connections between the layers. The network is thus a feedforward nonlinear system which approx- imates the mapping from the feature space to the parameter space. This inverse mapping is normally unknown, however, its input-output examples can be calculated using models of the signal and of the degrading system. In other words, for any particular parameter vector 0 E 0 one can calculate the vector f = f(0) and then, by using (4) with y = f , one can obtain z = z(Q) which is the neural network input. The network response is then obtained from (8). The desired network response is equal to 0, and the estimated parameter

359

ESTIMATED PARAMETERS

SIGNAL FEATURES

MEASUREMENTS

PARAMETERISED SIGNAL

Fig. 1. Neural network-based estimation of signal parameters.

vector 8 should be as close to this target value 0 as possible. Therefore the aim of the training process is to minimize a selected norm of the differences 118 - 011 over a set of examples {Oh E 0, h = 1 , 2 , . . . , N } . The minimization can be accomplished by the adjustment of the neural network weights. Finding an efficient training algorithm is one of the hottest research topics in the area of artificial neural network theory and application, see e.g., [28]-[30]. In the numerical experiment described in the next section, the well-known backpropagation algorithm [26]-[30] was used to find an initial weight estimates, followed by a faster and more accurate training using nonlinear programming [ 141.

It has been proven that three-layer artificial neural networks are capable of approximating any multivariable function to any desired degree of accuracy, provided certain mild assumptions are satisfied regarding the transfer function of their processing units [26], [28], [31]. However, except for some simple cases, the neural network of a given size is unable to provide a zero error of parameter estimation for every vector 0 belonging to the parameter space, even if there is no measurement noise. This is caused by the dissimilarity between the approximated and approximating functions. There is therefore a bias involved with the proposed method of parameter estimation, caused by the finite-size network inability to exactly reconstruct the mapping of interest. Nevertheless, it has been demonstrated in [14] and [15], and will be shown later on as well, that the bias can be made practically negligible by increasing the number of processing units in the network.

360 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996

Suppose g(z*) = 6'* where z* is defined by (4) for an observation vector y = f* = f(0*), and 6'* is an interior point of 0. Consider a small deviation A0 of the parameter vector, such that (e* + A6') E 0. It causes the corresponding deviations of both z and g. In particular, for the output node gk, k = 1 , 2 , . . . , p one obtains using the Taylor series expansion [32]

Ask = g k - G P .c""1 8% z* 4

,=1 = Gkl A21 + GkzAZz + . . . + GkpAjZP (9)

where

az, = Z, - 2;

and

r l 0 . . . 0 1

Assume the Jacobian IZI exists. Premultiplying both sides of (15) by ( Z T ) - I , one obtains

G = (Z')-'. (17)

It follows from this first-order perturbation analysis that the function g(y ) can uniquely be determined if the feature sensitivity matrix Z is nonsingular in the neighborhood of each point 6'* E 0. It should be noted that this is the necessary condition for the existence of this function. The question, whether a given architecture heural network can approximate it with a prescribed error or not, is another issue.

The following sensitivity matrix can be defined for the measurements

where

By definition, function gk(.) should be equal to 0k,gk(z* + Az) = 0; + AQk, and be independent of all the other pa- rameters. Under the assumption of g(z*) = 6'*, the following condition holds:

Agk = A6'k. (13)

Using (13) and (12) one can formulate the system of p linear equations with p unknowns Gkl, Gk2,. . . , and Gkp

which substituted to (17) gives

G = (ETF)- l . (21)

Equation (21) will be used to derive the covariance matrix for the neural-network based parameter estimator.

The additive noise E, in (2) is modeled as a stationary, white, zero-mean Gaussian process, i.e., = a21, where E = [e1 E,]' and g2 is the noise variance. The noise is uncorrelated with the signal. Consider the observation vector y* = f * + E, which, when substituted to (4), produces the following input to the neural network:

Repeating the above discussion for k = 1 , 2 , . . . , p give p equations similar to (14). All of them can be combined together to obtain

Suppose a sufficiently parameter estimate by the following expression:

that g(z*) = Q* at a point Q* E 0. F~~ noise variance one can approximate the

Z'G = I (15) 8 = g(z* + ET€)

(23) where E Q* + G ~ E % .

The covariance matrix of the estimate 8 can be obtained as 211 2 1 2 . . . 21,

u p 1 4 2 . ' . ZP, cov[8] = E [ @ - e*)@ - 6'*)T]

. . . . .

s E [( G ~ E ~ &) ( G ~ E ~ ~ ) ' ]

= a 2 ~ T ~ T ~ ~

= E[GTET~~'(GTET) ' ] Gi i G12 . . .

' I I . . . . .

MATERKA AND MIZUSHINA: PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS

~

361

where

rX1 0 . . . 0 1

Using (21) one obtains from (24)

C O V [ ~ ] = ~ ( E ~ F A F ~ L ) - ~ . (26)

This result can be compared to the covariance matrix of a least- squares parameter estimate s, based on the same measurement vector. That matrix, for large n, is described by the following expression [7]:

C O V [ ~ X ] = a 2 ( ~ T ~ ) - 1 . (27)

The matrix FTF is called the information matrix for 8 [33]. Since most of the information carried by the measurements is preserved by the transformation (4), one can expect that the proposed estimator is not much different from a least-squares estimator in terms of their parameter covariance matrices. This is indeed the case in practice, provided the inverse mapping of interest is accurately approximated by the neural network. This problem will be further discussed in the following section. (It should be noted here that (26) is valid for any number of observations n > p , whereas (27) is derived as a limit for n + CO. Thus their similarity is approximate only. More investigation is planned into the comparison of the proposed estimator to the least-squares [7] and total-least-squares [41] parameter estimators, including high noise level case.)

The formula (26) can also be written as

C O V [ ~ ] = a2(ZTAZ)-' . (28)

The larger the determinant IZTAZI, the lower the volume of confidence ellipsoids [33] and lower uncertainty of parameter estimates. The minimization of this determinant by a proper selection of observation points can be used for the experiment design [33], [34].

111. RESULTS To support the analytical results of the previous section

and to demonstrate the practical usefulness of the proposed approach to the parameter estimation problem, three numerical examples are presented. The examples are selected to cover various aspects of biomedical engineering. They range from laboratory measurements using a spectrophotometer, through human electroretinogram modeling by a nonlinear dynamical system impulse response to subcutaneous tissue temperature measurement using a multifrequency microwave radiometer.

A. Spectrometric Signal Deconvolution This particular example has already been discussed in a

number of papers [13], [35], [36] to illustrate the performance of iterative methods of signal deconvolution. Its selection here will give better understanding of the properties of the neural network-based method of signal parameter estimation in the context of the previous related work performed by other research groups. Fig. 2 shows the two signals of interest: z ( t )

1.8 - 1.6

1 4

1.2

$ I 1

*- 0.8 \

X 0.6

0.4

0.2

0

----t input, x

-D- blurtnoise. Y

-0.2 1 o u ~ m ~ ~ u ~ m - o w ~ m ~ o - d ~ ~ m - - m m w ~ ~ m m m o

t rt

Fig. 2. Computer simulated spectrophotometric signals.

and y( t ) , calculated using (1). The ideal signal z( t ) comprises two Gaussian components

1001 1 t - 108 x ( t ) = ~ 6 0 2 exp [ - 2 ( 4 ]

each dependent on three parameters. The parameters of the second component are fixed for the purpose of this demon- stration, whereas the parameters of the first component are unknown. Parameter 81 represents one-tenth of the amplitude of the first spectral component, 82 is the component width, and 83 corresponds to the spectral component location on the axis of the independent variable t. Their actual values belong to the following parameter space:

( 0 ~ 0 . 5 5 01 5 1.5, 2 5 82 5 4, 4 5 03 5 5 ) . (30)

The signal z( t ) is plotted in Fig. 2 for the nominal values of its parameters, 00 = [1.0, 3.0, 4.5IT. The number of parameters is limited to p = 3 in this example, to make all the presentation simpler and computer calculation shorter in time. The degrading system impulse response h(t) is also assumed in the form of a Gaussian function in this example

h(t) = - 3 exp[- ;(:>'I. a 8

Equations (l), (2Y), and (31) were used to simulate the signal observations with normally distributed pseudorandom noise added. The signal y ( t ) plotted in Fig. 2 corresponds to the noise standard deviation of c = 0.02. The noiseless signal measurements, obtained for a set of different parameter values spanning the range (30), are shown in Fig. 3.

The problem is to design the neural network to estimate the signal parameters given a set of noisy measurements ~ ( t ) . Observation points have to be selected first. Although other approaches are possible, it is assumed that equidistant samples of y ( t ) are taken at the following values of the variable t:

t; = CO + Cl(z - trunc (n/2 + l)), z = 1 , 2 , . . . , n. (32)

Suppose n = 9 is a sufficient number of observations, i.e., it is the number that gives a sufficient reduction of the noise- induced estimation error as compared to the error that could be observed for n = p = 3. As discussed e.g., in [33], the

IEEE TRANSACTTONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996 362

1.8

1.6 --

1.4 ~~

h 0 . 8 ~~

0.4 -

- (0.5.3.0.4.5) - 11.s.3.0.4.51 ---ct (1.0.3.0.4.01

--t 11.0.3.0.5.0) - (1.0.2.0.4.5) 11.0.4.0.4.5)

O w " d o w N w ~ O w " d o 3 4 " m ~ d " w b r w W m o

3

t

Fig. 3. Signal y ( t ) simulated for v = 0

determinant of the information matrix can be maximized in the search for the optimum observation points. The following criterion is suggested for the purpose

4o(F) = (fi. (33)

Fig. 4 shows the dependence of the function (33) on the constants CO and C1 in (32), calculated for the nominal parameter vector Bo. The criterion function reaches its max- imum at CO = 45 and C1 = 3. It follows then that the optimum observation interval covers the range from t = 33 to t = 57, for the nominal parameter values and n = 9. One can expect that this range may not be optimal for other values of the unknown parameters and for different numbers n. Nevertheless, the values CO = 45 and Cl = 3 were used to find the measurement covariance matrix R using (1)-(3), (5 ) , and (32). The matrix R was estimated assuming that the elements of the parameter vector were random variables, uniformly distributed over the parameter space (30). The largest eigenvalues of the covariance matrix were equal, respectively, to

= 0.345, = 0.243, 6 = 0.048. (34)

The corresponding eigenvectors are plotted in Fig. 5. They were used to construct the projection matrix (7) for the PCA network shown in Fig. 1.

Next, the feature sensitivity matrix Z was calculated on a dense grid of points covering the parameter space. The corresponding surface plots showing the dependence of the criterion function 40 (Z) on the signal parameters are presented in Fig. 6. The determinant ( Z ( appears to be a smooth function of the parameters. It does not reach zero within the parameter space. One concludes that the inverse function g(z) exists. The calculated examples { e , .(e)} can be used for the neural network training. Fig. 7 demonstrates how the signal feature z1 varies with the parameters in the present example. For the signal and parameter space under consideration, the features span the following range of values:

x1 E [-1.7,1.6], z2 E [-2.3,2.3], and z3 E [-0.7,2.8]. (35)

A single-hidden-layer feedforward neural network with sig- moidal processing units was used to verify the proposed pa- rameter estimation technique. This is a well-known neural net- work architecture, discussed by many authors, e.g., [26]-[30].

7

- 6 F4 - 5 0

2 4 PI X 0 2

3 5 0

0 4 1

Fig. 4. Determinant criterion 4o(F) as function of CO and C1 in (32)

0.6 0.5 0 . 4 0.3 0.2 0.1

0 -0.1 -0.2 -0.3 -0 .4 -0.5

33 39 45 51 57

t

Fig. 5. The eigenvectors of the data covariance matrix corresponding to its three largest eigenvalues.

The output of this network at node k , k = I, 2 , . . . , and p is described by

where w = [w1,w2,... ,wqlT = [uo,u~,'..,u,,v~o, ~ 1 1 , . . . , W,,]~ is the q-element weight vector associated with the output node k , q = ( p + 2)m + l , m is the number of processing units in the hidden layer, and [ ( a ) denotes a sigmoidal function of the variable a. There is generally a freedom in selecting the particular form of this "squashing" function [31]. The functions 1/(1 + exp ( -a)) or tanh(a) are perhaps the most popular ones in the literature. However, calculation of their values involve evaluation of the exponential function that is computationally intensive. The following piecewise function is proposed in this study:

a < -1

a: > 1. - a!3 ) , -1 5 a 5 1 (37)

The time of calculation of a value of 1/(1 + exp ( -a)) is approximately three times longer than the time needed to

MATERKA AND MIZUSHINA: PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS 363

2.5 2.5

2 2

6 l 5 5 Z l 8 1

0 5 1 5 0.5 4

0 0 167 333

e1 ez

ez

(a) (b) (C)

Fig. 6. Determinant criterion 4o(Z) as function of the signal parameters: (a) 83 = 4.5, (b) $2 = 3.0, and (c) 81 = 1.0.

1 5 5

67 7

e1 e3

(a) (b)

Fig. 7. Feature 21 as a function of the signal parameters: (a) 83 = 4.5, (b) 82 = 3.0, and (c) 81 = 1.0.

obtain a value of <(a) based on (37), for a PC386 computer. No significant difference in the neural network capability to approximate a continuous nonlinear mapping was observed when the two functions were tested on a number of selected benchmarks. In the numerical example discussed, the follow- ing sum:

N

was minimized in w, for each of the unknown parameters, k = 1,2,3, respectively. A regular lattice of points was defined in the parameter space (30) with K, points for parameter Br,r = 1,2,3

81% = 0.5 + (i - 1)/(K1 - l), i = 1 , 2 , . . . , K1 8 Z J = 2.0 + 2.0(j - 1)/(Kz - l), j = 1 , 2 , . * . , Kz (39)

For the training, K1 = 6,Kz = 12, and K3 = 12 were assumed giving a total of N = K1KzK3 = 864 example vectors

83k = 4.0 + (5 - l ) / (K3 - l), k = 1,2,. . . , K3.

8") = [ d i 2 , 8z2, 831~1, = KzK3(i-l)+K3(j-l)+k. (40)

More examples were required for parameters 82 and 03 than for 81. This reflects the fact that parameter 81 is a smooth function of features z, unlike the other two parameters. Surface plots of 8 2 as a function of the features are shown in Fig. 8. The

problem of example point selection for the neural network training has been discussed in more details in [38]. To test the neural network performance after training, a denser grid of N = 27000 points was used, corresponding to K1 = K2 = K3 = 30 in (39) and (40).

As expected, the approximation error decreased with the number of processing units m in the hidden layer. Table I compares results obtained for networks with m = 5 and m = 10 sigmoidal units. The time necessary for the training increased with m and was equal to about one hour and four hours of calculations using a PC486/66 MHz computer, respectively, for m = 5 and m = 10. The maximum absolute error of parameter estimation obtained for m = 10 ranged from 0.18% of the nominal parameter value for dl, through 0.35% for parameter 83 to 5.2% for parameter Oz. This can be considered as a very good accuracy for many practical applications. Nevertheless, looking for other neural network architectures which would provide better approximation ac- curacy at a lower number of processing units and adjustable weights is a worthwhile research topic.

For m = 5 and number of inputs equal to three, the total number of mathematical operations required to obtain the estimate of a signal parameter is 4m = 20 multiplications and additions. The value of the sigmoid function has to be evaluated m times as well. This operation can however be made very fast using a look-up table ROM, so as the time necessary to perform it can be neglected. On top of that, n

364 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996

3 5 4 4 3.5 3 5

3 3 2 5 2 5

3

2 5

2

a, 1 5 N 0 2 g 2

1.5 1 5 1 1 1

0 0 11 0

0 5 1.426 0,5 2 616

7.3

(a) (b) (c)

Fig. 8. Surface plots of the funchons gZ(z) realized by neural networks of m = 10 processing units: (a) z3 = 0, (b) z2 = 0, and (c) z1 = 0.

TABLE I ROOT MEAN SQUARE (&rns) AND MAXIMUM ABSOLUTE (&”)

ESTIMATION ERROR FOR THE PARAMETERS OF SIGNAL (29). CALCULATED ON THE TESTING SET FOR NOISELESS MEASUREMENTS

parameter I m I 6,, I 6- j

el 1 5 1 00021 I 0.0092 81 1 5 1 0 0983 I 0 5440 0.3 1 5 1 0.0063 I 0.0337

I 01 I 10 I 0.0003 I 0.0018 62 I 10 I 0.0345 I 0.1570

I 83 I 10 1 0.0037 I 0.0158 I

multiplications and n - 1 additions are needed to calculate the signal features. The total estimation time could then be of the order of a few microseconds if a DSP chip is used for the neural network implementation. It takes still a small fraction of a second for a PC to perform the same task. This compares very well with the performance of a PC program for the least-squares nonlinear model fitting which needs tens of seconds or a few minutes to estimate the signal parameters for the same measurements, depending on the noise level and on the starting point.

To evaluate the effect of noise on the estimation error, the parameter vector 0 was randomly drawn from the parameter space with a uniform probability distribution. Pseudorandom numbers were then added to the corresponding vector f to sim- ulate the Gaussian measurement noise of standard deviation a. Next, signal features were found using (4) for each of the random parameter vectors and were applied to the input of the trained neural networks of m = 10. The networks produced the parameter values with some error, due to noise and bias. The parameter vector random drawing was repeated 500 times for each selected value of the noise standard deviation. The estimation error mean value and its standard deviation were found for such a population of random parameter vectors. For the comparison, the parameters were estimated using a least-squares model fitting to the same measurement vectors that were used for feature calculation. In each case, the error mean value was about 10 times less than its corresponding standard deviation, so the estimators under consideration were

L O L D I P I P ~ ~ N N 0 0 0 0 0 0 0 0 I I I I I I I I

0 r 1 0 r 1 0 ~ 0 r l

r 1 m r c m d m d m

6 8 2 8 8 8 8 1 . . . . . . . .

10 ’ ---“I I 1

--> LS1 , % -ij 0 . 1

$ 0 .01

:: 0.001

U 0.0001

0.00001

noise st. dev.

Fig. 9. (NN) and for the least-squares (LS) model fitting.

Noise-induced error of the parameter estimates for neural networks

practically unbiased. The results are plotted on Fig. 9. For low noise values, o < 3.16e-4 for 01,a < le-3 for 02,

and U < 3.16e-3 for 8 3 , the least-squares method gives a lower noise-induced error than the neural network does. This is caused by the limited-size neural network inability to approximate the considered mapping from the feature space to the parameter space with a zero global error. In other words, the low-noise portions of the “NN’ curves in Fig. 9 represent the bias of the estimators. The actual values of the error standard deviation compare well with the respective “rms” entries in Table I. For higher noise levels, the bias of the neural network-based estimates can be neglected when compared to their noise-induced deviations and the curves representing the two estimators, NN and LS, coincide.

The sample parameter covariance matrix was estimated for the nominal parameter vector 00. The measurement vector was calculated 1000 times with a simulated Gaussian zero-mean random noise added, of standard deviation a = 0.001. Least- squares parameter estimates were also calculated using each of the measurement vector realizations. The corresponding fea- tures (4) were applied to the input of the neural network-based estimator. The nominal parameter values were, respectively, subtracted from the estimated parameter values. The NN estimator bias was identified before that and subtracted from the results as well, being equal to 4.027e-5,5.089e-2, and 4.297e-3, respectively, for 01 = l . O , & = 3.0, and 8 3 = 4.5.

MATERKA AND MIZUSHINA PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS 365

All these results were combined together to estimate the parameter covariance matrices

2.07 45.8 0.25 cov(8) = c2 45.8 1863.7 14.6

0.25 14.6 1.97 1 (41) I 2.09 49.2 -0.02

-0.02 -0.64 1.57

[ [ COV(~") = c2 49.2 2096.1 -0.64

respectively, for the neural network-based estimator and for the least-squares fitting. On the other hand, the two matrices (26) and (27) were, respectively, equal to

(ETFAFTE)-l = 0 1.591

(42) r2.07 48.4 0 1

0 1.59

There is a discrepancy between the theoretical and exper- imental covariance matrices for the ANN-based estimator. It can be explained by the fact that the neural networks actually approximate the inverse mapping O(z) by a function g(z). If the approximation error is different from zero, then there are nonzero differences between the derivatives of the approximated and approximating functions also. Following (24), the derivatives of the function g(z) with respect to the unknown parameters define the actual parameter covariance matrix. The elements of G can be derived from (36) as follows:

m

= uj5"vj; (43) j=I

where E' = d</do evaluated at a = wjo + ET='=, wj;zi. Using (43) calculated for the nominal parameter vector, one can obtain the following values of the elements of the discussed matrix:

2.05 45.3 0.28

0.28 15.1 2.01 GTRU1G = 45.3 183'7.3 15.1 ] .

They are in a good agreement with the values obtained from the random noise effect simulation; cf. (41).

(44) [ B. Parameter Identijication of a Nonlinear System

As discussed in [42], dynamics of the complex processes that take place in human retina can be studied noninvasively by recording a potential, the electroretinogram (ERG), from the eye. The ERG signal is measured in response to a light flash using a contact-lens electrode placed on the cornea. This class of signals has been modeled as an impulse response of a nonlinear system shown in Fig. 10 [42]. This system is a cascade of n identical single-Dole low-Dass filters. each having

Fig. 10. Block diagram of a nonlinear model of human eye rod receptor activity.

the equivalent time constant 81, connected to a nonlinear memoryless block whose transfer function is

where 01, 02 , and O3 are the model parameters. The output of the nonlinear block is the signal f ( t ) which is contaminated by noise and cannot be measured directly. Thus noisy ERG signal recordings are used to estimate the model parameters. The averaging technique has been applied to increase the signal to noise ratio of ERG waveforms [SI, [42]. However, averaging was not sufficient to estimate the parameters 81 and 02 as independent quantities due to a high correlation between them [42]. Therefore a constant value of B1 was assumed in [42] and the other two parameter have been found by model fitting. (Based on separate investigation, the number of linear stages of the model was also fixed, to n = 4 [42].)

Signal averaging is a technique which works well provided signal parameters do not change during the time of data recording. This assumption is likely to be violated, for two main reasons. One corresponds to possible changes in the experimental setup, e.g., caused by object movement, that may affect the observations. The other may be related to the fact that the biological system under investigation (the retina in this example) is likely to change characteristics of its response if it is repeatedly excited by the same stimuli. Therefore, there is a need for an estimation technique which would produce robust parameter estimates given noisy signal observations. We will show using the model of Fig. 10 that the neural network-based technique is superior to LSE model fitting in this sense. One should note, however, that the examples elaborated concern simulated signals and further work is necessary to investigate potential clinical and diagnostic applications of the technique under consideration.

One can show, by applying an appropriate Laplace trans- form pair, that the impulse response of the cascade of n low-pass stages in Fig. 10 can be expressed as

where T , 5 1.0 is the ratio of the light flash energy to the maximum flash energy under consideration. Following [42], the response has been normalized to have its peak value equal to 1.0 for ri = 1.0. Substituting (46) into (45) allows us to calculate the model response. A family of noiseless responses calculated for the "nominal" parameter values of 81 = 189 ms, O2 = 2e-5, and O3 = 180pV is shown in Fig. 11.

For a fixed flash energy, the leading edges of the respective curves shown in Fig. 11 can be measured and interpreted as rod receDtor activitv r421. The relevant Dortions of the recorded

366 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996

0

-20

-40

5 -60

-80

- - z g -100 8 -120 rh

-140

-160

-180

t-- 1\16

1\32

-e 1\64

t i m e (ms)

Fig. 11. (46), calculated for kc ranging from 1 : 1 to 1 : 64.

A family of impulse responses of the nonlinear system (45) and

signal are used for parameter estimation. Suppose two values of the flash energy are selected for the simulated experiment, namely corresponding to k, = 1.0 and to k, = 0.125, for which the response leading edges are defined over the periods, respectively, from t = 1 ms to t = 8 ms and from t = 9 ms to t = 15 ms. Assume also that the model parameters may deviate within the range of f 2 0 % from their nominal values. Further expansion of the parameter space is possible if required, at the expense of ANN size required. The leading edges calculated for a few combinations of model parameter values are plotted in Fig. 12. They form the observation signal in the present example. The sample data covariance matrix was calculated assuming uniform parameter distribution. The first ten eigenvalues of this matrix are plotted in Fig. 13 in the descending order. Eigenvectors corresponding to up to five first eigenvalues were used to construct the transformation matrix E. Three neural networks were trained using responses contaminated by simulated Gaussian noise by applying the procedure described in the previous example. It was found that the parameter identification accuracy increased with the number of eigenvectors considered and the number of ANN processing elements (PE) in the hidden layer. The maximum absolute testing error less than 1 % was obtained for five-input networks with m = 3 sigmoidal processing units each for both parameters 6'1 and 6'3. To obtain the same accuracy for 02, the number of processing units had to be increased to m = 5.

The ANN'S of five inputs and three PE's each were trained again using the observations contaminated by simulated ad- ditive Gaussian noise of zero mean and standard deviation equal to 2 pV. The networks were then tested for estimation accuracy using noisy signal observations with varying noise standard deviation, from 0 to 5 pV . The same data were used to estimate the model parameters using the LSE technique. For each testing noise variance, 30 experiments were simu- lated, each involving random parameter draw, noisy response calculation, parameter estimation using the two methods and error calculation. The mean, standard deviation and maximum absolute errors were estimated. To find their mean and standard deviation, each round of the 30 experiments was repeated 30 times. The results are summarized in Table II and Table 111. One can see that the LSE technique produces lower errors in the noiseless case, as expected. However, at the increased testing noise variance, the ANN-based technique is superior to

E -120 2 -140

-160

-180

- tZt --c t3+

--H O r ( N m - " ? m r . " O 4 N m I * m

t i m e ( m s ) r ( d h r ( r i - 4

Fig. 12. Leading edges of calculated nonlinear model responses. nom: 81 = 189, 82 = 2e - 5, 8 3 = 180, t l - :81 = 151.2, 8 2 = 2e - 5, O3 = 180, tl + : 01 = 226.8, 82 = 2e - 5, 83 = 180,t2 - : 81 = 189.2, O2 = 1.6e - 5, 63 = 180, t 2 + : el = 189, 82 = 2.4e - 5, 83 = 180, t 3 - :@I = 189. 82 = 2e - 5, 83 = 144, t 3 + :81 = 189, 82 = 2e - 5, 83 = 216.

- 4 1~~

I

aJ p 0.01

0.001

0.0001 1 2 3 4 5 6 7 8 9 10

index, i

Fig. 13. First 10 eigenvalues of the covariance matrix calculated for the nonlinear system response.

LSE model fitting. The differences are especially dramatic in the case of parameters 01 and 6'2. This demonstrates that the proposed parameter estimation technique is not only suitable to real-time applications but can also be advantageous in cases where data are contaminated by random noise. This applies to both linearly and nonlinearly distorted signals. Although a memoryless nonlinearity has been investigated in the present example, the method is applicable to systems that contain energy storing nonlinear elements as well [15].

C. Retrieval of Temperature-Depth Profile in Biological Objects From Multifrequency Microwave Radiometric Measurements

Possibilities of applying microwave radiometry to nonin- vasive estimation of subcutaneous tissue temperature have re- cently been investigated by a number of authors, see e.g., [43]. There are two main applications of the technique: detection of breast cancer and control of hyperthermia of cancers. The thermal radiation from subcutaneous tissues is measured over a number of microwave frequency bands by using an antenna placed on the skin surface [44]; see Fig. 14. The microwave radiometry responds to the temperature averaged over the field pattern of the antenna with very strong weighting of regions near the surface [43]. Since microwave power attenuation in living tissues depends on frequency, the weighting is different

MATERKA AND MIZUSHINA: PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS

l o l l ANN I LSE 11 A N N ] LSE 1 A N N 1 LSE 0 11 -0.004k0.02 I 0.0kO.O 11 0.11-10.009 1 0.0-10.0 1 0.20M.013 I 0.0M.O

361

1 0.003-10.02 0.1753.12 p w l -0.005M.02 0.82M.49

TABLE I1 NONLINEAR SYSTEM PARAMETER ESTIMATION UNDER NOISY CONDITIONS ERRORS OF 61

0.1W.012 0.8M.17 0.20M.017 2.6k1.0 0.1M.008 1.56M.62 0.21M.017 5.913.2 0.11M.009 2.43k1.03 0.21M.022 10.155.6

I 5 ~0.003M.009 I 0.22M.12 ~0.047M.006 I 0.63M.08 I 0.11M.02 1.70M.28 1

4 1 0.005M.02 I 1.39M.61 11 0.12M.013 I 3.96k1.69 I 0.23M.028 I 16.5k8.6 5 1 0.003M.02 I 1.63H.84 11 0.12M.011 I 4.36f1.87 0.24M.044 I 18.4+10.2

for each of the radiometer bands. Thus a set of brightness temperature measured over several frequency bands contains the information about the temperature versus depth profile in tissue. A method of retrieving the profile has been proposed in [44]. It combines least-squares model fitting and Monte Carlo techniques to estimate temperature confidence interval as a function of depth.

It is well known that radiometric data fluctuate randomly. The resulted uncertainty of the apparent brightness temperature can be reduced by increasing the time of radiometer response integration, at least in principle [44]. However, during an increased observation period of a biological object whose temperature profile is investigated in vivo, the assumption of signal stationarity is likely to become violated. Therefore, the integration time is kept short which causes an increase in brightness temperature variance. To cope with this situation, there is a need for a method of radiometric signal processing able to robustly estimate the temperature profile based on noisy data. We will present results of a computer simulation which demonstrate that the ANN-based estimation can feature the required characteristic when applied to this task.

We assume that a mathematical model is known which describes adequately the temperature profile as a function of depth [44]. The objective is to estimate the unknown parameters of the model based on multifrequency radiometric data. For the sake of simplicity we consider here the following three-parameter model:

T(d) = T,, -1, 5 d < 0

T(d) = t o + [ exp (- $> -XI’( - f ) ] > 0 5 d < 00 (47)

-1w -

0 -

.1 MUSCLE I

Fig. 14. Illustration of the idea of applying the microwave radiometry to the measurement of subcutaneous tissue temperature in biological objects in two bands centred at frequencies fB1 and f B z .

where T, and To are known from independent temperature measurements and represent water temperature in the bolus and skin surface temperature, respectively, and d denotes the

368

0.26 -1.61 -19.2- 0.21 -0.16 31.3 0.17 1.22 8.3 0.12 1.23 -6.3 0.09 1.14 -24.7-

IEEE TRANSACTIONS ON BIOMEDICAL. ENGINEERING, VOL. 43, NO. 4, APRIL 1996

. (50)

o n o " ? o n o m o ~ o ~ o d i d d N N " - * m

depth (mm)

Fig. 15. Weighting functions in (47).

depth. For I , = 10 mm, I , = 1.5 mm, If = 10 mm, T, = 22"C, and To = 24"C, the values 81 = 25"C, 82 = 100 mm, and 83 = 7 mm provide the best fit of model (47) to the profiles studied in [44] and [45]. There is a loss of accuracy, of about l0C, when a three- instead of four-parameter [44], 1451 model is used to describe the temperature-depth profile. However, the three-parameter model was sufficient to investigate ANN performance when applied to the temperature profile retrieval problem. Smaller number of parameters has made the ANN training and testing shorter in time. The requirements for the computer memory capacity needed to carry out the simulation have been relaxed as well. These helped to produce more data for the new technique evaluation at this stage. Further work is planned on training the ANN'S to estimate more accurate, four- and five-parameter models of the temperature-depth profile.

A weighted integral of the profile described by (47) pro- duces the brightness temperature observed at the radiometer output, as follows:

where i is the radiometer frequency band index and C,,i = 1 , 2 , . . . , and M are known constants. Fig. 15 shows a family of weighting functions, calculated using [44, (A1)-(AlO)], for the bolus-skin-fat-muscle structure at M = 5 frequencies: 1.2, 1.8, 2.5, 2.9, and 3.6 GHz, each corresponding to a centre frequency of a 400 MHz-wide band of a radiometer constructed for the purpose of temperature profile retrieval WI.

In the computer simulation, a trapezoidal rule was used to perform numerically the integration (48) for each fre- quency band of interest, resulting in the noiseless observations f , = T B ~ , i = 1,2, . . . , and 5. It was assumed that model parameters can take any value from the following parameter space:

81 E [20,30]"C, 02 E [70,130] mm, and 83 E [4,10] mm. (49)

For the nominal parameter vector, equal to [25,100, 7IT, the observation vector as calculated from (48) was equal to [29.05, 25.56, 24.87, 24.66, 24.35IT"C. Fig. 16 shows a family of curves representing the temperature-depth profile (47) in the range d E [0,50] mm, calculated for a number of different parameter vectors from the space (49). The eigenvalues of the

25 T - nom -0- t1- - tl+ - t2- - t2+

t3- - t3+ o n o m o * o m o n o

d d N N " * - * I I )

depth 0"

Fig. 16. Temperature profile calculated from (47) for different parameter vectors.

observation covariance matrix, estimated assuming uniform parameter distribution over (49), were as follows: fi = 2.47, I/& = 0 . 3 8 , a = 0.022, = 0.0017, and & = 0.000 058. For p = 3, the projection matrix was equal to

A three-input neural network was trained to estimate the parameters of the model (47) using noiseless observations. It was found that networks with m = 7 sigmoidal processing elements were able to estimate model parameters with the maximum absolute error less than 1% for 81 and 03, and less than 3% for 82, with respect to the nominal values. Caussian noise components were then added to the observations (48) and the ANN'S were trained again, for m = 3 processing elements per parameter. The noise standard deviation of 0.1"C was assumed for the training which approximates values measured using a five-band radiometer at the integration time of 5.0 s [441.

To evaluate the ANN performance, a numerical experiment was conducted in which brightness temperature vector was calculated at the nominal parameter vector. Pseudorandom numbers were then added to this vector. These noisy observa- tions were transformed using matrix (50) and the resulting features were applied to the input of the ANN estimators trained under noisy conditions. The ANN produced estimates of the parameters which were then substituted to (47) to calculate (retrieve) the temperature profile at a number of fixed values of depth. The random draw was repeated 500 times in the experiment discussed. For each random draw of the noise vector, the model (48) was also fitted to the noisy observations using a standard LSE program. The LSE-estimated parameters were also substituted to (47) to retrieve the temperature-depth profile. The results are shown in Fig. 17. One can see that the ANN technique gives narrower uncertainty intervals, by a factor of 2-3. This is a promising result, so further work is planned on this application. It should also be noted that the LSE model fitting takes five to six orders of magnitude more computer time to converge, since the objective function

MATERKA AND MIZUSHINA PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS 369

0 5 10 15 20 25 30 35 40 45 50

*Pm (mJ

(a)

O Y I : : : I : : : : I

0 5 10 15 20 25 30 35 40 45 50

depth ("t

(b)

Fig. 17. Temperature-depth profile retrieved by (a) the ANN-based technique and (b) LSE model fitting, using 500 numerically simulated observation vectors contaminated by random noise of zero mean and standard deviation U = 0.1OC ave: average profile, min and m a : minimum and maximum temperature, respectively.

calculation for fitting involves numerical integration of (48) and, therefore, is computationally intensive.

The numerical results presented in the above examples support the discussion of the previous section. One more example of signal parameter estimation using neural networks can be found in [39]. It has been demonstrated in 1391 that the proposed approach can be used for a physical object edge parameters estimation based on a digital image of the object. Edge location can be found with a subpixel accuracy. A high accuracy of finding the edge orientation can also be achieved with only a few sigmoidal processing units for a wide, realistic range of edge parameter values. The method has been verified by experiments using a TV camera [39]. The object edge has been modeled by a blurred noisy step in image intensity. Edge location corresponds to the signal delay along the image scan line. The method can be easily extended to fast and accurate time delay estimation of signals of other shapes. A numerical example that proves the usefulness of this concept was presented in [15] where neural networks were applied to the estimation of the time delay and other parameters of an oscillatory step response of a nonlinear dynamical system.

Iv. DISCUSSION AND CONCLUSION

Techniques that support medical diagnoses often involve estimation of unknown parameters of a signal which is not

directly available for measurement. The signal that can be measured is a noisy and blurred version of the waveform of interest which, in some cases, becomes also distorted when passing through nonlinear biological or physical media. Various numerical techniques are normally used to estimate the signal parameters given its available measurements. Since this issue falls generally into the category of ill-posed problems, the related numerical techniques are rather complex and require a substantial amount of iterative calculations. Therefore, they are not suitable for real-time applications if the signal parameters change quickly in time. On the other hand, there is a need for parameter estimation methods that will allow real-time tracking of time varying physical quantities which carry the diagnostic information.

Using artificial neural networks to find a solution to the problem has been postulated and discussed in the paper. In the proposed approach, the artificial neural network approximates a mapping from the measurement space to the signal parameter space. This mapping is usually unknown. It is assumed that a model of the signal along with models of the degrading system and noise are known in terms of the model structure. The actual values of model parameters are unknown, but the range of parameter values can be assumed a priori. It is assumed also that although model parameters may change in time, they remain constant within the signal observation interval. Now, measurement vector realizations can be calculated numerically using model equations for any given parameter vector. These two vectors form a pair of examples for, respectively, the neural network input and its target output. A set of examples can be generated to cover the parameter space and be used for the neural network training. The neural network internal weights are adjusted during the training process to minimize a measure of error between the parameter estimates at the neural network output and their target values. This training process require a substantial amount of computational time. However, it has to be performed once only for a given class of signals and for their parameter range.

After training, the neural network generates the signal parameter estimates at its output in a relatively short time. This time is limited by the signal propagation through the actual physical structure of the network, e.g., through ana- logue/digital electronic circuit or through an optical device. There are two main sources of error in the estimate value. One relates to the model adequacy to the actual signal being mea- sured. This is a general problem of model structure selection and model validation, see e.g., [40] and the references therein. There is a possibility of applying neural networks for this task which is also an interesting research avenue for future work.

Another source of error is related to the limited ability of the neural network of a given architecture and size to approximate the mapping of interest with a sufficiently small error. As pointed out by one of the anonymous reviewers, there are other methods suitable to construct nonlinear functions of many variables and ANN'S are not the only solution. For example, polynomials can be used whose coefficients can be estimated via linear least squares methods. However, polynomials are known as being not efficient in areas close to poles of the target function. Moreover, the corresponding

370 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APNL 1996

equations tend to become ill-conditioned when attempts are made to improve accuracy through an increase of the degree of a polynomial. Rational function (RF) approximation offers better performance in practice 1461, at a given number of adjustable coefficients of an approximating function. As in the case of polynomials, RF coefficient values can also be obtained via solving linear equations.

A broad definition of artificial neural networks says that they are nonlinear parameterised systems that can approximate any continuous multivariable function. They are modeled by networks of densely interconnected elementary processing el- ements. Adapting this view, an RF neural network architecture has been proposed [47], able to adaptively adjust its weight coefficients for nonlinear signal processing in a number of applications 1481. This architecture has been tested in [49] showing better performance (shorter time required for training and higher accuracy at a given number of adjustable weights) when compared to the popular multilayer perceptron.

From another perspective, an interesting property of a multilayer perceptron relates to the total risk defined as the mean integrated squared error between the target function 0 and the estimated function e. As discussed in [50, p. 1831, approximation by smooth function (e.g., polynomial of a finite degree) suffers from the “curse of dimensionality” problem [37] in that for a given total risk factor the required training set size N increases exponentially with the dimension of the input space, whereas for a multilayer perceptron N does not increase so quickly. This property needs further investigation in the context of parametric signal restoration to confirm thus postulated perceptron advantage over smooth functions.

In general, except for some trivial cases, the complexity of an approximating function (either a neural network or a smooth function), required to maintain a given level of error, increases with the number of unknown estimated signal parameters and with their range. This makes the training time longer and in- creases demand for the computer memory capacity. It has been demonstrated in [51] that ANN’S employed as classifiers can help increase the parameter identification accuracy given the size of parameter space. Namely, a properly trained Kohonen network [26]-[28], [30] classifies each measured signal feature vector as belonging to one of many predefined subspaces, each of limited signal parameter range and thus requiring low complexity approximating function. Corresponding coefficient values of an approximator are prestored to ensure high ap- proximation accuracy locally, within each subspace. Kohonen network invokes a right local approximator, appropriate to the feature vector measured. Considerable savings in training time and the overall increase of accuracy can be achieved by employing these modular (classifier-approximator) ANN approach to signal parameter identification.

The above discussion reflects the authors view that search- ing for efficient neural network architectures, capable of learn- ing multivariable functions with a small error in a short time, is a worthwhile research issue whose solution is most urgently needed. Such architectures should be suitable for parallel electronic implementation, to make high-speed signal parameter estimation possible for signals whose parameters vary in time. However, regardless of any outcome expected of

.

such research, there are many examples of signals of practical interest in biomedicine that can adequately be represented by models involving a few, say three to seven, parameters only. It has been demonstrated in this paper as well as in [141, 1151, [38], 1391, [51], and [52] that useful parameter estimators can be designed with the state-of-the-art neural network architectures, simulated on microcomputers or DSP processors. Such estimators can be incorporated with mea- suring instruments to provide an aid to faster, more precise and objective medical diagnoses. It is expected that a new and significant advance will be made to the instrumentation technology by introducing the proposed technique. Taking into account the observed rate of progress in the area of ANN theory and applications, there are good reasons for expecting that more efficient neural network architectures will emerge in the near future to allow parameter estimation of more complex signals. Thus investigation into potential application areas of the technique is a worthwhile research topic as well.

It has also been shown in this paper that the ANN-based method of parametric signal reconstruction is superior to the LSE model fitting under noisy conditions, provided the ANN’S are trained on noisy observations. Three numerical examples have been presented to demonstrate the validity of analytical derivations and to show usefulness of the method in application to biomedical signal processing.

APPENDIX For the ANN training, N examples of parameters 8(’) E

0, I = 1,2 , . . . , N are selected and the corresponding obser- vations (1) are calculated using the model equations. Signal features z(‘) = [f (#(‘I) - fITE are then computed from (4) and the ANN responses to z(’) are then computed as g(w, z ( ‘ ) ) and compared to the target values 19(’). For each $ k , the aim of the training process is to adjust the ANN q-dimensional weight vector w such that the mean square error

N sk(w) = N-l (6;’ - hk(W,y(‘)))’ (AI)

1=1

attains a minimum in w E Rq. Suppose the noise variance is small, U’ 2 0. The function gk(.) can now be approximated by a truncated Taylor series

where rT = ~ F ( w ) = [GM, . . . , Gkp]ET, and the derivatives Gki are defined by (11) with z* = z ( l ) . By substituting (A2) into (Al) one obtains

N

1=1

where 61 = &(w) = 8;) - gk(w,z(’)). Following the methodology of [21], now consider an ensem-

ble of identical neural networks, each having the same weights w. All their inputs have the same deterministic component z( ‘ ) , I = 1 , 2 , . . . , N . The additive noise component of the

MATERKA AND MIZUSHINA PARAMETRIC SIGNAL RESTORATION USING ARTIFICIAL NEURAL NETWORKS 37 1

ensemble input is the vector E , uncorrelated with 61. Averaging (A3) over the ensemble yields

There are two nonnegative terms in brackets of the sum (A4). Both of them are functions of w. The term 6; is a square of the error with which the neural network approximates the inverse mapping from the observation space to the parameter space. The second one, a’yTyl, is the variance of the noise-induced parameter identification error. The latter term is proportional to the norm of derivatives of the mapping with respect to the observations. Its minimization in w tends to make the function gk (y) constant or smoother. By definition, however, this function should be equal to 0 k and be independent of the other parameters to minimize the term 6;. Simultaneous minimization of the two terms is not possible; the mapping can either be smooth or accurate. Therefore a compromise is attained, depending on the value of noise variance a’ during training.

Assume that the ANN architecture selected is capable of approximating the mapping under consideration with a negligible error. For the noiseless training, i.e., for a’ = 0, the proposed technique is equivalent to the LSE model fitting in terms of the noise-induced-error variance. However, if the ANN is trained using noisy observations, i.e., for a’ > 0, the mapping becomes smoother at some expense of the increased approximation error. With the two effects combined, the ANN can still produce parameter estimates that are of lower variance when compared to the variance obtained from the LSE model fitting to the same noisy data. Thus by training the network on noisy observations one can make the proposed technique superior to the LSE method. Computer simulation results presented above provide further illustration of this property.

ACKNOWLEDGMENT The authors wish to thank the anonymous reviewers for their

valuable suggestions regarding the paper contents.

REFERENCES

[l] G. Avanzolini, P. Barbini, and A. Capello, “Comparison of algorithms for tracking short-term changes in arterial circulation parameters,” IEEE Trans. Biomed. Eng., vol. 39, pp. 861-867, Aug. 1992.

[2] M. Yoshikawa, H. Takada, M. Miura, T. Yambe, Y. Katahira, and %-I. Nitta, “Real-time cardiac output estimation of the circulatory system under left ventricular assistance,” IEEE Trans. Biomed. Eng., vol. 40, pp. 266-275, Mar. 1993.

[3] M. Nakamura, “Waveform estimation from noisy signals with variable signal delay using bispectrum averaging,” IEEE Trans. Biomed. Eng., vol. 40, pp. 118-127, Feb. 1993.

[4] P. Jansson, Deconvolution with Application in Spectroscopy. New York: Academic, 1984.

[5] B. D. Nener, S. T. Lai, L. Faraone, and A. G. Namibian, “Parameter evaluation in automated digital deep-level transient spectroscopy,” IEEE Trans. Instrum. Meas., vol. 42, pp. 913-919, May 1993.

[6] M. C. Roman and P. R. Brown, “Free-flow electrophoresis,” Analytical Chem., vol. 66, pp. 86A-94A, Jan. 15, 1994.

[7] G. A. F. Seber and C. J. Wild, Nonlinear Regression. New York Wiley, 1989.

[8] E. Malafiej, A. Materka, P. Koman, and A. Denys, “The usefulness of computer programs in antibiotic activity investigations and the estimation of drug level in the body,” in Proc. 2nd Nut. Con$ Comput. in Med., Lodz, Poland, 1991, pp. 256-262.

[9] H. Unbenhauen and G. P. Rao, Identification of Continuous Systems. Amsterdam, The Netherlands: North Holland, 1987.

[IO] K. Baszczynski and A. Materka, “Electronic measuring system for dynamic tests of fall-arresting equipment,” Prace Centralnego Instytutu Ochrony Pracy Warszawa, Poland WNT, 1989, vol. 142, pp. 143-159.

[1 11 ___, “A method of computer identification for a selected class of second-order systems,” Kwartalnik Elektroniki i Telekomunikacii, vol. 38, no. 1, pp. -17-33, 1992.

121 M. Bertero and E. R. Pike, “Signal processing for linear instrumental systems with noise: A generaltheoh with &strations from optical imaging and light scattering problems,” in Handbook of Statistics, B. V.N. K. Bose and C. R. Rao, Eds. Amsterdam, The Netherlands: Elsevier, 1993, pp. 1 4 6 .

131 P. B. Crilly, “A quantitative evaluation of various iterative deconvo- lution algorithms,” IEEE Trans. Instrum. Meas., vol. 40, pp. 558-562, Mar. 1991.

141 A. Materka, “Application of neural networks to dynamic system param- eter estimation,” in Proc. 14th Int. Con$ IEEE Eng. Med. Biol. Soc., Paris, France, 1992, vol. 3, pp. 1042-1044.

[15] A. Materka and M. Telfer, “Model identification for timeinvariant nonlinear systems with delay using feedforward neural networks,” in Proc. 2nd Int. Con$ Modeling Simulation, Melbourne, Australia, 1993,

[16] J. V. Beck and K. J. Arnold, Parameter Estimation in Engineering and Science. New York Wiley, 1977.

[17] R. Stein, “Selecting data for neural networks,” AI Expert, pp. 42-47, Feb. 1993.

[I81 A. Cohen, Biomedical Signal Processing. Boca Raton, n: CRC, vol. 2, 1986.

1191 W. J. Krzanowski, Principles ofMultivariate Analysis. Oxford: Oxford

vol. 1, pp. 125-131.

- . ~- Univ. Press, 1988.

1201 D. Naumann, D. Helm, H. Labischinski, and P. Giesbrecht, “The - - characterization of microorganizms by Fourier-transform infrared spec- troscopy, in Modern Techniques for Rapid Microbiological Analysis, W. H. Nelson, Ed. New York: WCH, 1991.

[21] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: Perceptron, Madaline and Backpropagation,” Proc. IEEE, vol. 78, pp. 1416-1442, Sept. 1990.

[22] E. Oja, “Principal components, minor components and linear neural networks,” Neural Networks, vol. 5, pp. 927-935, 1992.

[23] Y. Nagasaka and A. Iwata, “Performance evaluation of BP and PCA neural networks for ECG data compression,” in Proc. Int. Joint Con$ Neural Networks, Nagoya, Japan, 1993, pp. 1003-1006.

[24] J. Karhunen and J. Joutsensalo, “Learning of robust principal component subspace,” in Proc. Int. Joint Con$ Neural Networks, Nagoya, Japan,

[25] - , “Nonlinear generalizations of principal component learning algorithms,” in Proc. Int. Joint Conf Neural Networks, Nagoya, Japan,

1993, pp. 2409-2412.

- . - 1693, pp. 2599-2602.

[26] R. Hecht-Nielsen, Neurocomputing. Reading, MA: Addison-Wesley, 1990.

[27] R. P. Lippmann, “An introduction to computing with neural nets,” IEEE Acoust., Speech, Signal Processing Mag., pp. 4-21, Apr. 1987.

[28] D. R. Rush and B. G. Home, “Progress in supervised neural networks,” IEEE Signal Processing Mag., pp. 8-39, Jan. 1993.

[29] R. Battiti, “First- and second-order methods for learning between steeuest descent and Newton’s method, Neural Comput., vol. 4, PP. _ _ 1411162, 1992.

1301 P. D. Wasserman, AdvancedMethods in Neural Computing. New York - - . .

Van Nostrand Reinhold, 1993. [31] K. Hornik, “Some new results on neural network approximation,”

Neural Networks, vol. 6, pp. 1069-1072, 1993. [32] A. Materka, “Fast and accurate identification of electronic circuit

parameters using regularised feedforward neural networks,” in Proc. Inc. Joint Con$ Neural Networks, Nagoya, Japan, 1993, pp. 509-512.

[33] A. C. Atkinson and A. N. Donev, Optimum Experiment Design. Ox- ford: Oxford Science, 1992.

[34] F. Pukelsheim, Optimal Design of Experiments. New York Wiley, 1993.

[35] M. B. Slima, R. Z. Morawski, and A. Barwicz, “Spline-based variational method with constraints for spectrometric data correction,” IEEE Truns. Instrum. Meas., vol. 41, pp. 786790, June 1992.

[36] R. B. Whitted and P. B. Crilly, “A digital signal processing architecture for iterative deconvolution restoration algorithms,” IEEE Trans. Instrum. Meas., vol. 41, pp. 147-151, Jan. 1992.

[37] R. 0. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York Wiley, 1973.

[38] A. Materka, “Smoothness constraints for training neural network-based parameter estimators,” in Proc. 5th Australian Con$ Neural Networks,

312 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 43, NO. 4, APRIL 1996

Brisbane, Australia, 1994, pp. 210-213. [39] -, “Fast and robust edge evaluation using feedfaward neural

networks.” in Proc. IEEE Workshon Ksual Siena1 Processine Commun., Melbourne, Australia, 1993, pp. i41-144.

Nathick, MA: The Math Works, Inc., 1991. [40] L. Ljung, System Identification Toolbox for Use With MAT”.

r411 S. V. Huffel and J. Vandewalle, The Total Least Squares Problem . . Computational Aspects and Analysis. Philadelphia: S h , 1991.

[42] D. C. Hood and D. G. Birch, “Computational models of rod-driven retinal activity,” IEEE Eng. Med. Biol., pp. 5946 , Jan./Feb. 1995.

[43] E. A. Cheerer and K. R. Foster, “Microwave radiometry in living tissue: what does it measure?” IEEE Trans. Biomed. Eng., vol. 39, pp. 563-568, 1992.

[44] S. Mizushina, T. Shimizu, K. Suzuki, M. Kinomura, H. Ohba, and T. Sugiura, “Retrieval of temperature-depth profiles in biological objects from multifrequency microwave radiometric data,” J. Elecfromagn. Waves and Applicat., vol. 7, no. 11, pp. 1515-1548, 1993.

[45] S. Mizushina, T. Shimizu, H. Ohba, M. Kinomura, and T. Sugiura, “Noninvasive temperature profiling using mutlifrequency microwave radiometry,” presented at URSI Int. Con$, Kyoto, 1993.

[46] J. W. Ponton and J. Klemens, “Alternatives to neural networks for inferential measurements,” Computers Chem. Eng., vol. 17, no. 10, pp. 42-47, 1993.

[47] H. Leung and S. Haykin, “Rational function neural network,” Neural Comput., vol. 5, pp. 928-938, 1993.

[48] H. Leung, T. Lo, and J. Litva, “Nonlinear adaptive signal processing based on rational function,” Signal Processing, vol. 38, pp. 153-168, 1994.

[49] A. Materka, “Application of artificial neural networks to parameter estimation of dynamical systems,” IEEE h t . Con5 Instrum. Meas. Technol., Hamamatsu, Japan, 1994, pp. 123-126.

[50] S. Haykin, Neural Networks, a Comprehensive Foundations. New York: Macmillan, 1994.

[5 11 A. Materka, “Modular artificial neural network architecture for accurate estimation of dynamical system parameters,” in Proc. 18th Con$ Cir- cuit Theory Electron. Circuits, Polana-Zgorzelisko, Poland, 1995, pp. 635-640.

[52] -, “Neural network technique for parametric testing of mixed- signal circuits,” Electron. Lett., vol. 31, no. 3, pp. 183-184, 1995.

Andrzej Materka (SM’91) was born in Leczyca, Poland, in 1949. He received the M.Sc. degree in radio engineenng from Warsaw University of Technology, the Ph.D. degree in technical sciences from the Technical University of Lodz and the D.Sc. degree (Habilitation) in electromcs from the Technical University of Wroclaw, in 1972, 1979 and 1985, respectively

He was an engineer at the Radio and T V Broad- casting Stations, Lodz, from 1972 to 1974. Since 1974 he has been with the Institute of Electronics,

Technical University of Lodz, Poland, working on analog circuit teshng, semiconductor device modeling, medical electronics, and digital signflimage processing. In 1980-1982 he was a Post-Doctoral Research Scholar at the Microwave Laboratory, Research Institute of Electronics, Shizuoka Uni- versity, Hamamatsu, Japan. In 1992-1994 he was a Senior Lecturer at the Department of Electncal and Computer Systems Engineering, Monash University, Melbourne, Caulfield, Australia. Currently he is a Professor and Director of the Institute of Electronics, Technical University of Lodz, Poland. He has published 70 technical articles and three monographs. Dr. Materka holds membership in the Polish Electricians Association (SEP), Polish Society of Theoretical and Applied Electrical Technology (PTETiS), Polish Society of Medical Informatics (TIM), and the New York Academy of Sciences.

Shizuo Mizushina was born in Hamamatsu, Shizuoka Prefecture, Japan, in 1933. He received the B.Eng. degree from Shizuoka University, Hamamatsu, in 1957, the M.Sc. and Ph.D. degrees from the Ohio State University, Columbus, in 1962 and 1964, respectively.

He was a Research Assistant at the ElectroScience Lab., Ohio State Univ., from 1960 to 1964. He was a Member of Technical Staff at the Bell Labs, Murray Hill, NJ, from 1964 to 1965. Since 1965 he has been with Shizuoka University, where he is

a Professor at the Research Institute of Electronics. From 1986 to 1994, he was the Dean of the Graduate School of Electronic Science and Technology. Since 1994, he has been the Director General of the Research Institute of Electronics. His research interests are in the areas of medical applications of electronics, in particular, microwave radiometry for noninvastve temperature measurement.

Dr. Mizushina is a Vice President of the Japanese Society of Hypertherrmc Oncology for 1995-1996.


Recommended