A canonical correlation approach to blind source separation

A Canonical Correlation Approach to Blind

Source Separation

Magnus Borga, Hans Knutsson

Dept. of Biomedical EngineeringLinkoping UniversityUniversity HospitalSE-581 85 Linkoping

Sweden

June 5, 2001

Abstract

A method based on canonical correlation analysis (CCA) for solvingthe blind source (BSS) problem is presented. In contrast to independentcomponents analysis (ICA), the proposed method utilises the autocorre-lation in the source signals. This makes the BSS problem easier to solvethan if only the statistical distribution of the sample values is considered.Experiments show that the method is much more computationally effi-cient than ICA. The proposed method can be seen as a generalization ofthe maximum autocorrelation factors (MAF) method.

1 Introduction

The blind source separation (BSS) problem is formulated as follows: Given anunknown linear mixture of a set of unknown, statistically independent signals,the goal is to estimate the mixing matrix and recover the original signals. Themixture can be written as:

x(t) = As(t), (1)

where A is the unknown mixing matrix and the components in s are the sta-tistically independent source signals. The goal is then to estimate a matrix Wsuch that

y(t) = Wx(t) (2)

is equal to the unknown source signals in s (except for scalings and permuta-tions).

In recent years, methods based on information theoretical measures such asmutual information has gained a lot of interest. Such methods try to minimize

1

the statistical dependence between the reconstructed signals. Measuring statis-tical dependence is, however, in general not possible and other related objectiveshas to be used as approximations.

One such method that have gained much attention is independent compo-nents analysis (ICA). It was introduced by Comon [3] and many articles has beenpublished on the subject. Instead of minimizing the statistical dependence be-tween the reconstructed signals, ICA tries to make the signals as non-Gaussianas possible. In many practical situations this also reduces the statistical depen-dence and ICA often give a very good result.

Here we present an alternative method that generates uncorrelated com-ponents. To look for uncorrelated components means that we use a weakercondition than statistical independence. There are several ways of choosing Wso that the components in y are uncorrelated, but most of them will not givestatistically independent components. One of the most well known methodsof generating uncorrelated components i principal component analysis (PCA).PCA is, however, in general unlikely to solve the BSS problem.

A weakness with PCA, which also applies to ICA, is that temporal corre-lations (or spatial correlations in images) are not taken into account. In bothPCA and ICA the samples in time (or spatial position) may be rearranged ar-bitrarily and the method will still give the same solution. This may seem as astrength, but the fact is that almost all the information in the signal is thrownaway. This makes the problem more difficult then it needs to be. As an exam-ple, suppose that two images are added together. It is perhaps difficult, but notat all impossible, to see what the original images look like. Now imagine thatall pixels in images are permuted randomly. It would now be impossible to seewhat the original images look like. In practice most, if not all, signals have acertain auto-correlation. This means that si(t) and si(t+1) are correlated whilesi(t) and sj(t) are uncorrelated (since they are statistically independent). Thisautocorrelation can be used to solve the BSS problem.

Hence, if we want to use correlation instead of statistical dependence as acriterion for solving the BSS problem we must find uncorrelated componentsthat, in addition, have maximum spatial or temporal correlation within eachcomponent. This can be achieved by canonical correlation analysis (CCA).A similar approach is called maximum autocorrelation factors (MAF) [8] andhas successfully been used to analyse multi-spectral satellite images [7]. WhileMAF maximizes the autocorrelation for a pre-specified displacement vector, ourapproach automatically selects the optimum weighted combination of displace-ment vectors. Here we also have a more direct connection to CCA, which is anestablished statistical technique. There is a similarity between ICA and CCAin that both can be explained in terms of mutual information.

In the following section we give a brief description of CCA. In Section 3,we use the CCA formalism to describe the MAF approach and show underwhat condition it will solve the BSS problem. In Section 4 we introduce ageneralization of the MAF method and show that the condition for solving theBSS problem in this case is weaker than for the MAF approach. Experimentalresults are presented in Section 5, where the proposed method is compared to

2

ICA computationally as well as regarding the results.

2 Canonical correlation analysis

CCA finds two sets of basis vectors, one in each signal space, such that thecorrelation matrix between the signals described in the new basis is a diagonalmatrix. A subset of the vectors containing the N first pairs defines a linearrank-N relation between the sets that is optimal in a correlation sense. It hasbeen shown that finding the canonical correlations is equivalent to maximizingthe mutual information between the sets if the underlying distributions areelliptically symmetric [6].

Consider two multi-dimensional random variables, a and b. Consider thelinear combinations, a = wT

a (a − a) and b = wTb (b − b), of the two variables

respectively. The correlation between a and b is given by

ρ =wT

a Cabwb√wT

a CaawawTb Cbbwb

. (3)

where Caa and Cbb are the nonsingular within-set covariance matrices and Cab isthe between-sets covariance matrix. The maximum of ρ with respect to wa andwb is the largest canonical correlation. A complete description of the canonicalcorrelations is given by:[

Caa [ 0 ][ 0 ] Cbb

]−1 [[ 0 ] Cab

Cba [ 0 ]

] (wa

wb

)= ρ

(λawa

λbwb

)(4)

where: ρ, λa, λb > 0 and λaλb = 1. Equation (4) can be rewritten as:

C−1aa Cabwb = ρλawa

C−1bb Cbawa = ρλbwb

(5)

Solving eq. (5) gives N solutions {ρn, wan, wbn}, n = {1..N}. N is the minimumof the dimensionalities of a and b. ρn are the canonical correlations [4]. Moredetails can be found in [1, 2].

3 BSS by maximum autocorrelation

Consider the case where the source signals in s is a set of one-dimensional signals,e.g. time signals, s(t). The simplest way of using CCA for separating the mixedsignals x(t) is to find the linear combination of x(t) that correlates most with alinear combination of x(t + 1), i.e. the next sample. In other words, we let

a(t) = x(t) and b(t) = x(t + 1). (6)

The first canonical correlation component will then give a linear combinationy1(t) of the mixed signals with maximum autocorrelation ry1(1). The second

3

canonical correlation component will give a new linear combination y2(t) withmaximum autocorrelation ry2(1) under the constraint that it is uncorrelated tothe first component, etc.

In the case of a two-dimensional source signals, e.g. images, s(m,n), it is notthat obvious how to choose a and b. In the MAF approach

a(t) = x(m,n) and b(t) = x(m + ∆m,n + ∆n) (7)

where (∆m,∆n) must be chosen a priori by the user [7]. A CCA on this datawould maximize the auto correlation r(∆m,∆n). A slightly more general ap-proach would be to choose

a(t) = x(m,n) and b(t) = q(x(m,n)) (8)

where q(x(m,n)) is the result of a convolution between x(m,n) and a filter thatis close to circular symmetric and which have a zero in the center of the kernel.This would remove the directional bias in eq. (7). However, the selection of thefilter kernel is still to be decided by the user. Another potential disadvantagewith the latter method is that the correlation is maximized between a singlepixel and a low-pass filtered version of its surrounding. This means that wetry to maximize the correlation between two different frequency bands, whichis probably not an optimal approach in general.

The method described above chooses the components so that the autocor-relation is maximized. Now, why is that a relevant criterion? Intuitively, itcan be explained as follows: If we have two uncorrelated signals, u(t) with highautocorrelation and v(t) with low autocorrelation, the sum x(t) of these signalswill have less autocorrelation than the original signal u(t). We can see it as ifthe addition of v(t) has corrupted u(t), making it harder to predict.

More formally we can say that the sum of uncorrelated signals will havean autocorrelation function that is less than or equal to the maximum of theautocorrelation functions of the individual signals, i.e.

rx(d) ≤ max{rsi(d)} (9)

with equality only when all the autocorrelation functions are equal, i.e.

rsi(d) = rsj

(d) ∀i, j.

rx(d) is the autocorrelation function of the sum of the individual signals si. Toprove eq. (9), we look at the special case of two source signals. The generalizationto more than two signals is straight forward.

Consider two uncorrelated signals u(t) and v(t) with zero mean. The auto-covariance function of the sum x(t) of the two signals is

cx(d) = F−1 [F [x]F [x′]] = F−1 [F [u + v]F [u′ + v′]]

= F−1 [(F [u] + F [v])(F [u]F [v′])]

= F−1[F [u]F [u′] + F [v]F [v′] + F [u]F [v′]︸︷︷︸=0

+ F [v]F [u′]︸︷︷︸=0

]

= cu(d) + cv(d)

(10)

4

where F is the Fourier transform and x′(t) = x(−t). The autocorrelation func-tion (i.e. the normalized autocovariance function) is then

rx(d) =cx(d)cx(0)

=cu(d) + cv(d)cu(0) + cv(0)

=cu(0)ru(d) + cv(0)rv(d)

cu(0) + cv(0)(11)

which cannot be larger than the maximum of ru(d) and rv(d). In fact, eq.(11) shows that the autocorrelation function of the sum is the weighted averagebetween the autocorrelation functions of the sources. The weighting coefficientscu(0) and cv(0) are estimates of the variances of u(t) and v(t) respectively.

The conclusion is that if the sources are uncorrelated and have differentautocorrelations r(d) for a certain d, then the CCA between the mixture x(t)and x(t + d) will indeed find the original source signals, except for scalings andpermutations.

4 BSS by maximum canonical correlation

A problem with the maximum autocorrelation approach is that it maximizesthe autocorrelation for a specific distance d, which must be specified a priori. Ifthe autocorrelation functions for the source signals are equal for that particularchoice of d, the method will fail.

A more general and robust method is to let the CCA choose, not only theoptimal combination of channels, but also optimal filters within each channel.This means that instead of maximizing the correlation between a sample and aneighbouring sample, the CCA now maximizes the correlation between a sampleand a linear combination of a neighbouring (but not overlapping) region of thesignal. Again, consider the case where the source signals is a set s(t) of one-dimensional signals si(t). We then let

a(t) = x(t) and b(t) = (x(t + 1), ..., x(t + N))T (12)

This is an auto-regressive model, i.e. the CCA will find mutually uncorrelatedcomponents where the components are optimal with respect to auto-regressionerror. This is more general than the method in eq. (6). A simple example willshow that: Consider the case where s1(t) = sin(tπ/2). The autocorrelation r(1)for such a signal is

r(1) = E

[sin

(tπ

2

)sin

(tπ

2+

π

2

)]= E

[sin

(tπ

2

)cos

(tπ

2

)]= 0,

i.e. the method in eq. (6) would not work for d = 1. The auto-regressive modelin eq. (12) with N ≥ 2 would however find a high correlation (-1) between s(t)and s(t + 2).

In order to see how the canonical correlation approach can find one of thesource signals, we look again at the special case of two uncorrelated signals u(t)and v(t) such that their autocorrelation functions ru(d) and rv(d) differ for at

5

least one value of d ≤ N . Consider the two signals

a(t) = wTa a(t) and b(t) =

N∑i=1

zia(t + i).

Here we have assumed that the optimal de-mixing vector for each channel isequal for all samples in time. The coefficients zi are the coefficients weightingtogether the different samples in time. The correlation between these two signalsis

ρ(a, b) =E[a(t)b(t)]√

E[a2(t)]E[b2(t)]=

E[∑N

i=1 zia(t)a(t + i)]

σ2a

=∑N

i=1 ziE [a(t)a(t + i)]σ2

a

=∑N

i=1 zica(i)σ2

a

=zT ca

σ2a

(13)

The coefficient vector z that maximizes this correlation is the solution to theauto-regression problem of a(t + i) on a(t). This solution is given by

z = C−1a ca (14)

where Ca is the autocovariance matrix, i.e.

Ca(i, j) =1N

N∑k=1

a(i)a(j). (15)

This means that the correlation in eq. (13) can be written as

ρ(a, b) =

(C−1

a ra

)Tca

σ2a

=cT

a C−1a ca

σ2a

= rTa R−1

a ra (16)

where Ra = Ca/σ2a and ra = [ra(1), . . . , ra(N)]T . From eq. (11) we see that we

can write the autocorrelation function ra as

ra = cru + (1 − c)rv

where ru and rv are the autocorrelation functions for the original signals u(t)and v(t) respectively. This means that we can write the correlation as

ρ(a, b) = (cru + (1 − c)rv)T R−1a (cru + (1 − c)rv)

= c2rTu R−1

a ru + (1 − c)2rTv R−1

a rv + 2c(1 − c)ruR−1a rv

(17)

since R−1a is symmetric. To find the maximum of the correlation we look at the

second derivative of the expression in eq. (17) with respect to the mixing factorc:

∂2ρ

∂c2= 2(rT

u R−1a ru + rT

v R−1a rv − 2rT

u R−1a rv) (18)

6

From eq. (15) we see that Ca is positive definite and, hence, so is R−1a . Therefor

we can writerT R−1

a r = r′T r′ whre r′ = R−1/2r.

This means that

∂2ρ

∂c2= 2(r′Tu r′u + r′Tv r′v − 2r′Tu r′v) = 2‖r′u − r′v‖2 > 0 (19)

if ru 6= rv, which means that the correlation has it’s maximum for c = 0 orc = 1, i.e. when a(t) = u(t) or a(t) = v(t). Since the CCA maximizes thecorrelation, we can make the conclusion that it will not choose a mixture of thesource signals, since that would give less correlation than if one of the sourcesignals is chosen.

5 Experimental results

In this section we present some experimental results illustrating the performanceof the proposed method. The experiments are done using Matlab on a SunUltra 10. For the comparison with ICA, the FastICA algorithm [5] has beenused with default parameter settings. The FastICA algorithm is claimed to be“10-100 times faster than conventional gradient descent methods for ICA”.

The first experiment illustrates that the CCA approach and ICA give qual-itatively the same results. Figure 1 shows the source signals, which are a sinewave function and Gaussian noise (top left) and a mixture of the sources (topright). The lower panel shows the results form the CCA approach and ICArespectively. In this experiment, only one time step has been used for the CCA,i. e. the maximum autocorrelation approach was used.

The second experiment illustrates the difference in computational efficiencybetween the CCA approach and FastICA. The mixed signals in this case are fiveEEG channels (Figure 2). We can see that the results from CCA (Figure 3) andICA (Figure 4) are qualitatively similar. Also in this experiment the maximumautocorrelation approach was used. The computational cost, however, differedsignificantly. The FastICA algorithm needed 27 Mflops, and 3.8 seconds, whilethe CCA approach needed 6.8 Mflops, and 0.39 seconds on the Ultra 10 workstation.

The final experiment illustrates the difference between the MAF approach,i.e. CCA with only one time step in the b-variable, and the more general CCAapproach with more than one time step. Figure 5 shows the result when the twosource signals were sin(tπ/2) and white noise respectively. The one-time stepCCA (MAF) fails to recover the source signals, wile the two-time steps CCAmanage to recover the sources.

6 Summary and discussion

In many practical situations, the ICA algorithm makes the BSS problem unnec-essarily difficult. By only considering the statistical distribution of the sample

7

0 50 100−4

−2

0

2

4 s

0 50 100−4

−2

0

2

4 x

0 50 100−0.2

−0.1

0

0.1

0.2CCA

0 50 100−4

−2

0

2

4FastICA

Figure 1: A simple example showing the result of the proposed method com-pared to the “FastICA” algorithm.

values, ignoring the temporal or spatial relations within the source signals, rel-evant information is discarded. Most, if not all, natural signals have temporalor spatial relations, causing an autocorrelation in the signal. Canonical corre-lation analysis can solve the BSS problem with much less computational effortthan the ICA algorithm by utilising the autocorrelation functions of the sourcesignals. A necessary condition for the CCA approach to work is that the sourcesignals have different autocorrelation functions.

The method of Maximum autocorrelation factors (MAF) is a special case ofthe CCA approach, maximizing the autocorrelation in the reconstructed sourcesignals for a particular distance d. In this paper a more general approach hasbeen presented that adaptively finds the optimum filter to be applied to max-imize the correlation within the signal. Hence, the method does not dependupon the a priori choice of d.

8

Figure 2: Original EEG signals.

References

[1] T. W. Anderson. An Introduction to Multivariate Statistical Analysis. JohnWiley & Sons, second edition, 1984.

[2] M. Borga. Learning Multidimensional Signal Processing. PhD thesis,Linkoping University, Sweden, SE-581 83 Linkoping, Sweden, 1998. Dis-sertation No 531, ISBN 91-7219-202-X, http://people.imt.liu.se/∼magnus/.

[3] P. Comon. Independent component analysis, a new concept? Signal Pro-cessing, 36(3):287–314, April 1994.

[4] H. Hotelling. Relations between two sets of variates. Biometrika, 28:321–377,1936.

[5] Aapo Hyvarinen. Fast and robust fixed-point algorithms for independentcomponent analysis. IEEE Transactions on Neural Networks, 10(3):626–634, 1999.

[6] J. Kay. Feature discovery under contextual supervision using mutual infor-mation. In International Joint Conference on Neural Networks, volume 4,pages 79–84. IEEE, 1992.

9

Figure 3: EEG signals decomposed using CCA.

[7] A. A. Nielsen, K Conradsen, and J. Simpson. Mulivariate alteration detec-tion (MAD) and MAF postprocessing in multispectral, bitemporal imagedata: New approaches to change detection studies. Remote sens. environ.,64:1–19, 1998.

[8] P. Switzer and A. A. Green. Min/max autocorrelation factors for multivari-ate spatial imagery. Technical Report 6, Department of Statistics, StanfordUniversity, 1984.

10

Figure 4: EEG signals decomposed using ICA.

11

0 5 10 15 20

−2

0

2

s

0 5 10 15 20

−2

0

2

x

0 5 10 15 20

−2

0

2

CCA − One time step

0 5 10 15 20

−2

0

2

CCA − Two time steps

Figure 5: The results using one time step (MAF) or two time steps in the CCAmethod.

12

Date post:	14-May-2023
Category:	Documents
Upload:	independent
View:	2 times
Download:	0 times

A canonical correlation approach to blind source separation

Documents