+ All documents
Home > Documents > A geometric approach to blind separation of nonnegative and dependent source signals

A geometric approach to blind separation of nonnegative and dependent source signals

Date post: 30-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
11
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
Transcript

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

A geometric approach to blind separation of nonnegative anddependent source signals

Wady Naanaa a, Jean-Marc Nuzillard b,n

a Faculty of Sciences, University of Monastir, 5000 Monastir, Tunisiab ICMR, CNRS UMR 7312, University of Reims, 51687 REIMS Cedex 2, France

a r t i c l e i n f o

Article history:

Received 15 June 2011

Received in revised form

27 April 2012

Accepted 15 May 2012Available online 23 May 2012

Keywords:

Blind source separation

Dependent signals

Convex geometry

Facet identification

Dual cone

a b s t r a c t

Blind source separation (BSS) consists in processing a set of observed mixed signals to

separate them into a set of original components. Most of the current blind separation

methods assumes that the source signals are ‘‘as statistically independent as possible’’

given the observed data. In many real-world situations, however, this hypothesis does

not hold. In order to cope with such signals, a first geometric method was proposed that

separates statistically dependent signals, provided that they are nonnegative and locally

orthogonal.

This paper presents a new geometric method for the separation of nonnegative

source signals which relies on a working assumption that is weaker than local

orthogonality. The separation problem is expressed as the identification of relevant

facets of the data cone. After a rigorous proof of the proposed method, the details of the

separation algorithm are given. Experiments on signals from various origins clearly

show the efficiency of the new procedure.

& 2012 Elsevier B.V. All rights reserved.

1. Introduction

Blind source separation (BSS) is a data processingoperation whose goal is commonly compared to theresolution of the so-called cocktail-party problem. In thissituation many persons are simultaneously speakingduring a cocktail-party and many sensors (microphones)are placed in the room at different places. Each sensorrecords the speech of the few persons that are close to it.The problem consists in retrieving the speech given byeach person.

There has been a considerable interest around the blindseparation of mixed sources signals because there is a needfor reliable solutions in numerous fields of science such as

analytical chemistry [1], communication [2], data mining[3], medical sciences [4], etc.

More formally, solving a blind source separation pro-blem consists in retrieving n unknown source signalsfrom n mixtures of them, despite the lack of informationabout the mixing process. This can be expressed, in theinstantaneous and noiseless case, by the following linearmodel:

X ¼ AS ð1Þ

where X is an n�m matrix holding in its rows the detectedsignals. A is an n�n mixing matrix whose entries are theunknown mixing coefficients and S is an n�m matrixholding in its rows the unknown source signals. Clearly,this is an underdetermined problem. Indeed, a full identi-fication of A and S is not possible because the sources mightbe permuted and scaled provided that the columns of A areaccordingly transformed. More precisely, if P is a permuta-tion matrix and L a nonsingular diagonal matrix then wehave: AS¼ ðAPLÞðL�1P�1SÞ. The (A,S) and ðAPL,L�1P�1SÞ

Contents lists available at SciVerse ScienceDirect

journal homepage: www.elsevier.com/locate/sigpro

Signal Processing

0165-1684/$ - see front matter & 2012 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.sigpro.2012.05.019

n Corresponding author.

E-mail addresses: [email protected] (W. Naanaa),

[email protected] (J.-M. Nuzillard).

Signal Processing 92 (2012) 2775–2784

Author's personal copy

pairs are regarded as equivalent solutions in the sense ofBSS. Furthermore, the model described by (1) is generallynot adapted to practical problems. Indeed, a more realisticmodel involves additive noise

X ¼ ASþN ð2Þ

where N is an n�m matrix modeling the sensor noise. Ofcourse, the presence of noise increases the difficulty of theresolution because the components of N are also unknown.

The blind separation of signals has been studied in thefield of statistical signal processing where the concept ofindependent component analysis (ICA) was used torecover the source signals [5]. The mixtures and sourcesare therefore defined as sampled functions of an acquisi-tion variable that may be time, frequency, position,wavenumber, etc., depending on the nature of the physi-cal process under investigation. Hence, each column of S

and X corresponds to a particular value of the acquisitionvariable, also referred to as a ‘‘slot’’. ICA-based methodsfocus on source signals that are statistically independent.In practice, these methods may be applied even if statis-tical independence does not hold. In such cases, the goal isto find a set of source signals that are ‘‘as statisticallyindependent as possible’’, even though the quality of theseparation is degraded as the dependence between thesource signals increases. From the theoretical point ofview, the separation process is achieved through the useof mathematical tools issued from second order statistics[6], fourth order statistics [5] or information theory [7].

Estimated sources presenting as few dependence aspossible may not be satisfactory, especially when the truesource signals are known to be correlated. This situationhappens in many scientific fields, like in the chemical orthe medical fields [1,4].

In a previous paper, we proposed a geometric methodadapted to nonnegative and locally orthogonal sourcesignals [8]. A set of source signals was considered as locallyorthogonal when every source signal is distinguishablefrom all the others by exclusively displaying a responsein, at least, one (unknown) slot. In such a context, blindsource separation is expressed as the identification of theextreme directions of the cone that contains the data.The geometric approach has been explored in many workssince then [9–11]. This paper presents an improvement ofthe method proposed in [8]. It addresses the weakening ofthe local orthogonality assumption. The latter is replacedby an assumption that might be informally described by a‘‘good’’ distribution of the source vectors (the columns ofthe source matrix), in the nonnegative orthant. We showthat under the new hypothesis, the separation process canbe expressed as the identification of relevant facets of thedata cone, yielding a more general and robust BSS method.

The paper is organized as follows. Section 2 is a shortreview of published works on the separation of statisticallydependent signals. In Section 3, we remind some notionsand properties of convex geometry. Section 4 provides thetheoretical basis of our method. The separation algorithmis described in Section 5. Section 6 reports experiments onreal-world source signals that provides a comparisonbetween various existing separation algorithms and thepresent one. Finally, Section 7 provides a brief conclusion.

2. Related works

This section outlines a few published works on theseparation of statistically dependent signals.

In [12], the author proposed an algorithm which oper-ates on the signal spectra and proceeds by minimizing theoutput mutual overlap. This supposes, however, that thesource spectra display as few overlap as possible. Thisassumption is more appropriate than statistical indepen-dence for NMR spectra, but it might also be violated whenthe analyzed compounds share a lot of features. Moreover,the method, as it is described in [12], is limited to twosource signals.

Blind source separation is also viewed as a nonnega-tive matrix factorization (NMF) problem [13]. Thisapproach allows the separation of nonnegative sourcesignals which are mixed by nonnegative matrices. Givena nonnegative matrix X of observed signals, the separationprocess consists in finding a pair of nonnegative matrices(A,S) that minimizes a given distance between X and AS.

In [14], Donoho and Stodden proposed a geometricapproach to NMF. They specifically addressed the unique-ness and correctness of the factorization. Since the dataand factors are limited to the nonnegative orthant, thegoal is to find a simplicial cone contained in the non-negative orthant and containing the data points. Theauthors showed that under the condition requiring thatsome of the data are spread across the faces of thenonnegative orthant, the solution is unique. Nonetheless,they have not proposed any geometrical factorizationalgorithm but have applied an existing one to the problemof synthetic image decomposition into parts.

Convex analysis is another framework in which manyBSS methods where developed. CAMNS-LP (for convex ana-lysis of mixtures of nonnegative sources) the [9] is one ofthese methods. CAMNS-LP is dedicated to the separation ofnonnegative signals which are mixed by real coefficients.It, however, uses the local orthogonality assumption(which is referred to by local dominance), like in ourearlier paper [8]. The blind source separation problem isexpressed as the problem of finding the extreme pointsof a polyhedral set and is solved, in the general case, byconsidering a set of linear programs. The authors alsoproposed a semi-analytical method for instances in whichless than three sources intervene.

This convex analysis-based approach was furtherdeveloped, thus giving rise to the MVES method (mini-mum-volume enclosing simplex) [11]. The main advan-tage of MVES is that it goes beyond the local orthogonalityassumption. The blind source separation problem israther expressed as the search for a minimum-volumesimplex enclosing the data points. From the practicalpoint of view, this approach necessitates optimizationprograms which are not convex. The authors resorted toan approximation of the objective function to get a linearformulation of their problem. The simplex-volume criter-ion was originally proposed in [15] to process hyper-spectral images and was also used by the MVSA method(minimum volume simplex analysis) [10]. MVES andMVSA essentially differ by the way the minimization ofsimplex volume is achieved.

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–27842776

Author's personal copy

From the theoretical point of view, the geometricalapproach presented in [14] is undoubtedly the closest toours. Indeed, in addition of relying on concepts fromconvex geometry, the second working hypothesis usedin this article is similar to the one used by Donoho andStodden in order to ensure the factorization uniqueness.However, in our case, this hypothesis is solely enforced onthe source signals and not on the mixed ones.

3. Elements from convex geometry

Let us first remind some concepts and properties fromconvex geometry, the theory on which the proposedmethod is based. For more details about the definitionsand proofs omitted in this section, see [16].

To begin with, we introduce some notations. By Ai,(resp. Ai), we designate the ith column (resp. row) ofmatrix A. By A\i, (resp. A\i), we denote the submatrixobtained from A by removing the ith column (resp.row). Finally, the symbol k will be used for component-wise vector comparison.

3.1. Convex cones

A subset K of Euclidean space Rn is a convex cone ifa1v1þa2v2 2 K for all v1,v2 2 K and a1,a2Z0.

Let K be a matrix of Rn�m. The subset of Rn defined by

K¼ coneðKÞ ¼ fKa9ak0g ð3Þ

is a convex cone of Rn which is termed polyhedral. K is said tobe a generating matrix of K since every element of K is anonnegative linear combination of the columns of K. Thedimension of K is, therefore, given by rank(K). If K does notcontain a nonzero vector and its opposite simultaneouslythen it is said to be pointed. If, in addition of being convexand pointed, K has dimension n then it is termed proper.Henceforth, we will mainly focus on proper polyhedral cones.

3.2. Cone faces

A face of a proper polyhedral cone K of Rn is apolyhedral cone F � K such that for all v 2 F , ifv¼ v1þv2 with v1,v2 2 K then v1,v2 2 F .

A face of K having dimension n�1 is called facet,whereas a one-dimensional face is called an extreme direc-

tion. Clearly, if vector vE is an extreme direction of K then allaEvE,aE40 are also extreme directions of K and the subsetfaEvE j aE40g is called an extreme ray of K. In what follows,all vectors belonging to the same extreme ray will beconsidered as identical directions. If matrix K holds exactlythe set of all distinct extreme directions of K arrangedcolumnwise then K is said to be a minimal generating matrixof K. Accordingly, if a cone KDRn has exactly n distinctextreme directions then it is called simplicial.

Property 1. Let K, F and M be three matrix having the same

number of row and such that M is square and nonsingular.

Then if cone(F) is a face of cone(K) then cone(MF) is a face of

cone(MK).1

3.3. Dual cone

Let K¼ coneðKÞ be a proper polyhedral cone of Rn. Thedual of K, denoted by Kn is the unique convex cone givenby

Kn¼ fy 2 Rn9KT yk0g ð4Þ

If the extreme directions number of K does not exceed n

then its dual can also be defined by

Kn¼ fKTya9ak0g ð5Þ

were KTy¼ ðKKT

Þ�1K is the left pseudo-inverse of the

transpose of K. Eqs. (4) and (5) define the same cone.These are respectively the face description and the vertex

description of Kn. All proper polyhedral cones admit bothdescriptions. However, for proper polyhedral cones whosenumber of extreme directions exceeds the ambient spacedimension, conversion between face description andequivalent vertex description is not trivial. In fact, this isa well studied problem in convex geometry which can besolved by various algorithms [16–18].

Property 2. For any convex cone K, we have the following2:

(i) K is polyhedral if and only if Kn is polyhedral.(ii) K is proper if and only if Kn is proper.

(iii) For any convex cone K0, if KDK0 then Kn+K0n.(iv) If K is proper then the extreme directions of K are

respectively orthogonal to the facets of Kn. Likewise, the

extreme directions of Kn are respectively orthogonal to

the facets of K.

4. Method

For the purpose of establishing the theoretical back-ground of the proposed method, we state the blind sourceseparation problem in the noiseless case:

Problem 1. Given a full-rank matrix X 2 Rn�m with nrm,find a nonsingular matrix A 2 Rn�n and a matrix S 2 Rn�m

such that X¼AS.

This work concerns the separation of mixturesobtained from nonnegative source signals. Hence, asa first working hypothesis, we assume that the sourcematrix S is nonnegative.

Hypothesis 1. Sk0.

A consequence of Hypothesis 1 is that the columns of X

are constrained to be nonnegative linear combinations ofthe columns of A. However, matrix X is not necessarilynonnegative because A may contain negative coefficients.If Hypothesis 1 applies, the following lemma provides anecessary condition for the resolution of Problem 1.

Lemma 3. Let A and X be two matrices with the same

number of rows, then there exists Sk0 such that X¼AS if and

only if coneðXÞDconeðAÞ.

1 See [16, Chapter 2] for a proof of this property. 2 See [16, Chapter 2].

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–2784 2777

Author's personal copy

Proof. ()): Let x be in cone(X). Then x¼ Xa, ak0. Itfollows that x¼ ASa, ak0. But Sk0, then a0 ¼ Sak0. Wetherefore obtain x¼ Aa0, a0k0, which is equivalent tox 2 coneðAÞ. ((): For any column xi of X, we trivially havexi 2 coneðXÞ. Since coneðXÞDconeðAÞ, xi must be in cone(A).Then there must exists si

k0 such that xi ¼ Asi. By apply-ing this to every column of X, we obtain X ¼ AS,Sk0,where S is the matrix whose columns are the si’s. &

Since A is a square and non-singular matrix of Rn,cone(A) is an n-dimensional cone of Rn. For the samereason, cone(A) is pointed,3 and then it is a properpolyhedral cone. Furthermore, since the columns of A

are the n linearly independent generators of cone(A), theycorrespond to the extreme directions of this cone.4

It follows that cone(A) is simplicial.On the other hand, in the definition of Problem 1 it is

assumed that X is an n�m full-rank matrix with nrm.Thus, cone(X) is an n-dimensional cone of Rn. Besides,since X¼AS and Sk0, by Lemma 3 we obtainconeðXÞDconeðAÞ. But cone(A) is pointed then cone(X) isalso pointed. Hence, cone(X) is a proper polyhedral coneof Rn.

To sum up, according to Lemma 3, solving Problem 1amounts to finding a simplicial cone, cone(A), containing aproper polyhedral, cone(X).

There is, in general, an infinity of simplicial conescontaining a given proper polyhedral cone. Indeed, sup-pose for instance that the data matrix, X, is nonnegativethen all simplicial cones containing the nonnegativeorthant5 also contain the data cone, cone(X). There is, ingeneral, an infinity of such simplicial cones. So we mustresort to supplementary hypothesis in order to limit thenumber of candidate solutions.

As a second hypothesis, we assume that the sourcevectors, i.e., the columns of S, are spread in Rn in such away that the following holds

Hypothesis 2. Every vector of the standard basis of Rn isorthogonal to n�1 or more linearly independent columnsof S.

Fig. 1 shows a set of six vectors of R3. Arrangedcolumnwise, these vectors constitute the source matrixS. Each vector of the standard basis ðe1,e2,e3Þ of R3 isorthogonal to a pair of independent columns of S. Thus, S

verifies Hypothesis 2.Let us designate by S?i the submatrix composed of the

columns of S that are orthogonal to ei, the ith vector of thestandard basis of Rn. Note that all the components of theith row of S?i must be zeros since Sk0.

Lemma 4. Under Hypotheses 1 and 2, coneðS?iÞ, i : 1, . . . ,n

are facets of cone(S).

Proof. Since S?i is a submatrix of S whose rank is n�1(Hypothesis 2) whereas S has rank n, we have coneðS?i

Þ �

coneðSÞ. It remains to show that cones coneðS?iÞ, i : 1, . . . ,n

are faces of cone(S). Let us prove that for every s 2 coneðS?iÞ

such that s¼ s0 þs00 with s0,s00 2 coneðSÞ, we have s0,s00 2coneðS?i

Þ.Since s 2 coneðS?i

Þ, we must have si¼0. Furthermore,s0,s00 2 coneðSÞ implies that s0i,s

00i Z0. Therefore, in order to

have s¼ s0 þs00, s0i must be zero. On the other hand, s0 2

coneðSÞ then by distinguishing the columns of S?i from theother columns of S, (which form a submatrix denoted byS>i), we can write s0 ¼ S?iaþS>ia, with a,ak0. It followsthat, s0i ¼ ðS

?iÞiaþðS

>iÞia ¼ 0 and since ðS?i

Þi is a zerovector and ðS>i

Þi is a nonnegative and nonzero vector, amust be zero. It follows that s0 ¼ S?ia, ak0, which isequivalent to s0 2 coneðS?i

Þ. We can proceed in the samemanner to show that s00 is also in coneðS?i

Þ. Thus, coneðS?iÞ

is a facet of cone(S). &

For short, let us designate cone(A) by A and cone(X) byX .

Lemma 5. Under Hypotheses 1 and 2, every facet of Acontains a facet of X .

Proof. According to Lemma 4, coneðS?iÞ, i : 1, . . . ,n are

facets of cone(S). It follows, by Property 1 that coneðAS?iÞ

is a facet of X ¼ coneðASÞ. But, the ith row of S?i is a zerovector. Then by deleting this row and the ith column of A,we obtain AS?i

¼ A\iðS?iÞ\i. On the other hand, by Lemma 3,

coneðA\iðS?iÞ\iÞDconeðA\i

Þ. Thus, coneðAS?iÞDconeðA\i

Þ. ButconeðA\i

Þ � coneðAÞ ¼A and the columns of A\i are n�1linearly independent vectors. Then coneðA\i

Þ is trivially aface of A and since it has dimension n�1, it is a facet of A.The result follows. &

Lemma 5 tells us that simplicial cone A may bedetermined through its facets which must contain facetsof X . These facets can in turn be identified by consideringthe dual cones as shown below.

Lemma 6. Under Hypotheses 1 and 2, every extreme direc-

tion of An is also an extreme direction of Xn.

Proof. First, notice that dual cones An and Xn are bothproper polyhedral cones of Rn since A and X are so. Let an

Ebe a nonzero extreme direction of An. According toProperty 2-(iv), an

E is orthogonal to one of the facets ofA. The latter facet contains, according to Lemma 5, a facetof X which will be denoted by F . Then an

E is alsoorthogonal to F . Again according to Property 2-(iv), F isorthogonal to one of the extreme directions of Xn whichwill be denoted by xn

E . Since F has dimension n�1, anE and

xnE are necessarily collinear. Furthermore, an

E and xnE must

e2

s1

s2

s6

s5

s4

e1

e3 s3

Fig. 1. The columns (the si’s) of a source matrix that verify Hypothesis 2.

3 If (x,�x 2 coneðAÞ then we must have x¼ Aa and �x¼ Aa0 for some

a,a0k0. It follows that A�1xk0 and �A�1xk0 and then x¼0.4 See [16] p. 124.5 The simplicial cone generated by the standard basis of Rn .

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–27842778

Author's personal copy

belong to the same ray. Indeed, since XDA (Lemma 3), byProperty 2-(iii), we obtain Xn+An. Then both xn

E and anE

are in Xn. Then xnE and an

E cannot have opposite directionsotherwise Xn will not be pointed. Thus, an

E is also anextreme direction of Xn. &

Lemma 6 suggests that for simplicial cone An to bedetermined, one has to select n extreme directions amongthose of Xn. There are, in general, many such combina-tions since Xn may have more than n extreme directions.It follows that Hypothesis 2 does not guarantee theuniqueness of the solution to Problem 1. Nevertheless,it constrains the number of solutions to be finite. Moreprecisely, we have mn

n solutions which simultaneouslyverify Hypotheses 1 and 2, where mn denotes the numberof extreme directions of Xn.

So far, we only have a face description of Xn given by(4), since X is the only known matrix in Problem 1. Wehave, therefore, to obtain a vertex description of Xn con-sisting of a matrix Xn whose columns are the extremedirections of Xn. The latter matrix can be obtained from X

by means of one of the existing algorithms that convert aface description to an equivalent vertex description [17,18].

Now, we can state the main theoretical result of thispaper.

Theorem 7. Under Hypotheses 1 and 2, every row of S is

collinear to a row of XnT X.

Proof. A has n extreme directions since it is simplicial.Then, by (5), we obtain An

¼ ATy, where An denotes aminimal generating matrix of An. And since A is squareand nonsingular, we obtain A¼ ðAn�1

ÞT . By (1), we get

S¼ AnT X. On the other hand, Lemma 6 suggests that eachcolumn of An is collinear to a column of Xn and then, eachrow of AnT is collinear to a row of XnT , we deduce that eachrow of S is collinear to a row of XnT X. &

At this stage, having found a set of candidate sourcesignals stored in the rows of XnT X, we need a supplemen-tary criterion in order to select those corresponding to thetrue source signals. We propose to select n candidatesource signals based on the generalized coherence (GC)estimate [19]. This statistic is widely used as a measure oflinear statistical dependence, a limited form of statisticaldependence. As it has been shown in [19], the GC estimateis invariant to signal amplitude scaling and permutation.As a result, this criterion does not differentiate betweensolutions that are equivalent in the sense of BSS and this isa good property. Furthermore, from the definition ofProblem 1, one can deduce that the source matrix S mustbe full rank, which requires the linear independence of thesource signals. The GC estimate favors such signals. Noticealso that the set of candidate source signals, i.e. the rows ofXnT X, have not been obtained based on any statisticalindependence hypothesis. Thus, these signals may bestatistically dependent, and the GC estimate is only useda posteriori to select the true sources among these signals.

The generalized coherence estimate between the rowsof S is defined by

GðSÞ ¼ 1�detðSST

ÞQni ¼ 1 JsiJ

2ð6Þ

where the si, i : 1, . . . ,n are the rows of S. Note that thenumerator of (6) is the Gram determinant associatedwith matrix S. The properties of the Gram determinant(see [20]) imply that 0rGðSÞr1. The signals stored in therows of a candidate S matrix are the most linearlyindependent if the associated generalized coherence esti-mate is the closest to zero. GðSÞ is zero when the rows of S

are pairwise orthogonal. Conversely, a value of GðSÞ closeto 1 reveals nearly linearly dependent source signals.

Let S be any of the n�m matrices obtained by select-ing n rows among those of XnT X. Then, as an estimate ofthe source matrix, we propose the matrix ~S that mini-mizes the G cost function. That is

Hypothesis 3. ~S ¼ arg minSGðSÞ.

Once we have determined an estimate of sourcematrix ~S, we can deduce an estimate of the mixing matrixaccording to (1), that is ~A ¼ X ~S

y, where ~S

ydenotes the

right pseudo-inverse of ~S.

5. Algorithm

The DEDS algorithm (for Dual Extreme Direction-basedSeparation) (see Function 1) is an implementation of theabove-described blind source separation method. Themethod supposes that there are, at least, as many obser-vations as there are sources. If the number of observationsis in excess of the number of sources, we can first proceedto a principle component analysis (PCA) and then restrictthe operations of DEDS to the first n principal componentsof the input matrix X.

The DEDS algorithm involves two main steps. The firststep consists in calling the DUAL function which computesXn, a matrix that generates the dual of the data cone. Thisstep can be achieved by the Oðmn2mnÞ algorithm proposedin [18] where mn denotes the number of extreme direc-tions of Xn. It can also be performed by the double

description method [17] for which there is an implementa-tion running in Oðmbn=2cÞ steps. In our case the DUAL

function is based on this latter method. More precisely,we used the Skeleton implementation of the doubledescription method which is available at [21].

The second step consists in calling the SELECT functionwhich extracts an estimate of the source matrix byselecting n rows from those of the matrix product XnT X.According to Hypothesis 3, the n selected rows must beassociated with the minimum value of the G function.Finding a globally optimal solution is a computationallydifficult task (we suspect that it is NP-hard).

A very similar problem has already been described in[22]. The authors proposed a low-complexity iterativealgorithm for selecting a ‘‘good’’ basis from a given set of p

vectors of Rm, with mrp. Their algorithm is intended toselect a basis associated with the maximum absolutedeterminant over the p

m possible basis. In these settings,the problem is relevant only if the set of input vectorsspans Rm, otherwise the maximum of the absolute deter-minant is trivially zero. The proposed algorithm proceedsin two phases: the first phase is a greedy iterative processbased on Gram–Schmidt orthogonalization. The second

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–2784 2779

Author's personal copy

phase is a one by one replacement process based onCramer’s rule for determinants. For efficiency reasons, weadapted the approximation algorithm presented in [22] toour source selection problem rather than using an exactbut time-consuming algorithm.

In the present context, the set of input vectors are therows of XnT X, which are vectors of Rm. These vectors spanRm only in the trivial case where n¼m since rankðXnT XÞrrankðXÞrn. Consequently, our aim is rather to look for a‘‘good’’ basis of a n-dimensional subspace of Rm. In thissituation, the absolute determinant cannot be used asselection criterion since we may have nom. Hence theuse of the G function instead.

By (6), finding the submatrix ~S of XnT X that minimizesGðSÞ amounts to finding the one that maximizes thenormalized Gram determinant. Using the Gram–Schmidtorthogonalization, we get

detðSSTÞ ¼

Yn

i ¼ 1

JriJ2

ð7Þ

where the ri’s are iteratively defined as the differencebetween a row si of S and its orthogonal projection on therange of r1,r2, . . . ,ri�1

ri ¼ si�Xi�1

k ¼ 1

sirTk

JrkJ2

rk ð8Þ

The first phase of the selection algorithm proceeds in agreedy way. At iteration i, it selects one of the row ~si ofXnT X that satisfies

~si ¼ arg maxsk

JrkJ

JskJð9Þ

where sk, k : 1, . . . ,m denotes any row of XnT X. At a firstglance, ~s1 might be any of the rows of XnT X since, by (8),all these vectors give a ratio equal to one. Nonetheless, thechoice of ~s1 has a great impact on the quality of theoverall solution. This is because a bad choice of ~s1 maylead to a high GðSÞ value at the end of the optimizationprocess. The clue to a good chose is to determine thedirection (a row of XnT X) which is the most linearlyindependent from the others. Such a direction might beapproximated by calculating the mean vectors ¼ ð1=mÞ

Pmk ¼ 1 sk. And then, by projecting all the sk’s on

s and choosing as ~s1 the sk that gives the highest relativeresidual norm

~s1 ¼ arg maxsk

Jsk�sksT

JsJ2sJ

JskJð10Þ

Once the first vector ~s1 is chosen, the n�1 remainingvectors are selected according to (9) and form a firstestimate of the source matrix ~S ¼ ð~s1, ~s2, . . . , ~snÞ.

The complexity of the selection process requires, atmost, Oðn2m2Þ steps. Indeed, it includes n steps of m linearprojections of vectors in Rm on a basis containing, at most,n�1 vectors.

Now we turn to the second phase of the selectionprocess whose aim is to decrease further the cost of thematrix selected in the first phase. The process iterativelyreplaces some of the selected vectors by others which arecurrently not selected. Only one vector is replaced at a

time. The replacements rely on a generalization of theCramer’s rule to Gram determinants. As X and Xn are full-rank, the first phase yields an ~S matrix whose rank is alson. Therefore, the rows of ~S form a basis for the subspacespanned by the rows of XnT X. Cramer’s rule for determi-nants can therefore be written as

9ai,k9¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidetð ~Si’k

~ST

i’kÞ

detð ~S ~STÞ

vuut

where ~Si’k denotes the matrix obtained from ~S byreplacing its ith row ~si by sk and ai,k denotes the coeffi-cient associated with ~si when expressing sk in the basiscomposed by the rows of ~S. Our aim is to get an increaseof the normalized Gram determinant intervening in thecost function given by (6). Hence, the condition for areplacement to be valid is that

9ai,k94JskJ

J~siJ

The one by one replacement process stops when apredefined number of iterations is reached or when novector pair satisfies the replacement condition.

The two-phase selection algorithm described abovehas experimentally proved to give satisfactory sub-opti-mal solution in short times. In the pseudo-code of DEDS, itis implemented within function SELECT, which is called inthe third step of DEDS and whose outputs are the selectedsource signals.

Finally, an estimate of the mixing matrix is obtainedvia (1).

Function 1. DEDS ðXÞ-ð ~A, ~SÞ

Step 1. n’rankðXÞ

Step 2. Xn’ DUAL ðXÞ

Step 3. ~S’ SELECT ðXnT X,nÞStep 4. ~A’X ~S

y

6. Results

In order to evaluate the performances of the algorithmpresented in this paper (DEDS), two series of experimentswere carried out. The first series involves four one-dimensional source signals which are not completelyindependent. The second series deals with two-dimen-sional signals consisting of six highly dependent humanface images.

6.1. One-dimensional signals

The blind source separation results obtained by DEDS

were compared with those obtained by JADE [23]. Thelatter method has been chosen because it has proved to beable to effectively separate a wide variety of signalswithout resorting to parameter tuning. In particular,although it is an ICA method, JADE is able to separatecertain statistically dependent signals as it has beenexplained in [24].

The real source signals under consideration have twoorigins: (1) nuclear magnetic resonance (NMR) spectra

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–27842780

Author's personal copy

resulting from the analysis of chemical organic com-pounds [1]. Each of the four source signals containsm¼7000 samples. (2) Four of the EEG signals availableon the ICALAB site [25] sampled over m¼512 frequencyslots after conversion to the frequency domain.

The source signals were mixed according to the modeldescribed by (2). The mixing matrices were generated atrandom. Their coefficients are normally distributed aroundzero with a unit variance. To show the impact of themixing process on the performance of each algorithm, twogroups of mixing matrices were considered. In each group,the absolute determinant DðAÞ of the mixing matrices wastaken within a predefined interval: 0:570:1 or 0:00570:001. Note that a nearly singular mixing matrix yields, ingeneral, a hard to separate mixture set.

The additive noise was generated to be white andGaussian with uncorrelated samples whose variance wasassumed to be uniform over the rows of matrix N. Thesignal-to-noise ratio (SNR) was varied from 2 to 20 dB bya step of 2 dB. Each point appearing in the graphics ofFig. 2 corresponds to the mean value of the performanceindex obtained on 1000 problem instances.

As a measure of algorithm efficacy, we used the Amariperformance index [7]. It is a real number within the [0,1]

interval, which is computed as follows from the truemixing matrix A and the estimated one ~A

dðA, ~AÞ ¼1

2ðn�1Þ

Xn

i ¼ 1

Xn

j ¼ 1

9gij92

maxk

9gik92�1þ

Xn

j ¼ 1

9gji92

maxk

9gki92�1

0B@

1CA

ð11Þ

where gi,j denotes a component of the matrix G¼ A ~A�1

.An efficient separation process corresponds to an Amariperformance index that is close to zero.

Fig. 2 depicts the variation of the Amari performanceindex from various experiments. Each figure correspondsto one of the instance groups described above. In eachgraph, the source signals and the quality of the mixingprocess are the same but the signal-to-noise ratio varies.

By examining the maximum and minimum points ofthe various curves, it appears that the quality of the mixingprocess is the most influent parameter. Indeed, the perfor-mance of each algorithm significantly degrades as theabsolute determinant of the mixing matrices decreases.On the other hand, the relevance of the signal-to-noiseratio is obvious with the exception of an unexplainedbehavior of JADE observed on the EEG instances with

2 4 6 8 10 12 14 16 18 200.04

0.06

0.08

0.1

0.12

0.14

0.16

Mean of Amari performanceindex on NMR signals: Δ(A) = 0.5

SNR (dB)2 4 6 8 10 12 14 16 18 20

0.1

0.12

0.14

0.16

0.18

0.2

0.22

Mean of Amari performanceindex on EEG signals: Δ(A) = 0.5

SNR (dB)

2 4 6 8 10 12 14 16 18 200.22

0.24

0.26

0.28

0.3

0.32

0.34

Mean of Amari performanceindex on NMR signals: Δ(A) = 0.005

SNR (dB)2 4 6 8 10 12 14 16 18 20

0.3

0.31

0.32

0.33

0.34

0.35

0.36

0.37

Mean of Amari performanceindex on EEG signals: Δ(A) = 0.005

SNR (dB)

Am

ari P

erfo

rman

ce In

dex

Am

ari P

erfo

rman

ce In

dex

Am

ari P

erfo

rman

ce In

dex

Am

ari P

erfo

rman

ce In

dex

Fig. 2. Mean of Amari performance index obtained with DEDS (squares) and JADE (circles) on randomly mixed real life signals (NMR signals in left column

and EEG signals in right column). The random mixing matrices have their absolute determinant varying in the intervals 0:570:1 (row 1) and

0:00570:001 (row 2).

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–2784 2781

Author's personal copy

DðAÞ ¼ 0:570:1: the performance index weakly rises withthe SNR after having reached a minimum at 12 dB.

When DEDS processes good quality mixtures, it almostachieves better than JADE. However, the trend is partiallyinverted for the separation of low quality NMR signalmixtures (DðAÞ ¼ 0:00570:001). This observation sug-gests that JADE is less sensitive to a degradation of themixing process. Nonetheless, a compensation phenom-enon seems to take place with the inherently moredependent EEG signals. It is tempting to infer that theperformance deterioration of DEDS, caused by a bad mix-ing, could be balanced by a poor performance of JADE,caused by the statistical dependence of source signals. Onthe other hand, it is rather difficult to draw any generalconclusion concerning the effect of the signal-to-noiseratio on the performance ratio of the two algorithms. Theeffect of the signal-to-noise ratio seems to be hidden bythe effect of the mixing process quality.

Finally, it is worth noting that for these one-dimensionalsignals, we also experimented with the CAMNS-LP [9] andthe MVES [11] algorithms. Unfortunately, the experiment,

which is not reported here, revealed that the performancesof these two algorithms are degraded at high noise levels.Indeed, on the noisy instances (SNR r10 dB) considered inthe experiment, both algorithms have often failed to retrievethe true source signals.

6.2. Two-dimensional signals

Our algorithm is compared with two other algorithmsfor image separation, namely FP-NMF [26] and CAMNS-LP [9].The former is a nonnegative matrix factorization (NMF)algorithm and the latter relies on a geometric approach.Both algorithms are discussed later in Section 2.

Six human face images were considered in this study;they are part of the NMFLAB toolbox at [27]. Each image isreshaped as a row vector by scanning it from top leftto bottom right. The resulting rows are randomly mixedusing matrices whose entries follow a zero mean andunit variance normal distribution. The sign of negativecoefficients is simply ignored in order to obtain nonnega-tive mixing matrices. This requirement was added for two

Fig. 3. Human face separation: (row 1) sources, (row 2) well-mixed noiseless observations; extracted sources obtained by (row 3) FP-NMF; (row 4) CAMNS-

LP and (row 5) DEDS.

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–27842782

Author's personal copy

reasons: it enables the mixtures to be visualized as imagesand NMF-based algorithms require nonnegative mixtures.

Again, a distinction was introduced between two kindsof mixing process: ‘‘good’’ ones and ‘‘poor’’ ones, whichdiffer by the properties of the mixing matrices. A firstmixing process involves a six-by-six matrix noted AG

which is well-conditioned (condðAGÞ ¼ 13:87) and clearlynon-singular (detðAGÞ ¼ �2:99). A second mixing processinvolves a six-by-six matrix denoted AP which is ill-conditioned (condðAPÞ ¼ 2:07� 104) and nearly singular(detðAPÞ ¼ 7:74� 10�6). Then two groups of syntheticimages are formed: well-mixed noiseless images andpoorly-mixed noisy images. In the noisy case, a 20 dBwhite Gaussian noise is added over all the pixels of themixed images. As a measure of algorithm performance,we used a scaled distance between the initial sourcesand the computed ones, which reflects the gray-scaleerror per pixel

dðS, ~SÞ ¼minp2Pn

Pni ¼ 1 Jsi�~spi

J2

mnð12Þ

where Pn denotes the set of all permutations of theintegers from 1 to n. In this criterion, m corresponds tothe number of pixel per image.

The results obtained by FP-NMF, CAMNS-LP and DEDS onthe two groups of observed images are given in Figs. 3 and4. As a first and global observation, it appears that thesource images computed by FP-NMF are clearly moredegraded than those computed by CAMNS-LP or DEDS. Thiscan be confirmed by the scaled distance index whosevalues are given in Table 1 for every algorithm and forthe two groups of mixed images. The values obtained by

Fig. 4. Human face separation: (row 1) sources, (row 2) poorly-mixed and noisy observations; extracted sources obtained by (row 3) FP-NMF; (row 4)

CAMNS-LP and (row 5) DEDS.

Table 1Performance evaluation of FP-NMF, CAMNS-LP and DEDS on human face

images. The values reported for the noisy cases are the mean values

obtained on ten instances samples.

Algorithms Well-mixed & noiseless Poorly-mixed & noisy

FP-NMF 0.0825 0.1354

CAMNS-LP 0.0189 0.0902

DEDS 0.0034 0.0617

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–2784 2783

Author's personal copy

FP-NMF are always greater than those obtained by the twoother algorithms. From now on, the discussion deals withCAMNS-LP and DEDS only. In the well-mixed and noiselesscase (see Fig. 3), both algorithms computed preciseapproximations for almost all the source images. How-ever, the observation of the fifth source image reveals thesuperiority of DEDS over CAMNS-LP. Indeed, the one com-puted by DEDS is clearly more akin to the fifth true sourceimage than the one computed by CAMNS-LP.

Finally, on the group composed of the poorly-mixedand noisy images, which are the hardest to separate,DEDS clearly achieves better even though some estimatedimages are severely degraded. The superiority of DEDS overCAMNS-LP is particularly visible on the third, the fourth andthe sixth source, which were very badly estimated byCAMNS-LP whereas DEDS succeeded to obtain an acceptableapproximation.

7. Conclusion

This paper presents a new blind source separationalgorithm called DEDS that can be applied to nonnegativeand statistically dependent source signals. The blindseparation problem was expressed as the identificationof relevant facets of the data cone. This task is achieved byan effective algorithm which first computes the dual ofthe data cone and then selects, by means of an improvedgreedy process, a promising subset of source signals froma set of source candidates.

The application of the DEDS algorithm to simulated BSSproblem instances involving real-world one-dimensionaland two-dimensional source signals showed that itsperformance is highly competitive with some the mostpopular blind separation algorithms. Further progressmight be carried on, notably by weakening the assump-tion on the distribution of the source vectors over thenonnegative orthant. It is also conceivable to extend themethod to cope with source signals including negativereal valued signals.

References

[1] D. Nuzillard, J.-M. Nuzillard, Application of blind source separationto 1-D and 2-D NMR spectroscopy, IEEE Signal Processing Letters 5(1998) 209–211.

[2] W. Fu, X. Yang, N. Liu, X. Zeng, Blind source separation algorithm forcommunication complex signals in communication reconnaissance,Frontiers of Electrical and Electronic Engineering in China 3 (3)(2008) 338–342.

[3] V.P. Pauca, F. Shahnaz, M.W. Berry, R.J. Plemmons, Text miningusing nonnegative matrix factorizations, in: Proceedings of theFourth SIAM International Conference on Data Mining (SDM’04),Lake Buena Vista, FL, USA, 2004, pp. 452–456.

[4] F.J. Theis, Blind signals separation into groups of dependent signalsusing joint block diagonalization, in: International Symposium onCircuits and Systems, vol. 6, 2005, pp. 5878–5881.

[5] P. Comon, Independent component analysis: a new concept? SignalProcessing 36 (1994) 287–314.

[6] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, E. Moulines, A blindsource separation technique using second-order statistics, IEEETransactions on Signal Processing 45 (1997) 434–444.

[7] S. Amari, A. Cichocki, H.H. Yang, A new learning algorithm for blindsignal separation, in: Proceedings of the Advances in NeuralInformation Processing Systems, vol. 8, 1996, pp. 757–763.

[8] W. Naanaa, J.-M. Nuzillard, Blind source separation of positive andpartially correlated data, Signal Processing 85 (9) (2005) 1711–1722.

[9] T.-H. Chan, W.-K. Ma, C.-Y. Chi, Y. Wang, A convex analysis frame-work for blind separation of non-negative sources, IEEE Transac-tions on Signal Processing 56 (2008) 5120–5134.

[10] J. Li, J.J. Bioucas-Dias, Minimum volume simplex analysis: a fastalgorithm to unmix hyperspectral data, in: Proceedings of the IEEEInternational Geoscience and Remote Sensing Symposium, Boston,MA, vol. 4, 8–12 August 2008, pp. 2369–2371.

[11] T.-H. Chan, C.-Y. Chi, Y.-M. Huang, W.-K. Ma, A convex analysisbased minimum-volume enclosing simplex algorithm for hyper-spectral unmixing, IEEE Transactions on Signal Processing 57 (11)(2009) 4418–4432.

[12] A.K. Barros, The independence assumption: dependent componentanalysis, in: M. Girolami (Ed.), Advances in Independent ComponentAnalysis, Springer-Verlag Scientific Publishers, 2000, pp. 63–71.

[13] P. Paatero, U. Tapper, Positive matrix factorization: a nonnegativefactor model with optimal utilization of error estimates of datavalues, Environmetrics 5 (2) (1994) 111–126.

[14] D. Donoho, V. Stodden, When does nonnegative matrix factoriza-tion give a correct decomposition into parts? in: Advances inNeural Information Processing Systems 16, Vancouver, Canada,2003.

[15] M.D. Craig, Minimum-volume transforms for remotely sensed data,IEEE Transactions on Geoscience and Remote Sensing 32 (3) (1994)4418–4432.

[16] J. Dattorro, Convex Optimization & Euclidean Distance Geometry,Meboo Publishing, USA, 2005.

[17] T. Motzkin, H. Raiffa, G. Thompson, R.J. Thrall, The Double Descrip-tion method. Annals of Math Studies, vol. 8, Princeton UniversityPress, 1953, pp. 51–73.

[18] D. Avis, K. Fukuda, A pivoting algorithm for convex hulls and vertexenumeration of arrangements and polyhedra, Discrete & Computa-tional Geometry 8 (1992) 295–313.

[19] D. Cochran, H. Gish, D. Sinno, A geometric approach to multiple-channel signal detection, IEEE Transactions on Signal Processing 43(9) (1995) 2049–2057.

[20] S.S. Dragomir, B. Mond, On a property of Gram’s determinant,Extracta Mathematicae 11 (2) (2006) 282–287.

[21] N.Y. Zolotykh, Skeleton: An Implementation of the Double Descrip-tion Method /http://www.uic.unn.ru/�zny/skeleton/S.

[22] I. Ilani, R. Zamir, Self basis selection in a finite set, in: 23rd IEEEConvention of Electrical and Electronics Engineers in Israel, 2004,pp. 102–105.

[23] J.-F. Cardoso, A. Souloumiac, Blind beamforming for non Gaussiansignals, IEE Proceedings: F 140 (6) (1993) 362–370.

[24] M. Castella, P. Comon, Blind separation of instantaneous mixturesof dependent sources, in: Proceedings of the ICA’2007, London, UK,September 2007, Lecture Notes in Computer Science, vol. 4666, pp9–16.

[25] A. Cichocki, S. Amari, K. Siwek, T. Tanaka, et al., ICALAB Toolboxes/http://www.bsp.brain.riken.jp/ICALABS.

[26] A. Cichocki, R. Zdunek, S. Amari, New algorithms for non-negativematrix factorization in applications to blind source separation, in:IEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP2006), Toulouse, France, May 2006.

[27] A. Cichocki, R. Zdunek, NMFLAB: A MATLAB Toolbox for Non-Negative Matrix Factorization /http://www.bsp.brain.riken.jp/ICALAB/nmflab.htmlS.

W. Naanaa, J.-M. Nuzillard / Signal Processing 92 (2012) 2775–27842784


Recommended