+ All documents
Home > Documents > Improved nuisance attribute projection for face recognition

Improved nuisance attribute projection for face recognition

Date post: 10-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
10
THEORETICAL ADVANCES Improved nuisance attribute projection for face recognition Ariel Yifrach Eitan Novoselsky Yosef A. Solewicz Yitzhak Yitzhaky Received: 6 September 2012 / Accepted: 22 July 2014 Ó Springer-Verlag London 2014 Abstract The illumination variation is one of the well- known problems in face recognition under uncontrolled environments. Several techniques have been presented in the literature to cope up with this problem. Lately, a technique known as Nuisance Attribute Projection (NAP), originally developed for the speaker recognition field was introduced to image processing in order to compensate for luminance artifacts. This paper extends and improves the earlier work by exploring efficient methodologies for using NAP for face recognition under varied illumination con- ditions. In particular, we propose a modified NAP formu- lation and show that NAP training can be simplified for face recognition. Additionally, we suggested a compact framework merging between NAP compensation and ei- genface recognition. A series of experiments using the extended YaleB database, and a cross-validation using the PIE CMU and the Oulo databases are performed to validate our proposals. Keywords Face recognition Nuisance Attribute Projection Principle Component Analysis Support Vectors Machine 1 Introduction Current face recognition systems allow satisfactory iden- tification of individuals under constrained conditions [1]. However, images produced under uncontrolled conditions severely limit successful identification. Such problems are typically related to disparities in lighting and pose condi- tions [2]. In particular, variation in the illumination con- ditions should cause dramatic changes in the face appearance and thus can be considered as one of the challenging problems that a practical face recognition system needs to address. Such variations in the face appearance can be much larger than the variation caused by personal identity. In other words, images of different faces can appear more similar than images of the same face captured under extreme illumination variations. The prob- lem is even accentuated, when the face recognition system is a fully automated process, since the illumination varia- tion is usually coupled with other uncontrolled conditions, such as pose or expression variability and occlusion by other objects. The relation between the illumination and the informa- tive image structure, as perceived by the human visual system was modeled by Land and McCann [3] in their Lightness and Retinex (LR) model, which states that in each visual color channel the intensity signal can be modeled as a product of the so-called luminance and reflectance functions: I ðx; yÞ¼ Rðx; yÞLðx; yÞ; ð1Þ where L(x,y) represents the amount of illumination and R(x,y) represents the reflectivity of the object’s surface, at each point (x,y). Several practical illumination compensa- tion techniques based on the LR model were proposed in order to extract the reflectance R(x,y) from the image A. Yifrach (&) E. Novoselsky Y. Yitzhaky Department of Electro-Optics Engineering, Ben-Gurion University, Beer Sheva, Israel e-mail: [email protected] A. Yifrach Y. A. Solewicz Technology Section, Israel National Police, Jerusalem, Israel 123 Pattern Anal Applic DOI 10.1007/s10044-014-0388-4
Transcript

THEORETICAL ADVANCES

Improved nuisance attribute projection for face recognition

Ariel Yifrach • Eitan Novoselsky • Yosef A. Solewicz •

Yitzhak Yitzhaky

Received: 6 September 2012 / Accepted: 22 July 2014

� Springer-Verlag London 2014

Abstract The illumination variation is one of the well-

known problems in face recognition under uncontrolled

environments. Several techniques have been presented in

the literature to cope up with this problem. Lately, a

technique known as Nuisance Attribute Projection (NAP),

originally developed for the speaker recognition field was

introduced to image processing in order to compensate for

luminance artifacts. This paper extends and improves the

earlier work by exploring efficient methodologies for using

NAP for face recognition under varied illumination con-

ditions. In particular, we propose a modified NAP formu-

lation and show that NAP training can be simplified for

face recognition. Additionally, we suggested a compact

framework merging between NAP compensation and ei-

genface recognition. A series of experiments using the

extended YaleB database, and a cross-validation using the

PIE CMU and the Oulo databases are performed to validate

our proposals.

Keywords Face recognition � Nuisance Attribute

Projection � Principle Component Analysis � Support

Vectors Machine

1 Introduction

Current face recognition systems allow satisfactory iden-

tification of individuals under constrained conditions [1].

However, images produced under uncontrolled conditions

severely limit successful identification. Such problems are

typically related to disparities in lighting and pose condi-

tions [2]. In particular, variation in the illumination con-

ditions should cause dramatic changes in the face

appearance and thus can be considered as one of the

challenging problems that a practical face recognition

system needs to address. Such variations in the face

appearance can be much larger than the variation caused by

personal identity. In other words, images of different faces

can appear more similar than images of the same face

captured under extreme illumination variations. The prob-

lem is even accentuated, when the face recognition system

is a fully automated process, since the illumination varia-

tion is usually coupled with other uncontrolled conditions,

such as pose or expression variability and occlusion by

other objects.

The relation between the illumination and the informa-

tive image structure, as perceived by the human visual

system was modeled by Land and McCann [3] in their

Lightness and Retinex (LR) model, which states that in

each visual color channel the intensity signal can be

modeled as a product of the so-called luminance and

reflectance functions:

Iðx; yÞ ¼ Rðx; yÞLðx; yÞ; ð1Þ

where L(x,y) represents the amount of illumination and

R(x,y) represents the reflectivity of the object’s surface, at

each point (x,y). Several practical illumination compensa-

tion techniques based on the LR model were proposed in

order to extract the reflectance R(x,y) from the image

A. Yifrach (&) � E. Novoselsky � Y. Yitzhaky

Department of Electro-Optics Engineering, Ben-Gurion

University, Beer Sheva, Israel

e-mail: [email protected]

A. Yifrach � Y. A. Solewicz

Technology Section, Israel National Police, Jerusalem, Israel

123

Pattern Anal Applic

DOI 10.1007/s10044-014-0388-4

I(x,y) [4, 5]. According to the assumptions made to solve

Eq. (1), which are widely accepted and used [3, 5–7],

L(x,y) varies slowly [the low frequency component in

I(x,y)] while R(x,y) can change abruptly (the high fre-

quency component) thus the edges in the image correspond

to R(x,y) [3]. Based on that, it was proposed to extract

R(x,y) by high pass filtering of the image [6], or according

to (1), through a division of the original image by the low-

pass luminance component [3, 4]. The Quotient Image (QI)

[8], which is defined as the division between a test image

and a linear combination of three non-coplanar illuminated

images, was designed for dealing with illumination varia-

tion in face recognition. A Self-Quotient Image model was

proposed [9, 10] to extend the QI theory by computing the

ratio between the test image and its smoothed version [11]

computed by anisotropic filter. However, due to the

anisotropic nature of the employed smoothing filter, flat

zones in the images are not smoothed properly [12].

Unlike these methods, in order to extract the reflectance,

the NAP technique that will be detailed later removes the

subspace spanning the low-frequency within-class image

components associated to the luminance artifacts, L(x,y). A

significant benefit of the NAP is that the compensation is

done fast, since the image is projected into the pre-com-

puted NAP subspace and back, and image smoothing is not

needed.

Numerous illumination invariant face recognition

approaches have been proposed in the literature, tradi-

tionally categorized as being either passive or active [2].

Passive approaches overcome the illumination dissimilarity

by studying the final images produced by the imaging

system [13–15], in which face appearance has been altered

by illumination variations. Active approaches overcome

the illumination variation problem by employing active

imaging techniques such us optical filters, active illumi-

nation sources, etc. [16, 17]. Passive approaches can further

roughly classified into three main categories [18]: (1)

Illumination invariant feature extraction methods, which

attempt to identify the R(x,y) component [Eq. (1)], that is

then used for face recognition, as described above; (2)

Photometric normalization, and (3) 3-D face modeling.

Face image modeling can be based on illumination varia-

tion such as: Principal component analysis (PCA), Eigen-

faces [19], and Linear Discriminant Analysis (LDA),

Fisherface [13], using a statistical model whereas no

assumption on the surface property is needed. When there

are assumptions about the surface property, such as Lam-

bertian reflectance, the models for the face images based on

illumination variation are termed physical models and are

far from expectations [20] since not only the performances

of most of such methods [21–25] are still far from ideal,

many of these methods require assumptions on the light

source and large volume of training data. In addition to

methods developed for face recognition there are methods

designed for removing illumination artifacts from general

images. The photometric normalization approach [category

(2) above] uses an image processing tool to normalize face

images under severe lighting conditions. Common methods

for illumination normalization are: Histogram equalization

[26], Logarithm transform [27] and Gamma Intensity

Correlation [15]. However, these techniques do not per-

form well when images have space variant lighting con-

ditions [28].

In this paper, we extend a recently introduced photo-

metric normalization technique for illumination-invariance

face recognition based on Nuisance Attribute Projection

(NAP) [29]. We propose and evaluate an improved NAP

formulation enhancing intra-subject modeling and in

addition, propose ways to improve NAP training. In par-

ticular, we propose a modification in the NAP formulation,

which takes into account actual image templates as refer-

ences instead of the regular approximations used so far.

Furthermore, we investigate optimal training set sizes and

introduce the use of synthetic samples to further reduce

training requirements. We also show that the NAP com-

pensation can be straightforwardly embedded in the Ei-

genface recognition process. Moreover, since NAP can be

seen as a specific compensation kernel, we propose to

further combine NAP with Support Vector Machines

(SVMs), inspired from the original NAP approach for

speaker recognition [30]. The proposed method is exam-

ined using the extended YaleB database [22], and a cross-

validation is carried out using the PIE CMU [31] and the

Oulo [32] databases.

The rest of this paper is organized as follows: In Sects. 2

and 3 the NAP and the SVM methods are presented,

respectively. The proposed face recognition system using

SVM and NAP is described in Sect. 4, and experimental

results are presented in Sect. 5. Conclusions are in Sect. 6.

2 Eigenfaces and NAP

PCA can be used to compactly represent images by finding

suitable low-rank class-dependent projections [33]. In

addition, efficient image recognition can be further

achieved using the compressed parameterization [19]. In

the field of face recognition, the principal eigenvectors

spanning the low-rank subspace are known as ‘‘Eigenfac-

es’’ and describe the inter- or between-class data variabil-

ity. In contrast to the Eigenface subspace which retains

most of the image information, the intra- or within-class

subspace normally retains undesirable nuisance variability.

In addition to Eigenfaces, several approaches were pro-

posed in order to optimize data discrimination in view of its

spatial distribution. Techniques based on LDA [13, 34] aim

Pattern Anal Applic

123

at searching for the directions that maximize the inter-class

scatter while simultaneously minimizing the intra-class

scatter. On the other hand, NAP [30] initially defines the

subspace based on the eigenvectors of the within-class

covariance and further projects the data onto the orthogonal

complement of this space.

Nuisance Attribute Projection was initially introduced as

a technique able to mitigate the problem of channel vari-

ability in the speaker recognition field [30] and lately

extended to face recognition [29]. In state-of-the-art

speaker recognition framework, conversations are mapped

into a single vector in a multi-dimensional sparse space.

This is accomplished by pooling the several spectral

components extracted from each conversation and then

pooling them into a single ‘‘super-vector’’ [35]. Similarly,

in image processing, images are typically parametrized into

a single multidimensional vector representing the intensity

of each pixel. This common parametrization approach thus

led to the recent successful application of NAP in image

recognition [29].

The NAP technique essentially learns on a development

set, the spatial differences among same class subjects and

their respective centroid. These intra-class nuisance data is

then pooled into a single matrix and the eigenvectors of the

corresponding covariance matrix are found. The low-rank

subspace spanned by the main eigenvectors is called the

nuisance space which should be projected-out from each

conversation or image before recognition.

Formally, msi

hjdenotes the super-vector of the ith subject

si (either his recording or image) at the jth session (hj),

where each session represents in our case, a different

illumination condition. Note that in order to reliably model

intra-subject variability, these sessions should encompass

different scenarios. Also, assume that we have S subjects,

and H sessions. First, a session-averaged super-vector ( �msi )

is calculated for each subject. This value is then removed

from all the corresponding examples to obtain their mean-

shifted versions:

~msi

hj¼ msi

hj� �msi ð2Þ

The matrix:

M ¼ ~ms1

h1� � � ~ms1

hH� � � ~msS

h1� � � ~msS

hH

h ið3Þ

represents all the intersession variations from the average

subject parameterization. We assume that the nuisance

characteristics (in our case, artificial illumination) can be

represented by a low-rank subspace spanned by the

eigenvectors with the highest eigenvalues of the covariance

matrix MMT: The resulting eigenvectors form a base to the

reduced noise subspace. The projection operation which

‘‘normalizes’’ any super-vector mx is then defined as,

P ¼ I �Xd0

i¼1

wiwTi ; ð4Þ

where P is a projection matrix of the NAP, I denotes the

n� n identity matrix, wi represents the i-th NAP direction

and d0 stands for the number of NAP directions.

3 SVM with NAP

SVM is one of the most popular methods for data classi-

fication. At the most basic level, SVMs are two-class

hyperplane based classifiers operating usually in a high-

dimensional space related nonlinearly to the original

(usually lower dimensional) input space [30]. Given an

observation x 2 X and a Kernel function K, an SVM

classifier is given by the sums of the kernel functions [36]:

f ðxÞ ¼XL

i¼1

aitiKðx; xiÞ þ d ð5Þ

where ti is the ideal output, ai and d are parameters opti-

mized through the training process. The vectors xi are

support vectors obtained from the training set by an opti-

mization process [37]. The ideal output is either 1 or -1,

depending upon whether the corresponding support vector

is in class 0 or class 1, respectively. For classification, a

class decision is based upon whether the value f ðxÞ is

above or below a threshold. The kernel Kð�; �Þ is con-

strained to have certain properties (the Mercer condition),

so that Kð�; �Þ can be expressed as:

Kðx; yÞ ¼ bðxÞtbðxÞ; ð6Þ

where bðxÞ is a mapping from the input space (where

x lives) to a possibly infinite-dimensional SVM expansion

space. For a separable data set, SVM optimization chooses

a hyperplane in the expansion space with maximum margin

[38]. The data points from the training set lying on the

boundaries are the support vectors in Eq. (5). The focus of

the SVM training process is to model the boundary

between classes.

Kernel design in SVM aims at finding an appropriate

metric enhancing the specific classification task. The SVM

NAP method [38] works by removing subspaces that cause

undesired variability in the features. NAP constructs a new

kernel:

Kðma;mbÞ ¼ PbðmaÞ½ �t PbðmbÞ� �

¼ bðmaÞPbðmbÞ ¼ bðmaÞtðI � vvtÞbðmbÞ;ð7Þ

where P is a projection (P2 ¼ P), v is the direction being

removed from the SVM expansion space, bð�Þ is the SVM

expansion, and vk k2¼ 1.

Pattern Anal Applic

123

4 Proposed NAP–SVM face recognition

The proposed face recognition method using NAP and

SVM can be summarized as shown in Fig. 1. In the offline

stage, eigenface vectors are obtained in addition to eigen-

vectors of the NAP subspace and the SVM classifier is

built. In the online stage, information produced in the

offline stage is used to project-out undesired low rank

luminance dimensions and after that, feature extraction is

performed based on the Eigenfaces computed in the offline

stage. At last, using SVM classifier parameters, multi-class

decision is performed to predict the identity of the face

image.

In the offline stage, a PCA algorithm is trained using

only face images of different people captured in controlled

conditions in order to obtain a clean face eigenspace. On

the other hand, NAP is trained using images obtained under

both controlled and uncontrolled illumination conditions in

order to capture illumination nuisance. Fusion between

SVM and PCA improves the classification performance by

Eigenfaces. Moreover, performing feature extraction (such

as PCA or Independent Component Analysis) before SVM

enables faster computations in the offline stage during the

building of the classifiers [39]. It is important to mention

that the PCA is performed with the face images in the

logarithmic domain. The computation in the logarithmic

domain is necessary since we projected-out NAP eigen-

vectors in the logarithmic domain also.

The SVM part in the offline stage is meant to build the

subject’s model. Here the training data are the coefficients

of Eigenfaces obtained by projection of the face images to

the face space. Finally, SVM classifier is trained with face

images in the training set with optimal SVM parameters.

In the online stage, first the face image with the illu-

mination artifact is passed NAP process, then Eigenfaces of

the given face image are extracted and finally the SVM

classifier decides on the identity of the object.

5 Experiments

5.1 Database organization

In order to study the challenges of the described NAP

normalization scheme we used the extended YaleB Face

Database [22]. This database contains frontal faces of 38

different subjects captured under 64 illumination condi-

tions. Each 64 images of a subject were partitioned into 5

unequal subsets, (S1 through S5), characterized with

extremely different illumination properties. As the first

subset S1 contains some images without illumination arti-

facts, the last subset, S5 contains images with the most

severe illumination artifacts. The challenge in this database

is to achieve good recognition rate in the S5 subset but still

to obtain maximum recognition rates in the other subsets.

For the experiment, the database was partitioned into two

different parts depending on the level of the experiment. In

general, the database is partitioned into two disjoint groups,

as follows:

A: B01–B10

B: B11–B38

where A is defined as the evaluation stage and B is

defined as the development stage. (This partition is justified

since A and B were produced in different stages). The B##

sign refers to the index of the subject in the database.

In order to train the NAP subspace we used the classes

labeled S1 through S5 of the B11–B38 subjects. Eigenfaces

were estimated using only the S1 subset of B11–B38.

Training models for each of the subjects in A were

accomplished using B as background examples for the

SVM classifier. Evaluation was conducted in separate for

each of the S2–S5 subsets, using the subjects of partition

A as impostors. At all, for each illumination subset, we

have around a tenfold increase in impostor trials. In fact,

evaluation performance is not affected by the unbalanced

data, since in these experiments thresholds are optimized a

posteriori based upon the soft SVM outputs.

5.2 NAP effectiveness

We would like to start with some intuitive insight into the

NAP functionality for photometric normalization. As an

example, Fig. 2a shows the projections of five subjects in

subset S4 on a 3-D eigenface subspace. For comparison,

Fig. 2b presents the corresponding eigenface projections

when the NAP subspace is initially removed from the

images. It can be clearly seen that the NAP operation

enhances discriminability in the Eigenface domain, a trend

spotted with all the subjects in the database. For instance,

the same experiment using S5 produced a very similar

graph. On the other hand, with subsets S2 and S3, theFig. 1 A schematic diagram of the proposed recognition system

Pattern Anal Applic

123

impact of illumination compensation is less visible graph-

ically. It can be seen in the example presented in Fig. 2b

that simple linear separators could classify the subjects

after the NAP.

Another interesting result can be observed by com-

paring the eigenface bases (Fig. 3a) to the NAP bases

(Fig. 3b). While Eigenfaces retain prototypical face tra-

ces, NAP bases seem to additionally incorporate proto-

typical illumination artifacts. Although theoretically

counterintuitive, it is clear that the eigenface (inter-space

variability) and NAP (intra-space variability) are corre-

lated. The correlation degree is quite strong between the

respective principal directions and decays for higher

principal components as can be seen in Fig. 4. In this

figure, we show the sorted correlation coefficients

between most correlated NAP-Eigenface bases. Actually,

this phenomenon was also previously addressed with

respect to speech data [40].

5.3 Improved NAP algorithm

The NAP framework requires ‘‘clean’’ reference images in

order to estimate the intra-class data variability. In practice,

this reference is approximated by the mean image of the

class, as in Eq. (2). As a matter of fact, as noted in Sect. 3,

the original NAP methodology was developed for channel

compensation. In this specific scenario, obtaining the ref-

erence pattern is not feasible and the mean is used as an

approximation. Nevertheless, in our case, clean reference

images are available and they could replace the averaged

image in Eq. (2), hopefully rendering a more precise

description of the intra-class scattering.

We initially performed an experiment to assess the best

NAP dimension in subspace projection. Table 1 shows

recognition accuracy in terms of the Equal Error Rate

(EER) for different NAP dimensions and illumination

conditions. It can be observed that for our settings,

Fig. 2 NAP impact on the eigenface domain, before (a) and after NAP (b)

Fig. 3 Main face (a) and NAP (b) eigenvectors

Pattern Anal Applic

123

removing around 16 dimensions achieves the best results

for most of the conditions.

Following these findings and using the methodology

described in Sect. 3, we used a NAP rank of 16 to compare

the detection performance of some known illumination

compensation approaches on all subset conditions. In par-

ticular, we compare our proposed NAP implementation

with the original implementation [29] on YaleB. In addi-

tion, we include three other popular normalization tech-

niques as baselines: the single scale retinex algorithm (SR)

[41], the wavelet-denoising-based method (WD) [42]

(evaluated using the Coif wavelets) and the Local Ternary

Patterns (LTP) which is a generalization of the Local

Binary Patterns (LBP) [43]. We used the toolbox in [44,

45] to implement the baseline methods, all of them with

default settings. Note that similar comparisons were also

performed in [29] but in a closed-set fashion, where the

same subjects used for testing were also used for obtaining

the Eigenfaces, which may bias the experiments. The

results are presented in Table 2, where NAP-Mean of [29],

and the proposed NAP-Reference refer to the two ways of

modeling the intra-class variability, suggest that our NAP

implementation improves recognition performance over

the original implementation in the hardest illumination

conditions (S4 and S5). Moreover, it is competitive with

the other state-of-art baseline schemes evaluated. In addi-

tion, the NAP approach is considerably less complex than

common photometric normalization techniques (as noted in

the Sect. 1) and can be naturally merged with the PCA

projection as shown in Sect. 5.6.

5.4 Reduced NAP training

In principle NAP is a data-driven approach meant to model

the intra-variability space. Therefore, we could expect that

the diversity of distinct subjects is less relevant than a

robust representation of the illumination artifacts available

for training the NAP. In this sense, we performed an

experiment to assess the recognition performance as a

function of the amount of distinct subjects used to train the

NAP. Recall that we use 64 illumination conditions for

each subject. Figure 6 shows recognition rates for the S5

subset for increasing number of subjects while training

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection Rank

Cor

rela

tion

Coe

ffici

ent

Fig. 4 Correlation degree

between Eigenfaces and NAP

eigenvectors

Table 1 EER accuracy measure for different NAP dimensions

NAP’s dimension S2 S3 S4 S5

0 (no NAP) 2.5 12.9 25.3 32.2

1 1.7 7.2 13.1 10.0

2 1.7 5.6 8.1 8.4

4 0.1 4.3 4.8 7.3

8 0.7 4.3 3.4 5.8

16 0.0 2.8 5.1 5.2

32 0.0 3.0 6.6 6.0

64 0.0 4.3 6.7 7.5

128 0.0 3.5 5.8 7.4

Best results in bold

Table 2 EER accuracy values for different illumination compensa-

tion techniques, using the Yale database

Method EERs (Yale database subsets)

S2 S3 S4 S5

No NAP 2.5 12.9 25.3 32.2

WD 5 4.4 3.3 3.7

SR 2.5 9.5 9.1 10.3

LTP 1.7 3.0 3.3 3.1

NAP–Mean 0.0 2.8 5.1 5.2

Proposed NAP–Reference 0.0 2.8 4.4 4.3

Pattern Anal Applic

123

NAP. The subjects are progressively accumulated from

B11 to B38.

Surprisingly, this experiment suggests that as far as the

data adequately models the illumination noise, even a

single subject is enough for NAP training. It is worth to

note that other authors [29] encourage the use of a large

diversity of subjects for efficient NAP training. We men-

tion that picking any random subject as the single class for

NAP training leads to similar recognition performance.

Motivated by the above mentioned results, we can go a

step further and investigate the possibility of discarding

even the single subject required to train NAP. We actually

propose to use synthetic images of a virtual subject in the

distinct illumination conditions.

For the following experiment, instead of synthesizing

such a face, we picked a single subject and applied dif-

ferent levels of low-pass filtering to each of his images in

order to blur his idiosyncrasies while leaving mostly the

illumination artifacts. We then computed NAP using this

single subject for training and performed the recognition

evaluation. This process was repeated for each subject and

illumination condition under different low-pass filtering

levels. The average and standard deviation of the EERs

obtained using the several reference subjects in distinct

filtering levels are shown in Table 3 for each illumination

condition. Image filtering was performed using Gaussian

low pass filters with different r values. An example of a

‘‘synthesized’’ subject can be seen in Fig. 7a. The results

show that this proposed simplified synthetic NAP approach

is quite competitive with the full NAP training. Moreover,

the relatively small standard deviations observed suggest

that the choice of the training subject for NAP is quite

irrelevant.

This experiment supports the robustness and simplicity

of the NAP approach. The low-pass filtering operation used

to derive the ‘‘synthetic’’ training set actually distorts sig-

nificantly the subject, and distorts to a much lesser extent

illumination details because they are already in the low

frequency components. This experiment suggests that as

far as we can roughly reproduce the expected illumination

artifacts to be dealt with, it is possible to derive the NAP

projection with virtually no real data available for training,

as can be seen by the irrelevance of the filtering level (r)

used to blur the subject.

5.5 Cross validation

A reliable data-driven compensation technique should be

able to properly generalize to unseen data. Although we so

far benchmarked NAP using disjoint subjects for training

and testing from the Yale database, the estimated perfor-

mance may be overoptimistically biased due to the com-

mon illumination setup used across subjects. In this

section, we assess the generalization capability of NAP,

using two different face databases. Initially, we used the

PIE CMU [31] and the Oulo [32] databases in order to train

the NAP matrix and further apply to the same testing

protocol on the Yale database as before. We used a man-

ually cropped and resized version of the 68 subjects imaged

0 5 10 15 20 2560

65

70

75

80

85

90

95

100

Number of subjects

Rec

ogni

tion

Rat

e (%

)

Fig. 6 Recognition rate for S5 from Yale B as a function of number

of subjects used to train NAP

Table 3 EERs (mean and standard deviation) for the proposed

‘‘Synthetic’’ NAP for distinct filtering levels and in different subsets

r S2 S3 S4 S5

Mean STD Mean STD Mean STD Mean STD

0.5 0.7 0.3 4.5 0.7 7.3 1.5 8.4 1.8

1 0.7 0.4 4.6 0.9 7.6 1.6 8.2 2.1

2 0.7 0.3 4.6 0.7 7.3 1.6 7.8 2.0

4 0.6 0.3 4.1 0.6 6.8 1.7 7.4 1.8

8 0.5 0.3 4.6 0.7 7.6 1.8 8.1 1.9

16 0.4 0.3 6.0 1.2 10.3 2.8 8.8 1.6

Fig. 7 a A face image captured under strong non-uniform illumina-

tion; b the low pass version of the image in forming a synthetic

subject

Pattern Anal Applic

123

in 21 different illumination conditions from the PIE data-

base to model the illumination nuisance into the NAP

matrix. This cropping was performed in order to match the

image scaling with respect to Yale, and did not affect the

facial features. Similarly, we investigated an additional

extreme setup using only 16 varied illumination images of

a unique subject of the Oulo database for estimating the

NAP parameters. For a fare comparison focused on the

illumination compensation factor, we kept the same Ei-

genface set and SVM setup used in the previous

experiments.

Table 4 compares recognition performances obtained

for the distinct illumination partitions S1–S5 using the

aforementioned databases for NAP training. Note that for

Oulo, NAP dimensionality is severely limited by the low

number of images used for training the nuisance space and,

therefore, results are presented for two different NAP

dimensions.

This experiment suggests that NAP compensation is

generally robust to mismatched training data for different

illumination conditions. Moreover, even a minimum

diversity of training data (a few images of a single subject

from Oulo database) achieves satisfactory performances,

which corroborate our findings in Sect. 5.4. These results

are particularly encouraging since resolution, cropping and

illumination settings are quite different among the dat-

abases evaluated. Finally, note that possibly due to over-

fitting, lower NAP dimensions improve performance for

matched training and testing data (Yale) as opposed to

unmatched conditions (PIE). Recall that the Eigenfaces

used in these experiments were trained on a subset of Yale.

This can be a factor explaining the different recognition

rates attained for Yale and PIE as a function of NAP

dimensions. Other factors such as the illumination vari-

ability conditions in different databases can also lead to

distinct optimal NAP dimensions.

In a second experiment, we compared the different

compensation techniques reported in Sect. 5.3 (Table 2),

evaluating on the PIE CMU database. Eigenface and NAP

sets were trained using the Yale database as described in

Sects. 5.1 and 5.3. We used a partition of 18 subjects in PIE

as a background for model computations and the other 50

subjects for the evaluations. Similarly, we partitioned the

different illumination condition images into training and

testing subsets. Results are presented in Table 5. This

experiment further supports the conclusion that NAP is

competitive with other state-of-the-art compensation

methods and robust to mismatched conditions, even being a

data-driven technique and trained on a different database.

5.6 Merging NAP and Eigenfaces

In this section we suggest an additional simplification in

the NAP-Eigenfaces framework for face recognition. In

particular, we propose to merge the NAP and Eigenface

projections into a single step.

As mentioned before, the NAP projection aims to

remove the intra-space variability, in our case, illumination

irregularities. On the other hand, the Eigenfaces projection

aims to increase inter-space variability. While LDA

attempts to simultaneously optimize both criteria, the NAP-

Eigenface approach can be seen as two-step sequential

optimization, but that could be subsequently merged. One

option would be to estimate both spaces in separate and

then concatenate (after decorrelation) both eigenvector

bases, similar to the approach taken in [40]. Alternatively,

in this paper we propose a straightforward way of merging

the NAP and Eigenface projections, which is simply the

estimation of the Eigenface bases upon data which under-

went NAP of the channel.

In [29] the eigenface and NAP spaces are initially

estimated in independent stages. The eigenface bases are

extracted on well illuminated images (S1 in Yale database)

while the NAP projection involves all available illumina-

tion conditions. In operational mode, uncontrolled images

are first projected to the complementary NAP space and the

resulting illumination invariant image is then projected to

the eigenface space. In fact, this procedure assumes that the

NAP stage normalizes the illumination irregularities, gen-

erating images compatible in quality to the clean

Table 4 EER accuracy values for the different Yale subsets as

function of the training database for NAP and its dimension (in

parenthesis)

NAP compensation EERs (%)

S2 S3 S4 S5

None 2.5 12.8 25.0 31.5

Yale (50) 0 4.2 6.8 8.4

PIE (50) 0.7 5.5 10.8 6.9

Oulu (15) 0.7 6.6 15.0 15.8

Yale (15) 0.1 2.7 5.0 4.9

PIE (15) 0 5.7 11.7 11.1

Table 5 EER accuracy values comparison for the PIE database using

NAP and Eigenfaces trained on the Yale database

Method EERs (PIE database)

No NAP 25.6

WD 5.3

SR 8.7

LTP 4.3

NAP–Mean 3.8

Proposed NAP–Reference 3.8

Pattern Anal Applic

123

eigenvector space (S1). Nevertheless, in practice the ei-

genspace of NAP processed images does not necessarily

closely match the uncorrupted S1 image eigenspace, while

in principle we should pursue eigenbases tightly coupled to

the NAP processed images. Therefore, we propose to

estimate the eigenface space based on the NAP projected

images. Furthermore, this procedure does not require the

initial NAP projection in operational mode, since the NAP

processed eigenface space already excludes the NAP sub-

space. In mathematical representation,

CU ¼X

PXð Þ PXð ÞT ¼ PXXTPT ð8Þ

where P is the NAP projection matrix, CU denotes the

covariance matrix of the illumination compensated sub-

space trained on data X. PCA can diagonalize the covari-

ance matrix CU using the orthogonal eigenvectors VU

obtained in Eq. (9):

CUVU ¼ kVU ð9Þ

By substituting CU from Eq. (8) we get

PXXTPTVU ¼ kVU ð10Þ

If we denote the NAP processed images, XP by X0, then

Eq. (10) can be rewritten as

X0X0TV 0U ¼ kV 0U ð11Þ

which is the standard Eigenfaces formulation. This simply

means that the novel Eigenfaces bases VU0 primarily do

not contain any NAP directions and lead to results that are

mathematically identical to those obtained through the

regular two-step approach in [29].

6 Conclusions

Nuisance Attribute Projection was lately borrowed from

the speaker recognition field and introduced as a compen-

sation technique for image processing. While NAP is used

for minimizing channel artifacts in speech signals, it was

also proved to be a useful luminance compensation tech-

nique for face recognition. In light of this parallelism, this

paper discussed and developed several aspects relevant to

the NAP framework when dealing with face recognition,

apparently not applicable in the speech field. In particular,

we proposed a modification in the NAP formulation, which

takes into account actual image templates as references

instead of the regular approximations used so far. Addi-

tionally, it was shown that NAP training can be signifi-

cantly reduced for the face recognition tasks. Finally, we

suggested a compact framework, merging between NAP

compensation and eigenface recognition. In the future, we

intend to explore ways of incorporating NAP in the

frequency domain and explore other nuisance effects such

as position alignment.

Acknowledgments The authors would like to thank Vitomir Struc

for his helpful comments and to Ralph Gross for his assistance with

the PIE database.

References

1. Yang M, Kriegman DJ, Ahuja N (2002) Detecting faces in

images: a survey. IEEE TPAMI 24:34–58

2. Zou X, Kittler J, Messer K (2007) Illumination invariant face

recognition: a survey. In: First IEEE international conference on

biometrics: theory, applications, and systems, 2007. BTAS 2007,

27–29 Sep 2007. doi:10.1109/BTAS.2007.4401921

3. Land EH, McCann JJ (1971) Lightness and retinex theory. J Opt

Soc Am 61(1):1–11

4. Short J, Kittler J, Messer K (2004) A comparison of photometric

normalisation algorithms for face verification. In: Proceedings of

the sixth IEEE international conference on automatic face and

gesture recognition, 2004, 17–19 May 2004, pp 254–259. doi:10.

1109/AFGR.2004.1301540

5. Zhang T, Fang B, Yuan Y, Tang YY, Shang Z, Li D, Lang F

(2009) Multiscale facial structure representation for face recog-

nition under varying illumination. Pattern Recognit 42:251–258

6. Stockham TG Jr (1972) Image processing in the context of a

visual model. IEEE 60:828–842

7. Gross R, Brajovic V (2003) An image preprocessing algorithm

for illumination invariant face recognition. In: Proceedings of the

international conference on audio and video based biometric

person authentication, pp 10–18

8. Shashua A, Riklin-Raviv T (2001) The quotient image: class-

based re-rendering and recognition with varying illuminations.

IEEE TPAMI 23:129–139

9. Wang H, Li SZ, Wang Y (2004) Face recognition under varying

lighting condition using self quotient image. In: Proceedings of

IEEE international conference on AFGR, pp 819–824

10. Wang H, Wang H, Li SZ, Wang Y (2004) Generalized quotient

image. In: Proceedings of IEEE conference on CVPR,

pp 498–505

11. Choi S, Jeong G (2011) Shadow compensation using Fourier

analysis with application to face recognition. IEEE 18:23–26

12. Makwana RM (2010) Illumination invariant face recognition: a

survey of passive methods. Proc Comput Sci 2:101–110

13. Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs.

fisherfaces: recognition using class specific linear projection.

IEEE TPAMI 19:711–720

14. Wang H, Li SZ, Wang Y (2004) Face recognition under varying

lighting condition using self quotient image. In: Proceedings of

IEEE conference on AFGR, pp 819–824

15. Shan S, Gao W, Cao B, Zhao D (2003) Illumination normaliza-

tion for robust face recognition against varying lighting condi-

tions. In: Proceedings of IEEE conference on AMFG, pp 157–164

16. Kong SG, Heo J, Abidi B, Paik J, Abidi M (2005) Recent

advances in visual and infrared face recognition—a review.

CVIU 97:103–135

17. Pan Z, Healey G, Prasad M, Tromberg B (2003) Face recognition

in hyperspectral images. IEEE Trans PAMI 25:1552–1560

18. Zhang T, Tang YY, Fang B, Shang Z, Liu X (2009) Face rec-

ognition under varying illumination using gradient faces. IEEE IP

18:2599–2606

19. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn

Neurosci 3:71–86

Pattern Anal Applic

123

20. Chen T, Yin W, Zhou XS, Comaniciu D, Huang TS (2006) Total

variation models for variable lighting face recognition. TPAMI

28:1519–1524

21. Belhumeur PN, Kriegman DJ (1996) What is the set of images of

an object under all possible lighting conditions? In: IEEE con-

ference on CVPR, pp 270–277

22. Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few

to many: illumination cone models for face recognition under

variable lighting and pose. IEEE TPAMI 23:643–660

23. Basri R, Jacobs DW (2003) Lambertian reflectance and linear

subspaces. IEEE TPAMI 25:218–233

24. Savvides M, Vijaya Kumar BVK, Khosla PK (2004) ‘‘Corefac-

es’’—robust shift invariant PCA based correlation filter for illu-

mination tolerant face recognition. IEEE CVPR 2:834–841

25. Lee KC, Ho J, Kriegman D (2001) Nine points of light: acquiring

subspaces for face recognition under variable lighting. In: Pro-

ceedings of IEEE conference on CVPR, pp 519–526

26. Pizer SM, Amburn EP, Austin J, Cromartie R, Geselowitz A,

Greer T, Haar B, Zimmerman JB, Zuiderveld K (1987) Adaptive

histogram equalization and its variations. Comput Vis Graph

Image Process 39:355–368

27. Savvides M, Kumar BVK (2003) Illumination normalization

using logarithm transforms for face authentication. In: AVBPA,

pp 549–556

28. Chen W, Er MJ, Wu S (2006) Illumination compensation and

normalization for robust face recognition using discrete cosine

transform in logarithm domain. IEEE 36:458–466

29. Struc V, Vesnicer B, Mihelic F, Pavesic N (2010) Removing

illumination artifacts from face images using nuisance attribute

projection. In: Proceedings of the IEEE international conference

on ICASSP, pp 846–849

30. Solomonoff A, Campbell WM, Boardman I (2005) Advances in

channel compensation for SVM speaker recognition. In: Pro-

ceedings of IEEE International conference on ICASP,

pp 629–632

31. Sim T, Baker S, Bsat M (2003) The CMU pose, illumination, and

expression database. IEEE Trans Pattern Anal Mach Intell

25:1615–1618

32. Marszalec E, Martinkauppi B, Soriano M, Pietikainen M (2000)

A physics-based face database for color research. J Electron

Imaging 9:32–38

33. Kirby M, Sirovich L (1990) Application of the Karhunen-Loeve

procedure for the characterization of human face. IEEE TPAMI

12:103–108

34. Fukunnaga K (1990) Introduction to statistical pattern recogni-

tion, 2nd edn. Academic Press, New York

35. Vesnicer B, Mihelic F (2008) The likelihood ratio decision cri-

terion for nuisance attribute projection in GMM speaker verifi-

cation. EURASIP 2008:1–11

36. Cristianini N, Shawe-Taylor J (2000) Support vector machines.

Cambridge University Press, Cambridge

37. Collobert R, Bengio S (2001) SVMTorch: support vector

machines for large-scale regression problems. J Mach Learn Res

1:143–160

38. Campbell WM, Sturim DE, Reynolds DA, Solomonoff A (2006)

SVM based speaker verification using A GMM supervector

kernel and NAP variability compensation. In: Proceedings of

IEEE international conference on ICASSP, pp 97–100

39. Sezer OG, Ercil A, Keskinov M (2005) Subspace based object

recognition using Support Vector Machines. In: Proceedings of

European signal processing conference (EUSIPCO)

40. Solewicz Y, Aronowitz H (2009) Two-wire nuisance attribute

projection. In: INTERSPEECH, pp 928–931

41. Jobson DJ, Rahman Z, Woodell GA (1997) Properties and per-

formance of a center/surround retinex. IEEE Trans Image Process

6(3):451–462

42. Zhang T, Fang B, Yuan Y, Tang YY, Shang Z, Li D, Lang F

(2009) Multiscale facial structure representation for face recog-

nition under varying illumination. Pattern Recognit

42(2):252–258

43. Tan X, Triggs B (2010) Enhanced local texture feature sets for

face recognition under difficult lighting conditions. IEEE Trans

Image Process 19(6):1635–1650

44. Struc V, Pavesic N (2011) Performance evaluation of photo-

metric normalization techniques for illumination invariant face

recognition. In: Zhang YJ (ed) Advances in face image analysis:

techniques and technologies. IGI Global

45. Struc V, Pavesic N (2009) Gabor-based kernel partial-least-

squares discrimination features for face recognition. Informatica

(Vilnius) 20(1):115–138

Pattern Anal Applic

123


Recommended