Date post: | 10-Dec-2023 |
Category: |
Documents |
Upload: | independent |
View: | 0 times |
Download: | 0 times |
THEORETICAL ADVANCES
Improved nuisance attribute projection for face recognition
Ariel Yifrach • Eitan Novoselsky • Yosef A. Solewicz •
Yitzhak Yitzhaky
Received: 6 September 2012 / Accepted: 22 July 2014
� Springer-Verlag London 2014
Abstract The illumination variation is one of the well-
known problems in face recognition under uncontrolled
environments. Several techniques have been presented in
the literature to cope up with this problem. Lately, a
technique known as Nuisance Attribute Projection (NAP),
originally developed for the speaker recognition field was
introduced to image processing in order to compensate for
luminance artifacts. This paper extends and improves the
earlier work by exploring efficient methodologies for using
NAP for face recognition under varied illumination con-
ditions. In particular, we propose a modified NAP formu-
lation and show that NAP training can be simplified for
face recognition. Additionally, we suggested a compact
framework merging between NAP compensation and ei-
genface recognition. A series of experiments using the
extended YaleB database, and a cross-validation using the
PIE CMU and the Oulo databases are performed to validate
our proposals.
Keywords Face recognition � Nuisance Attribute
Projection � Principle Component Analysis � Support
Vectors Machine
1 Introduction
Current face recognition systems allow satisfactory iden-
tification of individuals under constrained conditions [1].
However, images produced under uncontrolled conditions
severely limit successful identification. Such problems are
typically related to disparities in lighting and pose condi-
tions [2]. In particular, variation in the illumination con-
ditions should cause dramatic changes in the face
appearance and thus can be considered as one of the
challenging problems that a practical face recognition
system needs to address. Such variations in the face
appearance can be much larger than the variation caused by
personal identity. In other words, images of different faces
can appear more similar than images of the same face
captured under extreme illumination variations. The prob-
lem is even accentuated, when the face recognition system
is a fully automated process, since the illumination varia-
tion is usually coupled with other uncontrolled conditions,
such as pose or expression variability and occlusion by
other objects.
The relation between the illumination and the informa-
tive image structure, as perceived by the human visual
system was modeled by Land and McCann [3] in their
Lightness and Retinex (LR) model, which states that in
each visual color channel the intensity signal can be
modeled as a product of the so-called luminance and
reflectance functions:
Iðx; yÞ ¼ Rðx; yÞLðx; yÞ; ð1Þ
where L(x,y) represents the amount of illumination and
R(x,y) represents the reflectivity of the object’s surface, at
each point (x,y). Several practical illumination compensa-
tion techniques based on the LR model were proposed in
order to extract the reflectance R(x,y) from the image
A. Yifrach (&) � E. Novoselsky � Y. Yitzhaky
Department of Electro-Optics Engineering, Ben-Gurion
University, Beer Sheva, Israel
e-mail: [email protected]
A. Yifrach � Y. A. Solewicz
Technology Section, Israel National Police, Jerusalem, Israel
123
Pattern Anal Applic
DOI 10.1007/s10044-014-0388-4
I(x,y) [4, 5]. According to the assumptions made to solve
Eq. (1), which are widely accepted and used [3, 5–7],
L(x,y) varies slowly [the low frequency component in
I(x,y)] while R(x,y) can change abruptly (the high fre-
quency component) thus the edges in the image correspond
to R(x,y) [3]. Based on that, it was proposed to extract
R(x,y) by high pass filtering of the image [6], or according
to (1), through a division of the original image by the low-
pass luminance component [3, 4]. The Quotient Image (QI)
[8], which is defined as the division between a test image
and a linear combination of three non-coplanar illuminated
images, was designed for dealing with illumination varia-
tion in face recognition. A Self-Quotient Image model was
proposed [9, 10] to extend the QI theory by computing the
ratio between the test image and its smoothed version [11]
computed by anisotropic filter. However, due to the
anisotropic nature of the employed smoothing filter, flat
zones in the images are not smoothed properly [12].
Unlike these methods, in order to extract the reflectance,
the NAP technique that will be detailed later removes the
subspace spanning the low-frequency within-class image
components associated to the luminance artifacts, L(x,y). A
significant benefit of the NAP is that the compensation is
done fast, since the image is projected into the pre-com-
puted NAP subspace and back, and image smoothing is not
needed.
Numerous illumination invariant face recognition
approaches have been proposed in the literature, tradi-
tionally categorized as being either passive or active [2].
Passive approaches overcome the illumination dissimilarity
by studying the final images produced by the imaging
system [13–15], in which face appearance has been altered
by illumination variations. Active approaches overcome
the illumination variation problem by employing active
imaging techniques such us optical filters, active illumi-
nation sources, etc. [16, 17]. Passive approaches can further
roughly classified into three main categories [18]: (1)
Illumination invariant feature extraction methods, which
attempt to identify the R(x,y) component [Eq. (1)], that is
then used for face recognition, as described above; (2)
Photometric normalization, and (3) 3-D face modeling.
Face image modeling can be based on illumination varia-
tion such as: Principal component analysis (PCA), Eigen-
faces [19], and Linear Discriminant Analysis (LDA),
Fisherface [13], using a statistical model whereas no
assumption on the surface property is needed. When there
are assumptions about the surface property, such as Lam-
bertian reflectance, the models for the face images based on
illumination variation are termed physical models and are
far from expectations [20] since not only the performances
of most of such methods [21–25] are still far from ideal,
many of these methods require assumptions on the light
source and large volume of training data. In addition to
methods developed for face recognition there are methods
designed for removing illumination artifacts from general
images. The photometric normalization approach [category
(2) above] uses an image processing tool to normalize face
images under severe lighting conditions. Common methods
for illumination normalization are: Histogram equalization
[26], Logarithm transform [27] and Gamma Intensity
Correlation [15]. However, these techniques do not per-
form well when images have space variant lighting con-
ditions [28].
In this paper, we extend a recently introduced photo-
metric normalization technique for illumination-invariance
face recognition based on Nuisance Attribute Projection
(NAP) [29]. We propose and evaluate an improved NAP
formulation enhancing intra-subject modeling and in
addition, propose ways to improve NAP training. In par-
ticular, we propose a modification in the NAP formulation,
which takes into account actual image templates as refer-
ences instead of the regular approximations used so far.
Furthermore, we investigate optimal training set sizes and
introduce the use of synthetic samples to further reduce
training requirements. We also show that the NAP com-
pensation can be straightforwardly embedded in the Ei-
genface recognition process. Moreover, since NAP can be
seen as a specific compensation kernel, we propose to
further combine NAP with Support Vector Machines
(SVMs), inspired from the original NAP approach for
speaker recognition [30]. The proposed method is exam-
ined using the extended YaleB database [22], and a cross-
validation is carried out using the PIE CMU [31] and the
Oulo [32] databases.
The rest of this paper is organized as follows: In Sects. 2
and 3 the NAP and the SVM methods are presented,
respectively. The proposed face recognition system using
SVM and NAP is described in Sect. 4, and experimental
results are presented in Sect. 5. Conclusions are in Sect. 6.
2 Eigenfaces and NAP
PCA can be used to compactly represent images by finding
suitable low-rank class-dependent projections [33]. In
addition, efficient image recognition can be further
achieved using the compressed parameterization [19]. In
the field of face recognition, the principal eigenvectors
spanning the low-rank subspace are known as ‘‘Eigenfac-
es’’ and describe the inter- or between-class data variabil-
ity. In contrast to the Eigenface subspace which retains
most of the image information, the intra- or within-class
subspace normally retains undesirable nuisance variability.
In addition to Eigenfaces, several approaches were pro-
posed in order to optimize data discrimination in view of its
spatial distribution. Techniques based on LDA [13, 34] aim
Pattern Anal Applic
123
at searching for the directions that maximize the inter-class
scatter while simultaneously minimizing the intra-class
scatter. On the other hand, NAP [30] initially defines the
subspace based on the eigenvectors of the within-class
covariance and further projects the data onto the orthogonal
complement of this space.
Nuisance Attribute Projection was initially introduced as
a technique able to mitigate the problem of channel vari-
ability in the speaker recognition field [30] and lately
extended to face recognition [29]. In state-of-the-art
speaker recognition framework, conversations are mapped
into a single vector in a multi-dimensional sparse space.
This is accomplished by pooling the several spectral
components extracted from each conversation and then
pooling them into a single ‘‘super-vector’’ [35]. Similarly,
in image processing, images are typically parametrized into
a single multidimensional vector representing the intensity
of each pixel. This common parametrization approach thus
led to the recent successful application of NAP in image
recognition [29].
The NAP technique essentially learns on a development
set, the spatial differences among same class subjects and
their respective centroid. These intra-class nuisance data is
then pooled into a single matrix and the eigenvectors of the
corresponding covariance matrix are found. The low-rank
subspace spanned by the main eigenvectors is called the
nuisance space which should be projected-out from each
conversation or image before recognition.
Formally, msi
hjdenotes the super-vector of the ith subject
si (either his recording or image) at the jth session (hj),
where each session represents in our case, a different
illumination condition. Note that in order to reliably model
intra-subject variability, these sessions should encompass
different scenarios. Also, assume that we have S subjects,
and H sessions. First, a session-averaged super-vector ( �msi )
is calculated for each subject. This value is then removed
from all the corresponding examples to obtain their mean-
shifted versions:
~msi
hj¼ msi
hj� �msi ð2Þ
The matrix:
M ¼ ~ms1
h1� � � ~ms1
hH� � � ~msS
h1� � � ~msS
hH
h ið3Þ
represents all the intersession variations from the average
subject parameterization. We assume that the nuisance
characteristics (in our case, artificial illumination) can be
represented by a low-rank subspace spanned by the
eigenvectors with the highest eigenvalues of the covariance
matrix MMT: The resulting eigenvectors form a base to the
reduced noise subspace. The projection operation which
‘‘normalizes’’ any super-vector mx is then defined as,
P ¼ I �Xd0
i¼1
wiwTi ; ð4Þ
where P is a projection matrix of the NAP, I denotes the
n� n identity matrix, wi represents the i-th NAP direction
and d0 stands for the number of NAP directions.
3 SVM with NAP
SVM is one of the most popular methods for data classi-
fication. At the most basic level, SVMs are two-class
hyperplane based classifiers operating usually in a high-
dimensional space related nonlinearly to the original
(usually lower dimensional) input space [30]. Given an
observation x 2 X and a Kernel function K, an SVM
classifier is given by the sums of the kernel functions [36]:
f ðxÞ ¼XL
i¼1
aitiKðx; xiÞ þ d ð5Þ
where ti is the ideal output, ai and d are parameters opti-
mized through the training process. The vectors xi are
support vectors obtained from the training set by an opti-
mization process [37]. The ideal output is either 1 or -1,
depending upon whether the corresponding support vector
is in class 0 or class 1, respectively. For classification, a
class decision is based upon whether the value f ðxÞ is
above or below a threshold. The kernel Kð�; �Þ is con-
strained to have certain properties (the Mercer condition),
so that Kð�; �Þ can be expressed as:
Kðx; yÞ ¼ bðxÞtbðxÞ; ð6Þ
where bðxÞ is a mapping from the input space (where
x lives) to a possibly infinite-dimensional SVM expansion
space. For a separable data set, SVM optimization chooses
a hyperplane in the expansion space with maximum margin
[38]. The data points from the training set lying on the
boundaries are the support vectors in Eq. (5). The focus of
the SVM training process is to model the boundary
between classes.
Kernel design in SVM aims at finding an appropriate
metric enhancing the specific classification task. The SVM
NAP method [38] works by removing subspaces that cause
undesired variability in the features. NAP constructs a new
kernel:
Kðma;mbÞ ¼ PbðmaÞ½ �t PbðmbÞ� �
¼ bðmaÞPbðmbÞ ¼ bðmaÞtðI � vvtÞbðmbÞ;ð7Þ
where P is a projection (P2 ¼ P), v is the direction being
removed from the SVM expansion space, bð�Þ is the SVM
expansion, and vk k2¼ 1.
Pattern Anal Applic
123
4 Proposed NAP–SVM face recognition
The proposed face recognition method using NAP and
SVM can be summarized as shown in Fig. 1. In the offline
stage, eigenface vectors are obtained in addition to eigen-
vectors of the NAP subspace and the SVM classifier is
built. In the online stage, information produced in the
offline stage is used to project-out undesired low rank
luminance dimensions and after that, feature extraction is
performed based on the Eigenfaces computed in the offline
stage. At last, using SVM classifier parameters, multi-class
decision is performed to predict the identity of the face
image.
In the offline stage, a PCA algorithm is trained using
only face images of different people captured in controlled
conditions in order to obtain a clean face eigenspace. On
the other hand, NAP is trained using images obtained under
both controlled and uncontrolled illumination conditions in
order to capture illumination nuisance. Fusion between
SVM and PCA improves the classification performance by
Eigenfaces. Moreover, performing feature extraction (such
as PCA or Independent Component Analysis) before SVM
enables faster computations in the offline stage during the
building of the classifiers [39]. It is important to mention
that the PCA is performed with the face images in the
logarithmic domain. The computation in the logarithmic
domain is necessary since we projected-out NAP eigen-
vectors in the logarithmic domain also.
The SVM part in the offline stage is meant to build the
subject’s model. Here the training data are the coefficients
of Eigenfaces obtained by projection of the face images to
the face space. Finally, SVM classifier is trained with face
images in the training set with optimal SVM parameters.
In the online stage, first the face image with the illu-
mination artifact is passed NAP process, then Eigenfaces of
the given face image are extracted and finally the SVM
classifier decides on the identity of the object.
5 Experiments
5.1 Database organization
In order to study the challenges of the described NAP
normalization scheme we used the extended YaleB Face
Database [22]. This database contains frontal faces of 38
different subjects captured under 64 illumination condi-
tions. Each 64 images of a subject were partitioned into 5
unequal subsets, (S1 through S5), characterized with
extremely different illumination properties. As the first
subset S1 contains some images without illumination arti-
facts, the last subset, S5 contains images with the most
severe illumination artifacts. The challenge in this database
is to achieve good recognition rate in the S5 subset but still
to obtain maximum recognition rates in the other subsets.
For the experiment, the database was partitioned into two
different parts depending on the level of the experiment. In
general, the database is partitioned into two disjoint groups,
as follows:
A: B01–B10
B: B11–B38
where A is defined as the evaluation stage and B is
defined as the development stage. (This partition is justified
since A and B were produced in different stages). The B##
sign refers to the index of the subject in the database.
In order to train the NAP subspace we used the classes
labeled S1 through S5 of the B11–B38 subjects. Eigenfaces
were estimated using only the S1 subset of B11–B38.
Training models for each of the subjects in A were
accomplished using B as background examples for the
SVM classifier. Evaluation was conducted in separate for
each of the S2–S5 subsets, using the subjects of partition
A as impostors. At all, for each illumination subset, we
have around a tenfold increase in impostor trials. In fact,
evaluation performance is not affected by the unbalanced
data, since in these experiments thresholds are optimized a
posteriori based upon the soft SVM outputs.
5.2 NAP effectiveness
We would like to start with some intuitive insight into the
NAP functionality for photometric normalization. As an
example, Fig. 2a shows the projections of five subjects in
subset S4 on a 3-D eigenface subspace. For comparison,
Fig. 2b presents the corresponding eigenface projections
when the NAP subspace is initially removed from the
images. It can be clearly seen that the NAP operation
enhances discriminability in the Eigenface domain, a trend
spotted with all the subjects in the database. For instance,
the same experiment using S5 produced a very similar
graph. On the other hand, with subsets S2 and S3, theFig. 1 A schematic diagram of the proposed recognition system
Pattern Anal Applic
123
impact of illumination compensation is less visible graph-
ically. It can be seen in the example presented in Fig. 2b
that simple linear separators could classify the subjects
after the NAP.
Another interesting result can be observed by com-
paring the eigenface bases (Fig. 3a) to the NAP bases
(Fig. 3b). While Eigenfaces retain prototypical face tra-
ces, NAP bases seem to additionally incorporate proto-
typical illumination artifacts. Although theoretically
counterintuitive, it is clear that the eigenface (inter-space
variability) and NAP (intra-space variability) are corre-
lated. The correlation degree is quite strong between the
respective principal directions and decays for higher
principal components as can be seen in Fig. 4. In this
figure, we show the sorted correlation coefficients
between most correlated NAP-Eigenface bases. Actually,
this phenomenon was also previously addressed with
respect to speech data [40].
5.3 Improved NAP algorithm
The NAP framework requires ‘‘clean’’ reference images in
order to estimate the intra-class data variability. In practice,
this reference is approximated by the mean image of the
class, as in Eq. (2). As a matter of fact, as noted in Sect. 3,
the original NAP methodology was developed for channel
compensation. In this specific scenario, obtaining the ref-
erence pattern is not feasible and the mean is used as an
approximation. Nevertheless, in our case, clean reference
images are available and they could replace the averaged
image in Eq. (2), hopefully rendering a more precise
description of the intra-class scattering.
We initially performed an experiment to assess the best
NAP dimension in subspace projection. Table 1 shows
recognition accuracy in terms of the Equal Error Rate
(EER) for different NAP dimensions and illumination
conditions. It can be observed that for our settings,
Fig. 2 NAP impact on the eigenface domain, before (a) and after NAP (b)
Fig. 3 Main face (a) and NAP (b) eigenvectors
Pattern Anal Applic
123
removing around 16 dimensions achieves the best results
for most of the conditions.
Following these findings and using the methodology
described in Sect. 3, we used a NAP rank of 16 to compare
the detection performance of some known illumination
compensation approaches on all subset conditions. In par-
ticular, we compare our proposed NAP implementation
with the original implementation [29] on YaleB. In addi-
tion, we include three other popular normalization tech-
niques as baselines: the single scale retinex algorithm (SR)
[41], the wavelet-denoising-based method (WD) [42]
(evaluated using the Coif wavelets) and the Local Ternary
Patterns (LTP) which is a generalization of the Local
Binary Patterns (LBP) [43]. We used the toolbox in [44,
45] to implement the baseline methods, all of them with
default settings. Note that similar comparisons were also
performed in [29] but in a closed-set fashion, where the
same subjects used for testing were also used for obtaining
the Eigenfaces, which may bias the experiments. The
results are presented in Table 2, where NAP-Mean of [29],
and the proposed NAP-Reference refer to the two ways of
modeling the intra-class variability, suggest that our NAP
implementation improves recognition performance over
the original implementation in the hardest illumination
conditions (S4 and S5). Moreover, it is competitive with
the other state-of-art baseline schemes evaluated. In addi-
tion, the NAP approach is considerably less complex than
common photometric normalization techniques (as noted in
the Sect. 1) and can be naturally merged with the PCA
projection as shown in Sect. 5.6.
5.4 Reduced NAP training
In principle NAP is a data-driven approach meant to model
the intra-variability space. Therefore, we could expect that
the diversity of distinct subjects is less relevant than a
robust representation of the illumination artifacts available
for training the NAP. In this sense, we performed an
experiment to assess the recognition performance as a
function of the amount of distinct subjects used to train the
NAP. Recall that we use 64 illumination conditions for
each subject. Figure 6 shows recognition rates for the S5
subset for increasing number of subjects while training
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection Rank
Cor
rela
tion
Coe
ffici
ent
Fig. 4 Correlation degree
between Eigenfaces and NAP
eigenvectors
Table 1 EER accuracy measure for different NAP dimensions
NAP’s dimension S2 S3 S4 S5
0 (no NAP) 2.5 12.9 25.3 32.2
1 1.7 7.2 13.1 10.0
2 1.7 5.6 8.1 8.4
4 0.1 4.3 4.8 7.3
8 0.7 4.3 3.4 5.8
16 0.0 2.8 5.1 5.2
32 0.0 3.0 6.6 6.0
64 0.0 4.3 6.7 7.5
128 0.0 3.5 5.8 7.4
Best results in bold
Table 2 EER accuracy values for different illumination compensa-
tion techniques, using the Yale database
Method EERs (Yale database subsets)
S2 S3 S4 S5
No NAP 2.5 12.9 25.3 32.2
WD 5 4.4 3.3 3.7
SR 2.5 9.5 9.1 10.3
LTP 1.7 3.0 3.3 3.1
NAP–Mean 0.0 2.8 5.1 5.2
Proposed NAP–Reference 0.0 2.8 4.4 4.3
Pattern Anal Applic
123
NAP. The subjects are progressively accumulated from
B11 to B38.
Surprisingly, this experiment suggests that as far as the
data adequately models the illumination noise, even a
single subject is enough for NAP training. It is worth to
note that other authors [29] encourage the use of a large
diversity of subjects for efficient NAP training. We men-
tion that picking any random subject as the single class for
NAP training leads to similar recognition performance.
Motivated by the above mentioned results, we can go a
step further and investigate the possibility of discarding
even the single subject required to train NAP. We actually
propose to use synthetic images of a virtual subject in the
distinct illumination conditions.
For the following experiment, instead of synthesizing
such a face, we picked a single subject and applied dif-
ferent levels of low-pass filtering to each of his images in
order to blur his idiosyncrasies while leaving mostly the
illumination artifacts. We then computed NAP using this
single subject for training and performed the recognition
evaluation. This process was repeated for each subject and
illumination condition under different low-pass filtering
levels. The average and standard deviation of the EERs
obtained using the several reference subjects in distinct
filtering levels are shown in Table 3 for each illumination
condition. Image filtering was performed using Gaussian
low pass filters with different r values. An example of a
‘‘synthesized’’ subject can be seen in Fig. 7a. The results
show that this proposed simplified synthetic NAP approach
is quite competitive with the full NAP training. Moreover,
the relatively small standard deviations observed suggest
that the choice of the training subject for NAP is quite
irrelevant.
This experiment supports the robustness and simplicity
of the NAP approach. The low-pass filtering operation used
to derive the ‘‘synthetic’’ training set actually distorts sig-
nificantly the subject, and distorts to a much lesser extent
illumination details because they are already in the low
frequency components. This experiment suggests that as
far as we can roughly reproduce the expected illumination
artifacts to be dealt with, it is possible to derive the NAP
projection with virtually no real data available for training,
as can be seen by the irrelevance of the filtering level (r)
used to blur the subject.
5.5 Cross validation
A reliable data-driven compensation technique should be
able to properly generalize to unseen data. Although we so
far benchmarked NAP using disjoint subjects for training
and testing from the Yale database, the estimated perfor-
mance may be overoptimistically biased due to the com-
mon illumination setup used across subjects. In this
section, we assess the generalization capability of NAP,
using two different face databases. Initially, we used the
PIE CMU [31] and the Oulo [32] databases in order to train
the NAP matrix and further apply to the same testing
protocol on the Yale database as before. We used a man-
ually cropped and resized version of the 68 subjects imaged
0 5 10 15 20 2560
65
70
75
80
85
90
95
100
Number of subjects
Rec
ogni
tion
Rat
e (%
)
Fig. 6 Recognition rate for S5 from Yale B as a function of number
of subjects used to train NAP
Table 3 EERs (mean and standard deviation) for the proposed
‘‘Synthetic’’ NAP for distinct filtering levels and in different subsets
r S2 S3 S4 S5
Mean STD Mean STD Mean STD Mean STD
0.5 0.7 0.3 4.5 0.7 7.3 1.5 8.4 1.8
1 0.7 0.4 4.6 0.9 7.6 1.6 8.2 2.1
2 0.7 0.3 4.6 0.7 7.3 1.6 7.8 2.0
4 0.6 0.3 4.1 0.6 6.8 1.7 7.4 1.8
8 0.5 0.3 4.6 0.7 7.6 1.8 8.1 1.9
16 0.4 0.3 6.0 1.2 10.3 2.8 8.8 1.6
Fig. 7 a A face image captured under strong non-uniform illumina-
tion; b the low pass version of the image in forming a synthetic
subject
Pattern Anal Applic
123
in 21 different illumination conditions from the PIE data-
base to model the illumination nuisance into the NAP
matrix. This cropping was performed in order to match the
image scaling with respect to Yale, and did not affect the
facial features. Similarly, we investigated an additional
extreme setup using only 16 varied illumination images of
a unique subject of the Oulo database for estimating the
NAP parameters. For a fare comparison focused on the
illumination compensation factor, we kept the same Ei-
genface set and SVM setup used in the previous
experiments.
Table 4 compares recognition performances obtained
for the distinct illumination partitions S1–S5 using the
aforementioned databases for NAP training. Note that for
Oulo, NAP dimensionality is severely limited by the low
number of images used for training the nuisance space and,
therefore, results are presented for two different NAP
dimensions.
This experiment suggests that NAP compensation is
generally robust to mismatched training data for different
illumination conditions. Moreover, even a minimum
diversity of training data (a few images of a single subject
from Oulo database) achieves satisfactory performances,
which corroborate our findings in Sect. 5.4. These results
are particularly encouraging since resolution, cropping and
illumination settings are quite different among the dat-
abases evaluated. Finally, note that possibly due to over-
fitting, lower NAP dimensions improve performance for
matched training and testing data (Yale) as opposed to
unmatched conditions (PIE). Recall that the Eigenfaces
used in these experiments were trained on a subset of Yale.
This can be a factor explaining the different recognition
rates attained for Yale and PIE as a function of NAP
dimensions. Other factors such as the illumination vari-
ability conditions in different databases can also lead to
distinct optimal NAP dimensions.
In a second experiment, we compared the different
compensation techniques reported in Sect. 5.3 (Table 2),
evaluating on the PIE CMU database. Eigenface and NAP
sets were trained using the Yale database as described in
Sects. 5.1 and 5.3. We used a partition of 18 subjects in PIE
as a background for model computations and the other 50
subjects for the evaluations. Similarly, we partitioned the
different illumination condition images into training and
testing subsets. Results are presented in Table 5. This
experiment further supports the conclusion that NAP is
competitive with other state-of-the-art compensation
methods and robust to mismatched conditions, even being a
data-driven technique and trained on a different database.
5.6 Merging NAP and Eigenfaces
In this section we suggest an additional simplification in
the NAP-Eigenfaces framework for face recognition. In
particular, we propose to merge the NAP and Eigenface
projections into a single step.
As mentioned before, the NAP projection aims to
remove the intra-space variability, in our case, illumination
irregularities. On the other hand, the Eigenfaces projection
aims to increase inter-space variability. While LDA
attempts to simultaneously optimize both criteria, the NAP-
Eigenface approach can be seen as two-step sequential
optimization, but that could be subsequently merged. One
option would be to estimate both spaces in separate and
then concatenate (after decorrelation) both eigenvector
bases, similar to the approach taken in [40]. Alternatively,
in this paper we propose a straightforward way of merging
the NAP and Eigenface projections, which is simply the
estimation of the Eigenface bases upon data which under-
went NAP of the channel.
In [29] the eigenface and NAP spaces are initially
estimated in independent stages. The eigenface bases are
extracted on well illuminated images (S1 in Yale database)
while the NAP projection involves all available illumina-
tion conditions. In operational mode, uncontrolled images
are first projected to the complementary NAP space and the
resulting illumination invariant image is then projected to
the eigenface space. In fact, this procedure assumes that the
NAP stage normalizes the illumination irregularities, gen-
erating images compatible in quality to the clean
Table 4 EER accuracy values for the different Yale subsets as
function of the training database for NAP and its dimension (in
parenthesis)
NAP compensation EERs (%)
S2 S3 S4 S5
None 2.5 12.8 25.0 31.5
Yale (50) 0 4.2 6.8 8.4
PIE (50) 0.7 5.5 10.8 6.9
Oulu (15) 0.7 6.6 15.0 15.8
Yale (15) 0.1 2.7 5.0 4.9
PIE (15) 0 5.7 11.7 11.1
Table 5 EER accuracy values comparison for the PIE database using
NAP and Eigenfaces trained on the Yale database
Method EERs (PIE database)
No NAP 25.6
WD 5.3
SR 8.7
LTP 4.3
NAP–Mean 3.8
Proposed NAP–Reference 3.8
Pattern Anal Applic
123
eigenvector space (S1). Nevertheless, in practice the ei-
genspace of NAP processed images does not necessarily
closely match the uncorrupted S1 image eigenspace, while
in principle we should pursue eigenbases tightly coupled to
the NAP processed images. Therefore, we propose to
estimate the eigenface space based on the NAP projected
images. Furthermore, this procedure does not require the
initial NAP projection in operational mode, since the NAP
processed eigenface space already excludes the NAP sub-
space. In mathematical representation,
CU ¼X
PXð Þ PXð ÞT ¼ PXXTPT ð8Þ
where P is the NAP projection matrix, CU denotes the
covariance matrix of the illumination compensated sub-
space trained on data X. PCA can diagonalize the covari-
ance matrix CU using the orthogonal eigenvectors VU
obtained in Eq. (9):
CUVU ¼ kVU ð9Þ
By substituting CU from Eq. (8) we get
PXXTPTVU ¼ kVU ð10Þ
If we denote the NAP processed images, XP by X0, then
Eq. (10) can be rewritten as
X0X0TV 0U ¼ kV 0U ð11Þ
which is the standard Eigenfaces formulation. This simply
means that the novel Eigenfaces bases VU0 primarily do
not contain any NAP directions and lead to results that are
mathematically identical to those obtained through the
regular two-step approach in [29].
6 Conclusions
Nuisance Attribute Projection was lately borrowed from
the speaker recognition field and introduced as a compen-
sation technique for image processing. While NAP is used
for minimizing channel artifacts in speech signals, it was
also proved to be a useful luminance compensation tech-
nique for face recognition. In light of this parallelism, this
paper discussed and developed several aspects relevant to
the NAP framework when dealing with face recognition,
apparently not applicable in the speech field. In particular,
we proposed a modification in the NAP formulation, which
takes into account actual image templates as references
instead of the regular approximations used so far. Addi-
tionally, it was shown that NAP training can be signifi-
cantly reduced for the face recognition tasks. Finally, we
suggested a compact framework, merging between NAP
compensation and eigenface recognition. In the future, we
intend to explore ways of incorporating NAP in the
frequency domain and explore other nuisance effects such
as position alignment.
Acknowledgments The authors would like to thank Vitomir Struc
for his helpful comments and to Ralph Gross for his assistance with
the PIE database.
References
1. Yang M, Kriegman DJ, Ahuja N (2002) Detecting faces in
images: a survey. IEEE TPAMI 24:34–58
2. Zou X, Kittler J, Messer K (2007) Illumination invariant face
recognition: a survey. In: First IEEE international conference on
biometrics: theory, applications, and systems, 2007. BTAS 2007,
27–29 Sep 2007. doi:10.1109/BTAS.2007.4401921
3. Land EH, McCann JJ (1971) Lightness and retinex theory. J Opt
Soc Am 61(1):1–11
4. Short J, Kittler J, Messer K (2004) A comparison of photometric
normalisation algorithms for face verification. In: Proceedings of
the sixth IEEE international conference on automatic face and
gesture recognition, 2004, 17–19 May 2004, pp 254–259. doi:10.
1109/AFGR.2004.1301540
5. Zhang T, Fang B, Yuan Y, Tang YY, Shang Z, Li D, Lang F
(2009) Multiscale facial structure representation for face recog-
nition under varying illumination. Pattern Recognit 42:251–258
6. Stockham TG Jr (1972) Image processing in the context of a
visual model. IEEE 60:828–842
7. Gross R, Brajovic V (2003) An image preprocessing algorithm
for illumination invariant face recognition. In: Proceedings of the
international conference on audio and video based biometric
person authentication, pp 10–18
8. Shashua A, Riklin-Raviv T (2001) The quotient image: class-
based re-rendering and recognition with varying illuminations.
IEEE TPAMI 23:129–139
9. Wang H, Li SZ, Wang Y (2004) Face recognition under varying
lighting condition using self quotient image. In: Proceedings of
IEEE international conference on AFGR, pp 819–824
10. Wang H, Wang H, Li SZ, Wang Y (2004) Generalized quotient
image. In: Proceedings of IEEE conference on CVPR,
pp 498–505
11. Choi S, Jeong G (2011) Shadow compensation using Fourier
analysis with application to face recognition. IEEE 18:23–26
12. Makwana RM (2010) Illumination invariant face recognition: a
survey of passive methods. Proc Comput Sci 2:101–110
13. Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs.
fisherfaces: recognition using class specific linear projection.
IEEE TPAMI 19:711–720
14. Wang H, Li SZ, Wang Y (2004) Face recognition under varying
lighting condition using self quotient image. In: Proceedings of
IEEE conference on AFGR, pp 819–824
15. Shan S, Gao W, Cao B, Zhao D (2003) Illumination normaliza-
tion for robust face recognition against varying lighting condi-
tions. In: Proceedings of IEEE conference on AMFG, pp 157–164
16. Kong SG, Heo J, Abidi B, Paik J, Abidi M (2005) Recent
advances in visual and infrared face recognition—a review.
CVIU 97:103–135
17. Pan Z, Healey G, Prasad M, Tromberg B (2003) Face recognition
in hyperspectral images. IEEE Trans PAMI 25:1552–1560
18. Zhang T, Tang YY, Fang B, Shang Z, Liu X (2009) Face rec-
ognition under varying illumination using gradient faces. IEEE IP
18:2599–2606
19. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn
Neurosci 3:71–86
Pattern Anal Applic
123
20. Chen T, Yin W, Zhou XS, Comaniciu D, Huang TS (2006) Total
variation models for variable lighting face recognition. TPAMI
28:1519–1524
21. Belhumeur PN, Kriegman DJ (1996) What is the set of images of
an object under all possible lighting conditions? In: IEEE con-
ference on CVPR, pp 270–277
22. Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few
to many: illumination cone models for face recognition under
variable lighting and pose. IEEE TPAMI 23:643–660
23. Basri R, Jacobs DW (2003) Lambertian reflectance and linear
subspaces. IEEE TPAMI 25:218–233
24. Savvides M, Vijaya Kumar BVK, Khosla PK (2004) ‘‘Corefac-
es’’—robust shift invariant PCA based correlation filter for illu-
mination tolerant face recognition. IEEE CVPR 2:834–841
25. Lee KC, Ho J, Kriegman D (2001) Nine points of light: acquiring
subspaces for face recognition under variable lighting. In: Pro-
ceedings of IEEE conference on CVPR, pp 519–526
26. Pizer SM, Amburn EP, Austin J, Cromartie R, Geselowitz A,
Greer T, Haar B, Zimmerman JB, Zuiderveld K (1987) Adaptive
histogram equalization and its variations. Comput Vis Graph
Image Process 39:355–368
27. Savvides M, Kumar BVK (2003) Illumination normalization
using logarithm transforms for face authentication. In: AVBPA,
pp 549–556
28. Chen W, Er MJ, Wu S (2006) Illumination compensation and
normalization for robust face recognition using discrete cosine
transform in logarithm domain. IEEE 36:458–466
29. Struc V, Vesnicer B, Mihelic F, Pavesic N (2010) Removing
illumination artifacts from face images using nuisance attribute
projection. In: Proceedings of the IEEE international conference
on ICASSP, pp 846–849
30. Solomonoff A, Campbell WM, Boardman I (2005) Advances in
channel compensation for SVM speaker recognition. In: Pro-
ceedings of IEEE International conference on ICASP,
pp 629–632
31. Sim T, Baker S, Bsat M (2003) The CMU pose, illumination, and
expression database. IEEE Trans Pattern Anal Mach Intell
25:1615–1618
32. Marszalec E, Martinkauppi B, Soriano M, Pietikainen M (2000)
A physics-based face database for color research. J Electron
Imaging 9:32–38
33. Kirby M, Sirovich L (1990) Application of the Karhunen-Loeve
procedure for the characterization of human face. IEEE TPAMI
12:103–108
34. Fukunnaga K (1990) Introduction to statistical pattern recogni-
tion, 2nd edn. Academic Press, New York
35. Vesnicer B, Mihelic F (2008) The likelihood ratio decision cri-
terion for nuisance attribute projection in GMM speaker verifi-
cation. EURASIP 2008:1–11
36. Cristianini N, Shawe-Taylor J (2000) Support vector machines.
Cambridge University Press, Cambridge
37. Collobert R, Bengio S (2001) SVMTorch: support vector
machines for large-scale regression problems. J Mach Learn Res
1:143–160
38. Campbell WM, Sturim DE, Reynolds DA, Solomonoff A (2006)
SVM based speaker verification using A GMM supervector
kernel and NAP variability compensation. In: Proceedings of
IEEE international conference on ICASSP, pp 97–100
39. Sezer OG, Ercil A, Keskinov M (2005) Subspace based object
recognition using Support Vector Machines. In: Proceedings of
European signal processing conference (EUSIPCO)
40. Solewicz Y, Aronowitz H (2009) Two-wire nuisance attribute
projection. In: INTERSPEECH, pp 928–931
41. Jobson DJ, Rahman Z, Woodell GA (1997) Properties and per-
formance of a center/surround retinex. IEEE Trans Image Process
6(3):451–462
42. Zhang T, Fang B, Yuan Y, Tang YY, Shang Z, Li D, Lang F
(2009) Multiscale facial structure representation for face recog-
nition under varying illumination. Pattern Recognit
42(2):252–258
43. Tan X, Triggs B (2010) Enhanced local texture feature sets for
face recognition under difficult lighting conditions. IEEE Trans
Image Process 19(6):1635–1650
44. Struc V, Pavesic N (2011) Performance evaluation of photo-
metric normalization techniques for illumination invariant face
recognition. In: Zhang YJ (ed) Advances in face image analysis:
techniques and technologies. IGI Global
45. Struc V, Pavesic N (2009) Gabor-based kernel partial-least-
squares discrimination features for face recognition. Informatica
(Vilnius) 20(1):115–138
Pattern Anal Applic
123