+ All documents
Home > Documents > Compressed data format for handwritten signature biometrics

Compressed data format for handwritten signature biometrics

Date post: 02-Dec-2023
Category:
Upload: kent
View: 0 times
Download: 0 times
Share this document with a friend
9
1 Compressed Data Format for Handwritten Signature Biometrics Oscar Miguel-Hurtado Dr. Luis Mengibar-Pozo Dpt. Electronic Technology Dpt. Electronic Technology University Carlos III de Madrid University Carlos III de Madrid [email protected] [email protected] Leganes, Madrid, 28911 Leganes, Madrid, 28911 SPAIN SPAIN Dr. Michael G. Lorenz Dr. Richard Guest Dpt. Electronic Technology School of Engineering and Digital Arts University Carlos III de Madrid University of Kent [email protected] [email protected] Leganes, Madrid, 28911 Canterbury, Kent, CT2 7NT SPAIN UNITED KINGDOM Abstract - In this work, the authors explore different lossless strategies of organizing the sample point values from an on-line signature data format, in order to obtain higher compression performance. A new near-lossless strategy of organizing sample point values is also proposed, with the introduction of a control error difference level mechanism. Seven different data compression algorithms and three public databases have been tested in order to find the compression ratios for these new strategies, identify if a better data compression algorithm exists and the influence from the origin of the database, i.e. different populations signing in very different ways. The authors have also carried out an algorithm evaluation to investigate the impact of the information lost on several data formats on the Signature Verification performances. Index Terms on-line signature, ISO/IEC Biometrics Standards, Compression, biometrics, authentication, security. I. INTRODUCTION The last few years has seen the release of a great range of signatures input devices based on dynamic signals (i.e time series channels), partly due to industrial and application-led demands. These devices are being introduced in our daily life. For example many shopping centers have started to use them for credit card transaction in order to simplify the receipt management, and save paper, energy and money with a paperless process. This increased use and availability of signature systems will present the biometric community with the opportunity to utilize their capabilities to improve transactional security though the deployment of automatic signature verification systems. The distributed nature of the market (input device vendors, algorithms providers, integrators and end-users) shows the importance of standardization, enabling all stakeholders to develop systems which easily interact with each other. ISO/IEC JC1 SC37 WG3 is already working on the development of a second generation of biometrics data interchange standards (Project 19794) and within this work there is a standard for signature time series data (19794-7) the development of which the authors are directly involved and for which this work forms a contribution for a new signature compressed data format. In this work, the authors explore different lossless strategies of organizing the sample point values from on-line signatures data in order to obtain a higher compression performance and consequently a lower biometric data record size with the minimum lost of information. A new near- lossless strategy of organizing sample point values is proposed with the introduction of a control error difference level between consecutive samples mechanism. Different data compression algorithms are tested to ascertain suitability towards on-line signature data characteristics. Three different database (MCyT Signature corpus, SVC2004 and MyIDea Signature corpus) with users from 5 different countries (Spain, England, France, China and Australia) are used for all experiments to identify if there are any demographic differences between them for the compression data formats proposed. This work is organized as follow. Section II will introduce the new second generation signature/sign time series data formats [1] (ISO/IEC WD 19794-7.2) showing the Full and Compact data formats structure. The new Compression Data Formats under investigation are introduced in Section III, while the different compression algorithms and dataset tested are explained in Sections IV and V respectively. Section VI will show the results for the compression ratio achieved. In Section VII and VIII an algorithm evaluation and results will be disclosed.
Transcript

1

Compressed Data Format for Handwritten Signature Biometrics

Oscar Miguel-Hurtado Dr. Luis Mengibar-Pozo Dpt. Electronic Technology Dpt. Electronic Technology University Carlos III de Madrid University Carlos III de Madrid [email protected] [email protected] Leganes, Madrid, 28911 Leganes, Madrid, 28911

SPAIN SPAIN Dr. Michael G. Lorenz Dr. Richard Guest Dpt. Electronic Technology School of Engineering and Digital Arts University Carlos III de Madrid University of Kent [email protected] [email protected] Leganes, Madrid, 28911 Canterbury, Kent, CT2 7NT

SPAIN UNITED KINGDOM Abstract - In this work, the authors explore different

lossless strategies of organizing the sample point values from an on-line signature data format, in order to obtain higher compression performance. A new near-lossless strategy of organizing sample point values is also proposed, with the introduction of a control error difference level mechanism. Seven different data compression algorithms and three public databases have been tested in order to find the compression ratios for these new strategies, identify if a better data compression algorithm exists and the influence from the origin of the database, i.e. different populations signing in very different ways. The authors have also carried out an algorithm evaluation to investigate the impact of the information lost on several data formats on the Signature Verification performances.

Index Terms — on-line signature, ISO/IEC Biometrics

Standards, Compression, biometrics, authentication, security.

I. INTRODUCTION

The last few years has seen the release of a great range of signatures input devices based on dynamic signals (i.e time series channels), partly due to industrial and application-led demands. These devices are being introduced in our daily life. For example many shopping centers have started to use them for credit card transaction in order to simplify the receipt management, and save paper, energy and money with a paperless process. This increased use and availability of signature systems will present the biometric community with the opportunity to utilize their capabilities to improve transactional security though the deployment of automatic signature verification systems. The distributed nature of the market (input device vendors, algorithms providers, integrators and end-users) shows the

importance of standardization, enabling all stakeholders to develop systems which easily interact with each other. ISO/IEC JC1 SC37 WG3 is already working on the development of a second generation of biometrics data interchange standards (Project 19794) and within this work there is a standard for signature time series data (19794-7) the development of which the authors are directly involved and for which this work forms a contribution for a new signature compressed data format.

In this work, the authors explore different lossless strategies of organizing the sample point values from on-line signatures data in order to obtain a higher compression performance and consequently a lower biometric data record size with the minimum lost of information. A new near-lossless strategy of organizing sample point values is proposed with the introduction of a control error difference level between consecutive samples mechanism.

Different data compression algorithms are tested to ascertain suitability towards on-line signature data characteristics.

Three different database (MCyT Signature corpus, SVC2004 and MyIDea Signature corpus) with users from 5 different countries (Spain, England, France, China and Australia) are used for all experiments to identify if there are any demographic differences between them for the compression data formats proposed.

This work is organized as follow. Section II will introduce the new second generation signature/sign time series data formats [1] (ISO/IEC WD 19794-7.2) showing the Full and Compact data formats structure. The new Compression Data Formats under investigation are introduced in Section III, while the different compression algorithms and dataset tested are explained in Sections IV and V respectively. Section VI will show the results for the compression ratio achieved. In Section VII and VIII an algorithm evaluation and results will be disclosed.

2

II. 19794 PART 7 V2: SIGNATURE/SIGN TIME SERIES

DATA

ISO/IEC JTC1 SC37 W3 experts are currently developing a revision of 19794-7, which will be the second generation of the signature/sign time series data format [1]. This new version is in second Working Draft state, and the main changes from Version 1 [2] are located within the Full Format, where completely new versions of General Header and Representation Header have been introduced. This new revision has also incorporated conformance test assertions within the new amendment 1, which were before part of the 29109-7 project [3].

But the most important change, with respect to this paper, is the call for Nationals Body contributions to provide a definition for a new compression data format. A decision was taken to incorporate a compression data format after results showed encouraging data compression performance using lossless compressions algorithms with signature data formats [4]. The following provides a brief description of the signature data formats included in this Version 2 of 19794-7 highlighting the key differences with Version 1.

A. Full Format

As has been mentioned above, the main changes within

this format is a new completely version of the header, divided in a General Header and one or more Representation Headers.

The new biometric data block (BDB) General Header, as shown in Figure 1, identifies the modality of the BDB (―Format Identifier‖) as well as its version (―Version Number‖). It also indicates the length of the BDB and the number of representations within the BDB Body. In this way, this new version of 19794-7 Full Format enables a single record to contain multiple signatures samples (representations) which was not possible within Version 1 of 19794-7.

The presence or absence of Certification Blocks at the representation level is indicated with the ―Certification Flag‖ field.

This BDB General Header is shared between all the different modalities defined in the 19794 project.

Figure 1 BDB General Header for ISO/IEC 19794-7 Full Format

Following the BDB General Header shall be the BDB body,

which should contain at least one single sample (representation). Each representation shall consist of a ―Representation Header‖ and a ―Representation Body‖ as shown in Figure 2.

Figure 2 BDB Body of ISO/IEC 19794-7 Full Format

The Representation Header (Figure 3) incorporates several new fields including ―Capture Data and Time‖ which indicates when the captures of this representations started, ―Capture Device Technology ID‖ which indicates the class of capture device technology used to acquire the captured biometric sample, ―Capture Device Vendor ID‖ and ―Capture Device Type ID‖ which indicate the vendor and product type and the quality blocks which contains the predicted comparison performance of this representation.

The ―Preamble‖ field in the Representation Header indicates the presence of absence of optional extended data within the BDB Body.

After these fields are the same fields as Version 1, ―Channel Description‖ and ―Number of Sample Points‖.

Figure 3 BDB Representation Header of ISO/IEC 19794-7 Full Format

The Channel Descriptions data (Figure 4) begins with the ―Channel Inclusion Field‖ indicating the presence of absence of particular channels. X and Y channels are mandatory as well as either the T channel, DT channel, or uniform sampling must be indicated. Following an indication of present channels shall be one channel description field for each channel indicated as present in the ―Channel Inclusion Field‖. These channel description contain information such scaling value, minimum and maximum possible channel values, mean value and standard deviation of the channel values, whether the channel value is constant or not and if

3

the linear component of the regression line for this channel has been removed.

Figure 4 Channel descriptions of ISO/IEC 19794-7 Full Format

The number of samples points included in the BDB Representation Body shall be indicated in the corresponding field ―Number of Sample Points‖.

Within the Version 2 BDB Representation Body can be found a sequence of sample points, and, if indicated at the preamble, the extended data (Figure 5). The structure of the optional extended data is not defined within this format.

Figure 5 BDB Representation Body of ISO/IEC 19794-7 Full Format

The sequence of sample points (Figure 6) remains the

same as defined in Version 1: each sample point containing the values of the channels indicated at the ―Channel Inclusion Field‖, stored in 2 bytes, except for S Channel, which will be stored in 1 byte.

Figure 6 Sequence of Sample Points of ISO/IEC 19794-7 Full Format

B. Compact Format

Compact Format remains the same as defined in previous

Version 1. Compact format is defined for use with smart cards and others tokens requiring a smaller representation size. Without containing a header, the Compact Format Representation Body is made by a sequence of sample points, with each sample point values stored in just 1 byte (Figure 7).

Figure 7 BDB of ISO/IEC 19794-7 Compact Format

As in Version 1, the Compact Format doesn’t allow multiple

signature representations. Information about the structure and contents of the data block (BDB) shall be contained in a separate matching algorithm parameters data objects.

C. Compression Data Format

After the results shown in the paper ―Analysis on compact

data formats for the performance of Handwritten Signature Biometrics‖ [4] presented at the last International Carnahan Conference on Security Technologies, the ISO/IEC JTC1 SC37 WG3 experts expressed an interest in adding a new Signature Compression Data Format within this part 7 of 19794 project.

In the following sections the different approaches studied for this new Compression Data Format will be presented alongside the experiments that have been conducted to identify the best approach.

III. PROPOSED COMPRESSION FORMAT

In order to define this compression data format SC37 WG3 Biometrics Data Format experts made a call for National Standards Body contributions. This paper will show the proposal made by the authors through the Spanish National Body. This proposal is based on organizing sample point values in a way which improves the use of data compression algorithms, as was shown in [4]. In this paper, a new way of organizing the sample point values, which may entail a controlled information loss, is presented. All these compression version (defined below) will be tested using three different public datasets.

Version 1 will be used as a reference to provide baseline results for other methods in order to assess the performance of the data compression algorithms. Version 2 and Version 3

4

are the same as presented in [4], whereas Versions 4 to 6 are new algorithm proposals, testing different information loss levels.

The following provides a brief description of each version of compression.

A. Version 1

This version compresses the sequence of sample points

as is defined in 19794-7 Full Format. This version provides a comparison of the algorithm compression performance against all the others versions. Compression algorithms will be applied to the sequence of sample points as it is defined in the standard (19794-7 WD1 Full Format) - as a sequence of points, each point containing the values from all the channels included.

B. Version 2

Instead of storing the sequence of samples as defined in

part 7 Full Format, it could be a better option to store each channel separately, link all channels together and finally compress the resulting data structure. This improves the compression algorithms performance as was shown in [4].

C. Version 3

In this version it is proposed not to store separately the

values of each channel but rather the difference between consecutive samples of the same channel. In this way the first value of a channel will be stored as its actual value, and the following values will be the difference with the previous one. These values shall be stored as a two byte signed instead an unsigned integer. After calculating the differences, the results of each channel will be linked together and then compressed. As it was shown in [4], this strategy improve the performance of compression algorithms over the application of Version 2.

D. Version 4

Version 4 is the new proposed strategy of storing sample

points values. As with Version 3 it doesn’t store the sample point values but rather the difference between consecutive sample point channel values. Version 4 allows certain limited error difference levels, enabling the storage of difference values within just one byte. Every channel will start with its initial value C1, followed by a sequence of differences between consecutive samples, dC2...dCn. When the difference between consecutive samples is greater than ―Error Difference Level + 1‖ multiplied by 127, a control character FFHEX shall be stored, followed by the original channel value for that sample.

When the difference between consecutive samples is lower than ―Error Difference Level + 1‖ multiplied by 127, this value shall be encoded in one byte as unsigned integer, after adding 128 to each value.

If the ―Error Difference Level‖ is greater than 0, an accumulative error control is introduced to limit the error between the original channel values and the channel values that will be compressed within the range defined in the following formula:

1 - elerence_LevError_Diff2 Max_Error (1)

The following is an implementation in Matlab code to store

a channel with an error control: Function store_channel(channel,num_samples,signed, difference) % Set initial Accumulative Error error = 0; % Store First Channel Value Value = channel(1); if signed store_big_endian(Value+offset); else store_big_endian(Value); end % Store Consecutive Difference for i=2 : num_samples Value = channel(i); % Difference between two consecutive samples difference = channel(i) - channel(i-1); % Difference divided by Error Compression Level difference = difference / (error_difference_level+1); % Difference Rounded difference_round = round(difference);

% Error introduced by rounded diff_channel_error = difference - difference_round; % Check if difference absolute value is greater than

max difference (126) if abs(difference_round) > 126 % Store control character: FF store_big_endian(FFHEX);; % Store Channel Sample Value

if signed store_big_endian(Value+ offset); else store_big_endian(Value); end % Reset Error error = 0;

else % Accumulative Error

error = error + diff_channel_error;

% Accumulative Error Control if error > 0.5

% If Error exceed upper_limit_error % add “1” to the sample value difference_round = difference_round + 1; %subtract “1” to acumulative error error = error - 1;

elseif error < - 0.5 % If Error exceed lower _limit_error % subtract “1” to the sample value difference_round = difference_round - 1; % add “1” to acumulative error error = error - 1; +

end

5

% Store Difference store_big_endian(Value+128); end end

In Version 4, a Error_Difference_Level value of 0 will be

used for all channels: X, Y, T, S and P, which will not entail any difference between the original data and the compressed data.

E. Version 5

Implemented as described in Version 4, however a

Error_Difference_Level value of 1 will be used for channels X, Y, T, S and P, which will entail a maximum difference between the original data and the compressed data of 1.

F. Version 6

Again, implemented as described in Version 4, but a

Error_Difference_Level value of 2 will be used for channels

X, Y, T, S and P, which will entail a maximum difference between the original data and the compressed data of 2.

IV. COMPRESSION ALGORITHM

Seven different lossless compression algorithms have been tested, most of them included in 7-Zip Command Line Version application [5]. Also GZip [6], LZW [7] and BZip2 [8] compression algorithms have been used.

Table 1 summarizes the compression algorithms tested and their source:

Table 1

Compression Algorithms Tested

Identifier Compression Algorithm

1 Zip (7 – zip) [5]

2 LZMA (7-zip) [5]

3 PPMd (7-zip) [5]

4 Deflate (7-zip) [5]

5 GZip [6]

6 LZW [7]

7 Bzip2 [8]

A. 7 Zip

7-Zip [5] is a free open source software which provides a

high compression ratio. The supported formats for packing and unpacking are: 7z, ZIP, GZIP, BZIP2 and TAR. The version used has been 4.65 for Windows (03-02-2009) in its command line version. Within 7z format, several methods have been tested: Zip, LZMA (Improved and optimized version of LZ77 algorithm), PPmD (Dmitry Shkarin's PPMdH with small changes) and Deflate (Standard LZ77-based algorithm). Further details can be found at [5].

B. GZip

GZip [6] is a free cross-platform software application for file

compression. It claims to have, as a main advantage over other compression software, much better compression and freedom from patented algorithms. It has been adopted by the GNU project and is now relatively popular on the Internet. The Gzip file format was standardized as RFC 1952 [9]. Version 1.2.4 windows platform has been used. Further details can be found at [6].

C. BZip2

BZip2 [8] is a free and open source lossless compression

algorithm developed by Julian Seward. It also claims to be patent free. Its main advantages are a good compression ratio and fast compression and decompression times. The current version used has been 1.0.5, released 17 March 2008. Further details can be found at [8].

D. LZW06

LZW06 lossless compression algorithm is an

implementation of the Lempel-Ziv-Welch (LZW) encoding /decoding algorithm by Michael Dipperstein [7]. Version 0.6 has been used, released 21 December 2009. Further details can be found at [7].

V. SIGNATURE DATABASES

Evaluation of the size of a BDB for these signature formats has been carried out using MCyT-Signature-Database Corpus [10], SVC2004 Database [11] and MyIDea Signatures Corpus [12] . All of them are publicly available.

A. MCyT Signature Database

MCyT database [10] comprises 100 different users. Each

user produced 25 genuine signatures. 25 skilled forgeries of each user’s signature were also captured. These skilled forgeries were produced by the 5 subsequent users, who practiced it until they felt confident. To capture the signatures for the database a Wacom Intous A6 USB graphic tablet was used. Users captured in this database are mainly Spanish writers.

B. SVC2004 Signature Database

SVC2004 database [11] comprises 40 different users.

Each user produced 20 genuine signatures. 20 skilled forgeries were also captured. Users were asked not to use their real signatures, rather devising a new signature and practicing production until confident. To capture the signatures for the database, a Wacom Intous A6 graphic tablet was used.

Users captured in this database are mostly Chinese users, signing with Oriental or Occidental characters. From these 40 different users, 24 use Occidental characters, whereas 16 use Oriental characters. Unlike Spanish signatures, English users are used to writing their name or initials without pictorial strokes. Chinese signatures are different in composition to Occidental signatures and are based on shorts strokes.

6

C. MyIDea

The MyIDea database [12] comprises 70 different users.

Each user produced 18 genuine signatures. 36 skilled forgeries were also captured, 18 of them using only static information, and the other 18 using static and dynamic information. To capture the signatures for the database an A4 Intuos2 graphic tablet from WACOM was used.

Users captured in this database are French and English. From these 70 different users, 46 are French users, whereas 24 are English users.

VI. COMPRESSION RESULTS

A. Methodology

Genuine signatures included in the datasets described

above have been stored following ISO/IEC 19794-7.2 specification in both Full Format and Compact Format.

All Full Format instances shared the same BDB General Header (7 bytes) and BDB Representation Header, which include all the mandatory fields (19 bytes) plus the channel descriptions (50 bytes) detailing which channels are included (time, x and y position, switch and pressure) and their scaling values and maximum and minimum values.

Compact Formats instances don’t have any header simply containing the number of samples (3 bytes) and their values for channels: x and y position, switch and pressure.

For each Compression Format Version proposed the same BDB General Header and BDB Representation header have been used, but the BDB Body has been formed by organizing the data as explained in Section IV and then compressed with the different compression algorithms detailed in Section V.

B. Results

The first graph shows the average data size in Kilobytes

(Figure 8) for the different Data Formats and the different datasets, obtaining an average for all the compression algorithms used.

The Full Formats average sizes show the difference

between different types of users and datasets. The small sizes for SVC2004 is due to the non-recording of pen-up movement, whereas MCyT and MyIDea contains these data. It can be also noted how the different nationalities have different average data sizes, with the largest data size group formed from contributors to the MCyT dataset (Spanish users).

Figure 9 depicts the average compression ratio achieved

for the different Data Formats and the datasets. The Compact Format obtains a compression ratio around 43% due to storing the values for channels X and Y position and Pressure in 1 byte instead of 2 byte as in Full Format. The Compact Format does not contain a header and the channel time has not been included, due to the fact that values do not have enough resolution within the range 0-255.

An improved performance for the compression ratio are achieved with the Compression Data Format Version 4 to 6,

obtaining almost more than 60% for all datasets with Version 6. The larger the ―error_level‖ value, the greater the compression ratio achieved.

Figure 8 Average BDB Size for different Data Formats and Datasets

MyIDea datasets show greater compression ratios than other datasets. This is due to the channels within the dataset having the smallest data ranges, hence the consecutive differences between values have more repetitions and obtain a better performance from the compression algorithms.

Figure 9 Average Compression Ratio for different Data Formats and

Datasets

Figure 10 shows the performance of the algorithms according to the Compression Format Versions proposed. In forming these results, the average between all datasets has been taken. The average compression ratio shown is the compression ratio achieved just for the BDB Body (i.e. sample points values). The best performance is achieved with Version 3, which stores the difference between channels in 2 bytes, therefore there are a large number of bytes with 0 values, representing difference less than 127. For Version 4-6, the compression algorithm ratios are lower than that achieved with Version 3, but the final data size (BDB size, which contains the General Header + Representation Header

7

+ BDB Body Compressed) achieved is smaller due to the fact that the data to compress (the differences between consecutive sample point values stored in only 1 byte instead stored in 2 byte as in Version 3) also has a much smaller size.

Regarding the performance of the difference compression algorithms, it is worth pointing out the better results achieved by GZip for Versions 4 to 5, but all the algorithms show good performance for Versions 3 to 5. LZW has shown bad performance for Versions 1 and 2.

Figure 10 Average Compression Ratio for different Compression Format

Versions and Compression Algorithms

As a final analysis on the compression ratios, Figure 11 depicts the average compression ratios achieved for different datasets and Compression Algorithms, again just for the compression ratio achieved within the BDB Body. The average compression ratios have been calculated across all the Compression Format Versions tested.

Figure 11 Average Compression Ratio for different Datasets and

Compression Algorithms

There are important differences between datasets. MyIDea dataset obtains the best compression ratios due to a lower channel resolution when capturing the signatures which results in the difference between consecutive values being

more repetitive. Within SVC2004 dataset, there is not a large difference between Occidental and Oriental users, indicating that the compression algorithms obtain the same performance regardless the type of signature. Another interesting result is that no one Compression algorithm achieves much better results than others, however the LZW obtains the poorest compression ratios.

VII. IMPACT ON ALGORITHM PERFORMANCE

Within the proposed Compressed Format Versions, a number of them introduce loss of information within their definition.

As described in Section III, equation (1), Version 5 and 6 entail a controlled difference error between the original channel values and the compressed channels values. This absolute difference error is not larger than a value of 1 for Version 5 (error_difference_value equal to 1), and larger than 2 for Version 6 (error_difference_value equal to 2).

Compact Format also implies certain loss of information due to the fact that the values have to be converted from 2 bytes values, in a range of 65536 values, to 1 byte values, in a range of 255 values.

Guassian Mixture Models Algorithm has been implemented based on [13], and all dataset has been tested for those three data formats.

VIII. ALGORITM PERFORMANCE RESULTS

In this section, Figures 12-16 show the results of the algorithm evaluation carried out for the three data formats which imply loss of information, in order to evaluate the impact of this data loss on algorithm performances.

Figure 12 Algorithm Evaluation Results (ROC curves) for MCyT Dataset

Figure 13 Algorithm Evaluation Results (ROC curves) for SVC2004

Occidental Dataset

8

Figure 14 Algorithm Evaluation Results (ROC curves) for SVC2004

Oriental Dataset

Figure 15 Algorithm Evaluation Results (ROC curves) for MyIDea

English Dataset

Figure 16 Algorithm Evaluation Results (ROC curves) for MyIDea French

Dataset

Figure 12 shows how the information lost using the Compact Format on MCyT dataset has an impact on the algorithm performance, whereas the proposed Version 5-6 maintain the same performance level.

For the other datasets (SVC2004 and MyIDea) the information lost using the Compact Format and Version 5-6 doesn’t lead to big differences on the algorithm performance.

It is also worth to highlighting the different algorithm performances achieved by the different datasets. The best algorithm performances have been achieved for MCyT (figure 12) and MyIDea French (figure 16) datasets, with MyIDea English (figure 15) dataset producing slightly worse results. On the other hand, for the Chinese users coming from SVC2004 dataset (figures 13 and 14) the algorithm performance is significantly worse for those users who sign using Chinese characters.

IX. CONCLUSION

The compressed data format proposed out-perform the Compact Data format compression ratios but without information loss due to stored the values in 1 byte instead 2 bytes.

The different approaches for the new compression format show very different results. V1 and V2, which store the original sample point values in two different ways (ordered by sample points or ordered by channels respectively) do not result in a good compression ratio. On the other hand, V3 to V6, which store the difference between consecutive sample point values, achieve good compression ratios.

Both V3 and V4 achieve similar results around 50% of compression ratios (comparable with the results achieved with Compact Format), but without causing any loss of information from the original values. As has been shown, V5 and V6 improve the performance of compression ratio compared to V3 and V4, introducing a limited error between the original values and the recalculated (after compression) signals. The possibility of controlling the Error Difference Level value allows adjusting the error level introduced by this data format to the resolution of the input device. The compression performance for V4 and the possibility of adjusting the Error Difference Level, which can be useful as tablet resolutions increase, lead the authors to propose Version 4 as the contribution for defining the new Compression Data Format within 19794-7.2 [1]. As it has been identified that there exists no significant differences between compression data algorithms, the authors will propose to allow the use of different algorithms in order to compress the data, specifying which one has been used in a field at the Representation Header.

X. ACKNOWLEDGMENTS

This work has been partially funded by Segur@ Project, which is funded by the CDTI from the Spanish Ministry of Industry, Tourism and Commerce. The authors would like to thank J. Ortega-Garcia and J. Fierrez-Aguilar for the provision of the MCyT Signature Database.

XI. REFERENCES

[1] ISO/IEC, "2nd Working Draft 19794-7, Biometric data interchange formats – Part 7: Signature/sign time series data," ed. ISO/IEC: Geneva, Switzerland, 2010.

[2] ISO/IEC, "IS 19794-7:2007, Biometric data interchange formats – Part 7: Signature/sign time series data," ed. ISO/IEC: Geneva, Switzerland, 2007.

[3] ISO/IEC, "CD 29109-7 Information technology -- Conformance testing methodology for biometric data interchange formats defined in ISO/IEC 19794 -- Part 7: Signature/sign time series data," ed. ISO/IEC: Geneva, Switzerland, 2010.

[4] O. Miguel-Hurtado, et al., "Analysis on compact data formats for the performance of handwritten signature biometrics," in Security Technology, 2009.

9

43rd Annual 2009 International Carnahan Conference on, 2009, pp. 339-346.

[5] 7-Zip. (2010, 7-Zip. Available: http://www.7-zip.org/ [6] GZip. (2010. Available: http://www.gzip.org/ [7] M. Dipperstein. (2010, Lempel-Ziv-Welch (LZW)

Encoding Discussion and Implementation. Available: http://michael.dipperstein.com/lzw/

[8] BZip, "BZip2 Implementation," 1.0.5 ed, 2008. [9] RFC 1952, GZIP file format specification version

4.3. Available: http://tools.ietf.org/html/rfc1952 [10] J. Ortega-Garcia, et al., "MCYT baseline corpus: a

bimodal biometric database," Vision, Image and Signal Processing, IEE Proceedings -, vol. 150, pp.

395-401, 2003. [11] D. Y. Yeung, et al., "SVC2004: First international

signature verification competition," Biometric Authentication, Proceedings, vol. 3072, pp. 16-22,

2004. [12] B. Dumas, et al., "MyIdea - Multimodal Biometrics

Database, Description of Acquisition Protocols," in Third COST 275 Workshop (COST 275), Hatfield

(UK), 2005, pp. 59-62. [13] O. Miguel-Hurtado, et al., "On-Line Signature

Verification by Dynamic Time Warping and Gaussian Mixture Models," in Security Technology, 2007 41st Annual IEEE International Carnahan Conference on, 2007, pp. 23-29.


Recommended