Pattern Synthesis for Nonparametric Pattern Recognition

�

P

Section: Pattern

Pattern Synthesis for Nonparametric Pattern RecognitionP. ViswanathIndian Institute of Technology-Guwahati, India

M. Narasimha MurtyIndian Institute of Science, India

Shalabh BhatnagarIndian Institute of Science, India

Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.

INTRODUCTION

Parametric methods first choose the form of the model or hypotheses and estimates the necessary parameters from the given dataset. The form, which is chosen, based on experience or domain knowledge, often, need not be the same thing as that which actually exists (Duda, Hart & Stork, 2000). Further, apart from being highly error-prone, this type of methods shows very poor adaptability for dynamically changing datasets. On the other hand, non-parametric pattern recogni-tion methods are attractive because they do not derive any model, but works with the given dataset directly. These methods are highly adaptive for dynamically changing datasets. Two widely used non-parametric pattern recognition methods are (a) the nearest neigh-bor based classification and (b) the Parzen-Window based density estimation (Duda, Hart & Stork, 2000). Two major problems in applying the non-parametric methods, especially, with large and high dimensional datasets are (a) the high computational requirements and (b) the curse of dimensionality (Duda, Hart & Stork, 2000). Algorithmic improvements, approximate methods can solve the first problem whereas feature selection (Isabelle Guyon & André Elisseeff, 2003), feature extraction (Terabe, Washio, Motoda, Katai & Sawaragi, 2002) and bootstrapping techniques (Efron, 1979; Hamamoto, Uchimura & Tomita, 1997) can tackle the second problem. We propose a novel and unified solution for these problems by deriving a compact and generalized abstraction of the data. By this term, we mean a compact representation of the given patterns from which one can retrieve not only the original patterns but also some artificial patterns. The compactness of the abstraction reduces the computational requirements, and

its generalization reduces the curse of dimensionality effect. Pattern synthesis techniques accompanied with compact representations attempt to derive compact and generalized abstractions of the data. These techniques are applied with (a) the nearest neighbor classifier (NNC) which is a popular non-parametric classifier used in many fields including data mining since its conception in the early fifties (Dasarathy, 2002) and (b) the Parzen-Window based density estimation which is a well known non-parametric density estimation method (Duda, Hart & Stork, 2000).

BACKGROUND

Pattern synthesis techniques, compact representations and its application with NNC and Parzen-Window based density estimation are based on more established fields:

• Pattern recognition: Statistical techniques, parametric and non-parametric methods, classifier design, nearest neighbor classification, probabil-ity density estimation, curse of dimensionality, similarity measures, feature selection, feature extraction, prototype selection, and clustering techniques.

• Data structures and algorithms: Computational requirements, compact storage structures, efficient nearest neighbor search techniques, approximate search methods, algorithmic paradigms, and di-vide-and-conquer approaches.

• Database management: Relational operators, projection, cartesian product, data structures, data management, queries, and indexing techniques.

�

Pattern Synthesis for Nonparametric Pattern Recognition

MAIN FOCUS

Pattern synthesis, compact representations followed by its application with NNC and Parzen-Window density estimation are described below.

Pattern Synthesis

Generation of artificial new patterns using the given set of patterns is called pattern synthesis. Instance based pattern synthesis uses the given training patterns and some of the properties of the data. It can generate a finite number of new patterns. Computationally this can be less expensive than deriving a model from which the new patterns can be extracted. This is especially useful for non-parametric methods like NNC and Parzen-Window based density estimation (Duda, Hart and Stork, 2000) which directly use the training instances. It is argued that using a larger synthetic set can reduce the bias of the density estimation or classification (Viswanath, Murty & Bhatnagar, 2006). Further, the usage of the respective compact representations can result in reduc-tion of the computational requirements.

This chapter presents two instance based pattern synthesis techniques called overlap based pattern synthesis and partition based pattern synthesis and their corresponding compact representations.

Overlap Based Pattern Synthesis

Let F be the set of features (or attributes). There may exist a three-block partition of F, say, {A, B, C} with the following properties. For a given class, there is a dependency (probabilistic) among features in A U B. Similarly, features in B U C have a dependency. However, features in A (or C) can affect those in C (or A) only through features in B. That is, to state more formally, A and C are statistically independent given B. Suppose that this is the case and we are given two patterns X = (a1, b, c1) and Y = (a2, b, c2) such that a1 is a feature-vector that can be assigned to the features in A, b to the features in B and c1 to the features in C, respectively. Similarly, a2, b and c2 are feature-vec-tors that can be assigned to features in A, B, and C, respectively. Then, our argument is that the two pat-terns (a1, b, c2) and (a2, b, c1) are also valid patterns in the same class or category as X and Y. If these two new patterns are not already in the class of patterns

then it is only because of the finite nature of the set. We call this type of generation of additional patterns as overlap based pattern synthesis, because this kind of synthesis is possible only if the two given patterns have the same feature-values for features in B. In the given example, feature-vector b is common between X and Y and therefore is called the overlap. This method is suitable only with discrete valued features (can be of symbolic or categorical types also). If more than one such partition exists then the synthesis technique is applied sequentially with respect to the partitions in some order.

One simple example to illustrate this concept is as follows. Consider a supermarket sales database where two records, (bread, milk, sugar) and (coffee, milk, bis-cuits) are given. Let us assume, it is known that there is a dependency between (i) bread and milk, (ii) milk and sugar, (iii) coffee and milk, and (iv) milk and biscuits. Then the two new records that can be synthesized are (bread, milk, biscuits) and (coffee, milk, sugar). Here milk is the overlap. A compact representation in this case is shown in Figure 1 where a path from left to right ends denotes a data item or pattern. So we get four patterns in total from the graph shown in Figure 1 (two original and two synthetic patterns). Association rules derived from association rule mining (Han & Kamber, 2001) can be used to find these kinds of dependencies. Generalization of this concept and its compact repre-sentation for large datasets are described below.

If the set of features, F can be arranged in an order such that F = {f1, f2 , ..., fd } is an ordered set with fk being the k th feature and all possible three-block parti-tions can be represented as Pi = {Ai, Bi, Ci } such that Ai = (f1, ..., fa ), Bi = (fa+1, ..., fb ) and Ci = (fb+1, ..., fd ) then the compact representation called overlap pattern graph is described with the help of an example.

Figure 1.

�


POverlap Pattern Graph (OLP-graph)

Let F = (f1, f2, f3 , f4 , f5 ). Let two partitions satisfying the conditional independence requirement be P1 = { {f1}, { f2 , f3 }, { f4 , f5 }} and P2 = { {f1, f2 } , {f3 , f4 }, { f5 }}. Let three given patterns be (a,b,c,d,e), (p,b,c,q,r) and (u,v,c,q,w), respectively. Since (b,c) is common between the1st and 2nd patterns, two synthetic patterns that can be generated are (a,b,c,q,r) and (p,b,c,d,e). Likewise three other synthetic patterns that can be generated are (p,b,c,d,e), (p,b,c,q,w) and (a,b,c,q,w) (note that, the last synthetic pattern is derived from two earlier synthetic patterns). A compact representa-tion called overlap pattern graph (OLP-graph) for the entire set (including both given and synthetic patterns) is shown in Figure 2 where a path from left end to right end represents a pattern. The graph is constructed by inserting the given patterns, whereas the patterns that can be extracted out of the graph form the entire synthetic set consisting of both original and synthetic patterns. Thus from the graph given in Figure 2, a total of eight patterns can be extracted, out of which five are new synthetic patterns.

OLP-graph can be constructed by scanning the given dataset only once and is independent of the order in which the given patterns are considered. An approximate method for finding partitions, a method for construction of OLP-graph and its application to NNC are described in (Viswanath, Murty & Bhatnagar, 2005). For large datasets, this representation reduces the space requirement drastically.

Partition Based Pattern Synthesis

Let P = {A1, A2,...,Ak} be a k block partition of F such that A1, A2,...,Ak are statistically independent sub-sets. Let Y be the given dataset and ProjA1(Y) be the projection of Y onto features in the subset A1. Then the cartesian product:

ProjA1(Y) X ProjA2(Y) X … X ProjAk(Y)

is the synthetic set generated using this method.The partition P satisfying the above requirement can

be obtained from domain knowledge or from association rule mining. Approximate partitioning methods using mutual information or pair-wise correlation between the features can also be used to get suitable partitions and are experimentally demonstrated to work well with NNC in (Viswanath, Murty & Bhatnagar, 2004).

Partitioned Pattern Count Tree (PPC-Tree)

This compact representation, suitable for storing the partition based synthetic patterns, is based on Pattern Count Tree (PC-tree) ( Ananthanarayana, Murty & Subramanian, 2001) which is a prefix tree like struc-ture. PPC-tree is a generalization of PC-tree such that a PC-tree is built for each of the projected patterns i.e., ProjAi(Y) for I = 1 to k. PPC-tree is a more compact structure than PC-tree which can be built by scanning the dataset only once and is independent of the order in which the given patterns are considered. The con-struction method and properties of PPC-tree can be found in (Viswanath, Murty & Bhatnagar, 2004). As an example, consider four given patterns viz., (a,b,c,d,e,f), (a,u,c,d,v,w), (a,b,p,d,e,f), (a,b,c,d,v,w) and a two block partition where the first block consists of first three features and the second block consists of remaining three features. The corresponding PPC-tree is shown in Figure 3 where each node contains a feature-value and an integer called count which gives the number of patterns sharing that node. There are two PC-trees, one corresponding to each block. A path in the PC-tree for block 1 (from root to leaf) concatenated with a path in the PC-tree for block 2 gives a pattern that can be extracted. So, in total six patterns can be extracted from the structure shown in Figure 3.

Figure 2.

�


An Application of Synthetic Patterns with Nearest Neighbor Classifier and Parzen-Window Based DensityEstimation

Classification and prediction methods are among vari-ous elements of data mining and knowledge discovery such as association rule mining, clustering, link analysis, rule induction, etc (Wang, 2003). The nearest neighbor classifier (NNC) is a very popular non-parametric clas-sifier (Dasarathy, 2002). It is widely used in various fields because of its simplicity and good performance. Theoretically, with infinite number of samples its error rate is upper-bounded by twice the error of Bayes classi-fier (Duda, Hart, & Stork, 2000). NNC is a lazy classifier in the sense that no general model (like a decision tree) is built until a new sample needs to be classified. So, NNC can easily adapt to situations where the dataset changes frequently. But, computational requirements and curse of dimensionality are the two major issues that need to be addressed in order to make the classifier suitable for data mining applications (Dasarathy, 2002). Prototype selection (Susheela Devi & Murty, 2002), feature selection (Isabelle Guyon & Andre Elisseeff, 2003), feature extraction (Terabe, Washio, Motoda, Katai & Sawaragi, 2002), compact representations of the datasets (Maleq Khan, Qin Ding & William Perrizo, 2002) and bootstrapping (Efron, 1979 and Hamamoto, Uchimura & Tomita, 1997) are some of the remedies to tackle these problems. But all these techniques need to be followed one after the other, i.e., they cannot be combined together. Pattern synthesis techniques and compact representations described in this chapter provide a unified solution. Similar to NNC, the Parzen-Window based density estimation is a popular non-parametric density estimation method

(Viswanath, Murty, & Kambala, 2005). Efficient implementations of NNC which can

directly work with OLP-graph are given in (Viswa-nath, Murty & Bhatnagar, 2005). These use dynamic programming techniques to reuse the partial distance computations and thus reduce the classification time re-quirement. Partition based pattern synthesis is presented in along with efficient NNC methods (Viswanath, Murty, & Bhatnagar, 2006). An efficient NNC imple-mentation with constant classification time is presented in (Viswanath, Murty, & Bhatnagar, 2004). Pattern synthesis and compact representation schemes can reduce the bias of the density estimation and are shown to perform well in a density based intrusion detection system (Viswanath, Murty, & Kambala, 2005). These methods are, in general, based on the divide-and-con-quer approach and are efficient to work directly with the compact representation. Thus, the computational requirements are reduced. Since the total number of patterns (i.e., the size of the synthetic set) considered is much larger than those in the given training set, the effect of curse of dimensionality is reduced.

FUTURE TRENDS

Integration of various data mining tasks like associa-tion rule mining, feature selection, feature extraction, classification, etc., using compact and generalized abstractions is a very promising direction. Apart from NNC and Parzen-Window based density estimation methods, other applications using synthetic patterns and corresponding compact representations like decision tree induction, rule deduction, clustering, etc., are also interesting areas which can give valuable results.

Figure 3.

�


PCONCLUSION

Pattern synthesis is a novel technique which can enhance the generalization ability of lazy learning methods like NNC and Parzen-Window based density estimation by reducing the curse of dimensionality effect and also the corresponding compact representations can reduce the computational requirements. In essence, it derives a compact and generalized abstraction of the given dataset. Overlap based pattern synthesis and partition based pattern synthesis are two instance based pattern synthesis methods where OLP-graph, PPC-tree are respective compact storage structures.

REFERENCES

Ananthanarayana, V.S., Murty, M.N,. & Subramanian, D.K. (2001). An incremental data mining algorithm for compact realization of prototypes, Pattern Recognition 34, 2249-2251.

Dasarathy, B.V. (2002). Data mining tasks and meth-ods: Classification: Nearest-neighbor approaches. In Handbook of data mining and knowledge discovery, 288-298. New York: Oxford University Press.

Duda, R.O, Hart, P.E & Stork, D.G. (2000). Pattern classification, 2nd Edition. Wiley Interscience Publica-tion.

Efron, B. (1979) Bootstrap methods: Another look at the Jackknife. Annual Statistics, 7, 1-26.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(March),1157-1182.

Hamamoto, Y., Uchimura, S., & Tomita, S. (1997). A bootstrap technique for nearest neighbor classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(1), 73-79.

Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann Publishers.

Khan, M., Ding, Q., & Perrizo, W. (2002). K-nearest neighbor classification on spacial data streams using p-trees. PAKDD 2002, LNAI 336, 517-528. Springer-Verlag.

Susheela Devi, V., & Murty, M.N. (2002). An in-cremental prototype set building technique. Pattern Recognition Journal, 35, 505-513.

Terabe, M., Washio, T., Motoda, H., Katai, O., & Sawaragi, T. (2002). Attribute generation based on as-sociation rules. Knowledge and Information Systems, 4(3), 329-349.

Viswanath P., Murthy, M.N, & Bhatnagar, S. (2004). Fusion of multiple approximate nearest neighbor clas-sifiers for fast and efficient classification, Information Fusion Journal, 5, 239-250.

Viswanath, P., Murty, M.N, & Bhatnagar, S. (2005). Overlap based pattern synthesis with an efficient near-est neighbor classifier. Pattern Recognition Journal 38, 1187-1195.

Viswanath, P., Murty, M.N, & Kambala, S. (2005). An efficient Parzen-window based network intrusion detector using a pattern synthesis technique. Proceed-ings of the First International Conference on Pattern Recognition and Machine Intelligence, Indian Statistical Institute, Kolkata, 2005, 799-804.

Viswanath, P., Murty, M.N, & Bhatnagar, S. (2006). Partition based pattern synthesis with efficient algo-rithms for nearest neighbor classification. Pattern Recognition Letters, 27, 1714-1724.

Wang, J. (2003). Data mining: Opportunities and chal-lenges. Hershey, PA: Idea Group Publishing.

KEY TERMS

Bootstrapping: Generating artificial patterns from the given original patterns. (This does not mean that the artificial set is larger in size than the original set, also artificial patterns need not be necessarily distinct from the original patterns.)

Curse of Dimensionality Effect: The number of samples needed to estimate a function (or model) grows exponentially with the dimensionality of the data.

Compact and Generalized Abstraction of the Training Set: A compact representation built by using the training set from which not only the original patterns but some of new synthetic patterns can also be derived.

�


Prototype Selection: The process of selecting a few representative samples from the given training set suitable for the given task.

Training Set: The set of patterns whose class labels are known and which are used by the classifier in clas-sifying a given pattern.

Date post:	29-Nov-2023
Category:	Documents
Upload:	iiits
View:	0 times
Download:	0 times

Pattern Synthesis for Nonparametric Pattern Recognition

Documents