+ All documents
Home > Documents > Automatic mobile photographer and picture diary

Automatic mobile photographer and picture diary

Date post: 13-Nov-2023
Category:
Upload: independent
View: 2 times
Download: 0 times
Share this document with a friend
7
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/254043449 Automatic mobile photographer and picture diary ARTICLE · JANUARY 2012 DOI: 10.1109/EAIS.2012.6232813 CITATION 1 READS 30 3 AUTHORS, INCLUDING: Plamen P Angelov Lancaster University 117 PUBLICATIONS 2,173 CITATIONS SEE PROFILE Javier Andreu Imperial College London 19 PUBLICATIONS 61 CITATIONS SEE PROFILE All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. Available from: Javier Andreu Retrieved on: 05 February 2016
Transcript

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/254043449

Automaticmobilephotographerandpicturediary

ARTICLE·JANUARY2012

DOI:10.1109/EAIS.2012.6232813

CITATION

1

READS

30

3AUTHORS,INCLUDING:

PlamenPAngelov

LancasterUniversity

117PUBLICATIONS2,173CITATIONS

SEEPROFILE

JavierAndreu

ImperialCollegeLondon

19PUBLICATIONS61CITATIONS

SEEPROFILE

Allin-textreferencesunderlinedinbluearelinkedtopublicationsonResearchGate,

lettingyouaccessandreadthemimmediately.

Availablefrom:JavierAndreu

Retrievedon:05February2016

Automatic Mobile Photographer and Picture Diary

Plamen Angelov, Javier Andreu, Tu Vuong

Intelligent Systems Research Area School of Computing and Communications

Lancaster University United Kingdom

Abstract-In this paper a novel application concerning smart phones is introduced which makes use of the recently introduced autonomous approach for visual landmarks identification based on video input. The problem of recognizing new or pre-defined features in a video stream surfaced a long time ago. Many workable solutions have been proposed and implemented but few can work in an embedded mobile device with sustainable quality. Either they are too slow or very computational costly. Embedded devices such as smart mobile phones have a limited and not scalable amount of resources. Some important aspects to consider when coding a mobile application are processing power, memory and energy in a great extent. This paper describes a novel recursive (and thus fast) method of automatic landmark recognition that is applied to such smart mobile devices. The application uses the inbnilt camera to retrieve information about the external context. This information is used as input of an evolving algorithm that is embedded and runs on the mobile device.

I. INTRODUCTION

Rapid advancements in mobile phone technologies and computing devices, in general, have greatly lowered the barrier of entry for most people. Nowadays, it is not unusual to own a handheld device which is as powerful as the most expensive consumer PCs from less than a decade ago. One of the most standardized technologies embedded in mobile phones is the camera, yet it has seen little uses other than taking occasional pictures. Many attempts have been made to better utilize the camera's mobility and improvements in processing power to extract relevant information from portable devices in real time [1-3]. It will be a great help to people for their daily tasks [4] and especially for people with Alzheimer disease and Dementia. One of the goals is to recognize unique landmarks from a given video stream [5].

This work intends to build a picture diary from collected 'landmarks' in real time. A picture diary is composed of a camera that retrieves pictorial information that users may consider as highlights and a picture album to download and view those pictures. This type of applications is appreciated in psychological and behavioral studies [6]. In this research, the idea of using landmarks as highlights of our life is a sort of visual-based attention model. Some research groups have even developed ad-hoc mobile devices for this use. For example, Microsoft Labs developed a portable digital camera device named SenseCam [7] that is a wearable device able to autonomously detect events or changes on the scenery and

978-1-4673-1727-6/12/$31.00 ©2013 IEEE

trigger automatically pictures. This ad-hoc device is designed for this specific matter and its hardware and software components are optimized to just be used for this purpose. The whole application also provides a software (external) photo album where all pictures are displayed and ordered in a time basis. Users can inspect those albums and remember where they have been and what they did. This type of applications can be a supportive system for memory handicapped persons such as Alzheimer or dementia patients.

In this paper we implemented an evolving recognition method of landmarks in a smart phone. The evolving algorithm has self-learning and self-developing capacities so then it does not require any training. As a difference with other solutions, it is not just sensorial system composed of filters and thresholds. The algorithm is fully adaptive and learns from the inputs retrieved from the camera. While the application runs, it is becoming more close to the user as it is adaptively learning the relation of each frame with all previous cached landmarks. The similarity of the current image frame with all previous frames can be calculated recursively. The current candidate landmark can also be compared with the previously identified landmarks.

The algorithm does not require storing all frames (since it is fully recursive), but only the previous landmarks making an automatic videodiary of interesting places visited to which one can easily attache time tag and possibly a location identifier building a simple map/route of the journey. The algorithm just keeps in memory a small amount of values which represent the density of the bins that an image is sub-divided to and the set of previously identified landmarks. A more detailed explanation of the application of the same algorithm for mobile robots can be found in [8] [9-11].

This paper is organized as follows, in section 2 fundaments of this work are described, in section 3 the evolving algorithm applied is explained and, finally, in section 4 it is evaluated in a mobile device.

II. BACKGROUND

A. Landmark Recognition.

In mobile robot navigation, a landmark can be described as anything that stands out and is unique compared to other objects recorded in a video stream. The ability to record new landmarks as well as to recognize previous ones can be valuable to localize itself in the surroundings when navigation technologies such as the Global Positioning System (GPS) are

102

unavailable. For normal consumers, automatic landmark recognition is a novel way to utilize the ever more ubiquitous cameras present in modem mobile phones: to build a picture diary of our everyday life without the tedium of manually taking every picture. The inherent limitations in processing power, energy and storage capacities of mobile devices make devising a strategy for landmark recognition without human involvement a challenging task, however. A practical implementation should not rely on a pre-built database of landmark templates, nor should it require off-line processing. The following literature reviews attempt to demonstrate the approaches to this problem and how they are different fromour approach.

B. Visual-based Attention odel.

One possible approach to identify visual landmarks is reported in [12]. It is focused on tracking interesting/salient spots in the video streams based on visual attention model [13] then selecting the most robust landmarks. The saliency-based model of visual attention is composed of four main steps:

a) Extract features from each frame to form feature maps. The features used are based on psychophysical studies on primate visual systems. In this research, the authors used the two opponent colors red/green, blue/yellow and comer detection as feature maps.

b) Each feature map is converted to a conspicuity map using multi-scale difference of Gaussians filters [14] to highlight the parts that differ.

c) The conspicuity maps are integrated together competitively into a saliency map.

d) Find the most active locations, so-called spots of attention, of the scene from the saliency map.

After finding out the spots of attention, they are tracked on a frame-by-frame basis using feature matching. To improve tracking stability, each spot is characterized by an n-dimension feature vector f where n is the number of features considered in the attention model. As the robot moves from place to place, new spots are added to and removed from the tracking database as they move in and out of the view, forming a trajectory for each tracked spot. A new landmark I is defined when its corresponding trajectory satisfies the robustness criteria, which is described by the length of said trajectory. The authors tested the algorithm with four video sequences acquired from a robot navigating in an in-door environment over a distance of about 10 meters. Results of the experiments showed that the tracking and landmark recognition performance is robust against the changes in lighting and small variations in the environment such as opening door. It is unclear, however, that this method will work well for automated landmark recognition in mobile devices handled by human:

e) The tests are limited to in-door environment with mostly stable lighting and geometries. More testing in open surroundings with many moving objects are needed before coming to a conclusion.

j) Robot movement is gentle with no abrupt changes, which is an advantage to the tracking algorithm. On the

contrary, human movement is often unpredictable, recorded sequences are jerky and rarely fixed at one location. That may confuse the tracking algorithm and miss many landmarks because most trajectories will be terminated before the robust criteria is satisfied.

[15] gave an insight of how to utilize additional information captured in the scene for automated landmark tracking and recognition based on an extension of the visual attention model.

III. IMAGE ANALYSIS AND AUTOMATIC PHOTOGRAPHER

A. Feature Extraction.

Although, there are different ways to collect specific features from an image to form feature vectors such as image­based recognition techniques (edge, comer, blobs and ridges detection) or making use of metadata through additional sensory inputs (light intensity, temperature, accelerometer, GPS sensors). We focus mainly on an image processing solution of the inbuilt camera. Image analytics algorithms are consuming in terms of memory and processing capacities. In this work, it is proposed a simplistic but very efficient image processing technique to detect the highlights of a scene sequence.

The overall idea of the approach is illustrated in Figure 1 below.

, .. ...... ... ... ... ... ....... ....... .................. , ..... . .... � ..... . ..... ... ... ... ... ... ... ... ... ...................... ... . ,-----_ ....... . Extract

Features

Yes

Calculate Variables

Store new landmarks - - , , , , ,

Figure 1. The overall schematic representation of the proposed approach

The image format is 24-bit RGB that is made up of three 8-bit components Red, Green, Blue. Before further processing, it is first converted to 24-bit HSV (Hue, Saturation, Value) format since it makes the algorithm less susceptible to changes in lighting conditions [16]. To avoid data over-fitting, [17] a

frame of (M XN) pixels is then decomposed into a matrix of smaller grid of (m x n) pixels called ''bin''s. The idea is to have bins with size comparable with the size of objects that will form the landmark. For example if the number of selected bins is 12 and the image has 640 x 480 pixels, the frame is divided

103

into 12 areas of 160 x 160 pixels areas. The mean value of the colour intensity of all pixels belongs to bin is calculated for each colour channel by [17]:

¢jq =_I -!t/&v (mxn) w=1 v=1

(1)

where I &v denotes the intensity of the red colour of the wth

row in vth column of the bin.

Then, ¢jq is the mean value of the C color channel for the

bin formed in the plh vertical and qlh horizontal of the image.

All those values for a single frame compose up a feature vector of dimension j = (p x q) : [1, 2, ... p x q] for that frame,

denoted as xl for a time instance k. Landmarks are indexed

by i = L.N , so a frame landmark is denoted as xl . The

original frame data is not used for the rest of the algorithm and discarded from the memory. This is illustrated in Fig. 2 below.

B. Image Analytics through density modelling.

A previous work on visual landmark [17] proposes the concept of data spatial density to compare similarity between frames. This spatial density measure is the so called "density". The approach is prototype based and just considers a few of the previous prototypes to work out the current density value of a new video frame. The formula to compute this density as kernel estimation is given by [reference ofKDE]:

(2)

where K denotes the kernel function, h is a smoothing

parameter or bandwidth and Xz the Zlh sample in a distribution

of Z Gaussian distributed random samples (Xt,X2, ••• ,xz ).

The type of kernels considered by [9] includes Gaussian and Cauchy Kernels. In the same work, Cauchy type kernel was selected by its recursive properties and no fixed parameters. The density of a sample calculated on a time basis is given by [8]:

1 D(Xk) = N+1 1 + -

1- L dE IIXk -xi II

k-l 1=1

(3)

where dE denotes the distance/dissimilarity between two feature vectors. K is the total number of samples out to date, and k is a time index for the current sample.

In this formula Euclidian distance is used but cosine distance could be easily fitted to the same formula. The representation of the density in Euclidian space gives enough information about the "distance" or "relationship" between visual frames. The previous formula (3) returns a single value that describes the dissimilarity of the current frame compared

to all previously recorded prototypes or highlights of the scene. The formula returns values in the range [0,1J with 0 representing entirely different and 1 - completely identical (100%) to the previous prototypes or highlights. In practical situations, if one highlight has been already retrieved, any following frames processed would not reach 1 anymore. It is due to the fact that any new frame cannot be identical to all previous highlights.

0.01 0.84 0.12 0.87 0.43 0.66 0.4tl 0.32 0.44 0.06 0.15 0.24

0.76 0.92 0.18 0.97 0.33 0.59 0.61 0.83 0.70 0.42 0.02 0.72

0.49 0.54 0.39 0.39 0.21 0.52 0.09 0.21 0.26 0.13 0.58 0.02

LM #1

0.36

.I).

LM #3

Figure 2. For each image frame the visual data is processed to get its HSV representation and divided into beans. Means per bean and by representation are computed and then the density is updated recursively.

In a simple explanation, the algorithm processed all gathered frames by the mobile phone camera one-by-one. At each iteration a density of the new highlight is computed after obtaining a feature vector of the mean value per bin computed by equation (1). If the density drops suddenly, the current image frame will be considered to represent a new 'landmark'. Its feature vector is stored in the list of recorded highlights and its pixel representation is stored in a picture album.

Taking into account the multidimensionality of inputs and adding the similarity measure, the formula becomes [8]:

104

1 D(Xk) =

1 n N+I

l + --LL(x,( - x!F k-1 j=1 1=1

(4)

As the purpose of the approach is to work on a real-time, frame-by-frame basis, (3) is transformed into a recursive formula:

k -1 D(Xk) = (5)

(k -1)(xlxk +1) + 2Ak -Bk

Where cumulative values (ak,bk,ck) are defined to enable

recursion [8 from the current! list]:

(5a)

Initialization of these values are the following

D ( XI ) = 1 . The accumulators ak and Ck can be calculated

based on the current frame's feature vector but bk and /k are

aggregated every frame:

eo = O;Bo = 0 (6a)

C. Density Monitoring Parameters.

To trigger autonomously the picture capturer from the real­time video scene, is needed to calculate other parameters that monitor changes on the density of frames. On the basis of the previously calculated density, some first order statistics such as mean and standard deviation are computed to monitor the dynamicity of the density model:

(7)

(8)

Denote Dk = D(xk) and D as the average of the density.

The recursive version of the standard deviation only depends on the current frame and is formulated as follows:

�k -1 1 -ak = --ak-I + --(Dk -Dk)

k k -1

- k-1 - 1 D=T Dk-1 +kDk

D. Trigging the Mobile Camera Automatically.

(9)

(10)

To detect a highlight in the scene (or landmark) and trigger the camera, the resultant standard deviation, from the mean

calculation of all new frame samples, is used to compare the novelty level of a new sample. In other words, a new sample is considered a novelty according if there is a significant deviation from the mean of all previous processed frames, see Figure 3.

Figure 3. The density plot and landmarks identified

The condition to consider trigging the camera can be expressed as a form of rule:

k

IF ( Dk ) < min Dk - rp. a ( Dk ) i=l

THEN xk is a novelty (12)

where rp denotes parameter which defines the amplitude

of the threshold, sensitive of the trigger.

IV. EVALUATION

A. Performance.

Figure 4. Life Content Window

The platform of evaluation was a smart phone (HTC with Android and Nokia N900 with Symbian OS). The CPU usage statistics shows that there is no significant variation in CPU usage during runtime due to the small difference between the average and maximum values. In terms of system memory consumption, the application allocated approximately 18MB after loading, which corresponds to the first crest. The maximum amount of system memory allocated is approximately 32MB. The actual amount of system memory

used by the application is 32 18=14MB. Likewise, the actual

amount of video memory used by the application is 444

430=14MB. Assuming the memory allocator does not over-

105

allocate; each discovered landmarks consume an average of 256KB of system and video memory.

The view of the front screen seen by the user is shown in Figures 4 and 5.

Landmark Parameters 1

Camera Dimensions

1 320 1:1 x 1 240 1:1

o Simple

How bIg are the landmarks? [l rq' (b "dm

How dIfferent can they be? [ C;ilI'·l'U"

o Advanced Bin Dimensions

) vi

Distance threshold: 1 0. 1 0 1:1 ,-�==------===========::::::J Fig.5 Settings Dialogue

B. Results.

Fig. 6. Diary Window View

Done

Experimental results demonstrated that the application's performance in terms of landmark recognition and resource utilization were suitable for use in mobile devices. Recognized Landmarks are differentiable and agreeable to users. Further testing with both, HTC Android-based and Nokia N900 Symbian-based smart phone types showed that it is able to process up to 30 frames per second with no skipped frame. The diary viewer is able to load hundreds of images for simultaneous viewing without crashing the application. No out-of-memory error has occurred even during the longest recording of about 25 minutes, although the frame rate slowed down gradually. This suggests that the application needs to have two different modes of operation: One is used for debugging/examination purpose and the other is used for all­day recording which does not show and store captured

landmarks in real-time. In the current implementation, all tunable parameters except standard deviation threshold are shown on the user interface. More testing from actual users will provide valuable information to tune the parameters for best results in different circumstances. The video diary which is growing/evolving all the time and has time and location tag is shown in Figure 6.

V. CONCLUSION

This paper describes the implementation of a landmark recognition and video diary application that runs on both, HTC Android-based and Nokia N900 Meego-based smart phone types. The algorithm used is recursive, one-pass, does not require any prior training and is low in resource utilization. The diary viewer built into the application, while simple, shows that it is possible to build a self-contained video diary into smartphones. This is the first step in making better use of the various hardware features that modem smartphones integrate into. The implementation has been built to be easily ported to other platforms, which will not limit it to smartphones but special-purpose devices and robots as well. The self-learning and self-developing landmark recognizer provides a very good performance while consuming very little resources (see section IV.A). It brings forward the opportunity of creating personalized applications with a very long life­time. In this paper, a personal picture diary prototype has been developed to show the potential implication of these algorithms as visual life logging.

VI. REFERENCES

[I] N. R. Gans, H. Guoqiang, K. Nagarajan, and W. E. Dixon, "Keeping Multiple Moving Targets in the Field of View of a Mobile Camera," IEEE Transactions on Robotics, vol. 27, pp. 822-828, 2011

[2] C. Hong and A. C. Kot, "Mobile camera identification using demosaicing features," in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE Intemational Symposium on, 2010, pp. 1683-1686.

[3] K. K. Nundy and S. Sanyal, "A low cost vein detection system using integrable mobile camera devices," Annual IEEE in India Conference (INDICON), 2010, pp. 1-3.

[4] R. Agethen, F. LUTZ, A. Schwarzmeier, G. Fischer, R. Weigel, and D. Kissinger, "An online telemetering system for mobile health parameter monitoring and medical assistance," Fifth Intemational Conference on, in Sensing Technology (lCST), 2011, pp. 470-473.

[5] C. Tao, W. Kui, Y. Kim-Hui, L. Zhen, and F. S. Tsai, "A Survey on Mobile Landmark Recognition for Information Retrieval," Tenth Intemational Conference in Mobile Data Management: Systems, Services and Middleware, MDM '09. 2009, pp. 625-630.

[6] Berry, Emma, Kapur, Narinder, Williams, Lyndsay, Hodges, Steve, Watson, Peter, Smyth, Gavin, Srinivasan, James, Smith, Reg, Wilson, Barbara, Wood, and Ken, "The use of a wearable camera, SenseCam, as a pictorial diary to improve autobiographical memory in a patient with limbic encephalitis: A preliminary report," Neuropsychological Rehabilitation, vol. 17, pp. 582-601, 2007.

[7] S. Hodges, L. Williams, E. Berry, S. Izadi, 1. Srinivasan, A. Butler, G. Smyth, N. Kapur, and K. Wood, "SenseCam: A Retrospective Memory Aid," in UbiComp 2006: Ubiquitous Computing. vol. 4206, Eds., ed: Springer Berlin / Heidelberg, 2006, pp. 177-193.

[8] P. Angelov, "An approach for fuzzy rule-base adaptation using on-line clustering," Intemational Journal of Approximate Reasoning, vol. 35, p. 275,2004.

106

[9] P. P. Angelov and X. Zhou, "Evolving Fuzzy-Rule-Based Classifiers From Data Streams," IEEE Transactions on Fuzzy Systems, vol. 16, pp. 1462-1475, 2008.

[10] J. Andreu and P. Angelov, "An evolving machine leaming method for human activity recognition systems " Journal of Ambient Intelligence and Humanized Computing, vol. 2, pp. 1-12,2011.

[Il] J. Andreu and P. Angelov, "Real-time human activity recognition from wireless sensors using evolving fuzzy systems " IEEE International Conference on in Fuzzy Systems (FUZZ), 2010, pp.2786-2793.

[12] N. Ouerhani, H. Hugli, G. Gruener and A Codourey, "A visual attention-based approach for automatic landmark selection and recognition" Second International Workshop in Attention and Performance in Computational Vision, 2005, pp. 183-195.

[13] L. Itti, C. Koch, and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, pp. 1254-1259, 1998.

[14] D. Marr and E. Hildreth, "Theory of edge detection " Proceedings of the Royal Socely of London. Series B, Biological Science, vol. 207, pp. 187-217, 1979.

[15] N. Ouerhani, A Bur, and H. HugH, "Linear vs. Nonlinear Feature Combination for Saliency Computation: A Comparison with Human Vision Pattern Recognition." Proceedings of the 28th conference on Pattern Recognition, 2006, pp.314-323.

[17] X. Zhou and P. Angelov, "Autonomous Visual Self-localization in Completely Unknown Environment using Evolving Fuzzy Rule-based Classifier," IEEE Symposium in Computational Intelligence in Security and Defense Applications, CISDA, 2007, pp. 131-138.

107


Recommended