MDPF: Minimum Distance PacketForwarding for Search Applications
in Mobile Ad Hoc NetworksHassan Artail, Senior Member, IEEE, and Khaleel Mershad
Abstract—This paper introduces a message forwarding algorithm for search applications within mobile ad hoc networks that is based
on the concept of selecting the nearest node from a set of designated nodes. The algorithm, which is called Minimum Distance Packet
Forwarding (MDPF), uses routing information to select the node with the minimum distance. The goal of the proposed algorithm is to
minimize the average number of hops taken to reach the node that holds the desired data. Numerical analysis and experimental
evaluations using the network simulation software ns2 were performed to derive the lower and upper bounds of the confidence interval
for the mean hop count between the source node of the data request, on one hand, and the node that holds the desired data and the
last node in the set of search nodes, on the other hand. In the experimental evaluation, the performance of MDPF was compared to that
of Random Packet Forwarding (RPF) and Minimal Spanning Tree Forwarding (MSTF). The results agreed with the numerical analysis
results and demonstrated that MDPF offers significant hop count savings and smaller delays when compared to RPF and MSTF.
Index Terms—Data search, message forwarding, routing, MANET, shortest path, simulations.
Ç
1 INTRODUCTION
IN a Mobile Ad Hoc Network (MANET), mobile devices(nodes) may be spread over a large area where access to
external data is achieved through one or more access points(APs). However, not all nodes have a direct link with theseAPs. Instead, they rely on other nodes that act as routers toreach them. In certain situations, the APs may be located atthe extremities of the MANET, where reaching them couldbe costly in terms of delay, power consumption, andbandwidth utilization. Additionally, the access point mayconnect to a costly resource (e.g., a satellite link), or anexternal network that is susceptible to intrusion. For suchreasons and others that concern data availability andresponse time, MANET applications should check for theexistence of the desired data inside the network beforeattempting to connect to the external data source. Anexample would be a node that is searching for data that havebeen requested before by other nodes and are now cachedand available to the rest of the nodes. Another example iswhere there is a group of nodes that have data which may beof interest to other nodes and are willing to share them.These scenarios and others suggest that efficient data searchtechniques be developed for allowing mobile nodes to findthe desired data if it exists in the MANET quickly and withminimum power consumption. Given how ad hoc wirelessnetworks work, searching performance relies on theefficiency of employed routing strategies. Actually, one of
the biggest challenges in MANETS lies in the creation ofefficient routing techniques [5].
Routing protocols are responsible for finding an efficientpath between any two nodes in the network that wish tocommunicate, and for routing data messages along this path.The path must be chosen so that network throughput ismaximized and message delay and other undesirable eventsare minimized. Two main types of routing protocols exist:source routing and destination routing. Destination routingitself is classified into two types: distance-vector routing,used in the RIP Internet protocol [11], and link-state routing,used in the OSPF Internet protocol [12]. Relevant to our workare the Destination-Sequenced Distance Vector (DSDV) andthe Ad hoc On-demand Distance Vector (AODV) protocols,which are distance-vector routing protocols designed forMANET environments. With such protocols, a node main-tains a routing table and a distance vector. The table containsthe neighbor along the shortest path to each destination inthe network, while the vector has the distance (number ofhops) of this path. In high mobility scenarios, the paths fromsources to destinations will become nonoptimal (i.e., not theshortest paths) until the routing tables are updated. WithDSDV, each node periodically updates its shortest paths bysending its distance vector to its neighbors to inform themabout possible distance changes to destinations in thenetwork, while with AODV, a node computes/updates theshortest path to a destination only when it needs tocommunicate with it (i.e., on demand).
Our proposed Minimum Distance Packet Forwarding(MDPF) algorithm is based on the same basic conceptemployed by distance-vector routing protocols in that itforwards the search message to the nearest node thatpotentially stores the desired data item. Actually, MDPFmay be regarded as a high-level routing protocol operating ontop of a distance-vector routing protocol, and thus, together
1412 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
. The authors are with the Electrical and Computer Engineering Depart-ment, American University of Beirut, PO Box 11-0236 Riad El-Solh,Beirut 1107-2020, Lebanon. E-mail: {hartail, kwm03}@aub.edu.lb.
Manuscript received 7 Jan. 2008; revised 4 Dec. 2008; accepted 3 Mar. 2009;published online 6 Mar. 2009.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TMC-2008-01-0003.Digital Object Identifier no. 10.1109/TMC.2009.56.
1536-1233/09/$25.00 � 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, & SPS
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
they form a two-layer protocol that works to minimize the
response time of a search application by following the
consecutive shortest paths. The given analysis focuses on
providing confidence intervals for the mean distance to reach
the node with the desired data and the distance to traverse all
the search nodes. Moreover, it will be demonstrated that
MDPF distributes the average load caused by search traffic
among the visited nodes nearly uniformly in spite of their
possibly nonuniform caching capacities.The rest of this paper is organized as follows: Section 2
describes the proposed approach and illustrates it using an
example application. Section 3 derives expressions for the
system parameters plus key performance measures and
presents the analysis results. Section 4 presents the simula-
tions done using the ns2 software. Section 5 provides a short
survey of related work, while finally, Section 6 ends the
paper with concluding remarks and ideas for future work.
2 FRAMEWORK OVERVIEW
The idea behind MDPF is to use routing table information
for visiting nodes in the order of shortest distance (hop
counts). As implied, this requires valid routing information,
which could be handled through a proactive routing
protocol such as the DSDV protocol or an on-demand
reactive routing protocol, like AODV. We make the
assumption that the set of nodes that hold the search
information is known to all nodes in the wireless network,
and we refer to these nodes as the search nodes, or SNs. We
emphasize though that a requesting node which is inter-
ested in a particular data item does not usually know which
specific SN holds the location of the data item, and
therefore, it must search for it in the SNs.
2.1 Basic Operation
According to MDPF, the client uses the information in therouting tables to send its request to the nearest SN. If an SNdoes not have the requested data, it also uses MDPF andforwards the request to the nearest unvisited SN. Fig. 1shows two example scenarios. Nodes request database data,which may be cached in any of the caching nodes (CNs).The search nodes (SNs) cache previously submittedrequests (queries), and for each such query, an SNmaintains a reference to the result that resides on a CN.In the first scenario, the client submits its request to thenearest SN (SN3), which does not have a matching query.The request is then forwarded in accordance with MDPFthrough SN1 and SN4 before it arrives to SN2, where amatch is found. Using the reference that is stored alongwith the cached query, the request of the client is forwardedto the CN that stores the result. This CN sends the result tothe client whose address is found in the forwarded packet.In the second scenario, no match is found in the SNs, andso, the last visited SN (SN5) forwards the request to the dataserver via the access point. The server retrieves the resultand sends it directly to the client, which, in turn, asks SN3(being its nearest SN) to cache the query. It is noted that thenode at which the client requested the data item that wasretrieved from the outside data server becomes a CN forthis particular item.
When a proactive routing protocol is employed, MDPFcan readily use the routing information to choose the SN thatrequires the minimum number of hops from the set ofunchecked SNs. However, when an on-demand routingprotocol, such as the AODV or Dynamic Source Routing(DSR) protocols, is in place, the routing information to thenearest unvisited SN must be discovered on demand ifnecessary (i.e., if its is not cached, or if it is cached but notfresh) and kept in the routing table for a certain period of
ARTAIL AND MERSHAD: MDPF: MINIMUM DISTANCE PACKET FORWARDING FOR SEARCH APPLICATIONS IN MOBILE AD HOC... 1413
Fig. 1. Two scenarios for request forwarding: Scenario 1 corresponds to a hit and includes steps 1-4, 5a, and 6a. Scenario 2 describes a miss andincludes steps 1-4, 4’, 4’’, 5b, 6b, and 7b.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
time before it expires. More specifically, when a reactiverouting protocol is employed, MDPF works as follows. Eachnode examines its routing table to find if the routinginformation to the unchecked SNs is present and valid. Ifyes, the node acts as in the proactive case and chooses the SNwith the minimum number of hops to reach. If the node findsthat its routing table does not contain the routing informa-tion for one or more unchecked SNs, it broadcasts an SNDiscovery Packet (SNDP) containing the list of uncheckedSNs and a sequence number to all its neighbors. When aneighbor receives an SNDP the first time, it checks its routingtable for the presence of one or more unchecked SNs. If itknows of such SNs, the neighbor sends the routinginformation of these SNs to the requesting node. Else, theneighbor broadcasts (forwards) the SNDP to its ownneighbors. In order to prevent the possibility of floodingthe network with packets, the SNDP contains a hop limit kthat denotes the maximum number of hops away from thesource that the SNDP can be sent to. The value of k dependson the network size, the total number of nodes, thetransmission range, and the number of current uncheckedSNs. For example, for a 1;000� 1;000 m2 network containing100 nodes, and when the number of unchecked SNs is 7, thenetwork diameter is approximately 14 hops and k could beset to 14=7 ¼ 2 hops (assuming that the SNs are uniformlyspread throughout the network). As the number of un-checked SNs decreases, k increases, and vice versa. Whenthis number is 1, k will be equal to the network diameter.Finally, the SNDP source node waits for time � (e.g., 0.1 sec),examines the routing information to the SNs it received, thenchooses the SN with minimum number of hops to reach, andforwards the search packet to it. It also adds the routinginformation to its routing table for future use.
2.2 Evaluation Methodology
The objective of this paper is to propose a messageforwarding algorithm for search applications and analyzeits performance. We focus on the analysis of the hop countto reach the SN having the desired data, and to traverse allthe SNs. We also consider an important metric that concernsfairness, namely, the average traffic load experienced by thedifferent SNs. In addition to the experimental evaluationusing the ns2 simulation software, we analyze the system’sperformance using analytical derivations in the case oftraffic load and numerical analysis in the case of hop count.The reason for this is that simulation by itself does notalways yield completely reliable results, and this fact hasbeen shown in published papers, such as [21].
2.2.1 Results Reliability
Simulation approaches usually suffer from a lack ofreliability because it is difficult to prove that the samplestaken out of a certain probability distribution are indeedtypical, or that the sampling distribution of their meanclosely follows a Gaussian law. For example, probabilitydistributions with a high kurtosis may have sample meanswhich are not close to the actual mean of the distribution.All these problems and others affect the reliability ofsimulation in general, while the analytical solution isusually reliable. Since the MDPF algorithm (or a very closevariant thereof) has, in fact, already been studied in
Computer Science under the name of “Nearest Neighbor”heuristic for the traveling salesperson problem, severalpapers can be found in the literature on the subject.Unfortunately, the problems in this area turned out to beso difficult that researchers who attempted to tackle similarproblems analytically did so under unrealistic simplifyingassumptions such as considering all distances between pairof points statistically independent from each other (i.e.,even ignoring the triangular inequality) [13], while someother researchers obtained analytical solutions on muchsimpler problems, for example, by restricting themselves tothe one-dimensional case, leaving the two-dimensionalproblem unsolved due to its difficulty [17]. In contrast,the probabilistic results for similar problems, which areconsidered the most reliable, have been obtained throughsimulation [10]. Still, however, we do not give up on theanalytical solution of the nearest SN search problem. In thisregard, it might be useful to point out the main challengethat makes the problem a difficult one, even under theinfinite node density assumption. First, it is not hard toobtain an expression for the probability distribution of theclosest SN to a given random SN: as derived in [2]. Weassume that the SNs are randomly, uniformly, andindependently distributed on the considered area, andtherefore, the probability distribution function of thedistance to the closest SN is simply the same as the distancesample minimum. The sample minimum has a closed-formformula [18] which could be applied. But the difficultiesstart to appear when we wish to determine the probabilitydistribution of the distance between the second and thethird SN. The main problem here is that the distribution ofavailable SNs around the second SN is not independentfrom the position of the first SN. Indeed, since the secondSN was the closest one to the first, it means that the secondSN is on the boundary of a disk centered at the first SNwhich is empty of any SN. As one reaches the third, fourth,and nth SN, the empty area becomes an ever morecomplicated union of disks, making it difficult to obtain aprovably accurate analysis.
2.2.2 Implemented Methodology
To avoid the lack of reliability often associated with resultsobtained through simulations, we will derive confidenceintervals for the obtained results. For the numericalanalysis, we will be able to obtain results at the 0.0001confidence level (meaning a probability inferior to 1/10,000of being wrong), while maintaining an acceptable precision(between 10 and 30 percent). For the experimental evalua-tion, we will follow the lead of Andrel and Yasinsac [1] andderive a sample size necessary to obtain a 90 percentconfidence in the computed averages. The next sectiondescribes two methods for implementing the numericalanalysis for the hop count measure, followed by a sectionthat treats the analytical derivation of the average loadexperienced by the SNs. Finally, a third section is dedicatedto presenting the results of the experimental evaluation.
3 HOP COUNT ANALYSIS
The average number of hops between two successivelytraversed SNs is different than average number of hops
1414 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
between two random nodes because the latter representsthe expected number of hops when only one destinationchoice is available, while with MDPF, a client or SN picksthe nearest unchecked SN, and hence, the expected numberof hops is anticipated to be lower. That is, when there aremore choices, it is more likely for a client or SN to find anunchecked SN that is closer to it than when having fewerchoices. Equivalently, as the number of choices decreases,the average number of hops to get to the next SN increases.
Like [2], we assume a rectangular topology with areaa� b and uniform distribution of nodes. Two nodes canform a direct link if the distance x between them is less orequal to the maximum node transmission range r0. In thisanalysis, we are interested in computing the averagenumber of hops to get to the SN that holds the desireddata. Moreover, and for reference, we also derive theaverage number of hops to reach the last SN from arequesting node. We do this by computing the upper andlower bounds of the number of hops using numericalanalysis. However, before describing our approach indetails, we mention two important theorems, which werefer to later. First, [15, Theorem 1] states that for all graphswhere the triangular inequality holds, the length of thenearest neighbor path obeys the following equality:
NNðiÞ ¼ 1
2ðlogðnÞ þ 1ÞOptðiÞ; ð1Þ
where NN(i) is the length of the Traveling Salesperson Tourobtained using the Nearest Neighbor heuristic on a probleminstance i, Opt(i) is the length of the optimal tour of theproblem instance, and n is the number of nodes. Here, theterm “optimal tour” was borrowed from [9] and refers tothe simple cycle of the shortest length containing all thenodes. Next, [16, Theorem 2] specifies that if n points are ina unit square, the optimal path length is at most:
OptðiÞ �ffiffiffi2p
2
ffiffiffinpþ
ffiffiffi2p
2þ 2: ð2Þ
So, we can deduce a worst case bound on the length of thenearest neighbor path in a unit square:
NNðiÞ � ðlogðnÞ þ 1Þffiffiffi2p
4
ffiffiffinpþ
ffiffiffi2p
4þ 1
� �: ð3Þ
However, it is often the average case which is the mostrelevant in practice. So, we are now faced with a standardstatistical problem: estimating the mean of an unknownprobability distribution using sampling. In our case, theprobability distribution of the total path length is not knownto belong to a well-known family (e.g., Binomial, Poisson,Geometric, Gaussian, etc.). It might be argued that since wewill be looking at the distribution of the sample mean, wecould infer that it would tend to be a Gaussian law using thecentral limit theorem. But if we simply rely on this, we woulddisregard the conditions of validity of the theorem, whichwould be a rather risky thing to do: there are probabilitydistributions which do not even have a mean, such asfðxÞ ¼ 1=x2, defined on ½1;þ1�. And, even for the distribu-tions that do have a mean, it is difficult to determine thesample size that would guarantee a sample mean distribu-
tion acceptably close to the normal. The only tool we have inthis regard is the Berry-Esseen theorem [19], which requiresknowing at least some bounds on the third moment (E½X3])and on the standard deviation to be applied. We do not knoweither of these two values in our case. It would be possible toderive some bounds on them, but if these bounds are tooimprecise, the required sample size would become enor-mous. That is why we finally opted for more direct methodsthat allow us to derive confidence intervals without makingany further assumption. We start by describing a naıveapproach: we run, say, 1,000,000 experiments and recordthe smallest and highest values of obtained path length. Wechoose two bounds which are, respectively, much belowthe minimum and much above the maximum. We run1,000,000 experiments a few times again. It is quite probablethat all the values we obtain will be within our bounds. So,we could obtain a result such as 99.9 percent of all path lengthvalues falling within bounds x and y at the 0.0001 confidencelevel or even better (a chance in 10,000 of being wrong). Wethen can obtain a bound on the mean by using the fact thatthe remaining paths have a bounded length (by Theorems 1and 2 that were mentioned above) and that there are only fewof them. The main problem with this approach is that in spiteof the very high confidence level it guarantees, the boundsobtained are far too imprecise, since, in fact, all what we willhave established is a weaker form of the following statement:“We’re pretty sure that the mean must be between theshortest path and the longest path we ever obtained.” This iswhy it was necessary to come up with methods that trade offthe confidence level for precision. In our work, we obtainedour confidence intervals using two methods, which can becombined or used independently.
3.1 Method 1 for Computing the ConfidenceIntervals
If we call B the worst case path length, then to obtain aconfidence interval for the mean, we proceed as follows:
1. Divide the interval [0, B] into n intervals b1; b2; . . . bnwhich need not all have the same size.
2. Run m experiments and record how often the pathlength falls within each interval.
3. Note that the function which maps the event that thepath length falls within the ith interval is a binomialrandom variable, since it has exactly two outcomes:either the path length falls within the interval or itdoes not.
4. Estimate the parameter p, the proportion of thebinomial distributions associated with each of thesen intervals, and obtain a confidence interval ci foreach of these parameters using the observedproportions during our experiments.
We denote by li the lower extremity of the ith confidenceinterval and ui its upper extremity. Clearly, our level ofconfidence, as guaranteed by the Clopper Pearson method[6], that a given one of these intervals contains the actualvalue of these parameters is not the same as our level ofconfidence that all these intervals contain their correspond-ing parameters at the same time. The Clopper-Pearsonmethod does not apply to the latter situation, but we willsee later how to deduce the level of confidence for the entire
ARTAIL AND MERSHAD: MDPF: MINIMUM DISTANCE PACKET FORWARDING FOR SEARCH APPLICATIONS IN MOBILE AD HOC... 1415
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
set of intervals {b1; b2; . . . bn} from the level of confidence of
the individual intervals. For the time being we just assume
that we are highly confident that all the parameters pi fall
within their computed confidence intervals. We let mi be
the left extremity of bi and Mi its right extremity, and let
fðxÞ be the probability density function of the path length x.
We then have
mi � x �Mi
) mifðxÞ � xfðxÞ �MifðxÞ
) mi
Z Mi
mi
fðxÞdx �Z Mi
mi
xfðxÞdx �Mi
Z Mi
mi
fðxÞdx
)Xni¼1
mi
Z Mi
mi
fðxÞdx� �
�Xni¼1
Z Mi
mi
xfðxÞdx
�Xni¼1
Mi
Z Mi
mi
fðxÞdx� �
)Xni¼1
mili � � �Xni¼1
Miui:
ð4Þ
Since we can make Mi �mi as small as we want, and since
we can make li � ui arbitrarily small provided that we can
make the sample size arbitrarily large, we now have a
method, which, in principle, can yield results as precise as we
want. But, in fact, the sample size would become prohibi-
tively large if the precision we require is too great. A problem
remains: how are we going to determine the confidence level
of our estimation? We recall that a given procedure yields a
level of confidence � if the actual parameter falls within the
confidence interval with a probability of 1� �. We now letA1
and A2 be two events with respective probabilities of 1� �and 1� �. We can write:
P ðA1 \A2Þ ¼ P ðA1Þ þ P ðA2Þ � P ðA1 [A2Þ) P ðA1 \A2Þ � ð1� �Þ þ ð1� �Þ � 1
) P ðA1 \A2Þ � 1� ð�þ �Þ:ð5Þ
Therefore, the confidence level for the entire set of
intervals is the sum of the confidence levels for each
interval. As expected, our confidence decreases as the
number of interval decreases. Thus, it appears that this is
one possible method to trade confidence for precision.
3.2 Method 2 for Computing the ConfidenceIntervals
Our second method is based on the theorem that the sample
mean is an unbiased estimator of the mean, meaning that
the probability distribution of the sample mean and the
sampled probability distribution have the same expectation.
This theorem holds for all probability distributions which
have a finite mean. The motivation behind the use of the
sample mean is that as a consequence of the Central Limit
Theorem, the sample means usually tend to be much more
tightly grouped than the sample values and this phenom-
enon increases with sample size. While this is the reason
why the confidence intervals we have obtained through this
method are relatively narrow, we do not rely on this fact to
derive them. So, we proceed as follows:
1. First, decide the total number of experiments that weare going to perform, say N .
2. Choose two numbers n1 and n2 such thatn1 � n2 ¼ N . The value of n1 will be the size of asingle sample and n2 will be the number of samples.
3. We run the N experiments, considering them to ben2 samples of n1 size each. We compute the mean ofeach of the n2 samples. We get the smallest meanand the largest mean. We then choose two boundswhich are, respectively, say 5 percent smaller thanthe smallest mean and 5 percent larger than thelargest mean.
4. We run the N experiments again. Hopefully, allthe means will be within the bounds chosen instep 3. If not, we keep widening the interval untilall means are within it for several more runs of theN experiments.
5. We are now able to obtain a good lower bound on theproportion of path lengths that lie within the interval,with a high-level confidence, using the same basicprinciples as the Clopper Pearson method.
3.3 Computing Confidence Intervals under theInfinite Node Density Assumption
In order to compute confidence intervals for E½HSNData� and
E½HSNLast� (expected number of hops to reach SN with data,
and to reach last SN, respectively), we proceed as follows:
first, we generate a random number of search nodes NSN by
getting each of the coordinates as a random number
uniformly distributed between 0 and a (the side length of
the wireless network’s square topography, i.e., b ¼ a). Since
we assume infinite node density, the expected number of
hops is equal to the euclidean distance between the two nodes
divided by r0. Given that all nodes are generated randomly,
there is no harm in choosing the first one to be the initial SN,
e.g., SN1. Then, we start building the nearest neighbor path,
by finding the closest SN, SN2, and then, the closest SN to SN2
not equal to SN1 or SN2, which would be SN3, and so on. When
we reach the last remaining SN, we compute the total path
length for this sample, (so, we are sampling, in fact, the
distribution of the maximum path length) and also compute
the average path length (to obtain a sample of the correspond-
ing distribution). We repeat this experiment a number of
times, say Nexp, in order to have a sufficient sample size.Before going further, we need to distinguish between
two possible methods of sampling: once we have obtainedthe position of the SNs and computed the length of one fullpath length, we can either discard the node sample andstart with a new set, or reuse the same sample but startfrom a different origin. Both methods can be used to
compute the mean, but it must be noted that only the firstone can be used to gain accurate information (besides justthe mean) about the path length probability distribution,because the second method samples the sample distribu-tion, not the original distribution. While the distribution ofthe sample mean is guaranteed to have the same expected
value as the underlying distribution, this doesn’t extend toother parameters, and therefore, the node set should beregenerated each time a new path length is computed.
1416 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
We now describe how we obtain the confidenceintervals using an example. Assume that we performed10,000 experiments, and in 40 percent of the cases, thepath length fell between, say, 36 and 40. So, how do wecompute the confidence interval at the 0.0001 confidencelevel? Let p be the proportion of the path lengths that fallbetween 36 and 40 in the actual probability distribution.Then, we obtain an interval [pmin; pmax] for the possiblevalues of p, such that these values of p would produceour sample, or a sample closer to the mean, with aprobability of 99.99 percent or more. The above interval isreferred to as a confidence interval at the 0.0001 level forp. We can use binary search to determine the values ofpmin and pmax, but this would require the computation ofthe binomial distributions for large samples (Nexp ¼10;000). If we recall the formula for the binomialdistribution, we observe that we will have to computethe terms of the sum:
F ðmÞ ¼Xmi¼0
n!
i!ðn� iÞ! pið1� pÞn�i: ð6Þ
The naıve algorithm clearly would not work: The factorialterms are too large to fit in the standard C++ data types, andmoreover, most of the terms are too small to be expressedusing a C++ data type. Additionally, the naıve algorithmwould be extremely slow: for a sample of size 10,000, wemight have to do more than 100 million multiplications anddivisions, only to compute the distribution for a singlevalue of p. And when taking into account that binary searchrequires about 50 iterations for maximum precision,obtaining the confidence interval for a single bi maynecessitate more than 5� 109 operations which takesconsiderable time. Also, this makes it undesirable to usearbitrary precision arithmetic to solve the precision pro-blem, as it would increase the running time several timesmore. In order to solve these difficulties, we adopted thefollowing approach:
1. To compute a single term of the sum, we maintainthe mantissa and the exponent of the product in twoseparate variables. Since 52 bits are available to storethe mantissa (using the double C++ type), therelative precision of the product remains excellenteven after 10,000 multiplications. The exponent isexplicitly maintained in another variable whichallows us to compute numbers beyond the reach ofthe native data type.
2. We note that while some of the terms of the summight be beyond the range of the native data typesand while the computation of each term mightnecessitate the computation of intermediate valuesbeyond the range of the data types, the significantterms will have a value which can be represented by adouble, since 10,000 of these terms will sum up to 1.
Since the largest of these terms would be the one thatcorresponds to the mean, we start by computing that term.Then, we deduce the previous terms and the followingterms using the following equality:
n!pið1� pÞn�i
i!ðn� iÞ! ¼ n!piþ1ð1� pÞn�i�1
ðiþ 1Þ!ðn� i� 1Þ!�ð1� pÞðiþ 1Þpðn� iÞ : ð7Þ
Therefore, we can obtain the confidence interval withexcellent numerical precision without using large numberarithmetic, and in time proportional to the sample size (asopposed to quadratic in the sample size in the case of thenaıve algorithm).
3.4 Computing Confidence Intervals under theFinite Node Density Assumption
The treatment of the finite node density case is very similarto that of the finite node density case in principle, but itgives rise to a number of performance problems. In thefinite node density case, the number of nodes underconsideration is much larger, since we also have to generatethe non-SN nodes, which in most practical cases wouldprobably be at least an order of magnitude more numerousthan the SNs. Additionally, computing the distancebetween two SNs necessitates running a Breadth-FirstSearch (BFS) rooted at one of the SNs, instead of justcomputing the euclidean distance between the SNs. This, inturns, requires computing the adjacency list of each node. Inorder to speed up this process, we used mainly two tricks:
1. We divided the square a� a area into a grid whereeach cell is an r0 by r0 square and kept the list ofthe nodes present in each cell. Clearly, a node in agiven cell can only reach the nodes in the same cellor in directly adjacent cells. So, instead of having tocheck the entire list of nodes to know which onesare reachable from the current, we only have to listthe nodes that are in the immediate vicinity ofcurrent node.
2. When running BFS, we stop as soon as an SN isdiscovered, without waiting to process the entiregraph. In the specific case of a graph based ongeographic locations, such as ours, this gives a largespeedup, because in most cases, only a small portionof the graph would have been processed when theclosest SN is discovered (unlike the case of “smallworld” graphs).
3.5 Confidence Intervals Computation Results
Using the two above methods, we obtained the confidenceintervals for the path lengths, as shown in Fig. 2. For each ofthese intervals, we are 99.9 percent sure that the mean fallswithin them. In the infinite node density case, the samplesize was 10,000, while for the finite density case, the samplesize was 3,000. All the corresponding bounds on theoriginal distribution mean are the 0.001 confidence level.To test the correctness and reproducibility of our results, wecompared them to Bettstetter and Eberspacher’s [2] byreproducing their experiments using our own setup. Weobtained a very close agreement (actually, the values weobtained for the number of hops between two randomnodes are the same as theirs with two significant digits,which is the precision with which they decided to presenttheir results).
3.6 Varying the Data Access Pattern
Here, we drop the assumption of the uniform access of
desired data among the SNs and consider a more general-ized form, represented by the Zipf distribution. We suppose
ARTAIL AND MERSHAD: MDPF: MINIMUM DISTANCE PACKET FORWARDING FOR SEARCH APPLICATIONS IN MOBILE AD HOC... 1417
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
that the popularity of individual data items stored in the
SNs is governed by a Zipf pattern [20], which has been used
frequently to model nonuniform distributions [4]. In Zipf
law, a data item ranked ið1 � i � NdÞ is accessed with
probability 1=ði�PNd
k¼1 1=k�Þ, where � ranges between 0
(uniform distribution) and 1 (strict Zipf distribution) and Nd
is the total number of data items. In this analysis, we let i
correspond to the order of SN traversal. That is, we let the
nearest SN to the client (i.e., the first contacted SN) be the
most probable SN to have the desired data, followed by the
next-visited SN, and so on. Put differently, we assume that
the nearest SN to a requesting node holds the most popular
data item(s) for that node, and that the second nearest SN
holds the next most popular data items(s), and so on. This
setup attempts to simulate locality of space by locating data
mostly near the nodes that are interested in them. The Zipf
probability density function for Nd ¼ 20 is illustrated in the
left part of Fig. 3 for different values of �, where it is seen
that as � increases, the probability of finding the data in the
nearest SNs becomes increasingly higher. The effect of
applying the Zipf distribution to the localization of data on
the expected number of hops to get to the SN that holds the
data is shown in the right part of Fig. 3. The exhibited
behavior (e.g., higher � values leading to lower hop counts)
can be attributed to the shapes of the pdfs in the left figure,
meaning that with higher values of �, it is more likely to find
the data by traversing an increasingly fewer numbers of
SNs. It is noted that Zipf does not affect the hop count to the
last SN, since the request will be routed through all the SNs.
4 AVERAGE SEARCH NODE LOAD
Since SNs are ordinary nodes themselves, an objectivewould be to minimize the number of requests handled byeach node without degrading the system’s performance.Given that MDPF calls for forwarding the request to thenearest SN and the requesting node may be any one in thenetwork, the initial SN may then be any of the SNs. Similarly,the second SN may be any of the remaining SNs, and so on.Hence, the order in which the SNs are accessed will beuniformly random. We define the load ratio on SNi; �i, as theratio of number of accesses to SNi to the total number ofrequests issued, and assume that the SNs have varying cachesizes. Having a cache size Ci for SNi with no replication, theprobability of finding a random data item in SNi is
Pi ¼CiPNQD
j¼1 Cj¼ CiCtotal
: ð8Þ
However, when calculating �i all possible positions ofSNi should be taken into account, since the list of SNs maybe accessed in any order. For this purpose, we define thefunction PAðaijnÞ, which is the probability that SNi will beaccessed (or have a request forwarded to) given that it is inposition n (where 1 � n � NSN ). As explained earlier, thisprobability depends on the cache size of all nodes that
1418 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
Fig. 2. Confidence interval for the mean number of hops in four cases.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
follow SNi. However, since the next nodes are considered tobe random, an expected total cache size must be deter-mined. Now, since there is no a priori knowledge of thepositions of each of the other nodes in the sequence, theirsize is estimated using the expected cache size of othernodes. This is determined as follows (N stands for NSN ):
E½C j Ci� ¼Ctotal � CiN � 1
: ð9Þ
We then multiply this value by the number of nodes thatfollow SNi and add Ci to get the total expected cache size ofnode SNi as well as all the nodes that follow it. Dividing theresultant value by the total cache size of the system gives usPAðai j nÞ as follows:
PAðai j nÞ ¼ Ci þðN � nÞðCtotal � CiÞ
N � 1
� �1
Ctotal
¼ ðn� 1ÞCi þ ðN � nÞCtotalðN � 1ÞCtotal
:
ð10Þ
Finally, since the position of SNi is assumed to beuniformly random, the probability of it being accessed (�i)is given by taking the average of PAðai j nÞ for all values of n:
PAðaiÞ ¼ �i ¼1
N
XNSN
n¼1
PAðai j nÞ
¼ 1
N
XNSN
n¼1
ðn� 1ÞCi þ ðN � nÞCtotalðN � 1ÞCtotal
¼ð0þ1þ� � �þðN�1ÞÞCiþððN�1ÞþðN�2Þþ� � �þ1þ0ÞCtotalNðN�1ÞCtotal
¼ ð0þ 1þ � � � þ ðN � 1ÞÞðCi þ CtotalÞNðN � 1ÞCtotal
¼ NðN � 1ÞðCi þ CtotalÞ2NðN � 1ÞCtotal
¼ 1
2þ Ci
2Ctotal¼ 1þ Pi
2:
ð11Þ
Note that in the case of a uniform cache size, pi ¼ 1=NSN ,and hence, (11) becomes
�i ¼1þNSN
2NSN: ð12Þ
The expression in (11) is plotted in Fig. 4, where one SNhas twice the cache size with respect to the others, which, inturn, have the same size. The curves illustrate the loadtrends for the SN with double the capacity and any of theother SNs, as the number of SNs increases. As shown, theload starts high, especially for the double-capacity SN, andthen, decreases toward a lower bound. The curves illustratethat beyond a certain number of SNs, the benefit in terms oflessening the load becomes insignificant, and also show thatthe lower limit of the load per SN is 0.5 when having a largenumber of SNs. Viewing it from another angle, the largestSN will experience the largest load, but as NSN increases,this load gets increasingly closer to that of the other nodes.We find the above results motivating in two ways. First, anSN donating twice the amount of cache size is not penalizedwith much more load. Second, all other nodes will generallybenefit in terms of load. Hence, with MDPF, the loadbalancing property of the system is not greatly disturbedwhen the caching capacities of the SNs fluctuate.
5 EXPERIMENTAL EVALUATION
To experimentally evaluate MDPF, we implemented it andtwo other techniques to which we compare it using the ns2network simulation software. The other techniques are the
ARTAIL AND MERSHAD: MDPF: MINIMUM DISTANCE PACKET FORWARDING FOR SEARCH APPLICATIONS IN MOBILE AD HOC... 1419
Fig. 4. Fraction of load per SN for two different storage capacity cases.
Fig. 3. Property of the Zipf pdf and its effect on the mean number of hops to desired data.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
Random Packet Forwarding (RPF) and Minimal SpanningTree Forwarding (MSTF) which we describe in the sectionafter next. This section presents the results and illustratestheir significance.
5.1 NS2 Simulations Setup
We implemented MDPF, RPF, and MSTF on top of theproactive DSDV and the reactive AODV routing protocols.In the simulated mobile ad hoc network, the wirelessbandwidth and the transmission range were set to 2 Mb/sand 100 m, respectively, while the topography size was setto 750� 750 m2. The network had 100 nodes randomlydistributed in the topography and their movement followedthe Random Way Point movement model supported byns2. The default values of the minimum and maximumnode velocity (Vmin and Vmax) were both set to 2 m/s andthe node pause time to 100 seconds. Each node sends arequest packet for a random data item every 10 seconds.There are 10,000 data items that were disseminateduniformly across all nodes at the beginning of thesimulation. For each data item, the nearest SN to the nodeholding the item is chosen to store the query for that itemalong with the address of that node.
To determine the hop count sample size that would give aparticular confidence level in the sample mean, we followedthe procedure described in [1]. Finding the sample size is notstraightforward, especially in MANET environments wherethere is considerable variation between “short” paths and“long” ones, particularly in large to very large networks.Also, the number of short paths compared to that of longones depends on the distribution of nodes in the networkand on the routing protocol used and how it chooses therouting path between two nodes. Specifically, we used theprocedure explained in [1] to compute the number ofsimulation runs required for achieving at least a 90 percentconfidence level in the presented results. Each run in oursimulations lasted for approximately 3,500 seconds, andeach sample mean computed over one run is the average ofabout 3,000 values (nodes started making requests starting att ¼ 500 seconds). In accordance with the procedure, for eachexperiment type, we initially ran a scenario with defaultparameter values 10 times. In each simulation run, thepseudorandom generator seed (based on the clock of the ns2simulator’s scheduler) and the node movement file werechanged. The average number of hops needed to reach theSN that has the desired data was computed, and the meanplus standard deviation for each set were calculated. Theþ=� precision values were chosen as 3 hops (approxi-mately 10 percent of the measured maximum value). Next,the number of runs required to achieve the 90 percentconfidence was computed using the central limit theorem, asdiscussed in [1]. The required number of runs was found tobe equal to 9, and consequently, we computed the averagesfrom the results of nine simulation runs for each scenario.
5.2 Description of Alternative Search Techniques
In RPF, the next SN to forward the search packet is chosenrandomly from the list of unchecked SNs, while in MSTF,the packet traverses the SN nodes which are connected viaa constructed minimal spanning tree (MST). For this, allSNs must send their routing tables to a randomly chosen
SN at predetermined intervals (2 seconds in the simula-tions). The selected SN will then create an MST and send itto all SNs by unicast or multicast (the simulations usedmulticast). Using this approach, a client can send itsrequest to any of the SNs (the nearest SN if the routinginformation is available). Then, the request is forwardedbetween the SNs in accordance with the MST. Each SN thatreceives a request searches for its answer (response) in itscache. If it finds the answer, it replies to the requester, elseit sends the request to the next unvisited SN along an MSTedge (a list of visited SNs is included in the request packetand updated at each visited SN). If an SN finds no SNalong an MST edge to forward the request to (for example,SN6 in Fig. 5), it sends the request to an unvisited SN alongordinary routing paths. In the example of Fig. 5, a requestis sent to SN1, then to SN2, SN4, SN5, and SN6 along theedges of the MST. At SN6, the request needs to beforwarded to a next unvisited SN along the MST.However, such SN doesn’t exist. Hence, SN6 will forwardthe request to one of the remaining unvisited SNs (SN3 andSN7) along routing paths available from the routingprotocol. If such paths don’t exist, SN6 will send therequest along the reverse path it came from (i.e, to SN5,SN4, then SN2) until the request reaches an SN that has apath to one of the remaining unvisited SNs. Note that thereverse path can be determined by the order of visited SNsin the request packet. In MSTF, even though an MST buildsa tree that links all its nodes with the least number of totalhops hMST , the total number of hops traversed by thepacket, however, could be greater than hMST due to theaforementioned condition, and as illustrated in theexample of Fig. 5. However, we will illustrate later in thissection that MSTF might produce better average searchtimes when the time to search for the data item at an SN issignificantly high.
5.3 Results
To be consistent with the results presented in Section 3.5,we computed the lower and upper bounds of the meanhop count for each scenario, in addition to the averagetaken over all sample values. As was described above,each experiment comprised at least 27,000 points. Tocompute the bounds, we used a procedure that is similarin principle to Method 2 (see Section 3.1.2) by dividing thesample space into 54 groups, each consisting of about500 samples. For each group, the average value was
1420 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
Fig. 5. A sample MST connecting the SNs and a request traversingthe SNs.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
computed, and then, the lower and upper bounds were
taken as the lowest and highest means, respectively, across
all groups. In addition to the bounds, the overall average
was taken over all 27,000 points.
5.3.1 Proactive Routing
In this and the next sections, we examine the effect of the
routing scheme on the performance of all three systems, and
in this section, we consider the DSDV protocol. The top-left
and middle-left graphs of Fig. 6 show the confidence
interval of the mean hop count to reach the SN with the data
reference and to traverse all SNs, respectively. First, we
remark that the results shown in the two graphs are in line
with the results presented in Section 3. In this regard, we
should indicate that the experimental results correspond to
the finite number of nodes case discussed in Section 3. This
is because 100 nodes is not a very large number when
considering the area of the topography and the node
transmission range, but, on the other hand, it allows the
network to almost always stay connected, since with the
Random Waypoint Mobility Model, nodes tend to move
toward the center of the network after reaching the edge [3].
We also note that the numerical analysis intervals are wider
because they do not account for the characteristics of the
routing protocol, the actual number of nodes in the
network, the speed of nodes, etc. What is important though
is that the experimental results exhibit similar trends as
those of the numerical analysis ones. This fact should serve
to increase the confidence in the reliability and representa-
tion of the presented results.With regard to comparing MDPF to MSTF and RPF, the
top two rows of the graphs in Fig. 6 confirm that MDPF
achieves minimum distance packet forwarding (in terms of
number of hops) to traverse all the SNs and reach the node
ARTAIL AND MERSHAD: MDPF: MINIMUM DISTANCE PACKET FORWARDING FOR SEARCH APPLICATIONS IN MOBILE AD HOC... 1421
Fig. 6. Mean number of hops and total search times when DSDV routing is used.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
that holds the reference to the requested data item. On theother hand, when considering the local search time (Tls) ateach SN, MSTF will take less total time when Tls is greaterthan 10 milliseconds, as illustrated in the bottom graph ofFig. 6. This is because the savings in the cumulativeforwarding time become outweighed by the much largertotal search time. Note that the forwarding time in the threesystems was set to 5 milliseconds, which is the averagecommunication time between two nodes.
Finally, Fig. 7 shows the total number of messagesgenerated during the simulation time by the three systemsfor different numbers of SNs. It is noticed that MDPFgenerates the least number of messages (requests, replies,and control packets), because the nearest SN is alwayschosen. Furthermore, the number of messages successfullyreaching their destinations is higher in MDPF than that inRPF, which is why the number of received messages forMDPF is higher than that of RPF. Finally, MSTF generates alarge number of control messages (MST and routing tablemessages) that are sent periodically, which is why thenumber of originated, received, and forwarded messages ismuch greater in MSTF.
5.3.2 Reactive Routing
The results shown in Fig. 8 were obtained through usingthe AODV protocol for routing. These results, whencompared to those depicted in Fig. 6, clearly indicate thaton-demand routing helps in reducing the total distance toboth, the SN with the reference to the required data (call itSNdata), and the last SN in the sequence of traversed SNs(call it SNlast). The reduction in the hop count wasmanifested in all three schemes, and the average numberof hops was reduced by as much as 4.4 to get to SNdata and11.6 to reach SNlast. This can be attributed to the property ofon-demand routing, in general, which offers fresher, andthus, shorter routes to destinations.
Fig. 9 shows that the number of messages under AODVis also decreased when compared to that of DSDV. This isbecause of the reactive nature of AODV in which messagesare sent only when needed (i.e., when search packets aresent), while in DSDV, messages are sent both periodicallyand whenever changes occur in the topology.
5.3.3 Effect of Mobility
The effect of mobility on the performance of the threesystems was also investigated. In this set of experiments,AODV was used for routing, the number of SNs was set to
7, and the node speed (Vmin ¼ Vmax) was varied between 1and 20 m/s while fixing the pause time to 100 seconds.
The graphs of Fig. 10 illustrate that as mobility increasesin the network, the number of hops to find the desired dataand traverse all search nodes increases nearly linearly for allthree systems. Moreover, it appears that all three systemsare affected equally by the increase in the mobility asevidences by the approximately constant offsets separatingthe curves in the graphs.
5.3.4 Effect of Varying the Number of Nodes
In this fourth set of experiments, the number of nodes N inthe network was varied to study it effect on the hop count.The minimum number of nodes used was 80 in order toavoid prolonged network partitions, as it was experimen-tally found that with less than 80 nodes, there were timeinstances at which some nodes could not be reached. Theresults are plotted in Fig. 11 and show the trend of the hopcount as the number of nodes was varied, while fixing thenumber of SNs to 7 and the node’s speed to 2 m/s.
The graphs in the figure show that the hop countgenerally decreases when N is increased. This can beattributed to the fact that with more nodes, the routingalgorithm can find shorter routes to the SNs since there aremore nodes (i.e., choices) available. It can also be noticedthat as N tends toward 200, the decrease in the hop countdiminishes because additional nodes do not necessarilyresult anymore in shorter routes.
5.3.5 Effect of Varying the Access Pattern
Finally, this section shows the effect of varying the accesspattern of the search requests. In this set of simulations,each node sends a request from a pool following a biasedZipf pattern that is also made to be location-dependent inthe sense that nodes around the same location tend to accesssimilar data (i.e., have similar interests). For this purpose,the square area was divided into 25 zones 150� 150 m2
each. Clients in the same zone follow the same Zipf pattern,while nodes in different zones have other offset values. Forinstance, if a node in zone i generated a request for dataitem id following the original Zipf-like access pattern, thenthe new id would be set to (idþ nq mod(i)) mod(nq), wherenq is the database size. This access pattern can make surethat nodes in neighboring grids have similar, although notnecessarily the same, access pattern.
The effect of varying the value of the � parameter (seeSection 3.6) on the number of hops to get to the SN thatknows where the data reside is illustrated in Fig. 12. Two
1422 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
Fig. 7. Total number of originated, received, and forwarded messages when DSDV is used.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
sets of experiments were run, one with locality of space
enabled and another one without it. Clearly, both graphs
illustrate a saving in hop count that increases with the
increase in the number of SNs. Intuitively, locality of space
helps in shortening the path to the SN that holds a reference
to the requested data item, a fact that is confirmed when
comparing the left graph to the right one. Moreover, the
graphs demonstrate that increasing the request popularity
(using the � parameter) also reduces the path length, which
is incidentally consistent with the trend exhibited in Fig. 3.
6 RELATED WORK
Several works have tackled the problem of traversing acertain set of nodes in a network according to some given
ARTAIL AND MERSHAD: MDPF: MINIMUM DISTANCE PACKET FORWARDING FOR SEARCH APPLICATIONS IN MOBILE AD HOC... 1423
Fig. 9. Total number of originated, received, and forwarded messages when AODV is used.
Fig. 8. Mean number of hops and total search times when AODV routing is used.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
criteria. We begin with Espes and Mammeri who propose in[7] an adaptive expansion search method, in which nodes inthe network determine their locations using a GlobalPositioning System (GPS). The route request packet is sentonly to nodes within a certain triangle whose vertex is S(source node), height SD+�, and angle � (where SD is thedistance between source and destination nodes, while � and� are the dynamically changing parameters). Each source
node sends a route request with starting values of � and � tonodes lying within the search triangle, and then, sendsanother route request with new (greater) values after acertain timeout (hence expanding the search triangle), and soon. The reported results show that the proposed systemalways returns a valid route after a given number of attempts.
An approach to using Dynamic Hash Tables (DHTs) in
distributed applications within MANETs was proposed in
1424 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
Fig. 11. Effect of varying the number of nodes on the hop count and the delay.
Fig. 10. Effect of varying node mobility on the number of hops.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
[14]. The approach includes two methods: The first usesDHTs on top of an MANET routing protocol, and hence, itrequires provisions for communication and control mes-sages exchange between the DHT algorithm and the routingprotocol. The second method integrates the DHT into therouting protocol itself such that the next destination foreach message is obtained from the DHT (for maximumefficiency), while the routing path of the message isobtained from the routing protocol. The authors do notdescribe in details how a DHT can generally be integratedinto a particular routing protocol (for example, howdifferent DHTs can be incorporated into different types ofrouting protocols, like proactive, reactive, and geographicones). Instead, they concentrate on Ekta, which is a protocolthey implemented that integrates the Pastry DHT into theDSR protocol. The authors also present an applicationwhich uses Ekta for discovering resources in an MANET,like a specific application on a node or a given type ofnodes. The main difference between such an approach andMDPF is that MDPF is a general algorithm that is weaklycoupled to the underlying routing protocol, in that it onlyuses its services to find the path to the next unvisited searchnode. On the other hand, there is a strong coupling betweenthe DHT approach and the routing algorithm. For instance,the approach requires defining the way how a specific DHTis integrated into a specific routing protocol.
The work in [8] seeks to maintain consistency of serviceprovider information that is cached at different pointswithin the wireless network. Toward this goal, theapproach calls for integrating service discovery function-ality with on-demand routing. This is basically achieved byincluding the information about the required service in aheader that is attached to the routing packet (for example,the RREQ message in AODV). If a route to the destinationSP is not known, the packet is broadcasted to the networkuntil an intermediate node knows the required route orknows the route to an alternate service provider thatprovides the same service type, or until it reaches the SPitself. The reply packet follows the reverse path taken by therequest packet (using a technique similar to the “gratui-tous” flag in AODV). Given that the sought service bindingsmay be cached at multiple intermediate nodes, theproposed approach evaluates experimentally the trade-offbetween forwarding a packet to the nearest (measured inhops) Service Provider (SP) and sending it to the SP withthe most up to date (i.e., the freshest) information. In
comparing our work with this particular one, we canidentify a common feature having to do with using therouting information maintained by the routing protocol toidentify the nearest special node (in our case, it is an SN,while for the discussed scheme, it is an SP). However, inMDPF, the SN that holds a reference to the required data isnot known, and thus, it must be searched for by going to thenearest SN (say, SN1), then to the nearest other SN to SN1,and so on. In contrast, the system in [8] discovers thelocation of the desired service through the route discoverypacket, which carries the service discovery information.Simply put, our approach does not involve any alterationsto the routing protocol in place, which can be proactive,reactive, or hybrid. The approach in [8], on the other hand,requires modifications to the routing protocol, which, inturn, must be proactive.
7 CONCLUSIONS
This paper described a data search algorithm for use inmobile ad hoc networks. The technique, which we calledMDPF, minimizes the total distance (hop count) taken by thesearch packet to traverse the set of mobile search nodes whileusing local routing information found on the nodes. This wasproven through reliably obtained performance results thatwere compared to those of two other search techniques,namely, RPF and MSTF. The proposed algorithm which thepaper analyzes and evaluates its performance may beregarded as being specific to MANETs since it accounts fortheir different dynamic aspects. This does not remove the factthat the carried analysis is valid for other types of networks.Although the search method itself is not 100 percent original,but the approach is justified by the need to have provablyreliable estimates. The value of this approach is that the onlyassumption that was used to derive the confidence intervalsis the fact that the employed pseudorandom generator is agood one, while other statistical approaches assume that thesample size used by the simulation is sufficient to make thedifference between the sample mean distribution andthe normal distribution negligible, with absolutely noevidence to back up this assumption.
ACKNOWLEDGMENTS
The authors would like to thank Mr. Naji Nahas for hisoriginal work on the numerical analysis part of the paper.
ARTAIL AND MERSHAD: MDPF: MINIMUM DISTANCE PACKET FORWARDING FOR SEARCH APPLICATIONS IN MOBILE AD HOC... 1425
Fig. 12. Effect of varying the request popularity and locality of space, on average, hop count.
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.
They also wish to thank the anonymous reviewers for theirvaluable input that helped making this paper a better one.
REFERENCES
[1] T. Andrel and A. Yasinsac, “On Credibility of Manet Simulations,”Computer, vol. 39, no. 7, pp. 48-54, July 2006.
[2] C. Bettstetter and J. Eberspacher, “Hop Distances in Homoge-neous Ad Hoc Networks,” Proc. IEEE Vehicular Technology Conf.,vol. 4, pp. 2286-2290, 2003.
[3] C. Bettstetter, H. Hartenstein, and X. Perez-Costa, “StochasticProperties of the Random Waypoint Mobility Model: EpochLength, Direction Distribution, and Cell Change Rate,” Proc. Int’lWorkshop Modeling, Analysis Simulation Wireless Mobile Systems,pp. 7-14, Sept. 2002.
[4] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “WebCaching and Zipf-Like Distributions: Evidence and Implications,”Proc. IEEE INFOCOM, pp. 126-134, 1999.
[5] J. Broch, D. Maltz, D. Johnson, Y. Hu, and J. Jetcheva, “APerformance Comparison of Multi-Hop Wireless Ad Hoc Net-work Routing Protocols Source,” Proc. Fourth Ann. ACM/IEEE Int’lConf. Mobile Computing Networking, pp. 85-97, 1998.
[6] C. Clopper and E. Pearson, “The Use of Confidence or FiducialLimits Illustrated in the Case of the Binomial,” Biometrika, vol. 26,pp. 404-413, 1934.
[7] D. Espes and Z. Mammeri, “Adaptive Expanding Search Methodsto Improve AODV Protocol,” Proc. 16th IST Mobile Wireless Comm.Summit, pp. 1-5, July 2007.
[8] C. Frank and H. Karl, “Consistency Challenges of ServiceDiscovery in Mobile Ad Hoc Networks,” Proc. Int’l Symp. ModelingAnalysis and Simulation of Wireless and Mobile Systems (MSWim ’04),Oct. 2004.
[9] M. Garey and D. Johnson, Computers and Intractability: A Guide tothe Theory of NP-Completeness. W.H. Freeman Publisher, 1979.
[10] D. Johnson, L. McGeoch, and E. Rothberg, “Asymptotic Experi-mental Analysis for the Held Karp Traveling Salesman Bound,”Proc. Seventh ACM-SIAM Symp. Discrete Algorithms, pp. 341-350,1996.
[11] G. Malkin, RIP Version 2, IETF RFC 1723, www.ietf.cnri.reston.va.us, Nov. 1994.
[12] J. Moy, OSPF Version 2, IETF RFC 1583, www.ietf.cnri.reston.va.us, Mar. 1994.
[13] A. Percus and O. Martin, “Finite Size and Dimensional Depen-dence in the Euclidean Traveling Salesman Problem,” PhysicalRev. Letter, vol. 76, no. 8, pp. 1188-1191, 1996.
[14] H. Pucha, S.M. Das, and Y.C. Hu, “Ekta: An Efficient DHTSubstrate for Distributed Applications in Mobile Ad Hoc Net-works,” Proc. Sixth IEEE Workshop Mobile Computing SystemsApplications (WMCSA ’04), Dec. 2004.
[15] D. Rosencrantz, R. Stearns, and P. Lewis, “An Analysis of SeveralHeuristics for the Traveling Salesman Problem,” SIAM J. Comput-ing, vol. 6, pp. 563-581, 1977.
[16] J. van Leeuwen, “Algorithmic Modeling and Complexity,” lecturenotes, Dept. of Information and Computing Sciences, UtrechtUniv., 2003.
[17] S. Vural and E. Ekici, “Analysis of Hop-Distance Relationship inSpatially Random Sensor Networks,” Proc. Sixth ACM Int’l Symp.Mobile Ad Hoc Networking Computing, pp. 320-331, 2005.
[18] Wikipedia, http://en.wikipedia.org/wiki/Order_statistic, 2009.[19] Wikipedia, http://en.wikipedia.org/wiki/Berry-Esseen_theorem,
2009.[20] G. Zipf, Human Behavior and the Principle of Least Effort. Addison-
Wesley, 1949.[21] S. Kurkowski, T. Camp, and M. Colagrosso, “MANET Simulation
Studies: The Incredible,” ACM SIGMOBILE Mobile ComputingComm., vol. 9, no. 4, pp. 50-61, Oct. 2005.
Hassan Artail received the BS and MS degreeswith high distinction in electrical engineeringfrom the University of Detroit, in 1985 and 1986,respectively, and the PhD degree in electricaland computer engineering from Wayne StateUniversity in 1999. He is currently an associateprofessor at the American University of Beirut(AUB) and is doing research in the areas ofInternet and mobile computing, mobile ad hocnetworks, and vehicle ad hoc networks. During
the past six years, he has published more than 80 papers in topconferences and reputable journals. Before joining AUB in 2001, hewas a system development supervisor at the Scientific Labs ofDaimlerChrysler, where he worked for 11 years in the field of softwareand system development for vehicle testing applications. He is a seniormember of the IEEE and the IEEE Computer Society.
Khaleel Mershad received the BE degree withhigh distinction in computer engineering andinformatics from Beirut Arab University, Leba-non, in 2004, and the ME degree in computerand communications engineering from the Amer-ican University of Beirut (AUB), in 2007. He iscurrently working toward the PhD degree atAUB, where he is doing research in the area ofdata availability in mobile ad hoc networks.
. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.
1426 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. 10, OCTOBER 2009
Authorized licensed use limited to: AMERICAN UNIVERSITY OF BEIRUT. Downloaded on August 26, 2009 at 01:10 from IEEE Xplore. Restrictions apply.