+ All documents
Home > Documents > Hardware complexity and parallel computation

Hardware complexity and parallel computation

Date post: 22-Nov-2023
Category:
Upload: independent
View: 2 times
Download: 0 times
Share this document with a friend
13
HARDWARE COMPLEXITY AND PARALLEL COMPUTATION t (Preliminary Version) tt Patrick W. Dymond and Stephen A. Cook Department of Computer Science University of Toronto 1. INTRODUCTION Parallel time is clearly a fundamental complex- ity resource for synchronous parallel computation. But study of parallel time complexity classes alone does not give complete insight into the cost of a parallel computation: clearly the number of pro- cessors and amount of circuitry involved is also important. In this paper we examine the parallel resource hardware, and relate hardware complexity classes to previously studied resource classes. As motivation for studying hardware complexity, observe that while there has developed a substantial body of evidence showing that sequential Turing machine space can be simulated efficiently by parallel time (see [CS] or [GI], where this relationship is called the "parallel computation thesis"), these simula- tions do not constrain the hardware used by the parallel machine, and in fact use an exponential (in the Tm space) amount of hardware. For example, SAT can be accepted in linear space, thus in linear time on a SIMDAG [GI], but if P; NP, more than a polynomial amount of hardware will be required to do so. To lay the foundations for a study of the parallel resource hardware we found it appropriate to introduce two new models of parallel computers. The trouble with such existing models as parallel RAM's [FW,SS] is that the unit of hardware (the processor) is already unbounded in size. The new models are the aggregate .(which can be thought of as a combinational circuit with cycles), and the hardware modification machine (which consists of a collection of variably connected finite state transducers). For these models a very close relationship between hardware and sequential space will be demonstrated, and in fact this relationship does not depend on using an exponential amount of parallel time. It can also be shown that the plexity classes defined by parallel time on these models are closely related to those defined using previously studied parallel models (such as cir- cuits, alternating Turing machines and varieties of parallel random access machines), as well as to those defined using the sequential (Turing machine) resource reversal. t This paper is largely a condensation of the first author's Ph.D. thesis [Dl]. tt Current address: Department of Electrical Engin- eering & Computer Science, University of Califor- nia, San Diego. This research was supported in part by Natural Sciences and Engineering Research Council of Canada. CHl498-5/80/0000-0360$OO.75 C 1980 IEEE Reversal bounded complexity classes have been studied before [HI], and Pippenger [PI] has used the concept in his study of simultaneous relation- ships between Turing machine and circuit complexi- ties in which time and reversal are related to circuit size and depth. Although Pippenger did not explicitly discuss parallel hardware, he introduced the idea of width of a circuit, a closely related concept. In the fourth section of this paper we discuss simultaneous relationships between sequen- tial (space, reversal), parallel (hardware, time) and circuit (width. depth) resources which provide evidence for an extended version of the parallel computation thesis mentioned above. The reversal [HI] used in a Turing machine (Tm) computation is 1 plus the number of time steps at which any of the heads changes the of its motion. (A pause is not a reversal.) A set A is accepted in reversal R(n) if there exists a Tm which accepts A using at most R(n) reversals on inputs of length n. Because we will be comparing Tm complexity classes with those of circuits, which recognize only sets over the alphabet {O,l}, we will restrict our definitions of Tm time, space and reversal complexity classes to sets over {O,l}*; this in- volves no important loss of generality in our results. Thus DTlME(T) a {AE{O,I}*IA is accepted in time O(T(n» by a multitape Tm} and the classes DSPACE(T) and DREVERSAL(T) are defined analogously. (This prefix "D" stands for "deterministic".) We shall assume that our resource bounding functions map natural numbers. to .natural numbers, and that they always grow at least as fast as log n. We will also require them to be stronglyconstructi- ble, defined as follows: A function T:N+N is strongly constructible pro- vided there is a deterministic Tm M which, on input In, computes T(n) using space max {log· T(n). logn}. These "strongly constructible" functions correspond to the "linear space honest" functions of Seiferas [S4]; they comprise a rich class of functions inclu- ding most bounding of practical interest (e.g. polynomials, flognl k , etc.). Another traditional model of computation which we will use is the (combinational) circuit family, which is described in [Bl,Cl,H4.Pl,R2]. We review the relevant definitions here. A circuit an of n inputs is a directed acyclic
Transcript

HARDWARE COMPLEXITY AND PARALLEL COMPUTATIONt

(Preliminary Version)

ttPatrick W. Dymond and Stephen A. Cook

Department of Computer ScienceUniversity of Toronto

1. INTRODUCTION

Parallel time is clearly a fundamental complex­ity resource for synchronous parallel computation.But study of parallel time complexity classes alonedoes not give complete insight into the cost of aparallel computation: clearly the number of pro­cessors and amount of circuitry involved is alsoimportant. In this paper we examine the parallelresource hardware, and relate hardware complexityclasses to previously studied resource classes. Asmotivation for studying hardware complexity, observethat while there has developed a substantial bodyof evidence showing that sequential Turing machinespace can be simulated efficiently by parallel time(see [CS] or [GI], where this relationship is calledthe "parallel computation thesis"), these simula­tions do not constrain the hardware used by theparallel machine, and in fact use an exponential(in the Tm space) amount of hardware. For example,SAT can be accepted in linear space, thus in lineartime on a SIMDAG [GI], but if P ; NP, more than apolynomial amount of hardware will be required todo so.

To lay the foundations for a study of theparallel resource hardware we found it appropriateto introduce two new models of parallel computers.The trouble with such existing models as parallelRAM's [FW,SS] is that the unit of hardware (theprocessor) is already unbounded in size. The newmodels are the aggregate .(which can be thought ofas a combinational circuit with cycles), and thehardware modification machine (which consists of acollection of variably connected finite statetransducers). For these models a very closerelationship between hardware and sequential spacewill be demonstrated, and in fact this relationshipdoes not depend on using an exponential amount ofparallel time. It can also be shown that the co~

plexity classes defined by parallel time on thesemodels are closely related to those defined usingpreviously studied parallel models (such as cir­cuits, alternating Turing machines and varieties ofparallel random access machines), as well as tothose defined using the sequential (Turing machine)resource reversal.

t This paper is largely a condensation of the firstauthor's Ph.D. thesis [Dl].

tt Current address: Department of Electrical Engin­eering & Computer Science, University of Califor­nia, San Diego. This research was supported inpart by Natural Sciences and EngineeringResearch Council of Canada.

CHl498-5/80/0000-0360$OO.75 C 1980 IEEE

Reversal bounded complexity classes have beenstudied before [HI], and Pippenger [PI] has usedthe concept in his study of simultaneous relation­ships between Turing machine and circuit complexi­ties in which time and reversal are related tocircuit size and depth. Although Pippenger did notexplicitly discuss parallel hardware, he introducedthe idea of width of a circuit, a closely relatedconcept. In the fourth section of this paper wediscuss simultaneous relationships between sequen­tial (space, reversal), parallel (hardware, time)and circuit (width. depth) resources which provideevidence for an extended version of the parallelcomputation thesis mentioned above.

The reversal [HI] used in a Turing machine (Tm)computation is 1 plus the number of time steps atwhich any of the heads changes the directi~n of itsmotion. (A pause is not a reversal.) A set A isaccepted in reversal R(n) if there exists a Tmwhich accepts A using at most R(n) reversals oninputs of length n.

Because we will be comparing Tm complexityclasses with those of circuits, which recognizeonly sets over the alphabet {O,l}, we will restrictour definitions of Tm time, space and reversalcomplexity classes to sets over {O,l}*; this in­volves no important loss of generality in ourresults. Thus DTlME(T) a {AE{O,I}*IA is acceptedin time O(T(n» by a multitape Tm} and the classesDSPACE(T) and DREVERSAL(T) are defined analogously.(This prefix "D" stands for "deterministic".)

We shall assume that our resource boundingfunctions map natural numbers. to .natural numbers,and that they always grow at least as fast as log n.We will also require them to be stronglyconstructi­ble, defined as follows:

A function T:N+N is strongly constructible pro­vided there is a deterministic Tm M which, on inputIn, computes T(n) using space max {log· T(n). logn}.These "strongly constructible" functions correspondto the "linear space honest" functions of Seiferas[S4]; they comprise a rich class of functions inclu­ding most bounding ~unctions of practical interest(e.g. polynomials, flognlk, etc.).

Another traditional model of computation whichwe will use is the (combinational) circuit family,which is described in [Bl,Cl,H4.Pl,R2]. We reviewthe relevant definitions here.

A circuit an of n inputs is a directed acyclic

graph with nodes labelled fromi{xl .x2••••• xn} u

BI u B2, where Bi • {flf:{O,I} +{O,l}} = the set of

boolean functions of rank i. Nodes of a labelledn

with x. are called "inputs and have indegree 0; anodev1 with label g€B. (called a gate) has indegree

1

i and one edge into v is associated with each argu­ment of g; there is a sequence 0 of nodes designat­ed as outputs, and every node of an lies on a path

from an input to an output. An assignment x of nvalues from {O,l} to the n input nodes of a in-

nduces values on every node of the circuit in theobvious way, and we say that the sequence of valuesinduced on the nodes of 0 is the output of thecircuit. an computes a function fn on strings over

{O,l}n if, for every input assignment x €{O,l}n,an n

outputs f (x).

Let size (an) be the number of gates in an.

The depth d(v) of a node v is 0 if v is an inputnode; otherwise d(v) = the length of the longestpath from an input to v. Depth(a

n) is max d(v).

v€an

Width (an) max #{vIO<d(v)Si and therelSiSdepth(an)

is an arc from v to a node w,d(w»i}.

This notion of circuit width, introduced by Pippen­ger [PI], intuitively corresponds to the optimalnumber of gate values one would need to remember soas to avoid any recalculation while determining theoutput of the circuit by first evaluating all gatesof depth 1, then using these values to evaluate allgates at level 2, etc., ignoring costs associatedwith re-examining inputs.

For AS{O,l}*, let An. An{O,l}n. A circuit an

recognizes An if it has a single output node andcomputes the characteristic function of An. Afamily {a } of circuits recognizes A if for every n,

na recognizes An. Finally, define SIZE(L) (DEPTH(L),

nWIDTH(L» = {AI3{a } which recognizes A and

nsize(a) (resp. depth(a ),width(a » • O(L(n»}. We

n n nwill discuss the issue of uniformity of circuitfamilies in connection with aggregates in the nextsection.

We conclude this section by mentioning someproperties which our Tm'sand circuits can be pre­sumed to have.

Property 1.1 If M is a Tm which works in time T(n),space Sen) and reversal R(n), then

T(n) - Q(n),S(n) - O(T(n», R(n) • O(T(n»,

T(n) • O(R(n).(n+S(n»),Sen) - O(log T(n», R(n) -n(log T(n».

Property 1.2 If {a lis a family of circuits withn

size(a ) - L(n), width(a ) • Wen) and depth(a ) •n n n

D(n), then

L(n) • Q(n),D(n) • O(L(n», Wen) = O(L(n»,

L(n) • O(W(n).D(n»,D(n) Q(log L(n», Wen) = Q(log L(n».

All the assertions of Property 1.1 can be provedfor any halting Turing machine which examines allbits of its input using straightforward techniques,given our convention that all complexity bounds are

O(R(n»Q(logn). For example, one shows that T(n)=2by observing that the number of moves between areversal depends on the amount of non-blank work­space available, which can be increased by at mostn + a constant factor after each reversal.

The assertions of Property 1.2 can similarlybe established with the exception of the last. Oneneed only assume that there is a path from everyinput node and every gate to the output node.Regarding the last assertion, it has been observedthat by writing the function computed by a circuitas a formula in disjunctive normal form one canconstruct an equivalent circuit of constant width.Consequently, all sets (even nonrecursive ones) arein WIDTH(O(l» and we cannot hope to relate thecomplexity of the resource width to resources onTuring machines or other models. However, there isa sense in which width does correspond to Tm re­sources if we impose the restriction Wen) =Q(log(L(n», as will be seen in Section 2.

Thus we shall henceforth assume that our cir­cuits and Tm's satisfy the above properties.

We will also be discussing synchronous circui ts,in which all gates at any depth i receive theirinputs only from gates of depth i-lor from inputnodes. It is easily shown that for any family ofcircuits recognizing a set A, there is anotherfamily of synchronous circuits which recognizes A,has the same depth and width as the original family,and has size no greater than the square of the sizeof the original (since the size is bounded above bythe product of width and depth, and bounded belowby each of the width and the depth).

2 • AGGREGATES

Circuit depth can be justified as providing auseful characterization of parallel time (as in[R2,CI]), but the size of a circuit doesn't alwaysappear "to properly estimate parallel hardwarerequirements. The problem can be traced to thefact that a circuit has no way to re-use its gatesas its computation proceeds, while a reasonableparallel machine would presumably be able to re-useits hardware. In order to study the notion of hard­ware in a "circuit-like" setting, we will describea parallel model called an aggregate.

The aggregate model combines features from thesequential circuits or logical nets of switchingtheory (e.g. [BW] and [02]) and from Goldschlager'sconglomerate model [Gl], while differing from both.Like circuits, each aggregate works for exactly oneinput length and each gate computes only boolean

functions of degree S 2. Like the finite statecontrols in conglomerates, the gates of an aggre­gate work synchronously so that there is no possi­bility of ambiguity in a computation. And unlikeeither circuits or conglomerates, the input conven­tion for aggregates is such that either the hard­ware used or the time taken in a computation can be.sublinear in the input size.

Formally, an aggregateSn

on inputs

xl ,x2, ••• ,xn is a directed graph (not necessarily

acyclic) whose nodes ha,,:,e labels from B2

uBl

U{x}

(recall B. = {f/f:{O,1}1+{O,1}}). A node v with. 1

label fEBi

must have indegree i, and one in-edge

of v is associated with each argument of f. A nodewith label x is an input node and must have in­degree zero. Associated with each input node v isa register R

vconsisting of a sequence of rlognl

distinct nodes, which specifies which input Xi will

be assigned to v. (Distinct nodes have disjointregisters.) There is also a distinguished pair ofnodes (vO,v

l) called output nodes. A configuration

C of Sn is an assignment of 0 or 1 to each node v

of 8n (the assigned value is called the output of

v). A computation of Sn is a sequence CO,CI, ••• ,CTof configurations which satisfies:

a) All nodes in Co ·have output 0 except for nodes

computing the constant function 1.

b) If a node v has label fEB i , then in Ct+

lit

has output equal to f applied to the input(s)of v in C •

t

c) If v is an input node, then v has output 0 inC for t~rlognl and in general, in C +rl 1t t ognv has output xi +l ' where iis the value in

binary notation of the register Rv

in Ct.

The output of Sn is defined to be the output of the

node Vo in the first configuration CT

in which VI

has output 1. The running time, time(Sn)' is the

maximum over all inputs of the index T. The hard­ware size,hardware(Sn), is the number of nodes in

Sn. (For aggregates viewed as transducers, i.e.

computing a function, we extend this definitionslightly, see [el].) We shall only consider aggre­gates which are well defined in the sense they havean output as defined above for every input oflength n.

The unusual input conventions we have chosenneed justification. The reason that inputs xaren't fed directly into aggregates (as they iare for circuits) is that this would entailhardware(Sn) ~ n, whereas we are interested in sub-

linear hardware bounds. In fact, the value of aninput node v could be computed from the index in R

v

362

using a (combinational) selection circuit of sizeO(n) and depth O(logn). (This is the reason forthe delay of rlognl time units for R.y to affect v.)Our convention of not counting the stze of theselection circuit is similar to the convention ofnot counting the input tape in measuring spaceused by an off-line Turing machine. In any case,this convention does not significantly affect ~ases

in which the hardware bound is n(n).

An aggregate 8 recognizes a set Anc{O,l}n ,ifn -

nan outputs 1 on an input x = x

l,x

2, ••• ,x

n<=> XE:A •

A family of aggregates {S } recognizes a setn

AE{O,l}* if S recognizes An for all n. Definen

AG-TIME(T) = {L\L is recognized by a family ofaggregates {a } and time (S ) = O(T(n»}

n n

AG-HARDWARE(H) = {LIL is recognized by a familyof aggregates {S } and hardware (a ) = O(H(n»}.

n n

According to the definitions above, it ispossible for an aggregate family to recognize anon-recursive set (just as a circuit family can)because of the fact that a different machine isallowed for each input size. Since we wish to re­late aggregate and circuit resources to those ofTuring machines (which only recognize recursivesets) we will have to either increase the power ofour Turing machines, or reduce that of our circuitsand aggregates. The approach used by Borodin in[Bl] is to require that the family be uniform insome sense; that is, that the members of the familybe enough alike so that there would be an efficientalgorithm for describing the circuit fora giveninput size.

Unfortunately, there is no obvious choice fora precise definition of "efficient algorithm" inthe above sentence, and a variety of possibilitieshave been suggested [Bl,C2,Pl,R2]. Here we will usethe following definition for uniform circuit, dueto Borodin and Cook [C2].

Definition: A circuit family {an} is uniform

if the transformation ln~ can be computed by an

deterministic Tm in space O(log(size(an»). (Here

a is a binary string coding the circuit ex in then n

standard way.)

While for unrestricted circuit families, theclass of sets recognized in a given width or depthbound is unaffected by the assumption that the cir­cuit is synchronous, it is not clear that uniformcircuit families possess this property.

However, most families of uniform circuits en­countered in practice can be described as uniformsynchronous families; in particular, this appliesto all the families used to perform simulations inthis paper. Nevertheless, because of the possibledifferences between complexity classes defiaed byuniform families and those defined by uniformsyn­chronous families, we will distinguish these classesby prefixing the class name by an S when it is

Theorem 2.3

DSPACE(S) = UAG-HARDWARE(S) = USWIDTH(S).

Proof By a result of Borodin [Bl],

DSPACE(S) S UDEPTH(S2) s DSPACE(S2). 0

More details can be found in [Dl].

Corollary 2.2 DSPACE(S) S UAG-TIME(S2) ~ DSPACE(S2).

(possibly invert­R. Once the

voutput of this selector subcircuit has been obtain­ed, it can be carried down to the layer where itwill be needed (rlognl layers below). Since there

~. d h·are at most rlognl 1nput no es, at most t 1S many

lines need be carried down from this level:, so thatthe total width needed is O(S).

While aggregate time is related by the aboveto circuit depth and sequential space, there is evena closer relationship among uniform aggregate hard­ware, uniform circuit width, and sequential space.

Proof To show UAG-HARDWARE(S) ~ USWIDTH(S), theconstruction of Theorem 2.1 is modified by impl~­

menting each input node by a selector circuit ofwidth O(logn). This selector subcircuit computes

v (xi" "value in R is i-I"), where the quotedl~i~n vfunction is computed by ANDing theed) bits of the gates representing

To show DSPACE(S) s UAG-HARDWARE(S), an aggre­gate simulates a space S Tm M with state set Q andworktape alphabet {a,l} by having a "module" ofgates to represent each worktape square, in thestyle of [P2]. Each module remembers a "worktapesquare configuration" which consists of a pair(q,o), q€Qu{$}, o€{a,l}. If M's worktape head iscurrently visiting a particular square with symbol

To show USWIDTH(S) s DSPACE(S), a Turingmachine M simulates a circuit an level by level

starting at levell, by keeping a list of W=cS(n)bits, one bit for each gate on that level. Theorder of the Wbits corresponds to the order inwhich the Wgates appear in the circuit description.To determine a gate's type and the positions of itsinputs on the previous level, M uses extra spaceO(log(size(a

n») = O(S) (see Property 1.2) to gener-

ate a description of the circuit. (In fact, thisdescription may require too much space to writedown, but M ignores most of it and retains only theinformation needed for the gate of current interest.)(Note that if the circuit family is not synchronous,

space 0(S+10g2(size a » will suffice for the simu-n 2

lation, as in space log (an) the level of every node

can be determined.)

truct an input node vi for each circuit input xi.

The register R has constant value i. Let va bevi

the output node for the circuit, and let vI be the

end of a length T(n) chain of identity gates havingthe constant function 1 at the beginning. 0

T(n)·H(n) = n(n), H(n) = n(log(T(n»,T(n) = n(log(H(n».

To convert a circuit to an aggregate, cons-

The first property holds by assuming that theaggregate examines all its inputs; the second maybe assumed because if the hardware is larger thanan exponential in the time", then some gates cannever affect the output (by bounded fan-in) and somay be discarded; the third is because an instan­taneous configuration of an aggregate of hardwareH(n) can be described in O(H(n» bits and no halt­ing computation repeats an instantaneous configura­tion.

necessary to assume that a synchronous family isbeing used. Thus USWIDTH(W) denotes the class ofsets recognized by uniform synchronous familieswithin width W(n).

Theorem 2.1

For aggregates, no such problem arises, be­cause the gates are synchronized by definition.Uniformity of aggregate families is defined in amanner similar to that of circuit families.

The uniform aggregate and circuit complexityclasses UAG-TIME, UAG-HARDWARE, USIZE, UWIDTH,UDEPTH are defined in the same way as the alreadydefined non-uniform classes, except that now therelevant circuit or aggregate family must beuniform.

Proof Sketch: An aggregate can be converted to acircuit by implementing each input node by theselection circuit mentioned earlier. Then eachgate v is replaced by a set {<v,t>la~t~T(n)} ofgates. The gate <v,t+l> has inputs from <wl,t> and

<w2,t>, where wI and w2 are the inputs to v in the

aggregate. The circuit output is <va,T(n» (we can

assume va retains its value in the aggregate oncevl=l).

An aggregate family {B } is uniform if thenn -transformation 1 ~Bn can be comput~d by a determin-

istic Tm in space O(log(hardware(Bn)·time(Sn».

(Note: This definition of uniformity is less3tringent (i.e. allows more families) than that in[CI] .)

Analogously to Properties 1.1 and 1.2, thefollowing can be assumed to hold for aggregates:

(No assumption of constructability is needed inpart (a) of this theorem, since for both circuitsand aggregates a different member of the family isused for each input size. For part (b), T must beconstructible to ensure uniformity.)

Property 2.1 If {B } is a family of aggregatesn

working in time T(n) and hardware H(n) then

(a) AG-TIME(T) = DEPTH(T)(b) UAG-TIME(T) = UDEPTH(T)

a, and M is in state q, the corresponding module isremembering (q,a); each module corresponding to anyother square remembers (<1>,0') where a'is the appro­priate symbol.

Every module is connected to its right andleft neighbors, to the input node v and an associa­ted binary counter which serves as R , and to the

v"clock" which ensures that the modules pause flognlsteps between simulation steps(to allow time forupdate of the input node).

The module currently remembering a square con­figuration with a non-null state (i.e. one of theform (q,a» has all the information it needs,tosimulate one step of the machine (current inputsymbol, current worktape symbol a, current state q)and so can calculate its own new symbol, send theappropriate signal to update the input counter, andnotify either its left or right neighbor of the newstate (depending which way the head moves). Eachmodule can be constructed from a constant number ofgates (i.e. depending on M but not the length ofthe input), the input mechanism requires O(logn)gates and so the total hardware used is O(S). Theuniformity of the aggregate family follows from thefact that S is strongly constructible. 0

As an immediate consequence of the theorem andCorollary 2.2, we observe an interesting relation­ship between uniform aggregate time and uniformaggregate hardware.

Corollary 2.4

i) UAG-HARDWARE(H) E UAG-TIME(H2)ii) UAG-TIME(H) s UAG-HARDWARE(H). 0

The relationship between uniform aggregatehardware and Turing machine space can also be in­vestigated in the non-uniform setting, and again acorrespondence becomes evident.

To begin, we need a definition of non-uniformspace. The concept has been studied before [BI,CI,KL,PI,S5], and synthesizing some of the ideas ofthese sources we propose the following definitions:

A non-uniform Turing machine with space boundS(n) and advice bound-A(n) is a traditional S(n)space bounded machine extended by a read-only"advice" tape of length A(n). Such a machine Maccepts a set Q if there is a function f:N+{O,I}*,If(n)1 = A(n) such that on any input w, Iwl=n, thefollowing holds: M accepts w when started with f(n)on its advice tape ~> W€Q. We also define thecomplexity classes NU-SPACE(S,A) = {QE{O,I}*IQisaccepted by a non-uniform Tm in space bound O(S(n»and advice bound O(A(n»}.

In [Cl] a definition of non-uniform space Sclasses corresponding to the size of sets of finiteautomata is presented and shown to be the same as

Sthe class NU-SPACE(S,2). We now show that non-uniform aggregate hardware corresponds to an appar­ently more restricted class.

Theorem 2.5AG-HARDWARE(S) = NU-SPACE(S,SlogS).

Proof A non-uniform TmM with space c.S(n) andadvice bound d.SlogS accepts the same set as anaggregate family {B } of hardware S by using a

n -description of an as it.s advice. This description

is of length O(SlogS), since it consists of a listof individual gate descriptions, each containingthe gate's serial number, type, and serial numbersof its inputs. Then M simulates B

nusing ideas in

the proof of Theorem 2.3.

Conversely, a hardware O(S) aggregate simula­ting a non-uniform Tm with space S and advice SlogSuses O(S) gates to represent the Tm's worktapesquares and proceeds as in Theorem 2.3. To keeptrack of the Tm's advice tape, M uses a subcircuitconsisting of a set of O(S) gates whose inputs areconnected so as to encode the SlogS bits of informa­tion on the tape. The output of this subcircuitwill be the current bit of the advice tape, andthere will also be a counter (of lengthO(log(SlogS)>> representing the current position ofthe advice tape head. These will be connected tothe rest of the aggregate in the same way as thecircuit used for the input in Theorem 2.3. The sub­circuit encoding the advice tape is described indetail in [DI]. 0

The theorem shows that

AG-HARDWARE(S) = NU-SPACE(S,SlogS)

~ NU-SPACE(S,2S),

but the reverse inclusion seems unlikely, and sothe close relationship which exists between uniformaggregate hardware and space doesn't seem to holdas closely between non-uniform hardware and non­uniform space as defined in [Cl].

We should note that every set is in NU-SPACE(n,2n), and that there are non-recursive sets inNU-SPACE(l,I), and hence in NU-SPACE(S,SlogS). Thusthe corresponding aggregate families, like non­uniform circuits, also accept non-recursive sets,and cannot be included in uniform complexity classes.Nevertheless, it is sometimes possible to show arelationship between certain non-uniform and uniformcomplexity classes_ Results of this kind were firstreported by Karp and Lipton [KL]. Here we mentionthree results similar to those in [KL], except theyinvolve non-uniform aggregate hardware in theirhypotheses.

Theorem 2.6 Let X be any of the three classes P,NF, or NL (NL = NSPACE(logn». Then

X £ AG-HARDWARE (logn) =>X E DSPACE(logn-loglogn).

When X is P or NP the proof is in [el], andwhen X is NL the proof is in [Dl]. In 'all threecases the essential property of non-uniform aggre­gates needed is Theorem 2.5. In fact, the resultscan be generalized to apply to non-uniform. Turingmachines with more general advice bounds. Forexample, we have

NL E NU-SPACE(S,A) => NL E DSPACE(S+A).

Thus the hypothesis would imply an improvement toSavitch's theorem [S2J whenever both S and A are

2o(log n).

While these theorems illustrate the applica­tion of aggregates to the study of non-uniformity,our primary interest is in uniform complexityclasses. The close relationships between uniformaggregates, Turing machines and uniform circuitsdemonstrated in this section indicate that uniformaggregate resources do not define new complexityclasses taken separately. Nevertheless, dependingon the extent to which uniform aggregates are avalid model of parallel computation, these relation­ships do provide insights into the concepts ofparallel time and hardware. And in Section 4 wewill consider new complexity classes obtained bysimultaneously bounding both time and hardware onuniform aggregates.

3. HARDWARE MODIFICATION MACHINES

In [CI] it was observed that existing modelsof parallel machines can be divided into twoclasses, the first consisting of models in whichthe connections between processors are fixedthroughout the computation, and the second contain­ing models in which processors may vary their inter­communication links as the computation proceeds.Circuits, aggregates and conglomerates [GlJ fallinto the first class, while SIMDAG's [GIl andP-RAM's [FW,Wl] fall into the second. (Individualprocessors in these RAM models have the ability toread from global memory locations whose addressesgrow with the computation.) For a given (uniform)model X from the first group there are oftenresults of the form

2(3.1) X-TIME(T) E DSPACE(T) E X-TIME(T )

whereas for many models in the second group thecorresponding results would be

2(3.2) DSPACE(T) ~ X-TIME(T) ~ DSPACE(T ).

The models satisfying (3.2) use powerfulinstructions to obtain their linear time simulationof deterministic space, instructions which wouldseem to require a very complex physical realization.For example, in SIMDAG's, the "store" instructionrequires that any of an unbounded number of pro­cessors be able to store into any of an unboundednumber of global memory locations in a single step.Because of this huge implicit fan-in, Goldschlagerproposed a variant of the SIMDAG model, in which thetime charged for each instruction is proportionalto the time which would be needed to simulate itsexecution on a (fixed structure) conglomerate. Theresulting model satisfies (3.1) ·rather than (3.2).

Observing that in the models satisfying (3.1)each processor is permanently connected to a fixednumber of other processors, while in thosp satis­fying (3.2) each processor has one-step access toany of an unbounded number of other processors, Cook

proposed an intermediate model, which can accessonly a bounded number of processors at anyone time,but which can vary the processors to which it isconnected as its computation proceeds. We providea specific realization of this idea in "hardwaremodification machines" or HMM's. (so called in ana­logy with sequential storage modification machinesdescribed by Schonhage [S3]), which are intended toexplore the complexity classes defined by collect­ions of synchronous machines each of which can re­arrange its comm~nication links by a bounded amountin one step. It will turn out that HMM's satisfy(3.2), and may be in some sense the simplest classof machines which do.

The essential features of HMM's are:

1) Each piece of equipment (called a unit) is afinite state transducer, accepting k input symbolsand computing an output symbol based on its inputsand current state. All active units have the sametransition function.

2) The units operate synchronously, and have theability to adjust the lines on which they receivetheir input symbols in a restricted way. Theselines-can be thought of as "taps" on the outputs ofother units. (We are allowing unbounded fanout.)In particular, a unit U. may in one step redirect

1

one of its input lines to tap a unit U. at distanceJ

no more than two from it. (That is, Uj

must be

either already tapped by one of Ui's lines, or

there must be a Ue tapped by Ui and one of Ue's

lines taps- Uj .)

3) Initially a single unit Uo is the only one

active. (In addition, a fixed tree structure des­cribed below provides Uo with access to the HMM's

input.) Uo starts in state qo and its input lines

are tapping itself (except for one tapping the rootof the input tree). As the computation proceeds,active units can activate other units by bringingthem into the machine from an unbounded pool ofquiescent units and initializing their taps andstate. (The taps must be set to units at distance~ 1 from the creator.) The HMM accepts its inputif U ever enters its "accept" state, and rejectsoit if Uo enters its "reject" state.

4) Input to the HMM is provided by means of afixed complete binary tree of depth rlognl consist­ing of special units each with three taps {O,1,2}.The input word w = w

lw

2•••w

n, wi€{O,l}, is stored

in the n successive units comprising the leftmostleaves of the tree by having each of these unitsoutput either zero or one. (The remaining leafunits each output the blank symbol, B.) The tapsof each leaf unit are connected to its left andright neighbors (taps 0 and 1) and to its father inthe tree (2). A non-leaf unit in the tree has tap2 connected to its father and taps 0 and 1 connectedto its left and right sons, respectively. Thesenon-leaf units output "L" or "R", depending onwhether they are a left son or a right son. The

root of the tree outputs "T" and has its tap 2connected to UO' which in turn has its tap 1

connected to the root as shown in Figure 3.1.These input units never make any transitions.

HMM Input tree for input of length 6.Each box represents an input tree unit,and its taps are shown as lines origin­ating inside the box (terminating in adot on the tapped unit).

Figure 3.1

This input convention has been chosen to allow"random access" to the bits of the input, so thatsublinear time bounds can be studied. Similarly,we do not count the input tree units when we consi­der hardware on HMM's. It is of course clear thatlinear hardware is always required to r~present

inputs, but it is interesting to consider how muchadditional hardware is required for particularcomputations, in the same way that we consider sub­linear space bounds for deterministic Turingmachines. (The convention that the input unitsmake no transitions corresponds to the Tm conven­tion of read-only input.) As with aggregates, ourinput convention has no significant effect in caseswhere at least linear time and linear hardware areused.

An HMM runs in time t on a particular input Xif Uo enters either an accepting or rejecting state

before it has made more than t transitions. Simi­larly, an HMM uses hardware h on a particular inputif Uo enters either an accepting or rejecting state

without there every having been more than h unitsactive (not counting the input tree units). An HMMP recognizes a set A ~ {O,l}* in time T(n) (or inhardware H(n» if, for every w ~ {O,l}*, Iwl=n, Pworks in time T(n) on w (resp., uses hardware H(n)

on w) and P accepts w <=> weA. The complexityclasses defined by hardware and time bounds forHMM's are HMM-TlME (T) and HMM-HARDWARE (H) • Notethat there is no need to impose any uniformity con­dition on HMM's; because a single machine works forall input lengths, HMM's are already uniform.

Analogously to Turing machines and aggregatesthere is no significant loss of generality inassuming that the following holds for HMM's:

Property 3.1 If P is an HMM working in time T(n)and using hardware H(n), then

1) T(n)·H(n) - Q(n),2) T(n) = Q(log H(n»,3) H(n)·log(n+H(n» = Q(log T(n».

In the remainder of this section we will proveresults about HMM complexity classes which aresimilar to, but not the same as, the results inthe previous section about uniform aggregatefamilies.

Theorem 3.1

HMM-HARDWARE(H) = DSPACE(H·(logn+log H).

Proof A deterministic Tm M simulating an H unitHMM P keeps a list of H records, one for each (non­input-tree) unit of P. Each record consists of theunit's current state and output symbol and for eachof the unit's taps, either a negative integer

between -1 and _2rlognl+l (indicating the tap isconnected to one of the input tree units), or anumber i between 1 and H (indicating the tap isconnected to the unit represented by the i-threcord in the list). M can obtain the outputsymbols of input tree units by consulting its inputtape, and has the states and output symbols of allother units. Thus it can make a new list, represent­ing the state of P after another parallel step hastaken place. So the space required for the simula­tion is O(H(logH+logn». We note that no assump­tion of constructibility is required for this simu­lation, since instead of knowing H, M can just addnew records as P adds new units.

Conversely, an HMM P simulating an H·(logH+logn)space Tm M does so using only O(H) units by encodingextra information using the units' taps.

For simplicity of exposition, we will firstsuppose that M has, besides its input tape, a single(binary) worktape of length H log H where H is apower of 2 and we will consi.der' this worktape asdivided into H blocks of length log H. The simula­ting HMM P contains a tree of O(H) units with Hleaves, organized so that (1) by starting at any ofthe leaves and following the taps of the units tothe root a unique sequence of log H bits can berecovered; and (2), given a sequence of log H bits,the corresponding leaf unit can be found by follow­ing a path of taps from the root. The tree isorganized in the same way as that of the HMMinputtree described at the beginning of this section(see Fig.3.l). Thus each leaf ~ codes the sequenceover {L,R} of length log H describing the path fr~

366

HMM-HARDWARE(H) = DSPACE(HlogH).

Simulation details can be found in [Dl]. Noassumption of constructability of H is needed,since successively larger values for H can betried until one works. 0

~ to the root, and a unit tapping onto ~ can re­cover the sequence by following the taps 2 of thetree units from t to the root. Now P can representthe H log H bits in the work tape of M by forminga doubly linked list of H units. where the i-thunit represents a block of log H bits by having itstap 2 point to the appropriate leaf. Note that theinput tree can also be used to store block informa­tion, so that actually a work tape of lengthH(logH+logn) can be represented.

Theorem 3.1 for HMM's stands in close analogy toTheorem 2.3 for uniform aggregates, by exhibitinga precise (up to a linear factor) correspondencebetween hardware and space complexity classes.In the case of HMM's however, less hardware isneeded to simulate a given space bound, and we cantherefore conclude by the space hierarchy theoremthat hardware on these machines with variable,intercommunication structure is strictly morepowerful than uniform aggregate hardware, in whichthe intercommunication paths are fixed.

It remains to show how H sets up the structurerepresenting the configuration graph in O(T) steps.If T is known, this is done by growing a tree ofunits of depth O(T) with N leaves (c.f. Theorem3.1). Each leaf determines a path from the rootand this path determines a configuration corres­ponding to the leaf. By working backwards alongthis path the leaf unit discovers the bits of theconfiguration it represents and can create a listof units representing the bits of the successorconfiguration. This list can then be used by theleaf unit as a guide through the original tree tofind its successor leaf. To remove the assumptionabout knowing T, try T = 1,2,4, ••• at most quadrup-ling.the t~tal time taken. 0

Corollary 3.5

The application of the lemma to the proof ofthe first inclusion of the theorem is as follows:the N nodes of the graph will represent configura­tions (including input head positions) of M, themachine to be simulated. An arc (i,j) means theimmediate successor configuration of i is j, asdetermined by M's transition function. Node I willrepresent the starting configuration, and node Nthe unique accepting configuration. A path oflength ~ N from 1 to N exists <=> M accepts its

input. Since N = O(T·n.cT),rlogNl = O(T). Thusin OCT) transitions H can tell if M accepts.

r2(n) ,Corollary 3.2 If H(n)

Turing our attention from hardware to HMMtime, we show that the inclusions (3.2) at the be­ginning of this section apply to HMM's.

UAG-TIME(T) = UDEPTH(T) ~ HMM-TIME(T).

Thus, time on HMM's is at least as powerful as timeon our other uniform parallel models.

Theorem 3.3

DSPACE(T) ~ HMM-TIME(T) s DSPACE(T2).

Proof The first inclusion will follow from Lemma3.4. The second inclusion will be shown by meansof Lemma 3.6 below which establishes that an HMMworking in time T can be simulated by an alterna­ting Turing machine [CKS] , [R2], [R3] in time T2•The inclusion follows by a result of leKS]. 0

Lemma 3.4 Suppose a directed graph G with out­degree one (self-loops are allowed) and N nodes isrepresented in an HMM H by a collection of N unitssuch that the unit representing node i has itsinput line connected to unit j if~ (i,j) is an edgeof G. Also suppose that node N's arc points toitself and that the corresponding unit is ou~utting

a special symbol s. Then for any t, H can determineif there is a path from 1 to N of length ~ j, forsome j~t, j~2t, in rlogtl steps.

Proof. The argument is suggested by Wyllie's proof[WI] that DSPACE(T) S P-RAM-TIME(T). Each nodeoriginally Is receiving an input from its immediate(I-step) successor. In a single transition of Heach of the units can adjust its input line to taponto its (uniqu~) 2-step successor. After k transi­tions, each of the units will be tapping itsk

2 -step successor. A path from node I to node N of

length ~ 2rlogtl exists iff the unit representingnode 1 is receivings after rlogtl transitions. 0

Lemma 3.6 An HMM which works in time O(T(n» canbe simulated by an indexing alternating Turingmachine (see [R3] for the definition) in time

O(T(n)2) and space O(T(n».

Proof See [DI]. 0

Corollary 3.7

HMM-TIME(T) c UDEPTH(T 2) = UAG-TIME(T2)

~ DSPACE(T2).

Proof The first inclusions follows from the lemmaand a theorem of Ruzzo's [R2] relating circuitdepth and alternating machine time. The next equal­ity is our Theorem 2.1, and the last inclusionfollows from Borodin's theorem. 0

Since Theorem 3.1 relates sequential space toHMM time and Theorem 3.3 relates sequential spaceto HMM hardware, we obtain the following inter­relationship.

Corollary 3.8

i) HMM-HARDWARE(H) ~ HMM-TIME(H·(logn+logH»

T2

ii) HMM-TIME (T) c HMM-HARDWARE (1 (T ».- og +n

of this section to discuss simultaneous relation­ships between sequential and parallel resources.

Pippenger [PI] has investigated simultaneousrelationships between Tm's and uniform circuitfamilies and has present~d four theorems, tworelating time, reversal and size,depth, and tworelating time, space and size,width. These theoremscan be stated as follows:

4. SIMULTANEOUS RESOURCE BOUNDS

Two fundamental resources for parallel machinesare clearly hardware and ·time. Although in Sections2 and 3 we have exhibited close relationshipsbetween either of these resources and sequentialspace separately, we have not yet considered re­lating them simultaneously to sequential resources.

To discuss simultaneous complexity classes wewill need the following notation: for a particularmodel of computation X (such as the Tm or HMM),particular resources A,B (such as time, space) andbounding functions f and g, let X-A,B(f,g) denotethe class of languages recognized by machines ofmodel X using no more than O(f(n» of resource A,and simultaneously using no more than O(g(n» ofresource B. (We will usually omit X when the re­sources make the model clear.)

(4.1)

(4.2)

(4.3)

(4.4)

TIME,REVERSAL(T,R) £ USIZE,DEPTH(TO(l),

Rlog4T)

TIME,SPACE(T,S) £ USIZE,WIDTH(T2,S)

USIZE,DEPTH(L,D) £ TlME,REVERSAL(LO(l),

Dlog~)

USIZE,WIDTH(L,W) £ TIME,SPACE(LO(l),WlogL)

and

Theorem 4.1

These results can be combined and extended to show:

Theorem 4.1 can be proved from (4.1)-(4.4) andProperties 1.1 and 1.2 by first observing thefollowing:

TIME,REVERSAL(T,R) £ UWIDTH,DEPTH(TO(l) ,RO(l»

TIME,SPACE(T,S) £ UDEPTH,WIDTH(T2,S)

USIZE,DEPTH(L,D) E SPACE,REVERSAL(LO(l),DO(l»

USIZE,WIDTH(L,W) E REVERSAL,SPACE(LO(l),wO(l» •

REVERSAL,SPACE(RO(l),SO(l»

UDEPTH,WIDTH(RO(l) ,sO(l».

(4.5)

These follow from (4.1)-(4.4) because 1) the ori­ginal inclusi~ns hold when the right-hand sidesare increased by changing size to width or depth,and time to space or reversal: and 2) the originalinclusions still hold when 10gT is replaced by S orRand 10gL by Wor D. (We are using the propertiesthat 10gT - O(R), 10gL = O(W), 10gL - O(D).) Bythe restriction (3) above, S·R - n(n), so the largerof the two Tm resources space,reversal (for any Tmin the class REVERSAL.SPACE~}S»must be as largeas ;n and also as large as n (since T • O(n·R·S).If S • OCR), we apply (4.5), and if R • n(s) (4.6)to obtain

(4.8)

(4.6)

(4.7)

(4.9) SPACE,REVERSAL(S,R) £ TIME, REVERSAL (SO (1) ,R)

E UWIDTH,DEPTH(SO(l),RO(l»

(4.10) REVERSAL,SPACE(R,S) £ TIME, SPACE (RO(1) ,S)

£ UDEPTH,WIDTH(RO(l),S).

Similarly, for circuits one of the two resourcesW,D must be as large as the square root of the size.If W• 0(1[) apply (4.7), if D - n(l[) (4.8)

(4.11) UWIDTH,DEPTH(W,D) £ USIZE,DEPTH(WO(l) ,D)

£ SPACE,REVERSAL(WO(l) _DO(l»

We will continue to require that our complexitybounding functions be strongly constructible and.grow at least as fast as logn. In addition, whenconsidering the simultaneous complexity classX-A,B(f,g) we shall assume without significant lossof generality that:

, 3 2For example, TIME,SPACE(n ,log n) denotes

the class of languages which are each accepted bya (deterministic) Tm which works in time O(n3) andspace 0(log2n), and UAG-TIME,HARDWARE(T,H) denotesthe class of languages each accepted by a uniformaggregate family which works in time O(T{n»' anduses hardware O(H(n». Note that these simultaneousclasses are different in general from the inter­section of the corresponding single resourceclasses; for example, the set A = {Onlnln~O} can berecognized in either space O(logn) or reversal 0(1),but it can be shown that the product of space andreversal used by any Tm accepting A must be n(n),and hence that A is not in the classREVERSAL,SPACE(l,logn).

We will use this terminology in the remainder

(1) fen) = n(log(g(n»(2) g(n) = n(log(f(n»(3) f(n).g(n) = n(n).

Two complexity resources A,B are said to bepolynomial1y related if for any bounding function T,

the class A(T) is a subset of B(TO(l» and converse-0(1)

1y, B(T) E A(T ). Similarly, a pair A1,A2 of

resources on a model X and another pair Bl ,B2 on a

model Yare simultaneously polynomial related, iffor any bounding functions T,H

0(1) 0(1)X-Al ,A2(S,T) E Y-Bl ,B2(S ,T)

Y-B B (S T) c X-A A (SO (1) TO(l»l' 2 ' - l' 2 ' •

(1) and (2) follow from properties 1.1, 1.2 and 2.1.For uniform circuits and aggregates (3) is also aconsequence of properties 1.2 and 2.1; for Turingmachines with resources (reversal, space), the proofthat the restriction (3) imposes no essential lossof generality is due to Hong [H2].

(4.12) UDEPTH,WIDTH(D,W) oS USIZE,WIDTH(DO(l),W)

~ REVERSAL,SPACE(DO(l) ,WO(l».

The. two cases dealt with above do not exhaustall possibilities; it may be for example thatneither the width nor the depth of a circuit familyis asymptotically as large as IL. Nevertheless,for any specific value of n, one of the tworesources must be as large as IL. It follows thatany language L in UWIDTH,DEPTH(W,D) can be ex­pressed as the union of two disjoint sublanguages

LO

and L1

, satisfying LO

e: USIZE,DEPTH(WO(l) ,D),

L € USIZE,WIDTH(DO(l) ,W) and for any n, each sub­llnguage contains either all or none of the mem­bers of L of that length. By the arguments givenabove, these two languages are each members of the

class SPACE,REVERSAL(WO(l) ,DO(l»and hence theirunion is also in this class (because of the strongconstructability of Wand D). A similar argumentfor space,reversal classes completes the proof. 0

Corollary 4.2

TIME,REVERSAL,SPACE(TO(l),RO(l) ,sO(l»

= USIZE,DEPTH,WIDTH(TO(l),RO(l) ,sO(l».

Proof Both the time and the size are po1ynomiallyrelated to the larger of Sand R, hence to eachother. 0

This simultaneous polynomial correspondencecan be extended to include uniform aggregatefamilies.

Theorem 4.3 . The following classes are equal:

i) REVERSAL, SPACE (TO (1) ,HO(l»

ii) UDEPTH,WIDTH(TO(l) ,HO(l»

iii) UAG-TIME,HARDWARE(TO(l),HO(l».

Proof The proof that the third class is equal tothe other two divides into cases, as above. When Texceeds H, the construction in the second half ofthe proof of Theorem 2.3 is used. When H exceeds T,the conversion of a circuit to an aggregatedescribed in the proof of Theorem 2.1 suffices. 0

The simultaneous polynomial correspondences ofthe theorem can be extended to simultaneous HMM timeand hardware classes as well, as is shown in [H3].

These thebrems establish polynomial relation­ships between parallel time and reversal, as wellas between hardware and space. We have of courseobserved the relationship between hardware and spaceearlier, but the correspondence between aggregatetime (or circuit depth) and reversal is new(although it could be inferred from Section 3 of[PI]). We obtain a more precise bound for thestmulation of aggregate parallel time by sequentialreversal in the next theorem.

Theorem 4.4 UAG-TlME(T) ~ REVERSAL(T2).

Proof We will first need the following "data

processing" lemma, which provid.es a method ofsorting using small reversal.

Lemma S A file of n records of length m each witha key of length p can be sorted by a Tm H in simul­taneous space O(n·m) and reversal O(p), using a"radix sort".

Proof See (Dl].

Let {B } be a uniform family of aggregatesn

such that time(B ) = T(n) and hardware(S ) = H(n).n n

On input of size n, the Tm must first compute a des­cription of Bn• By uniformity this description can

be computed by some machine in space 10g(R.T) =space OCT). Using Lemma S, Pippenger's observationthat functions computable in space OCT) can be com-

0(1)puted in reversal T can be strengthened so thatthe description of the aggregate can be computed in

reversal 0(T2) using techniques of [PI, Proposition3.1]. The description is a set of H records, eachof length 0(10gH+10gn) with keys (identificationnumbers for the gates) of length 10gR.

An additional n records, each containing a bitWi of the input and having a binary key i, l~i~n,

can be obtained in O(logn) reversal by repeateddoubling.

The proof now proceeds as in [PI, Proposition3.3]. Given values for all the gates at any time t,the values of all gates at time t+i can be computedin reversal O(log(H+n» by reduction to sortingusing Lemma S. The basic idea is that a mastertape holds the number of each gate with its currentvalue. The descriptions of all gates can be firstsorted by the number of each gate's left input.Then the values of each gate's left input can beobtained using constant reversal. Similarly, valuesof right inputs can be obtained with a second sort.Following this, a single pass through the recordscan be used to compute the new output values, andthe master tape can be updated by means of one moresort. Values of inputs may be obtained and kept ina queue for each register. Thus one parallel stepof B can be simulated in reversal OCT), and T steps

n 2in reversal O(T ). 0

A specific polynomial upper bound for the simu­lation of reversal by aggregate time is not knownbecause of the apparent need for the Tm to beoblivious [PF], which causes the exponent of thetime bound to depend on the number of tapes beingsimulated. The main idea of the proof of polynomialcorrespondence is based on Pippenger's observationthat a computation ofa Tm be~een reversals issimilar to that of a finite state transducer, andthus can be simulated by a circuit of small depthusing the parallel prefix algorithm of [LF] and [01].

These results provide evidence for

Extended Parallel Computation Thesis:

i) Parallel time and hardware requirements are

simultaneously polynomially related to sequen­tial (Tm) reversal and space requirements.

ii) Parallel time and hardware are polynomiallyrelated.

Like Church's thesis, and the original paralleicomputation thesis of Goldschlager [Gl] and Chandraand Stockmeyer [CS], this thesis cannot be proved,because it relates an intuitive concept (parallelcomputation) to a mathematically precise one(Turing machines). We can, however, obtain evidencefor the thesis by showing that it holds for any"reasonable" model of a parallel machine.

Theorems 4.1 and 4.3 (based ort the seminalideas of [PI]) provide such evidence for part (i),and Corollaries 2.4 and 3.8 support part (ii). Othersupport for the thesis is contained in the work ofHong [H3], who both extends the class of machinemodels providing evidence, and generalizes thethesis to consider nondeterministic and other typesof machines.

The original thesis of [CS,Gll follows from thethesis presented here by observing that space is re­lated to parallel hardware (by part (i» and hard­ware to parallel time (by part (ii». Like theoriginal thesis, this thesis provides no guaranteethat the time of an arbitrary sequential computationmay be reduced by para1le1izing it; it merelystates that the resulting hardware and paralleltime will be polynomially related to the space andreversal of the sequential computation. However,if the reversal of the original computation is smallthen we can obtain a fast parallel computation usingonly polynomially more hardware than the originalcomputation uses space. In contrast to this, theoriginal thesis guarantees a fast parallelizationif the original space is small, but does not boundthe hardware required to achieve this in any way.

In fact, we conjecture that no such bound onhardware is possible in general when transformingspace to parallel time, and that the thesis present­ed here is essentially optimal when both hardwareand parallel time are considered. To state thisanother way, we are conjecturing that it is not ingeneral the case that

SPACE,REVERSAL(S,R) ~

parallel-time,hardware(SO(l),RO(l».

Note that by our thesis, this is equivalent toconjecturing that

parallel-time, hardware (TO(I) ,IiO(l» .:J:

parallel-time, hardware (HO(l) ,TO(l».

Some of the evidence for this conjecture isdiscussed in the next section.

5. NC AND SC

It is widely believed that the "feasible" setsfor sequential computers form a subclass of P. Thebelief that parallel algorithms which use a non-

370

polynomial amount of. hardware are as impracticalas algorithms which require non-polynomial timesuggests that the feasible sets for parallelmachines may also be a subclass of P, becausepolynomial hardware running for polynomial t~e canbe simulated by a polynomial time sequential (Tm)computation.

While we know of no class which exactlycharacterizes the feasible sets for parallelmachines, the simultaneous class

NC = USIZE,DEPTH(nO(l) ,10gnO(1» proposed byPippenger [PI] appears to be a very useful andstable characterization of those sets which can beaccepted extremely qUickly on parallel machineswhich use a feasible amount of hardware.

Pippenger showed that NC =TIME,REVERSAL(nO(l),

10gnO(1», and Ruzzo [R2] has obtained results)which show both that the class remains unchangedunder a variety of definitions of circuit uniform­ity, and that it can be defined in terms of othermachine models including auxiliary pushdownmachines (AUXPDM's) [C3] and alternating Turingmachines (ATH's) [CKS]. More precisely, by closelyrelating alternating Tm's to uniform circuits,Ruzzo shows that:

NC == ATM-TlME, SPACE ( (logn) 0(1) ,10gn)

== ATM-ALTERNATION,SPACE«logn)O(l),logn)0(1)

== AUXPDM-TlME, SPACE ('2 (logn) ,logn) •

For s~quential computations a similar class of

interest is SC = TIME,SPACE(nO(l),(logn)O(l». Thequestion of whether the two simultaneous classes NCand SC are equal was first considered (in a differ­ent form) by Borodin [BI], and the question isstill open. We provide new characterizations forNC and SC in terms of the parallel models HMM's andaggregates as follows:

Theorem 5.1

(a) NC == UAG-TIME,HARDWARE«logn)O(l),nO(l»

SC UAG-TlME,HARDWARE(nO(l),(logn)O(l»

(b) NC HMM-TlME,HARDWARE«logn)O(l) ,nO(l»

SC == HMM-TIME,HARDWARE(nO(l),(logn)O(I».

Proof Clearly TIME,REVERSAL(nO(l),lognO(l» is asubset of

SPACE,REVERSAL(nO(l) ,10gnO(1».

Equality follows since even allowing polynomialspace does not permit a substantial increase intime when the reversal is bounded. (Recall thattime can be no more than (n+space)xreversal.)Similar reasoning shows that

SC == SPACE,REVERSAL«logn)O(l),nO(l».

The theorem then follows from Theorem 4.3.. 0

This shows that the question SC = NC mentioned

[CKS] A.K. Chandra, D.C. Kozen, and L.J. Stockmeyer.Alternation. IBM Research Report RC 7489,1978.

[CS1] S.A. Cook and R. Sethi. Storage requirementsfor deterministic polynomial time recogniza­ble language. J. Comp. Sys. Sci., 7, 1973,pp.354-375.

[CS] A.K. Chandra and L.J. Stockmeyer. Alterna­tion. Conference Record IEEE 17th AnnualSymposium on Foundations of Computer Science,1976, pp.98-l08.

above is a particular case ~f Section 4's questionof whether the parallel (HMM) resources TIME-HARD­WARE are po1ynomial1y related to the resourcesHARDWARE, TIME. One can make arguments based on thepebbling lower bounds of [P5l and [LT] which suggestthat neither NC nor SC is a subset of the other.

SC was studied in '[C2l, where it was shown tocontain all languages log space reducible to deter­ministic context-free languages. It follows from[Bl] that NLOG = NSPACE(logn) is contained in NC.Furthermore, since NC is clos~d under complementa­tion, the class co-NLOG = {AIAENLOG} is also in NC

2 . 0(1) 2(in fact, in NC = USIZE,DEPTH(n ,log n). Fromthe AUXPDM characterizations given above, Ruzzowas able to show that context-free language recog­nition problems are also in NC 2• We can use thesame characterization to show that a problem firstconsidered by Goldschlager is in NC2. DefineMPCVP = {xix encodes a monotone planar circuit instandard form and a set of inputs such that the out­put gate evaluates to true}. (In a standard formmonotone planar circuit, gates at any level receiveinputs only from the previous level and lines tothese inputs do not cross.)

Let POLYLOG-PDM denote the class of setsaccepted by log space bounded (nondeterministic)auxiliary pushdown machines operating in polynomi~l

time. Ruzzo [R2] has shown that POLYLOG-PDM ~ NC •

2Theorem 5.2 MPCVP E POLYLOG-PDM, so MPCVP E NC •

See [Dl].

The corresponding result in [G2] shows· thatMPCVP E DSPACE(10g2n), although our result is basedon the ideas in an earlie~ version of [G2] in whicha 10g3n space bound was obtained. The theorem isinteresting partially because Goldschlager hadearlier shown in [Gl] that both the planar circuitvalue problem and the monotone circuit value problemwere log-space complete for P (see [HU]) and thusvery unlikely to be in 10g2n depth-or 10g2n space(seeresults of [CSl]). Our theorem points out thatwhen circuits are restricted to be both planarand monotone they can be evaluated very quickly inparallel, using a polynomial amount of hardware.

ACKNOWLEDGEMENTS

We thank Allan Borodin, Les Goldschlager,Jia-wei Hong, Nicholas Pippenger and CharlesRackoff for many valuable discussions aboutparallel computation.

References

[Cl]

[C2]

[C3]

[Dl]

[FW]

[Gl]

[G2]

[HU]

[HI]

S.A. Cook. Towards a complexity theory ofsynchronous parallel computation. Presentedat Internales Symposium uber Logik undAlgorithmik, Zurich, Feb. 1980. (Proceedingsto appear). Also, Tech. Report 141/80, Dept.of Computer Science, Univ. of Toronto.

S.A. Cook. Deterministic CFL's are acceptedsimultaneously in polynomial time and logsquared space. Proc. 11th Annual ACMSymposium on Theory of Computing, May 1979,pp.338-345.

S.A. Cook. Characterizations of pushdownmachines in terms of time-bounded computers.~. ACM, 18, 1971, pp.4-l8.

P.W. Dymond. Simultaneous Resource Bounds andParallel Computation. Ph.D. thesis, Dept. ofComputer Science, Univ. of Toronto, 1980.

S. Fortune and J. Wyllie. Parallelism inrandom access machines. Proc. 10th ACMSymposium on Theory of Computing, May 1978,pp.114-1l8.

L. Goldschlager. Synchronous parallel com­putation. Ph.D. thesis and Tech. Report 114,Dept. of Computer Science, Univ. of Toronto,December 1977. See also A unified approachto models of synchronous parallel machines.Proc. 11th Annual ACM Symposium on Theory ofComputing, May 1978, pp.89-94.

L. Goldschlager. A fast parallel algorithmfor the monotone planar circuit value problem,1978. To appear.

J.E. Hopcroft and J.D. Ullman. Introductionto Automata Theory, Languages, and Computation.Addison-Wesley, 1979.

J. Hartmanis. Tape reversal bounded Turingmachine computations. J. Comp. Sys. Sci., 2,1968.

[BW]

[Bl]

A.W. Burks and J.B. Wright. Theory of logicalnets. In Sequential Machines: SelectedPapers, E.P. Moore, ed., Addison-Wesley, 1964,pp.193-2l2.

A. Borodin. On relating time and space tosize and depth. SIAM~. Comp., 6, 1977,pp.733-744.

371

[H2]

[H3]

J.W. Hong. A tradeoff theorem for space andreversal, 1979. To appear, SIGACT News.

J.W. Hong. The similarity and duality ofcomputation. Conference Record IEEE 21stAnnual Symposium on Foundations of ComputerScience, 1980 (to appear).

[H4] H.J. Hoover. Some topics in circuit comple­xity. M.Se. thesis and Tech. Report 139/80,Dept. of Computer Science, Univ. of Toronto,Dec. 1979.

[KL] R.M. Karp and R.J. Lipton. Someconnectionsbetween non-uniform and uniform complexityclasses. Froc. 12th Annual ACM Symposiumon Theory of Computing, April 1980, pp.302­309.

[LF] R.E. Ladner and M.J. Fischer. Parallelprefix computation. Proc. Int. Conf. onParallel Processing, 1977, pp.2l8-223.

[LT] T. Lengaur and R.E. Tarjan. Upper and lowerbounds on time-space tradeoffs. 11th AnnualACM Symposium on Theory of Computing,30 April-2 May, 1979.

[01] Yu. P. Ofman. On the algorithmic complexityof discrete functions. Soviet Physics Doklady,7, 1963, pp.589-591.

[02] Yu. P. Ofman. A universal automaton. Trans.Moscow Math. Soc., v.14 (1965), pp.200-2l5,(AMS translation).

[PI] N.J. Pippenger. On simultaneous resourcebounds (preliminary version). Proc. 20thAnnual Symposium on Foundations of ComputerScience, Oct. 1979, pp.307-3ll.

[P2] N.J. Pippenger. A relationship betweenmachine time-complexity and network gate co~

plexity, ms., 1973.

[P3] N.J. Pippenger. Course notes for CSC2429.Topics in the theory of computation, Dept. ofComputer Science, Univ. of Toronto, 1978.

[P4] N.J. Pippenger. Fast simulation of combina­tional logic networks by machines withoutrandom-access storage. Allerton Conf. onCorom. Contr. and Comp. 15, 1977, pp.25-33.

[P5] N. Pippenger. Comparative schemato10gy andpebbling with auxiliary pushdowns.(Preliminary version). 12th Annual ACM Symp.on Theory of Computing. April 28-30, 1980,pp.351-356.

[Rl] R.M. Russell. The CRAY-l computer system.C. ACM, 21, 1978, pp.63-72.

[R2] W.L. Ruzzo. On uniform circuit complexity(extended abstract). Proc. 20th AnnualSymposium on Foundations of Computer Science,Oct. 197~, pp.312-318.

[RJ.] W.L. Ruzzo. Tree-s-ize bounded alternation.Frac. 11th ACM Sympac on Theory of Comp.ptillS,1979, pp.l52-359.

[SS] W. Savitch and M. Stimson. Time bounded randomaccess machines with parallel processing ...·J. ACM ~ Jan.197~~ pp.l03~l18.

[Sl] W.J. Savitch. Parallel and nondeterministiccomplexity classes. Proc. 5th Conf. Automata,Lansuages·andProgramming. Springer Verlag,Lecture Notes in Computer Science, 1978,pp.4l1-424.

[S2] W.J. Savitch. Relations between nondetermin­istic and deterministic tape complexities.J. Comp. Sys. Sci. 4, pp.177~192.

[S3] A. Schonhage. Storage modification machines.Tech. Report, Mathematsches Institut,Universitat Tubengen, Germany, 1979.

[S4] J.I. Seiferas. A note on notions of tapeconstructability. Tech. Report 187,Pennsylvania State Univ., 1976.

[S5] C.P. Schnorr. The network complexity and theTuring machine complexity of finite functions.Acta Informatica, 7, 1976, pp.660-674.

(WI] J.C. Wyllie. The complexity of parallelcomputations. Ph.D. thesis and TR-79-387,Dept. of Computer Science, Cornell Univ.,1979.

372


Recommended