Dag - Directed Acyclic graph --- A Shortened Discussion This is an abbreviated version of a set of details we have come up with about a class suitable for containing Particles in an Event and encapsulating the "parent/child" relationships. The contents page is intentionally left the same as that for the detailed version (which is availble upon request) but many topics are excised here. WEB NOTE - it is intended to make this a more useful WEB page, with decent formatting and links to the more detailed information. However, we make this note availble now to solicit early comments. 1. Motivation 1.1 What is a Dag? 1.2 Guiding Principles 1.3 Main Concepts 1.4 How would an StdHep Event class use Dag? 2. Basics of the Dag and related classes 2.1 Classes 2.2 Constructors 2.3 Adding Particles One at a Time 2.4 Referencing a Particle 2.5 Adding Parent/Child relationships 2.6 Removing Paricles and Relationships 2.7 Relationship Properties of a Node 3. Iteration Concepts 3.1 The Dag::iterator class 3.2 Iterating in Loops 3.3 Order of iteration 3.4 When the Graph Structure Changes 3.5 Filling a Dag from a HepEvt Block 3X. Concepts to support effcient storage (new) 3X.1 UniqueID 3X.2 GraphConnectivity and support for split-up bytestream i/o 4. Dag::NodeList -- Lists within a Dag 4.1 Constructing and Filling a NodeList 4.2 The NodeList::iter subclass 4.3 Loops Over NodeLists 4.4 NodeLists in Changing Dag's 4.5 Relationships within a NodeList 4.6 Using a NodeList to add to a Dag or form a new Dag 4.7 Other NodeList Operations 5. Dags with Identical Structure 5.1 Inducing Dag Structure 5.2 Corresponding Iterators 5.3 Output of a Dag 6. Further Operations 6.1 Container Properties and Operations 6.2 Input/Output 6.3 For_each and Special Loops 6.4 Does a Dag of pointers make sense? 7. Side Issues 7.1 Excceptional Situations 7.2 std:: semantics NOT present for Dag 7.3 std:: semantics present: 7.4 Warnings, Discussions, Decisions, and Open Questions 1. Motivation StdHep deals with Events, which may have a rich structure of particles decaying into other particles. We wish to have a container-like class that embodies the sort of parent-child relationships present in an event, which can be used by StdHep, by users of the StdHep Event class, and also by other packages with the same needs. We choose the templated container class approach rather than having a base class implementing the Dag structure, to provide consistency witht the std:: container semantics. However, this choice is not "set in concrete." 1.1 What is a Dag? The Dag (which stands for Directed acyclic graph) is a templated class; the template parameter is the type of object contained in the collection. For example, an Event might be a Dag. Beyond just holding Particles, that class also knows about parent/child relationships -- the Particles can be envisioned as associated with points (nodes) of a graph and the these nodes can be connected to one another by arcs representing the relationships. 1.1.1 The Dag can have: * An arbitrary number of nodes. * An arbitrary number of children stemming from a given node. * An arbitrary number of parents leading to a given node. * An arbitrary number of "roots" (nodes with no parents). * Nodes or connected groups of nodes need not be connected. Example: P1--->P2, P3--->P4 P3--->P5 is a valid Dag, even though P1 and P3 are in no way connected. 1.1.2 The following restrictions apply: * There can be at most one arc between two given nodes. * The Dag must be **acyclic**, that is, there cannot be a closed circle of A--->B--->C---> ... --->A. The routines which define parentage wil enforce this. 1.2 Guiding Principles 1.2.1 We wish to cover all the easily anticipated ways a physics program may want to maneuver through these graphs. 1.2.2 To as great an extent as possible ("but no greater") we utilize the same semantics the std:: container classes support. 1.2.3 The Dag is "subordinate to" the Event; we will not address concepts involving how multiple Events are to be organized. 1.3 Main Concepts Dag (Directed Acyclic Graph) A Dag is a collection of objects of type T along with parent- child relationships that give it an underlying structure of a graph. In the StdEvt case, an Event involves a a Dag. Dag::iterator An iterator, in the same sense that you have an iterator into a std::list of objects. NodeList A class representing some subset of the nodes of a graph. A NodeList::iterator will have the same behavior as an iterator in the Dag, but covering only those nodes which are part of that list. 1.4 How would an StdHep Event class use Dag? We suggest having StdHep::Event inherit from Dag. That way, operations that apply to Dag such as insertion and iteration equally apply to Event. ## Use Case 1.4: class Particle { ... }; class Event : public Dag { // Plus perhaps other data and operations not implied // by being a Dag of Particles. Any such data would // be on an overall basis, and not per Particle. } Q1. Hypothetically Asked Questions: Q1.1 Could the event include a Dag instead of being a Dag? A1.1 It could, but syntactically, making the Event inherit from Dag makes code using the Event much more readable. Of course, Event can have other properties beyond being a Dag. Q1.2 Is Dag::iterator a true iterator in the std:: sense? A1.2 Yes, in fact it inherits from std::iterator (which just provides several typedefs) so that methods appearing in the std:: library for generic containers will work. 2. Basics of the Dag and related classes 2.1 Class structure 2.1.1 The following classes are visisble to the user of Dag: Dag (Directed Acyclic Graph) Dag::iterator Dag::const_iterator An iterator, in the same sense that you have an iterator into a std::list of objects. NodeList A class representing some subset of the nodes of a graph, having those relationships induced by those relationships in the graph involving only nodes in that subset. A NodeList is NOT a container of Particles in the sense that a Dag is: It does not "own" the Particles it represents. The NodeList is basically an ordered list of specific iterators in the Dag. NodeList::iterator NodeList::const_iterator An iterator associated with a NodeList is set up to behave just like a Dag::iterator, except that when you iterate on it, it covers only those nodes which are in the defined sequence. 2.2 Constructors: Dag(); // create an empty Dag Dag(const Dag &); // copy another Dag ## Use Case 2.2: class Particle {...}; Dag thisCollection; 2.3 Adding Particles One at a Time Dag::iterator insert(const T & p); Dag::iterator insertChild(const T & p, Dag::iterator iparent); ## Use Case 2.3: // (Assume Event is a public Dag as in use case 1.4) Particle p1, p2, p3, p4, p5; Event thisEvent; Event::iterator ip1 = thisEvent.insert(p1); Event::iterator ip2 = thisEvent.insertChild(p2,ip1); thisEvent.insertChild(p3,ip1); thisEvent.insertChild(p4,ip2); Event::iterator ip5 = thisEvent.insert(p5); // Structure of thisEvent is now p1--->p2--->p4 // p1--->p3 // p5 If you don't need to refer to the newly added Particle explicitly, you are of course free to discard the returned iterator, as was done for p3 and p4. 2.4 Referencing a Particle The object of type T at the node represented by an iterator ip -- in StdHep type T is normally a Particle -- may be accessed by dereferencing ip. The syntax and semantics for this is just like for std:: containers. The following are methods of Dag::iterators: T& operator*(); T* operator->(); These allow not only the naive dereferencing operations *ip, but also pointer operations to get at a data member or member function of the Particle that *ip refers to. ## Use Case 2.4: struct Particle { double energy; double momentum; double getMass(); }; class Event : public Dag {}; Event thisEvent; Particle p1 = new Particle (whatever); Event::iterator ip1 = thisEvent.insert(p1); delete p1; // The Dag still has a copy of it! // Now let's use it via ip1. // We illustrate five basic ways to use the iterator: // Reading and writing the whole object, // reading and writing a data member, and invoking // a member function. Particle p2 = *ip1; *ip1 = p2; double x = ip1->energy; ip1->momentum = .9*x; x = ip1->getMass(); 2.5 Adding Parent/Child relationships A parent-child relationship between two nodes may be specified when a node is added to the Dag. You can also declare a parent-child relationship involving nodes which are already in the Dag. bool parentChild (Dag::iterator parent, Dag::iterator child); A check is performed: If relating the two nodes in this way leads to a closed cycle, a NotAcyclic exception is thrown or the return boolean comes back false. ## Use Case 2.5: // (Assume thisEvent is a public Dag which had 5 Particles // added as in use case 2.3) thisEvent.parentChild(ip5,ip1); // Structure of thisEvent is now p5--->p1--->p2--->p4 // p1--->p3 2.7 Relationship Properties of a Node bool isRoot(ip); // ip has no parents bool isLeaf(ip); // ip has no children bool isAncestor(ip1,ip2); // a chain of children goes ip1 to ip2 bool areConnected(ip1,ip2); // a chain of child and/or parent // relations connects ip1 and ip2 int nParents(ip); int nChildren(ip); iterator parent(ip); // find one (arbitray) parent of ip iterator root(ip); // find one root which is an ancestor 3. Iteration Concepts 3.1 The Dag::iterator class In the same way as an std:: container class has a sub-class called iterator, the Dag has Dag::iterator. Dag::iterator Dag::const_iterator 3.2 Iterating in Loops The Dag has various methods returning iterators, including: Dagiterator begin(); Dagiterator end(); Dag::iterators can be incremented (++i or i++) or decremented; the meaning is to point to the "next" node in the Dag, where "next" is defined by an order described in section 3.3. ## Use Case 3.2: // (Assume thisEvent is a public Dag which had 5 Particles // added as in use case 2.3). Say you want to apply the method // getChiqs() to each Particle in thisEvent and add the results. Event::iterator ip; double chisq = 0; for (ip = thisEvent.begin(); ip != thisEvent.end(); ip++) { chisq += ip->getChisq(); } This loop, of course, will cover each node in the Dag precisely once. *** If the structure of the Dag changes (by adding or erasing nodes or *** relationships) during such iterations, then the meaning of the *** iteration becomes undefined. 3.3 Order of iteration Certain guarantees are made about the order in which nodes will appear when you iterate by incrementing a Dag::iterator in the sequence generated by (ip = dag.begin(); ip != dag.end(); ip++). * This sequence covers each node in the Dag once. * If node A is the child of node B, then ip will refer to B before it will refer to A. 3.4 When the Graph Structure Changes A Dagiterator remains valid ("pointing to" the same Particle) even if other Particles have been inserted, or relationships have been added or severed, subsequent to the determination of that iterator. Of course, if the iterator's referent is erased, it is no longer valid, and dereferencing it will lead to unpredictable results. Although an iterator survives those changes in the structure of the graph without invalidating it for dereferencing, these changes can and do affect the results of sequences of iterators. In particular, changing the structure by adding Particles or relationships in the middle of a loop using iterators can be a bad idea. 3.5 Filling a Dag from a HepEvt Block A key use case is filling a Dag from a HepEvt block. Here we assume the block consists of Np Particles each i-th one having a list of D(i) daughters, identified by an index in a simple array. ## Use case 3.5: HepEvent h; Particle getP (HepEvent& hh, int i) { // ... extract i-th particle from hh } std::vector getDs (HepEvent& hh, int i) { // ... extract vector of daughter indices of i-th particle from hh } class Event : public Dag { ... } Event thisEvent; bool inserted[] = new bool[Np]; int i; for ( i=0; i < Np; i++ ) { inserted[i]=false; } for ( i=0; i < Np; i++ ) { if (inserted[i]) continue; Event::iterator iparent = thisEvent.insert ( getP (h, i) ); inserted[i] = true; std::vectordaughters = getDs (h, i); std::vector::iterator id; for ( id = 0; id != daughters.size(); ++id ) { if ( !inserted[id] ) { thisEvent.insertChild ( getP(h, id), iparent ); inserted[id] = true; } } } 3X. Concepts to support effcient storage 3X.1 UniqueID Although within an application, the natural way to access a Particle in a Dag is by an iterator (as discoussed in section 3), this would be unsuitable in cases where Events are stored for later use by some other application. To support that important usage, the Dag associates an integer UuniqueID with each Particle in it. The rules for assignment of these uniqueID's are: 1) They are assigned in the same order as the Particles were inserted into the Dag. 2) The start with one and are assigned sequentially, so that they can compress into two-byte unsigned shorts if it is known that there will be fewer than 64K particles in the Dag. 3) The uniqueID sticks with the Particle. If a Particle is ever removed from the Dag, that uniqueID becomes a "hole" and will not be reassigned to another Particle. There will be (trival) wethods to form the Dag::iterator from an int uniqueID, and vice-versa. There wil also be a method to form a vector of int uniqueID's given a NodeList (see next section) and so forth. The uniqueID identifies particles within an Event in a non-transient way. The issue of identifying a specific Event is not addressed by the Dag mechanism; Dag is subordinate to Event, and does not deal with concepts of orginization of multiple Events. In section 5 we discuss graphs with identical connectectivity structures induced from one another. The obvious natural statement about uniqueID's will hold: If Dag da is induced from Dag db, then the uniqueID of any item a in da is the same as the uniqueID of the corresponding item b in db. 3X.2 GraphConnectivity and support for split-up bytestream i/o For efficient I/O, it is sometimes desirable to split Particles into there various attributes in some way, grouping all information of a specific type to support rapid read access for applications that do not need the full Particle information. For example, ROOT can take advantage of this strategy to greatly improve performance by limiting I/O requirements. Three operations are needed to output an event in this manner: 1) The ability to get each of the Particles, one at a time, so as to be able to split off and recombine information in an arbitrary way. A simple iterator loop suffices to do this. 2) Some way to acquire just the graph connectivity of the Dag, which would supplement the plain Particle outputs. Dag will have a getGraphConnectivity() method, which will yield the desired data. The GraphConnectivity strucutre is necessarily of variable length; it may be prudent to structure it as N blocks of some fixed shape so that storage mechanisms such as those in ROOT can deal with it effectively. 3) A scheme to read back the GraphConnectivity and populate the Dag with one Particle at a time, in the same order they appeared in step (1), to recover the same Dag. This will mean having a constructor of Dag taking a GraphConnectivity. Once you have this, one fill the Dag with the particles or any part of them by doing something like: ## Use case 3X.2 GraphConnectivity gc; // somehow read back in vector v; // data from **some** fields of the particles Dag d (gc); iv = v.begin(); for (Dag::iterator i=d.begin(); i!=d.end(); ++i) { *i = *iv++; } 4. Dag::NodeList -- Lists within a Dag The Dag::NodeList class is provided to handle a subset of nodes in the graph, keeping intact the relationship structure as applied to that subset of nodes. For example, the set of all children of one Particle, or the descendant tree stemming from one Particle, would be a NodeList. The idea is that the NodeList has an iterator which can be used with the same syntax as a Dag::iterator. 4.1 Constructing and Filling a NodeList An empty Dag::NodeList can be constructed from the Dag it will refer to: NodeList (Dag & dag); The structure of the graph itself is never affected through the NodeList: * You can insert into a NodeList Dag::iterators. You cannot insert T's. * There are no methods of NodeList to establish relationships: no insertChild() or ParentChild(). All relationships stem from the underlying Dag. The insert method lets you add any node in a Dag to a NodeList associated with that Dag. ## Use Case 4.1.1: // (Assume thisEvent is a public Dag which had 5 Particles // as in use case 2.5). Say you want a NodeList containing p1 // and p5. Event::NodeList nlist (thisEvent); nlist.insert (ip1); nlist.insert (ip5); // Structure of nList is now p5--->p1 because in 2.5, // ip5 was the parent of ip1 There are methods of the Dag to form special lists of nodes, relative to a given node (which is specified by an iterator ip). NodeList children (iterator ip); NodeList parents (iterator ip); NodeList descendants (iterator ip); NodeList ancestors (iterator ip); NodeList leaves (iterator ip); NodeList roots (iterator ip); NodeList connectedNodes (iterator ip); This is the list of all nodes which are connected to ip via some series of parent/child relationships. There are also methods of the Dag to form special lists of nodes, without reference to some specific node: NodeList leaves(); NodeList roots(); 4.2 The NodeList::iterator A Dag::NodeList::iterator behaves like a "pointer to" a Particle in a DagNodeList. The primary purpose is to increment the iterator to iterate in a loop over the contents of the NodeList, as discussed in section 4.3. The syntax for dereferencing a NodeList iterator is identical to that for a Dag::iterator. The following are methods of Dag::NodeList::iterators: T& operator*(); T* operator->(); 4.3 Loops Over NodeLists Using NodeList iterators to estblish loops over their contents is precisely the same in syntax as using Dag iterators. The NodeList has various methods returning iterators, including: Dag::NodeList::iterator begin(); Dag::NodeList::iterator end(); NodeList iterators can be incremented (++i or i++) or decremented; the meaning is to point to the "next" node in the NodeList, where "next" is defined by the order described in section 3.3 for nodes in a Dag, omitting any nodes which are not in the NodeList. ## Use Case 4.3: // (Assume thisEvent is a public Dag which had 5 Particles // added as in use case 2.3). Say you want to apply the method // getChiqs() to all the nodes in the tree originating from p1, // and add the results. Event::NodeList tree = thisEvent.tree(ip1); double chisq = 0; Event::NodeList::iterator itr; for (itr = tree.begin(); itr != tree.end(); itr++) { chisq += itr->getChisq(); } This loop, of course, will cover each node in the daughter tree of p1 precisely once. 4.6 Using a NodeList to add to a Dag or Form a New Dag There are three ways to use a Dag::NodeList to put nodes into a Dag. 1) Dag ( const Dag::NodeList & nl ); This forms a new Dag from a NodeList which must be based on a Dag with the same type of Particle T. Note that this implies copy semantics: If you change a particle in the second Dag, that will not affect the corresponding Particle in the first. 2) Dag::iterator insert ( const Dag::NodeList nl, Dag::NodeList::iterator ip ); This inserts into the Dag a NodeList based on some OTHER Dag (of the same type T) and returns an iterator the newly inserted node in the modified Dag corresponding to the specified iterator in the NodeList. Copy semantics hold. 3) Dag::iterator insertChild ( const Dag::NodeList nl, Dag::NodeList::iterator ip, Dag::iterator iparent ); This inserts into the Dag a NodeList based on some OTHER Dag (of the same type T), connecting the node corresponding to the specified iterator in the NodeList as the child of the specified parent node (which must be in the Dag being added to.) 4.7 Other NodeList Operations NodeList union ( Dag::NodeList & n1, Dag::NodeList & n2 ); NodeList intersection( Dag::NodeList & n1, Dag::NodeList & n2 ); NodeList complement ( Dag::NodeList & n1 ); iterator DagIterator ( Dag::NodeList::iterator it ); bool isInNodeList (Dag::iterator ip, Dag::NodeList & nl ) 5. Dags with Identical Structure Sometimes it is convenient to copy just the relationship and order of iteration structure of a graph, and assign a completely different type of value to the data at each node. This sort of thing can be used when you wish to extend the data in a Dag by adding some computed quantitiy to the information kept for each node, or when you need to keep of coverage of nodes in a graph for the purpose of non-repeating iterations down a tree. 5.1 Inducing Dag Structure To cause an empty Dag dagu to have the same graph structure as some Dag dagt, use the following method: induceFrom (dagt, dagu); If you traverse iterators in the identical way, they will remain "in sync" between the two graphs. ## Use case 5.1 Say there is a method of Particle that returns the nominal mass minus the kinematic sqrt(E**2-p**2) mass, and you want to compute those values and store them in a Dag parallel to the event's Dag: class Event : public Dag { // ... as usual } Event evt; Dag massDiscrepancy; induceFrom (evt, massDiscrepancy); Event::iterator ip; Dag::iterator imd; for (ip=event.begin(), imd=massDiscrepancy.begin(); ip!=event.end(); ip++, imd++) { *imd = ip->massDiscrepancy(); } 5.2 Corresponding Iterators In two Dag's dagt and dagu where one has structure induced from the other (and assuming the structure has not changed since that point), there is a method to generate the iterator in dagu corresponding to some iterator in dagt. template Dag::iterator correspondingNode (Dag, Dag, Dag::iterator) The use of this method is illustrated in use case 5.3, where a Dag of integers is established to parallel and shadow a Dag of Particles, which is traversed in a non-trivial manner. The uniqueID of any item in dagt will be the same as the uniqueID of the corresponding item in dagu. If either the original Dag or the induced Dag undergo structural changes, then the concept of correspondence is severely weakened. 5.3 Output of a Dag A key case illustrating how induced Dag's can be useful is the task of streaming a Dag to an ostream. The object is to output each particle in an indented tree manner: 1 p1 __ 2 p1.1 ____ 3 p1.1.1 __ 4 p1.2 __ 5 p1.3 ____ 6 p1.3.1 ____ 7 p1.3.2 __ {see 3} __ 8 p1.3.3 9 p2 ... In this illustration, p1.1 is a child of p1 and so forth; and I am abbreviating the streamed output of each Particle by p1.2.3 or whatever. Note the {see 3} near the bottom -- Particle 3 (p1.1.1) happens to have tow parents, p1.1 and p1. When it is listed as a child of p1, the full info has already been output so instead we want the output routine to just say {see 3}. The "extra" Dag we will keep is a shadow of the structure of the Dag event, with an integer at each node to use as the number to print out in the output shown above. This value does double duty, becoming negative if the particle has already been output. We assume that there is a method Particle::output(int nIndent) which outputs one Particle's information, indented by that many spaces on lines past the first. ## Use Case 5.3 void outputParticleTree ( Event ev, NodeList::iterator & ir, int nIndent, Dag & nodes, ostream & os ) { Dag::iterator i = correspondingNode ( nodes, ev, ir); iN = *i; nSpaces ( os, nIndent ); // nSpaces outputs nIndent underscores and a space if (iN < 0) { nSpaces (os, nIndent); cout << "{See Particle " << -iN << "}\n"; return; } ir->output (nIndent); // output the info for this particle NodeList theChildren = ir->children(); NodeList::iterator ic; for ( ic = theChildren.begin(); ic != theChildren.end(); ic++ ) { outputParticleTree ( ic, nIndent+2, nodes, os ); } *i = -iN; // mark this node has having been hit now. return; } ostream & operator << ( ostream & os, Event & event ) Dagnodes; induceFrom(event, nodes); Dag::iterator i; int n; for (i=nodes.begin(), n=1; i != nodes.end(); i++) { *i = n++; // label each node 1...n; } // positive means we have not hit this node yet. Dag::iterator ip; NodeList theRoots = event.roots(); NodeList::iterator ir; for (ir=theRoots.begin(); ir != theRoots.end(); ir++) { int nIndent = 0; outputParticleTree ( event, ir, nIndent, nodes, os ); } return os; } This use case could also be coded taking advantage of the uniqueID to label nodes and to track which nodes have already been output. 6. Further Operations 6.1 Container Properties and Operations bool empty(); size_t size(); Dag::iterator i = dag.find(dag.begin(),dag.end(), pred); int n = dag.count(dag.begin(),dag.end(), pred); bool b = dag1.equal(dag1.begin(),dag1.end(),dag2.begin()); 6.2 Streams Input/output Assuming that type T has stream operators << and >>, Dag can be output and input using operator << and >> as well. The format of output will be that described in section 5.3; to accomplish this, the output of any given Particle will be inspected for newline characters so that the appropriate indentation can be prepended for each line. For input, the identical output string of bytes will of course be accepted. 6.4 Does a Dag of pointers make sense? A Dag, like the std:: containers, owns its contents, in the same sense that a std::list or vector owns its contents. A user can create a Dag of pointers; if so, the same capabilities, warnings, and traps that apply to a list of pointers apply to the Dag of pointers. 7. Side Issues 7.1 Exceptional Situations The Dag template is intended to work on a broad spectrum of compilers and be available in a variety of situations. In particular, there will be a #define to suppress all use of the exceptions mechanism. All methods which sensibly could have an exceptional condition occur -- for example, adding a relationship to a Dag -- return a boolean indicating whether the operation was valid. In the example case, if adding the relationship would force a cyclic set of relationships, then it is forbidden and the method will return false. 7.4 Warnings, Discussions, Decisions, and Open Questions 2 - One can imagine having a method which loops controlled by a NodeList, but does things in a recursive manner based on the relationships. This would "hide" the recursive nature of such an endeavor, and might make trivial operations scuh as summing some function over a descendants tree. The general case of such an operation requires specifying the NodeList and traversal scheme (bottom-up or top-down would be typical) and: Assuming one made certain assumptions, such as that the combining methods must be operator+=, the signature for this method would look something like template Result doTreeBottomUp ( Dag::iterator it, Primitive pget (Result&, T&) ); I suppose one could provide this. Open Questions: --------------- 5 - What sorts of special loops should be supported?