sparse_matrix_ordering.html 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377
  1. <HTML>
  2. <!--
  3. Copyright (c) Jeremy Siek 2000
  4. Distributed under the Boost Software License, Version 1.0.
  5. (See accompanying file LICENSE_1_0.txt or copy at
  6. http://www.boost.org/LICENSE_1_0.txt)
  7. -->
  8. <Head>
  9. <Title>Sparse Matrix Ordering Example</Title>
  10. <BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b"
  11. ALINK="#ff0000">
  12. <IMG SRC="../../../boost.png"
  13. ALT="C++ Boost" width="277" height="86">
  14. <BR Clear>
  15. <H1><A NAME="sec:sparse-matrix-ordering"></A>
  16. Sparse Matrix Ordering
  17. </H1>
  18. <P>
  19. Graph theory was identified as a powerful tool for sparse matrix
  20. computation when Seymour Parter used undirected graphs to model
  21. symmetric Gaussian elimination more than 30 years ago&nbsp;[<A HREF="bibliography.html#parter61:_gauss">28</A>].
  22. Graphs can be used to model symmetric matrices, factorizations and
  23. algorithms on non-symmetric matrices, such as fill paths in Gaussian
  24. elimination, strongly connected components in matrix irreducibility,
  25. bipartite matching, and alternating paths in linear dependence and
  26. structural singularity. Not only do graphs make it easier to
  27. understand and analyze sparse matrix algorithms, but they broaden the
  28. area of manipulating sparse matrices using existing graph algorithms
  29. and techniques&nbsp;[<A
  30. HREF="bibliography.html#george93:graphtheory">13</A>]. In this section, we are
  31. going to illustrate how to use BGL in sparse matrix computation such
  32. as ordering algorithms. Before we go further into the sparse matrix
  33. algorithms, let us take a step back and review a few things.
  34. <P>
  35. <H2>Graphs and Sparse Matrices</H2>
  36. <P>
  37. A graph is fundamentally a way to represent a binary relation between
  38. objects. The nonzero pattern of a sparse matrix also describes a
  39. binary relation between unknowns. Therefore the nonzero pattern of a
  40. sparse matrix of a linear system can be modeled with a graph
  41. <I>G(V,E)</I>, whose <I>n</I> vertices in <I>V</I> represent the
  42. <I>n</I> unknowns, and where there is an edge from vertex <I>i</I> to
  43. vertex <I>j</I> when <i>A<sub>ij</sub></i> is nonzero. Thus, when a
  44. matrix has a symmetric nonzero pattern, the corresponding graph is
  45. undirected.
  46. <P>
  47. <H2>Sparse Matrix Ordering Algorithms</H2>
  48. <P>
  49. The process for solving a sparse symmetric positive definite linear
  50. system, <i>Ax=b</i>, can be divided into four stages as follows:
  51. <DL>
  52. <DT><STRONG>Ordering:</STRONG></DT>
  53. <DD>Find a permutation <i>P</i> of matrix <i>A</i>,
  54. </DD>
  55. <DT><STRONG>Symbolic factorization:</STRONG></DT>
  56. <DD>Set up a data structure for the Cholesky
  57. factor <i>L</i> of <i>PAP<sup>T</sup></i>,
  58. </DD>
  59. <DT><STRONG>Numerical factorization:</STRONG></DT>
  60. <DD>Decompose <i>PAP<sup>T</sup></i> into <i>LL<sup>T</sup></i>,
  61. </DD>
  62. <DT><STRONG>Triangular system solution:</STRONG></DT>
  63. <DD>Solve <i>LL<sup>T</sup>Px=Pb</i> for <i>x</i>.
  64. </DD>
  65. </DL>
  66. Because the choice of permutation <i>P</i> will directly determine the
  67. number of fill-in elements (elements present in the non-zero structure
  68. of <i>L</i> that are not present in the non-zero structure of
  69. <i>A</i>), the ordering has a significant impact on the memory and
  70. computational requirements for the latter stages. However, finding
  71. the optimal ordering for <i>A</i> (in the sense of minimizing fill-in)
  72. has been proven to be NP-complete&nbsp;[<a
  73. href="bibliography.html#garey79:computers-and-intractability">30</a>]
  74. requiring that heuristics be used for all but simple (or specially
  75. structured) cases.
  76. <P>
  77. A widely used but rather simple ordering algorithm is a variant of the
  78. Cuthill-McKee orderings, the reverse Cuthill-McKee ordering algorithm.
  79. This algorithm can be used as a preordering method to improve ordering
  80. in more sophisticated methods such as minimum degree
  81. algorithms&nbsp;[<a href="bibliography.html#LIU:MMD">21</a>].
  82. <P>
  83. <H3><A NAME="SECTION001032100000000000000">
  84. Reverse Cuthill-McKee Ordering Algorithm</A>
  85. </H3>
  86. The original Cuthill-McKee ordering algorithm is primarily designed to
  87. reduce the profile of a matrix&nbsp;[<A
  88. HREF="bibliography.html#george81:__sparse_pos_def">14</A>].
  89. George discovered that the reverse ordering often turned out to be
  90. superior to the original ordering in 1971. Here we describe the
  91. Reverse Cuthill-McKee algorithm in terms of a graph:
  92. <OL>
  93. <LI><I>Finding a starting vertex</I>: Determine a starting vertex
  94. <I>r</I> and assign <i>r</i> to <I>x<sub>1</sub></I>.
  95. </LI>
  96. <LI><I>Main part</I>: For <I>i=1,...,N</I>, find all
  97. the unnumbered neighbors of the vertex <I>x<sub>i</sub></I> and number them
  98. in increasing order of degree.
  99. </LI>
  100. <LI><I>Reversing ordering</I>: The reverse Cuthill-McKee ordering
  101. is given by <I>y<sub>1</sub>,...,y<sub>N</sub></I> where
  102. <I>y<sub>i</sub></I> is <I>x<sub>N-i+1</sub></I> for <I>i = 1,...,N</I>.
  103. </LI>
  104. </OL>
  105. In the first step a good starting vertex needs to be determined. The
  106. study by George and Liu&nbsp;[<A
  107. HREF="bibliography.html#george81:__sparse_pos_def">14</A>] showed that
  108. a pair of vertices which are at maximum or near maximum ``distance''
  109. apart are good ones. They also proposed an algorithm to find such a
  110. starting vertex in &nbsp;[<A
  111. HREF="bibliography.html#george81:__sparse_pos_def">14</A>], which we
  112. show in <A
  113. HREF="#fig:ggcl_find_starting_vertex">Figure
  114. 1</A> implemented using the BGL interface.
  115. <P>
  116. <P></P>
  117. <DIV ALIGN="CENTER"><A NAME="fig:ggcl_find_starting_vertex"></A>
  118. <table border><tr><td>
  119. <pre>
  120. namespace boost {
  121. template &lt;class Graph, class Vertex, class Color, class Degree&gt;
  122. Vertex
  123. pseudo_peripheral_pair(Graph& G, const Vertex& u, int& ecc,
  124. Color color, Degree degree)
  125. {
  126. typename property_traits&lt;Color&gt;::value_type c = get(color, u);
  127. rcm_queue&lt;Vertex, Degree&gt; Q(degree);
  128. typename boost::graph_traits&lt;Graph&gt;::vertex_iterator ui, ui_end;
  129. for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui)
  130. put(color, *ui, white(c));
  131. breadth_first_search(G, u, Q, bfs_visitor&lt;&gt;(), color);
  132. ecc = Q.eccentricity();
  133. return Q.spouse();
  134. }
  135. template &lt;class Graph, class Vertex, class Color, class Degree&gt;
  136. Vertex find_starting_node(Graph& G, Vertex r, Color c, Degree d) {
  137. int eccen_r, eccen_x;
  138. Vertex x = pseudo_peripheral_pair(G, r, eccen_r, c, d);
  139. Vertex y = pseudo_peripheral_pair(G, x, eccen_x, c, d);
  140. while (eccen_x &gt; eccen_r) {
  141. r = x;
  142. eccen_r = eccen_x;
  143. x = y;
  144. y = pseudo_peripheral_pair(G, x, eccen_x, c, d);
  145. }
  146. return x;
  147. }
  148. } // namespace boost
  149. </pre>
  150. </td></tr></table>
  151. <CAPTION ALIGN="BOTTOM">
  152. <table><tr><td>
  153. <STRONG>Figure 1:</STRONG>
  154. The BGL implementation of <TT>find_starting_node</TT>. The key
  155. part <TT>pseudo_peripheral_pair</TT> is BFS with a custom queue
  156. type virtually.
  157. </td></tr></table></CAPTION>
  158. </DIV>
  159. <P></P>
  160. The key part of step one is a custom queue type with BFS as shown in
  161. <A
  162. HREF="#fig:ggcl_find_starting_vertex">Figure
  163. 1</A>. The main Cuthill-McKee algorithm shown in Figure&nbsp;<A
  164. HREF="#fig:cuthill-mckee">Figure 2</A>
  165. is a fenced priority queue with a generalized BFS. If
  166. <TT>inverse_permutation</TT> is a normal iterator, the result stored
  167. is the inverse permutation of original Cuthill-McKee ordering. On the
  168. other hand, if <TT>inverse_permutation</TT> is a reverse iterator, the
  169. result stored is the inverse permutation of reverse Cuthill-McKee
  170. ordering. Our implementation is quite concise because many components
  171. from BGL can be reused.
  172. <P></P>
  173. <DIV ALIGN="CENTER"><A NAME="fig:cuthill-mckee"></A><A NAME="6261"></A>
  174. <table border><tr><td>
  175. <pre>
  176. template &lt; class Graph, class Vertex, class OutputIterator,
  177. class Color, class Degree &gt;
  178. inline void
  179. cuthill_mckee_ordering(Graph& G,
  180. Vertex s,
  181. OutputIterator inverse_permutation,
  182. Color color, Degree degree)
  183. {
  184. typedef typename property_traits&lt;Degree&gt;::value_type DS;
  185. typename property_traits&lt;Color&gt;::value_type c = get(color, s);
  186. typedef indirect_cmp&lt;Degree, std::greater&lt;DS&gt; &gt; Compare;
  187. Compare comp(degree);
  188. fenced_priority_queue&lt;Vertex, Compare &gt; Q(comp);
  189. typedef cuthill_mckee_visitor&lt;OutputIterator&gt; CMVisitor;
  190. CMVisitor cm_visitor(inverse_permutation);
  191. typename boost::graph_traits&lt;Graph&gt;::vertex_iterator ui, ui_end;
  192. for (boost::tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui)
  193. put(color, *ui, white(c));
  194. breadth_first_search(G, s, Q, cm_visitor, color);
  195. }
  196. </pre>
  197. </td></tr></table>
  198. <CAPTION ALIGN="BOTTOM">
  199. <TABLE>
  200. <TR><TD> <STRONG>Figure 2:</STRONG> The BGL implementation of
  201. Cuthill-McKee algoithm. </TD></TR> </TABLE></CAPTION> </DIV><P></P>
  202. <P>
  203. <P>
  204. <H3>Minimum Degree Ordering Algorithm</H3>
  205. <P>
  206. The pattern of another category of high-quality ordering algorithms in
  207. wide use is based on a greedy approach such that the ordering is
  208. chosen to minimize some quantity at each step of a simulated -step
  209. symmetric Gaussian elimination process. The algorithms using such an
  210. approach are typically distinguished by their greedy minimization
  211. criteria&nbsp;[<a href="bibliography.html#ng-raghavan">34</a>].
  212. <P>
  213. In graph terms, the basic ordering process used by most greedy
  214. algorithms is as follows:
  215. <OL>
  216. <LI><EM>Start:</EM> Construct undirected graph <i>G<sup>0</sup></i>
  217. corresponding to matrix <i>A</i>
  218. </LI>
  219. <LI><EM>Iterate:</EM> For <i>k = 1, 2, ... </i> until
  220. <i>G<sup>k</sup> = { }</i> do:
  221. <UL>
  222. <LI>Choose a vertex <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i>
  223. according to some criterion
  224. </LI>
  225. <LI>Eliminate <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> to form
  226. <i>G<sup>k+1</sup></i>
  227. </LI>
  228. </UL>
  229. </LI>
  230. </OL>
  231. The resulting ordering is the sequence of vertices <i>{v<sup>0</sup>,
  232. v<sup>1</sup>,...}</i> selected by the algorithm.
  233. <P>
  234. One of the most important examples of such an algorithm is the
  235. <I>Minimum Degree</I> algorithm. At each step the minimum degree
  236. algorithm chooses the vertex with minimum degree in the corresponding
  237. graph as <i>v<sup>k</sup></i>. A number of enhancements to the basic
  238. minimum degree algorithm have been developed, such as the use of a
  239. quotient graph representation, mass elimination, incomplete degree
  240. update, multiple elimination, and external degree. See&nbsp;[<A
  241. href="bibliography.html#George:evolution">35</a>] for a historical
  242. survey of the minimum degree algorithm.
  243. <P>
  244. The BGL implementation of the Minimum Degree algorithm closely follows
  245. the algorithmic descriptions of the one in &nbsp;[<A
  246. HREF="bibliography.html#LIU:MMD">21</A>]. The implementation presently
  247. includes the enhancements for mass elimination, incomplete degree
  248. update, multiple elimination, and external degree.
  249. <P>
  250. In particular, we create a graph representation to improve the
  251. performance of the algorithm. It is based on a templated ``vector of
  252. vectors.'' The vector container used is an adaptor class built on top
  253. the STL <TT>std::vector</TT> class. Particular characteristics of this
  254. adaptor class include the following:
  255. <UL>
  256. <LI>Erasing elements does not shrink the associated memory.
  257. Adding new elements after erasing will not need to allocate
  258. additional memory.
  259. </LI>
  260. <LI>Additional memory is allocated efficiently on demand when new
  261. elements are added (doubling the capacity every time it is
  262. increased). This property comes from STL vector.
  263. </LI>
  264. </UL>
  265. <P>
  266. Note that this representation is similar to that used in Liu's
  267. implementation, with some important differences due to dynamic memory
  268. allocation. With the dynamic memory allocation we do not need to
  269. over-write portions of the graph that have been eliminated,
  270. allowing for a more efficient graph traversal. More importantly,
  271. information about the elimination graph is preserved allowing for
  272. trivial symbolic factorization. Since symbolic factorization can be
  273. an expensive part of the entire solution process, improving its
  274. performance can result in significant computational savings.
  275. <P>
  276. The overhead of dynamic memory allocation could conceivably compromise
  277. performance in some cases. However, in practice, memory allocation
  278. overhead does not contribute significantly to run-time for our
  279. implementation as shown in &nbsp;[] because it is not
  280. done very often and the cost gets amortized.
  281. <P>
  282. <H2>Proper Self-Avoiding Walk</H2>
  283. <P>
  284. The finite element method (FEM) is a flexible and attractive numerical
  285. approach for solving partial differential equations since it is
  286. straightforward to handle geometrically complicated domains. However,
  287. unstructured meshes generated by FEM does not provide an obvious
  288. labeling (numbering) of the unknowns while it is vital to have it for
  289. matrix-vector notation of the underlying algebraic equations. Special
  290. numbering techniques have been developed to optimize memory usage and
  291. locality of such algorithms. One novel technique is the self-avoiding
  292. walk&nbsp;[].
  293. <P>
  294. If we think a mesh is a graph, a self-avoiding walk(SAW) over an
  295. arbitrary unstructured two-dimensional mesh is an enumeration of all
  296. the triangles of the mesh such that two successive triangles shares an
  297. edge or a vertex. A proper SAW is a SAW where jumping twice over the
  298. same vertex is forbidden for three consecutive triangles in the walk.
  299. it can be used to improve parallel efficiency of several irregular
  300. algorithms, in particular issues related to reducing runtime memory
  301. access (improving locality) and interprocess communications. The
  302. reference&nbsp;[] has proved the existence
  303. of A PSAW for any arbitrary triangular mesh by extending an existing
  304. PSAW over a small piece of mesh to a larger part. The proof
  305. effectively provides a set of rules to construct a PSAW.
  306. <P>
  307. The algorithm family of constructing a PSAW on a mesh is to start from
  308. any random triangle of the mesh, choose new triangles sharing an edge
  309. with the current sub-mesh and extend the existing partial PSAW over
  310. the new triangles.
  311. <P>
  312. Let us define a dual graph of a mesh. Let a triangle in the mesh be a
  313. vertex and two triangles sharing an edge in the mesh means there is an
  314. edge between two vertices in the dual graph. By using a dual graph of
  315. a mesh, one way to implement the algorithm family of constructing a
  316. PSAW is to reuse BGL algorithms such as BFS and DFS with a customized
  317. visitor to provide operations during
  318. traversal. The customized visitor has the function <TT>tree_edge()</TT>
  319. which is to extend partial PSAW in case by case and function
  320. <TT>start_vertex()</TT> which is to set up the PSAW for the starting vertex.
  321. <br>
  322. <HR>
  323. <TABLE>
  324. <TR valign=top>
  325. <TD nowrap>Copyright &copy; 2000-2001</TD><TD>
  326. <A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>)
  327. </TD></TR></TABLE>
  328. </BODY>
  329. </HTML>