connected_components.rst 6.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147
  1. .. Copyright (C) 2004-2008 The Trustees of Indiana University.
  2. Use, modification and distribution is subject to the Boost Software
  3. License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
  4. http://www.boost.org/LICENSE_1_0.txt)
  5. ===========================
  6. |Logo| Connected Components
  7. ===========================
  8. ::
  9. namespace graph {
  10. // Default constructed ParentMap
  11. template<typename Graph, typename ComponentMap, typename ParentMap>
  12. typename property_traits<ComponentMap>::value_type
  13. connected_components( const Graph& g, ComponentMap c);
  14. // User supplied ParentMap
  15. template<typename Graph, typename ComponentMap, typename ParentMap>
  16. typename property_traits<ComponentMap>::value_type
  17. connected_components( const Graph& g, ComponentMap c, ParentMap p);
  18. }
  19. The ``connected_components()`` function computes the connected
  20. components of an undirected graph. The distributed connected
  21. components algorithm uses the sequential version of the connected
  22. components algorithm to compute the connected components of the local
  23. subgraph, then executes the parallel phase of the algorithm. The
  24. parallel portion of the connected components algorithm is loosely
  25. based on the work of Goddard, Kumar, and Prins. The interface is a
  26. superset of the interface to the BGL `sequential connected
  27. components`_ algorithm.
  28. Prior to executing the sequential phase of the algorithm, each process
  29. identifies the roots of its local components. An adjacency list of
  30. all vertices adjacent to members of the component is then constructed
  31. at the root vertex of each component.
  32. The parallel phase of the distributed connected components algorithm
  33. consists of a series of supersteps. In each superstep, each root
  34. attempts to hook to a member of it's adjacency list by assigning it's
  35. parent pointer to that vertex. Hooking is restricted to vertices
  36. which are logically less than the current vertex to prevent looping.
  37. Vertices which hook successfully are removed from the list of roots
  38. and placed on another list of completed vertices. All completed
  39. vertices now execute a pointer jumping step until every completed
  40. vertex has as its parent the root of a component. This pointer
  41. jumping step may be further optimized by the addition of Cycle
  42. Reduction (CR) rules developed by Johnson and Metaxas, however current
  43. performance evaluations indicate that this would have a negligible
  44. impact on the overall performance of the algorithm. These CR rules
  45. reduce the number of pointer jumping steps from *O(n)* to *O(log n)*.
  46. Following this pointer jumping step, roots which have hooked in this
  47. phase transmit their adjacency list to their new parent. The
  48. remaining roots receive these edges and execute a pruning step on
  49. their adjacency lists to remove vertices that are now members of their
  50. component. The parallel phase of the algorithm is complete when no
  51. root successfully hooks. Once the parallel phase is complete a final
  52. pointer jumping step is performed on all vertices to assign the parent
  53. pointers of the leaves of the initial local subgraph components to
  54. their final parent which has now been determined.
  55. The single largest performance bottleneck in the distributed connected
  56. components algorithm is the effect of poor vertex distribution on the
  57. algorithm. For sparse graphs with a single large component, many
  58. roots may hook to the same component, resulting in severe load
  59. imbalance at the process owning this component. Several methods of
  60. modifying the hooking strategy to avoid this behavior have been
  61. implemented but none has been successful as of yet.
  62. .. contents::
  63. Where Defined
  64. -------------
  65. <``boost/graph/connected_components.hpp``>
  66. Parameters
  67. ----------
  68. IN: ``Graph& g``
  69. The graph typed must be a model of `Distributed Graph`_.
  70. OUT: ``ComponentMap c``
  71. The algorithm computes how many connected components are in the
  72. graph, and assigns each component an integer label. The algorithm
  73. then records to which component each vertex in the graph belongs by
  74. recording the component number in the component property map. The
  75. ``ComponentMap`` type must be a `Distributed Property Map`_. The
  76. value type must be the ``vertices_size_type`` of the graph. The key
  77. type must be the graph's vertex descriptor type. If you do not wish
  78. to compute component numbers, pass ``dummy_property_map`` as the
  79. component map and parent information will be provided in the parent
  80. map.
  81. UTIL: ``ParentMap p``
  82. A parent map may be supplied to the algorithm, if not supplied the
  83. parent map will be constructed automatically. The ``ParentMap`` type
  84. must be a `Distributed Property Map`_. The value type and key type
  85. must be the graph's vertex descriptor type.
  86. OUT: ``property_traits<ComponentMap>::value_type``
  87. The number of components found will be returned as the value type of
  88. the component map.
  89. Complexity
  90. ----------
  91. The local phase of the algorithm is *O(V + E)*. The parallel phase of
  92. the algorithm requires at most *O(d)* supersteps where *d* is the
  93. number of initial roots. *d* is at most *O(V)* but becomes
  94. significantly smaller as *E* increases. The pointer jumping phase
  95. within each superstep requires at most *O(c)* steps on each of the
  96. completed roots where *c* is the length of the longest cycle.
  97. Application of CR rules can reduce this to *O(log c)*.
  98. Performance
  99. -----------
  100. The following charts illustrate the performance of the Parallel BGL
  101. connected components algorithm. It performs well on very sparse and
  102. very dense graphs. However, for cases where the graph has a medium
  103. density with a giant connected component, the algorithm will perform
  104. poorly. This is a known problem with the algorithm and as far as we
  105. know all implemented algorithms have this degenerate behavior.
  106. .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeSparse&columns=9
  107. :align: left
  108. .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeSparse&columns=9&speedup=1
  109. .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeDense&columns=9
  110. :align: left
  111. .. image:: http://www.osl.iu.edu/research/pbgl/performance/chart.php?generator=ER,SF,SW&dataset=TimeDense&columns=9&speedup=1
  112. -----------------------------------------------------------------------------
  113. Copyright (C) 2004 The Trustees of Indiana University.
  114. Authors: Nick Edmonds, Douglas Gregor, and Andrew Lumsdaine
  115. .. |Logo| image:: pbgl-logo.png
  116. :align: middle
  117. :alt: Parallel BGL
  118. :target: http://www.osl.iu.edu/research/pbgl
  119. .. _Sequential connected components: http://www.boost.org/libs/graph/doc/connected_components.html
  120. .. _Distributed Graph: DistributedGraph.html
  121. .. _Distributed Property Map: distributed_property_map.html