Overview
The Minimum Spanning Tree (MST) algorithm computes the spanning tree with the minimum sum of edge weights for each connected component in a graph.
The MST has various applications, such as network design, clustering, and optimization problems where minimizing the total cost or weight is important.
Concepts
Spanning Tree
A spanning tree is a connected subgraph that all nodes of a connected graph G= = (V, E) (or a connected component) and forms a tree (i.e., no circles). There may exist multiple spanning trees for a graph, and each spanning tree must have (|V| - 1) edges.
The 11 nodes in the graph below, along with the 10 edges highlighted in red, form a spanning tree of this graph:
Minimum Spanning Tree (MST)
The MST is the spanning tree that has the minimum sum of edge weights. The construction of the MST starts from a given start node. The choice of the start node does not impact the correctness of the MST algorithm, but it can affect the structure of the MST and the order in which edges are added. Different start nodes may result in different MSTs, but all of them will be valid MSTs for the given graph.
After assigning edge weights to the above example, the three possible MSTs with different start nodes are highlighted in red below:
Regarding the selection of start nodes:
- Each connected component requires only one start node. If multiple start nodes are specified, the first one will be considered valid.
- No MST will be computed for connected components that do not have a specified start node.
- Isolated nodes are not considered valid start nodes for computing the MST.
Considerations
- The MST algorithm ignores the direction of edges but calculates them as undirected edges.
Example Graph
To create this graph:
// Runs each row separately in order in an empty graphset
create().node_schema("electricCenter").node_schema("village").edge_schema("connects")
create().edge_property(@connects, "distance", float)
insert().into(@electricCenter).nodes([{_id:"A"}])
insert().into(@village).nodes([{_id:"B"}, {_id:"C"}, {_id:"D"}, {_id:"E"}, {_id:"F"}, {_id:"G"}, {_id:"H"}])
insert().into(@connects).edges([{_from:"A", _to:"B", distance: 1}, {_from:"B", _to:"C", distance: 1.3},{_from:"C", _to:"D", distance: 1}, {_from:"A", _to:"D", distance: 1.2}, {_from:"A", _to:"C", distance: 2.4}, {_from:"D", _to:"H", distance: 1.65}, {_from:"A", _to:"H", distance: 0.4}, {_from:"A", _to:"E", distance: 0.7}, {_from:"A", _to:"F", distance: 2.2}, {_from:"A", _to:"G", distance: 1.6}, {_from:"E", _to:"G", distance: 0.9}, {_from:"E", _to:"F", distance: 1.27}, {_from:"F", _to:"G", distance: 0.45}])
Creating HDC Graph
To load the entire graph to the HDC server hdc-server-1
as hdc_mst
:
CALL hdc.graph.create("hdc-server-1", "hdc_mst", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
})
hdc.graph.create("hdc_mst", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
}).to("hdc-server-1")
Parameters
Algorithm name: algo(mst)
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids |
[]_id |
/ | / | Yes | Specifies the start node by its _id for each connected component; the system will chooses the start nodes if it is unset. |
uuids |
[]_uuid |
/ | / | Yes | Specifies the start node by its _uuid for each connected component; the system will chooses the start nodes if it is unset. |
edge_schema_property |
[]"<@schema.?><property> " |
/ | / | No | Numeric edge properties used as weights; for each edge, the specified property with the smallest value is considered as its weight; edges without this property are ignored. |
return_id_uuid |
String | uuid , id , both |
uuid |
Yes | Includes _uuid , _id , or both to represent nodes in the results. Edges can only be represented by _uuid ; this option is only valid in File Writeback. |
limit |
Integer | ≥-1 | -1 |
Yes | Limits the number of results returned; -1 includes all results. |
File Writeback
CALL algo.mst.write("hdc_mst", {
params: {
return_id_uuid: "id",
ids: ["A"],
edge_schema_property: "distance"
},
return_params: {
file: {
filename: "paths"
}
}
})
algo(mst).params({
project: "hdc_mst",
return_id_uuid: "id",
ids: ["A"],
edge_schema_property: "distance"
}).write({
file: {
filename: "paths"
}
})
Result:
A--[107]--H
A--[108]--E
E--[111]--G
F--[113]--G
A--[101]--B
A--[104]--D
C--[103]--D
Full Return
CALL algo.mst("hdc_mst", {
params: {
ids: ["A"],
edge_schema_property: "@connects.distance"
},
return_params: {}
}) YIELD mst
RETURN mst
exec{
algo(mst).params({
ids: ["A"],
edge_schema_property: "@connects.distance"
}) as mst
return mst
} on hdc_mst
Result:
Stream Return
CALL algo.mst("hdc_mst", {
params: {
ids: ["A"],
edge_schema_property: "distance"
},
return_params: {}
}) YIELD mst
FOR e1 IN pedges(mst)
MATCH ()-[e2 WHERE e2._uuid = e1._uuid]->()
RETURN sum(e2.distance)
exec{
algo(mst).params({
ids: ["A"],
edge_schema_property: "distance"
}).stream() as mst
uncollect pedges(mst) as e1
find().edges({_uuid == e1._uuid}) as e2
return sum(e2.distance)
} on hdc_mst
Result: 5.65