Overview
Closeness centrality of a node is measured by the average shortest distance from the node to all other reachable nodes. The closer a node is to all other nodes, the more central the node is. This algorithm is widely used in applications such as discovering key social nodes and finding best locations for functional places.
Closeness Centrality algorithm is best to be applied in connected graph. For disconnected graph, its variant, the Harmonic Centrality, is recommended.
Closeness centrality takes on values between 0 to 1, nodes with higher scores have shorter distances to all other nodes.
Closeness centrality was originally defined by Alex Bavelas in 1950:
- A. Bavelas, Communication patterns in task-oriented groups (1950)
Concepts
Shortest Distance
The shortest distance of two nodes is the number of edges contained in the shortest path between them. Shortest path is searched by the BFS principle, if node A is regarded as the start node and node B is one of the K-hop neighbors of node A, then K is the shortest distance between A and B. Please read K-Hop All for the details about BFS and K-hop neighbor.
Examine the shortest distance between the red and green nodes in the above graph. Since the graph is undirected, no matter which node (red or green) to start, the other node is the 2-hop neighbor. Thus, the shortest distance between them is 2.
Examine the shortest distance between the red and green nodes after converting the undirected graph to directed graph, the edge direction should be considered now. Outgoing shortest distance from the red node to the green node is 4, incoming shortest distance from the green node to the red node is 3.
When edge weights are considered, the shortest distance between two nodes is the least sum of weights of the edges in the path between them.
Examine the shortest distance between the red and green nodes after assigning weights to edges. To minimize the total weight, a path with more edges is chosen, resulting in a total weight of 5.
Closeness Centrality
Closeness centrality score of a node defined by this algorithm is the inverse of the arithmetic mean of the shortest distances from the node to all other reachable nodes. The formula is:
where x
is the target node, y
is any node that connects with x
along edges (x
itself is excluded), k-1
is the number of y
, d(x,y)
is the shortest distance between x
and y
.
Calculate closeness centrality score of the red node in the incoming direction in the graph above. Only the blue, yellow and purple three nodes can reach the red node in this direction, so the score is 3 / (2 + 1 + 2) = 0.6
. Since the green and grey nodes cannot reach the red node in the incoming direction, they are not included in the calculation.
Closeness Centrality algorithm consumes considerable computing resources. For a graph with V nodes, it is recommended to perform (uniform) sampling when V > 10,000, and the suggested number of samples is the base-10 logarithm of the number of nodes (
log(V)
).
For each execution of the algorithm, sampling is performed only once, centrality score of each node is computed based on the shortest distance between the node and all sample nodes.
Considerations
- The closeness centrality score of isolated nodes is 0.
- When computing closeness centrality for a node, the unreachable nodes are excluded. For example, isolated nodes, nodes in other connected components, or nodes in the same connected component although cannot access in the specified direction, etc.
Example Graph
To create this graph:
// Runs each row separately in order in an empty graphset
create().node_schema("user").edge_schema("vote")
create().edge_property(@vote, "strength", uint32)
insert().into(@user).nodes([{_id:"A"},{_id:"B"},{_id:"C"},{_id:"D"},{_id:"E"},{_id:"F"},{_id:"G"},{_id:"H"}])
insert().into(@vote).edges([{_from:"A", _to:"B", strength:1}, {_from:"A", _to:"E", strength:3}, {_from:"B", _to:"B", strength:1}, {_from:"B", _to:"C", strength:2}, {_from:"C", _to:"A", strength:3}, {_from:"D", _to:"A", strength:2}, {_from:"F", _to:"G", strength:2}, {_from:"G", _to:"B", strength:3}, {_from:"H", _to:"G", strength:1}])
Running on HDC Graphs
Creating HDC Graph
To load the entire graph to the HDC server hdc-server-1
as hdc_cc
:
CALL hdc.graph.create("hdc-server-1", "hdc_cc", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
})
hdc.graph.create("hdc_cc", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
}).to("hdc-server-1")
Parameters
Algorithm name: closeness_centrality
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids |
[]_id |
/ | / | Yes | Specifies nodes for computation by their _id ; computes for all nodes if it is unset. |
uuids |
[]_uuid |
/ | / | Yes | Specifies nodes for computation by their _uuid ; computes for all nodes if it is unset. |
direction |
String | in , out |
/ | Yes | Specifies that the shortest paths shoud only contain incoming edges (in ) or outgoing edges (out ). |
edge_schema_property |
[]"<@schema.?><property> " |
/ | / | Yes | Numeric edge properties used as weights, summing values across the specified properties; edges without this property are ignored. |
impl_type |
String | dijkstra , delta_stepping , spfa , beta |
beta |
Yes | Specifies the weighted shortest paths to be computed by the Dijkstra, Delta-Stepping, SPFA or the default (beta ) Ultipa shortest path algorithm. This is only valid when edge_schema_property is used. |
sample_size |
Integer | -1 , -2 , [1, |V|] |
-2 |
Yes | Specifies the sampling strategy for computation; Sets to -1 to sample log(|V|) nodes, or a number between [1, |V|] to sample a specific number of nodes (|V| is the total number of nodes in the graph). Sets to -2 to perform no sampling. This option is only valid when all nodes are involved in the computation. |
return_id_uuid |
String | uuid , id , both |
uuid |
Yes | Includes _uuid , _id , or both to represent nodes in the results. |
limit |
Integer | ≥-1 | -1 |
Yes | Limits the number of results returned; -1 includes all results. |
order |
String | asc , desc |
/ | Yes | Sorts the results by closeness_centrality . |
File Writeback
CALL algo.closeness_centrality.write("hdc_cc", {
params: {
return_id_uuid: "id"
},
return_params: {
file: {
filename: "closeness"
}
}
})
algo(closeness_centrality).params({
project: "hdc_cc",
return_id_uuid: "id"
}).write({
file: {
filename: "closeness"
}
})
Result:
_id,closeness_centrality
G,0.538462
D,0.388889
F,0.368421
H,0.368421
B,0.636364
A,0.583333
E,0.388889
C,0.5
DB Writeback
Writes the closeness_centrality
values from the results to the specified node property. The property type is float
.
CALL algo.closeness_centrality.write("hdc_cc", {
params: {},
return_params: {
db: {
property: 'cc'
}
}
})
algo(closeness_centrality).params({
project: "hdc_cc"
}).write({
db:{
property: 'cc'
}
})
Full Return
CALL algo.closeness_centrality("hdc_cc", {
params: {
return_id_uuid: "id",
ids: ["A", "B"],
edge_schema_property: "strength"
},
return_params: {}
}) YIELD cc
RETURN cc
exec{
algo(closeness_centrality).params({
return_id_uuid: "id",
ids: ["A", "B"],
edge_schema_property: "strength"
}) as cc
return cc
} on hdc_cc
Result:
_id | closeness_centrality |
---|---|
A | 0.291667 |
B | 0.318182 |
Stream Return
CALL algo.closeness_centrality("hdc_cc", {
params: {
return_id_uuid: "id",
direction : "out",
order: "desc",
sample_size: -2
},
return_params: {
stream: {}
}
}) YIELD cc
FILTER cc.closeness_centrality > 0.5
RETURN cc
exec{
algo(closeness_centrality).params({
return_id_uuid: "id",
direction : "out",
order: "desc",
sample_size: -2
}).stream() as cc
where cc.closeness_centrality > 0.5
return cc
} on hdc_cc
Result:
_id | closeness_centrality |
---|---|
A | 0.75 |
C | 0.6 |
Running on Distributed Projections
Creating Distributed Projection
To project the entire graph to its shard servers as dist_cc
:
create().projection("dist_cc", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true
})
Parameters
Algorithm name: closeness_centrality
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
direction |
String | in , out |
/ | Yes | Specifies that the shortest paths shoud only contain incoming edges (in ) or outgoing edges (out ). |
sample_rate |
Float | (0, 1] | / | Yes | Specifies the proportion of nodes to sample for computation. |
limit |
Integer | ≥-1 | -1 |
Yes | Limits the number of results returned; -1 includes all results. |
order |
String | asc , desc |
/ | Yes | Sorts the results by closeness_centrality . |
File Writeback
CALL algo.closeness_centrality.write("dist_cc", {
params: {},
return_params: {
file: {
filename: "closeness"
}
}
})
algo(closeness_centrality).params({
project: "dist_cc"
}).write({
file: {
filename: "closeness"
}
})
Result:
_id,closeness_centrality
H,0.368421052631579
F,0.368421052631579
E,0.388888888888889
D,0.388888888888889
B,0.636363636363636
A,0.583333333333333
G,0.538461538461538
C,0.5
DB Writeback
Writes the closeness_centrality
values from the results to the specified node property. The property type is double
.
CALL algo.closeness_centrality.write("dist_cc", {
params: {},
return_params: {
db: {
property: 'cc'
}
}
})
algo(closeness_centrality).params({
project: "dist_cc"
}).write({
db:{
property: 'cc'
}
})