Overview
The k-Nearest Neighbors (kNN) algorithm is a classification technique that classifies the target node based on the classifications of its k nearest (most similar) nodes. kNN was proposed by T.M. Cover and P.E. Hart in 1967 and has since become one of the simplest and widely used classification algorithms:
- T.M. Cover, P.E. Hart, Nearest Neighbor Pattern Classification (1967)
Although containing the word neighbor in its name, kNN does not explicitly consider the edges between nodes when calculating similarity. Instead, it focuses solely on node properties.
Concepts
Similarity Metric
Ultipa's kNN algorithm computes pair-wise cosine similarity between the target node and all other nodes in the graph, then selects the k nodes with the highest similarity to the target node.
Vote on Classification
One node property is selected as the class label. After finding the nearest k nodes for the target node, assign the majority label among the k nodes to the target node.
If multiple labels occur with the highest frequency, the label of the node with the highest similarity will be selected.
Example Graph
To create this graph:
// Runs each row separately in order in an empty graphset
create().node_schema("image")
create().node_property(@image,"d1",int32).node_property(@image,"d2",int32).node_property(@image,"d3",int32).node_property(@image,"d4",int32).node_property(@image,"type",string)
insert().into(@image).nodes([{_id:"image1", d1:50, d2:160, d3:20, d4:35}, {_id:"image2", d1:42, d2:90, d3:30, d4:90, type:"Gold"}, {_id:"image3", d1:24, d2:50, d3:55, d4:70, type:"Silver"}, {_id:"image4", d1:38, d2:20, d3:32, d4:70, type:"Gold"}, {_id:"image5", d1:98, d2:10, d3:15, d4:36, type:"Copper"}, {_id:"image6", d1:51, d2:56, d3:44, d4:30, type:"Copper"}])
Creating HDC Graph
To load the entire graph to the HDC server hdc-server-1
as hdc_knn
:
CALL hdc.graph.create("hdc-server-1", "hdc_knn", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
})
hdc.graph.create("hdc_knn", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
}).to("hdc-server-1")
Parameters
Algorithm name: knn
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
node_id |
_uuid |
/ | / | No | Specifies the target node by its _uuid . |
node_schema_property |
[]"<@schema.?><property> " |
/ | / | No | Numeric node properties used to compute the cosine similarity; at least two properties are required. The schema must be the same as the target node. |
top_k |
Integer | >0 | / | No | The number of the nearest nodes to select. |
target_schema_property |
"<@schema.?><property> " |
/ | / | No | Numeric or string node property used as the class label. The schema must be the same as the target node. |
return_id_uuid |
String | uuid , id , both |
uuid |
Yes | Includes _uuid , _id , or both to represent nodes in the results. |
File Writeback
CALL algo.knn.write("hdc_knn", {
params: {
return_id_uuid: "id",
// Assigns image1 as the target node
node_id: 15420327323139833857,
node_schema_property: ["d1", "d2", "d3", "d4"],
top_k: 4,
target_schema_property: "@image.type"
},
return_params: {
file: {
filename: "knn.txt"
}
}
})
algo(knn).params({
project: "hdc_knn",
return_id_uuid: "id",
// Assigns image1 as the target node
node_id: 15420327323139833857,
node_schema_property: ["d1", "d2", "d3", "d4"],
top_k: 4,
target_schema_property: "@image.type"
}).write({
file: {
filename: "knn.txt"
}
})
Result:
Gold:2
top k(_id):
image2,0.85516
image6,0.841922
image3,0.705072
image4,0.538975
The first line in the file represents the majority label and its count among the k nearest nodes. Starting from the third line, it shows the top k nodes along with their similarity scores to the target node.
Full Return
CALL algo.knn("hdc_knn", {
params: {
return_id_uuid: "id",
// Assigns image1 as the target node
node_id: 15420327323139833857,
node_schema_property: ["d1", "d2", "d3", "d4"],
top_k: 4,
target_schema_property: "@image.type"
},
return_params: {}
}) YIELD r
RETURN r
exec{
algo(knn).params({
return_id_uuid: "id",
// Assigns image1 as the target node
node_id: 15420327323139833857,
node_schema_property: ["d1", "d2", "d3", "d4"],
top_k: 4,
target_schema_property: "@image.type"
}) as r
return r
} on hdc_knn
Result:
_ids | similarity |
---|---|
image2 | 0.85516 |
image6 | 0.841922 |
image3 | 0.705072 |
image4 | 0.538975 |
Stream Return
CALL algo.knn("hdc_knn", {
params: {
return_id_uuid: "id",
// Assigns image1 as the target node
node_id: 15420327323139833857,
node_schema_property: ["d1", "d2", "d3", "d4"],
top_k: 4,
target_schema_property: "type"
},
return_params: {
stream: {}
}
}) YIELD r
RETURN r
exec{
algo(knn).params({
return_id_uuid: "id",
// Assigns image1 as the target node
node_id: 15420327323139833857,
node_schema_property: ["d1", "d2", "d3", "d4"],
top_k: 4,
target_schema_property: "type"
}).stream() as r
return r
} on hdc_knn
Result:
_ids | similarity |
---|---|
image2 | 0.85516 |
image6 | 0.841922 |
image3 | 0.705072 |
image4 | 0.538975 |