TextRank - Graph Analytics & Algorithms

Change Password

Submit

Change Email

Submit

Change Nickname

Current Nickname:

Submit

Profile

Account ID:

Full Name:
Phone:
Company:
Company Email:

Change Password

Apply

You have no license application record.

Apply

Certificate	Issued at	Valid until	Serial No.	File

Serial No.	Valid until	File

Not having one? Apply now! >>>

Product	Created On	ID	Amount (USD)	Invoice

Product	Created On	ID	Amount (USD)	Invoice

No Invoice

Create Ultipa Account

I agree to the Privacy Policy and the

Data Processing Agreement .

Please agree to continue.

Already have an Ultipa account? Sign in now!

Forgot Password

Reset Password

Back to sign in

TextRank

HDC

Overview

TextRank, derived from PageRank, is a graph-based ranking model for text processing. It can be used for various natural language processing tasks, including keyword extraction, keyphrase extraction, and text summarization.

R. Mihalcea, P. Tarau, TextRank: Bringing Order Into Texts (2004)

Concepts

Text as a Graph

To apply the TextRank algorithm, the text must first be represented as a graph .The structure of the graph depends on the specific application:

Nodes: Text units that best fit the task, such as words, collocations, or sentences, are added as nodes in the graph.
Edges: Relationships between text units, such as semantic similarity, co-occurrence, or contextual overlap, are used to connect nodes with edges.

Sample graph build for keyphrase extraction: nodes are selected lexical units from the text, and edges are established based on co-occurrence within a defined window of words (Source: Original paper)

TextRank Model

TextRank computes the ranks of all text units recursively using a "recommendation" mechanism, similar to the PageRank algorithm. It incorporates edge weights through a modified formula that integrates them effectively:

where,

Out(v) is the set of nodes that node v points to;
w_vu is the edge weight between nodes v and u;
d is the damping factor.

Considerations

The rank of isolated text unit will stay the same as the value of (1 - d).
A self-loop acts as both a successor and a predecessor, meaning a node can pass some rank to itself. If a network has many self-loops, it will take more iterations to converge.

Example Graph

To create this graph:

// Runs each row separately in order in an empty graphset
create().edge_property(@default, "weight", int32)
insert().into(@default).nodes([{_id:"A"}, {_id:"B"}, {_id:"C"}, {_id:"D"}, {_id:"E"}, {_id:"F"}, {_id:"G"}])
insert().into(@default).edges([{_from:"A", _to:"E", weight:3}, {_from:"B", _to:"A", weight:3}, {_from:"B", _to:"E", weight:2}, {_from:"C", _to:"A", weight:1}, {_from:"C", _to:"D", weight:4}, {_from:"D", _to:"E", weight:5}, {_from:"E", _to:"G", weight:2}, {_from:"F", _to:"B", weight:1}, {_from:"F", _to:"G", weight:3}])

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as hdc_textrank:

CALL hdc.graph.create("hdc-server-1", "hdc_textrank", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
})

hdc.graph.create("hdc_textrank", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
}).to("hdc-server-1")

Parameters

Algorithm name: text_rank

Name	Type	Spec	Default	Optional	Description
`init_value`	Float	>0	`0.2`	Yes	The initial rank assigned to all nodes.
`loop_num`	Integer	≥1	`5`	Yes	The maximum number of iteration rounds. The algorithm terminates after all iterations are completed.
`damping`	Float	(0,1)	`0.8`	Yes	The damping factor.
`max_change`	Float	≥0	`0`	Yes	The algorithm terminates when the changes in all ranks between iterations are less than the specified `max_change`, indicating that the result is stable. Sets to `0` to disable this criterion.
`edge_schema_property`	[]"`<@schema.?><property>`"	/	/	No	Numeric edge properties as weights, summing values across the specified properties; edges without the specified properties are ignored.
`return_id_uuid`	String	`uuid`, `id`, `both`	`uuid`	Yes	Includes `_uuid`, `_id`, or both values to represent nodes in the results.
`limit`	Integer	≥-1	`-1`	Yes	Limits the number of results returned; `-1` includes all results.
`order`	String	`asc`, `desc`	/	Yes	Sorts the results by `rank`.

File Writeback

CALL algo.text_rank.write("hdc_textrank", {
  params: {
    return_id_uuid: "id",
    init_value: 1,
    loop_num: 50,
    damping: 0.8,
    edge_schema_property: "weight",
    order: 'desc'
  },
  return_params: {
    file: {
      filename: "textrank"
    }
  }
})

algo(text_rank).params({
  projection: "hdc_textrank",
  return_id_uuid: "id",
  init_value: 1,
  loop_num: 50,
  damping: 0.8,
  edge_schema_property: "weight",
  order: 'desc'
}).write({
  file: {
    filename: "textrank"
  }
})

Result:

_id,text_rank
G,0.973568
E,0.81696
A,0.3472
D,0.328
B,0.24
F,0.2
C,0.2

DB Writeback

Writes the text_rank values from the results to the specified node property. The property type is double.

CALL algo.text_rank.write("hdc_textrank", {
  params: {
    loop_num: 50,
    edge_schema_property: "@default.weight"
  },
  return_params: {
    db: {
      property: "rank"
    }
  }
})

algo(text_rank).params({
  projection: "hdc_textrank",
  loop_num: 50,
  edge_schema_property: "@default.weight"
}).write({
  db:{ 
    property: 'rank'
  }
})

Full Return

CALL algo.text_rank("hdc_textrank", {
  params: {
    return_id_uuid: "id",    
    init_value: 1,
    loop_num: 50,
    damping: 0.8,
    edge_schema_property: "weight",
    order: 'desc',
    limit: 5
  },
  return_params: {}
}) YIELD TR
RETURN TR

exec{
  algo(text_rank).params({
    return_id_uuid: "id",    
    init_value: 1,
    loop_num: 50,
    damping: 0.8,
    edge_schema_property: "weight",
    order: 'desc',
    limit: 5
  }) as TR
  return TR
} on hdc_textrank

Result:

_id	text_rank
G	0.973568
E	0.81696
A	0.3472
D	0.328
B	0.24

Stream Return

CALL algo.text_rank("hdc_textrank", {
  params: {
    return_id_uuid: "id",
    loop_num: 50,
    damping: 0.8,
    edge_schema_property: "weight",
    order: 'desc',
    limit: 5
  },
  return_params: {
  	stream: {}
  }
}) YIELD TR
RETURN TR

exec{
  algo(text_rank).params({
    return_id_uuid: "id",
    loop_num: 50,
    damping: 0.8,
    edge_schema_property: "weight",
    order: 'desc',
    limit: 5
  }).stream() as TR
  return TR
} on hdc_textrank

Result:

_id	text_rank
G	0.973568
E	0.81696
A	0.3472
D	0.328
B	0.24

ID
Product
Status
Cores
Maximum Shard Services
Maximum Total Cores for Shard Service
Maximum HDC Services
Maximum Total Cores for HDC Service
Applied Validity Period(days)
Effective Date
Expired Date
Mac Address
Reason for Application
Review Comment