Change Password

Please enter the password.
Please enter the password. Between 8-64 characters. Not identical to your email address. Contain at least 3 of: uppercase, lowercase, numbers, and special characters.
Please enter the password.
Submit

Change Nickname

Current Nickname:
Submit

Apply New License

License Detail

Please complete this required field.

  • Ultipa Graph V4

Standalone

Please complete this required field.

Please complete this required field.

The MAC address of the server you want to deploy.

Please complete this required field.

Please complete this required field.

Cancel
Apply
ID
Product
Status
Cores
Applied Validity Period(days)
Effective Date
Excpired Date
Mac Address
Apply Comment
Review Comment
Close
Profile
  • Full Name:
  • Phone:
  • Company:
  • Company Email:
  • Country:
  • Language:
Change Password
Apply

You have no license application record.

Apply
Certificate Issued at Valid until Serial No. File
Serial No. Valid until File

Not having one? Apply now! >>>

Product Created On ID Amount (USD) Invoice
Product Created On ID Amount (USD) Invoice

No Invoice

v5.0
Search
    English
    v5.0

      PageRank

      HDC Distributed

      Overview

      PageRank was originally proposed in the context of World Wide Web (WWW), it takes advantage of the link structure of WWW to produce a global objective 'importance' ranking of webpages that can be used by search engines. This algorithm was proposed in 1997-1998 by Google co-founders Larry Page and Sergey Brin.

      With the development of technology and the emergence of enormous correlation data, PageRank has been adopted in many other fields too.

      Concepts

      Link Structure and PageRank

      In WWW, hypertexts contained in webpages create links between webpages. Every webpage (node) can have some forward links (via out-edges) and backlinks (via in-edges). In the following graph, A and B are backlinks of C, D is a forward link of C.

      Webpages vary greatly in terms of the number of backlinks they have. Naturally, webpages that are more important, authoritative or of high quality are likely to receive more or more important backlinks.

      PageRank can be described as this: a page has high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page has many backlinks and when a page has a few highly ranked backlinks.

      Rank Propagation

      The ranks (scores) of all pages are computed in a recursive way by starting with any set of ranks and iterating the computation until it converges. In each iteration, a page gives out its rank to all its forward links evenly to contribute to the ranks of the pages it points to; meanwhile every page receives ranks from its backlinks, so the rank of page u after one iteration is:

      where Bu is the backlink set of u.

      Below shows a steady state of a set of pages:

      Damping Factor

      Consider the following kinds of webpages:

      • Webpages with no backlinks. The rank they receive is 0, but they still need to be browsed in the Internet.
      • Webpages with no forward links. Their ranks are lost from the system.
      • A group of webpages that only point to pages within the group, but not any page outside the group.

      To overcome these problems, a damping factor, whose value is between 0 and 1, is introduced. It gives each webpage a base rank while weakening the ranks passed from backlinks. The rank of page u after one iteration becomes:

      where d is the damping factor. For example, when d is 0.7, if a webpage receives 8 ranks in total from backlinks, then the rank of this webpage is updated to 0.7*8 + (1-0.7) = 5.9.

      Damping factor can also be understood as the probability that a web surfer randomly jump to a webpage that is not one of the forward links of the current webpage.

      Considerations

      • The rank of isolated webpages will stay the same as the value of (1 - d).
      • Self-loop is regarded as a forward link and a backlink, a webpage would pass some rank to itself through self-loop. If a network has many self-loops, it will take more iterations to converge.

      Example Graph

      To create this graph:

      // Runs each row separately in order in an empty graphset
      create().node_schema("account").edge_schema("follow")
      insert().into(@account).nodes([{_id:"A"}, {_id:"B"}, {_id:"C"}, {_id:"D"}, {_id:"E"}, {_id:"F"}, {_id:"G"}, {_id:"H"}, {_id:"I"}, {_id:"J"}, {_id:"K"}, {_id:"L"}, {_id:"M"}, {_id:"N"}])
      insert().into(@follow).edges([{_from:"A", _to:"E"}, {_from:"B", _to:"E"}, {_from:"C", _to:"A"}, {_from:"C", _to:"H"}, {_from:"D", _to:"J"}, {_from:"E", _to:"G"}, {_from:"E", _to:"G"},{_from:"E", _to:"I"}, {_from:"E", _to:"N"}, {_from:"F", _to:"L"}, {_from:"F", _to:"B"}, {_from:"H", _to:"C"}, {_from:"H", _to:"E"}, {_from:"I", _to:"E"}, {_from:"J", _to:"E"}, {_from:"K", _to:"E"}, {_from:"K", _to:"M"}, {_from:"L", _to:"E"}, {_from:"L", _to:"F"}, {_from:"L", _to:"N"}, {_from:"M", _to:"E"}, {_from:"N", _to:"F"}])
      

      Running on HDC Graphs

      Creating HDC Graph

      To load the entire graph to the HDC server hdc-server-1 as hdc_pagerank:

      CALL hdc.graph.create("hdc-server-1", "hdc_pagerank", {
        nodes: {"*": ["*"]},
        edges: {"*": ["*"]},
        direction: "undirected",
        load_id: true,
        update: "static",
        query: "query",
        default: false
      })
      

      hdc.graph.create("hdc_pagerank", {
        nodes: {"*": ["*"]},
        edges: {"*": ["*"]},
        direction: "undirected",
        load_id: true,
        update: "static",
        query: "query",
        default: false
      }).to("hdc-server-1")
      

      Parameters

      Algorithm name: page_rank

      Name
      Type
      Spec
      Default
      Optional
      Description
      init_value Float >0 0.2 Yes The initial rank assigned to all nodes.
      loop_num Integer ≥1 5 Yes The maximum number of iteration rounds. The algorithm will terminate after completing all rounds.
      damping Float (0,1) 0.8 Yes The damping factor.
      weaken Integer 1, 2 1 Yes Keeps it as 1 for PageRank. Sets to 2 will run ArticleRank.
      return_id_uuid String uuid, id, both uuid Yes Includes _uuid, _id, or both values to represent nodes in the results.
      limit Integer ≥-1 -1 Yes Limits the number of results returned; -1 includes all results.
      order String asc, desc / Yes Sorts the results by rank.

      File Writeback

      CALL algo.page_rank.write("hdc_pagerank", {
        params: {
          return_id_uuid: "id",
          init_value: 1,
          loop_num: 50,
          damping: 0.8,
          weaken: 1,
          order: 'desc'
        },
        return_params: {
          file: {
            filename: "rank"
          }
        }
      })
      

      algo(page_rank).params({
        project: "hdc_pagerank",
        return_id_uuid: "id",
        init_value: 1,
        loop_num: 50,
        damping: 0.8,
        weaken: 1,
        order: 'desc'
      }).write({
        file: {
          filename: "rank"
        }
      })
      

      Result:

      _id,rank
      E,2.3906
      G,1.15624
      F,1.03774
      N,0.842146
      I,0.67812
      B,0.615097
      L,0.615097
      J,0.36
      C,0.333333
      A,0.333333
      H,0.333333
      M,0.28
      K,0.2
      D,0.2
      

      DB Writeback

      Writes the rank values from the results to the specified node property. The property type is float.

      CALL algo.page_rank.write("hdc_pagerank", {
        params: {
          loop_num: 50,
          weaken: 1
        },
        return_params: {
          db: {
            property: "rank"
          }
        }
      })
      

      algo(page_rank).params({
        project: "hdc_pagerank",
        loop_num: 50,
        weaken: 1
      }).write({
        db:{ 
          property: 'rank'
        }
      })
      

      Full Return

      CALL algo.page_rank("hdc_pagerank", {
        params: {
          return_id_uuid: "id",    
          init_value: 1,
          loop_num: 50,
          damping: 0.8,
          weaken: 1,
          order: 'desc',
          limit: 5
        },
        return_params: {}
      }) YIELD PR
      RETURN PR
      

      exec{
        algo(page_rank).params({
          return_id_uuid: "id",    
          init_value: 1,
          loop_num: 50,
          damping: 0.8,
          weaken: 1,
          order: 'desc',
          limit: 5
        }) as PR
        return PR
      } on hdc_pagerank
      

      Result:

      _id rank
      E 2.390599
      G 1.15624
      F 1.037742
      N 0.842146
      I 0.67812

      Stream Return

      CALL algo.page_rank("hdc_pagerank", {
        params: {
          return_id_uuid: "id",
          loop_num: 50,
          damping: 0.8,
          weaken: 1,
          order: 'desc',
          limit: 5
        },
        return_params: {
        	stream: {}
        }
      }) YIELD PR
      RETURN PR
      

      exec{
        algo(page_rank).params({
          return_id_uuid: "id",
          loop_num: 50,
          damping: 0.8,
          weaken: 1,
          order: 'desc',
          limit: 5
        }).stream() as PR
        return PR
      } on hdc_pagerank
      

      Result:

      _id rank
      E 2.390599
      G 1.15624
      F 1.037742
      N 0.842146
      I 0.67812

      Running on Distributed Projections

      Creating Distributed Projection

      To project the entire graph to its shard servers as dist_pagerank:

      create().projection("dist_pagerank", {
        nodes: {"*": ["*"]}, 
        edges: {"*": ["*"]},
        direction: "undirected",
        load_id: true
      })
      

      Parameters

      Algorithm name: page_rank

      Name
      Type
      Spec
      Default
      Optional
      Description
      init_value Float >0 0.2 Yes The initial rank assigned to all nodes.
      loop_num Integer ≥1 10 Yes The maximum number of iteration rounds. The algorithm will terminate after completing all rounds.
      damping Float (0,1) 0.8 Yes The damping factor.
      weaken Integer 1, 2 1 Yes Keeps it as 1 for PageRank. Sets to 2 will run ArticleRank.
      limit Integer ≥-1 -1 Yes Limits the number of results returned; -1 includes all results.
      order String asc, desc / Yes Sorts the results by rank.

      File Writeback

      CALL algo.page_rank.write("dist_pagerank", {
        params: {
          init_value: 1,
          loop_num: 50,
          damping: 0.8,
          weaken: 1,
          order: "desc"
        },
        return_params: {
          file: {
            filename: "rank"
          }
        }
      })
      

      algo(page_rank).params({
        project: "dist_pagerank",
        init_value: 1,
        loop_num: 50,
        damping: 0.8,
        weaken: 1,
        order: "desc"
      }).write({
        file: {
          filename: "rank"
        }
      })
      

      Result:

      _id,rank
      E,2.4451734081316898184
      G,1.1753836278518499103
      F,1.072201237069960067
      N,0.86041240546502095743
      I,0.68769181392592604318
      B,0.62905439446913602453
      L,0.62905439446913602453
      J,0.35999999999999998668
      H,0.33350809599999997612
      A,0.33350809599999997612
      C,0.33350809599999997612
      M,0.28000000000000002665
      D,0.2000000000000000111
      K,0.2000000000000000111
      

      DB Writeback

      Writes the rank values from the results to the specified node property. The property type is double.

      CALL algo.page_rank.write("dist_pagerank", {
        params: {
          loop_num: 50,
          weaken: 1
        },
        return_params: {
          db: {
            property: "rank"
          }
        }
      })
      

      algo(page_rank).params({
        project: "dist_pagerank",
        loop_num: 50,
        weaken: 1
      }).write({
        db:{ 
          property: 'rank'
        }
      })
      
      Please complete the following information to download this book
      *
      公司名称不能为空
      *
      公司邮箱必须填写
      *
      你的名字必须填写
      *
      你的电话必须填写