|
514 | 514 | "graph will correspond to similarity between pairs of nodes.\n",
|
515 | 515 | "\n",
|
516 | 516 | "Neural Structured Learning provides a graph building library to build a graph\n",
|
517 |
| - "based on sample embeddings. It uses **cosine similarity** as the similarity\n", |
518 |
| - "measure to compare embeddings and build edges between them. It also allows us to\n", |
519 |
| - "specify a similarity threshold, which can be used to discard dissimilar edges\n", |
520 |
| - "from the final graph. In this example, using 0.99 as the similarity threshold,\n", |
521 |
| - "we end up with a graph that has 445,327 bi-directional edges." |
| 517 | + "based on sample embeddings. It uses\n", |
| 518 | + "[**cosine similarity**](https://en.wikipedia.org/wiki/Cosine_similarity) as the\n", |
| 519 | + "similarity measure to compare embeddings and build edges between them. It also\n", |
| 520 | + "allows us to specify a similarity threshold, which can be used to discard\n", |
| 521 | + "dissimilar edges from the final graph. In this example, using 0.99 as the\n", |
| 522 | + "similarity threshold and 12345 as the random seed, we end up with a graph that\n", |
| 523 | + "has 429,415 bi-directional edges. Here we're using the graph builder's support\n", |
| 524 | + "for [locality-sensitive hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing)\n", |
| 525 | + "(LSH) to speed up graph building. For details on using the graph builder's LSH\n", |
| 526 | + "support, see the\n", |
| 527 | + "[`build_graph_from_config`](https://www.tensorflow.org/neural_structured_learning/api_docs/python/nsl/tools/build_graph_from_config)\n", |
| 528 | + "API documentation." |
522 | 529 | ]
|
523 | 530 | },
|
524 | 531 | {
|
|
531 | 538 | },
|
532 | 539 | "outputs": [],
|
533 | 540 | "source": [
|
534 |
| - "nsl.tools.build_graph(['/tmp/imdb/embeddings.tfr'],\n", |
535 |
| - " '/tmp/imdb/graph_99.tsv',\n", |
536 |
| - " similarity_threshold=0.99)" |
| 541 | + "graph_builder_config = nsl.configs.GraphBuilderConfig(\n", |
| 542 | + " similarity_threshold=0.99, lsh_splits=32, lsh_rounds=15, random_seed=12345)\n", |
| 543 | + "nsl.tools.build_graph_from_config(['/tmp/imdb/embeddings.tfr'],\n", |
| 544 | + " '/tmp/imdb/graph_99.tsv',\n", |
| 545 | + " graph_builder_config)" |
| 546 | + ] |
| 547 | + }, |
| 548 | + { |
| 549 | + "cell_type": "markdown", |
| 550 | + "metadata": { |
| 551 | + "colab_type": "text", |
| 552 | + "id": "4dk9xfQcK553" |
| 553 | + }, |
| 554 | + "source": [ |
| 555 | + "Each bi-directional edge is represented by two directed edges in the output TSV\n", |
| 556 | + "file, so that file contains 429,415 * 2 = 858,830 total lines:" |
| 557 | + ] |
| 558 | + }, |
| 559 | + { |
| 560 | + "cell_type": "code", |
| 561 | + "execution_count": null, |
| 562 | + "metadata": { |
| 563 | + "colab": {}, |
| 564 | + "colab_type": "code", |
| 565 | + "id": "dDPwTpZcJ3zF" |
| 566 | + }, |
| 567 | + "outputs": [], |
| 568 | + "source": [ |
| 569 | + "!wc -l /tmp/imdb/graph_99.tsv" |
537 | 570 | ]
|
538 | 571 | },
|
539 | 572 | {
|
|
1532 | 1565 | "collapsed_sections": [
|
1533 | 1566 | "24gYiJcWNlpA"
|
1534 | 1567 | ],
|
| 1568 | + "last_runtime": { |
| 1569 | + "build_target": "//learning/deepmind/public/tools/ml_python:ml_notebook", |
| 1570 | + "kind": "private" |
| 1571 | + }, |
1535 | 1572 | "name": "Graph regularization for sentiment classification using synthesized graphs",
|
1536 | 1573 | "private_outputs": true,
|
1537 | 1574 | "provenance": [],
|
|
0 commit comments