Skip to content

Commit 537c11e

Browse files
Neural-Link Teamtensorflow-copybara
authored andcommitted
Updates the NSL IMDB tutorial to use the new LSH support when building the graph.
PiperOrigin-RevId: 326516612
1 parent 7df5817 commit 537c11e

File tree

1 file changed

+45
-8
lines changed

1 file changed

+45
-8
lines changed

g3doc/tutorials/graph_keras_lstm_imdb.ipynb

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -514,11 +514,18 @@
514514
"graph will correspond to similarity between pairs of nodes.\n",
515515
"\n",
516516
"Neural Structured Learning provides a graph building library to build a graph\n",
517-
"based on sample embeddings. It uses **cosine similarity** as the similarity\n",
518-
"measure to compare embeddings and build edges between them. It also allows us to\n",
519-
"specify a similarity threshold, which can be used to discard dissimilar edges\n",
520-
"from the final graph. In this example, using 0.99 as the similarity threshold,\n",
521-
"we end up with a graph that has 445,327 bi-directional edges."
517+
"based on sample embeddings. It uses\n",
518+
"[**cosine similarity**](https://en.wikipedia.org/wiki/Cosine_similarity) as the\n",
519+
"similarity measure to compare embeddings and build edges between them. It also\n",
520+
"allows us to specify a similarity threshold, which can be used to discard\n",
521+
"dissimilar edges from the final graph. In this example, using 0.99 as the\n",
522+
"similarity threshold and 12345 as the random seed, we end up with a graph that\n",
523+
"has 429,415 bi-directional edges. Here we're using the graph builder's support\n",
524+
"for [locality-sensitive hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing)\n",
525+
"(LSH) to speed up graph building. For details on using the graph builder's LSH\n",
526+
"support, see the\n",
527+
"[`build_graph_from_config`](https://www.tensorflow.org/neural_structured_learning/api_docs/python/nsl/tools/build_graph_from_config)\n",
528+
"API documentation."
522529
]
523530
},
524531
{
@@ -531,9 +538,35 @@
531538
},
532539
"outputs": [],
533540
"source": [
534-
"nsl.tools.build_graph(['/tmp/imdb/embeddings.tfr'],\n",
535-
" '/tmp/imdb/graph_99.tsv',\n",
536-
" similarity_threshold=0.99)"
541+
"graph_builder_config = nsl.configs.GraphBuilderConfig(\n",
542+
" similarity_threshold=0.99, lsh_splits=32, lsh_rounds=15, random_seed=12345)\n",
543+
"nsl.tools.build_graph_from_config(['/tmp/imdb/embeddings.tfr'],\n",
544+
" '/tmp/imdb/graph_99.tsv',\n",
545+
" graph_builder_config)"
546+
]
547+
},
548+
{
549+
"cell_type": "markdown",
550+
"metadata": {
551+
"colab_type": "text",
552+
"id": "4dk9xfQcK553"
553+
},
554+
"source": [
555+
"Each bi-directional edge is represented by two directed edges in the output TSV\n",
556+
"file, so that file contains 429,415 * 2 = 858,830 total lines:"
557+
]
558+
},
559+
{
560+
"cell_type": "code",
561+
"execution_count": null,
562+
"metadata": {
563+
"colab": {},
564+
"colab_type": "code",
565+
"id": "dDPwTpZcJ3zF"
566+
},
567+
"outputs": [],
568+
"source": [
569+
"!wc -l /tmp/imdb/graph_99.tsv"
537570
]
538571
},
539572
{
@@ -1532,6 +1565,10 @@
15321565
"collapsed_sections": [
15331566
"24gYiJcWNlpA"
15341567
],
1568+
"last_runtime": {
1569+
"build_target": "//learning/deepmind/public/tools/ml_python:ml_notebook",
1570+
"kind": "private"
1571+
},
15351572
"name": "Graph regularization for sentiment classification using synthesized graphs",
15361573
"private_outputs": true,
15371574
"provenance": [],

0 commit comments

Comments
 (0)