graphistry · lmeyerov · Sep 25, 2025 · Aug 10, 2025 · Aug 10, 2025 · Aug 10, 2025
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -394,7 +394,7 @@ jobs:
         unzip -l dist/graphistry*.whl | grep -q "graphistry/py.typed" || (echo "ERROR: py.typed marker missing from wheel - users won't get type information" && exit 1)
         echo "✅ py.typed marker confirmed in wheel distribution"
 
-  
+
   test-docs:
     needs: [changes, python-lint-types]
     # Run if docs changed OR Python changed OR infrastructure changed OR manual/scheduled run

diff --git a/.gitignore b/.gitignore
@@ -97,3 +97,4 @@ AI_PROGRESS/
 /PLAN.md
 plans/
 tmp/
+test_env/
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,17 @@ The changelog format is based on [Keep a Changelog](https://keepachangelog.com/e
 This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and all PyGraphistry-specific breaking changes are explictly noted here.
 
 ## Dev
+### Added
+* GFQL: Add comprehensive validation framework with detailed error reporting
+  * Built-in validation: `Chain()` constructor validates syntax automatically
+  * Schema validation: `validate_chain_schema()` validates queries against DataFrame schemas
+  * Pre-execution validation: `g.chain(ops, validate_schema=True)` catches errors before execution
+  * Structured error types: `GFQLValidationError`, `GFQLSyntaxError`, `GFQLTypeError`, `GFQLSchemaError`
+  * Error codes (E1xx syntax, E2xx type, E3xx schema) for programmatic error handling
+  * Collect-all mode: `validate(collect_all=True)` returns all errors instead of fail-fast
+  * JSON validation: `Chain.from_json()` validates during parsing for safe LLM integration
+  * Helpful error suggestions for common mistakes
+  * Example notebook: `demos/gfql/gfql_validation_fundamentals.ipynb`
 
 ## [0.41.2 - 2025-08-28]
 
@@ -15,7 +26,6 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
   * shared types in `embed_types.py` and `umap_types.py`
 * Add `mode_action` to `.privacy`
 * Fixed `contains`, `startswith`, `endswith`, and `match` predicates to prevent error when run with cuDF
-
 
 ## [0.41.1 - 2025-08-15]
 
@@ -45,13 +55,19 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 ## [0.41.0 - 2025-07-26]
 
 ### Added
-* Typing: Add PEP 561 type distribution support (#714)
-  * Add py.typed marker file to enable type checking with mypy, pyright, and PyCharm
-  * Configure MANIFEST.in and setup.cfg to include py.typed in source and wheel distributions
-  * Add CI validation to prevent regressions where py.typed might be accidentally removed
-  * Enables accurate type information and autocompletion for PyGraphistry APIs
+* GFQL: Add comprehensive validation framework with detailed error reporting
+  * Built-in validation: `Chain()` constructor validates syntax automatically
+  * Schema validation: `validate_chain_schema()` validates queries against DataFrame schemas
+  * Pre-execution validation: `g.chain(ops, validate_schema=True)` catches errors before execution
+  * Structured error types: `GFQLValidationError`, `GFQLSyntaxError`, `GFQLTypeError`, `GFQLSchemaError`
+  * Error codes (E1xx syntax, E2xx type, E3xx schema) for programmatic error handling
+  * Collect-all mode: `validate(collect_all=True)` returns all errors instead of fail-fast
+  * JSON validation: `Chain.from_json()` validates during parsing for safe LLM integration
+  * Helpful error suggestions for common mistakes
+  * Example notebook: `demos/gfql/gfql_validation_fundamentals.ipynb`
 
 ### Fixed
+* Docs: Fix case sensitivity in server toctree to link concurrency.rst (#723)
 * Docs: Fix notebook validation error in hop_and_chain_graph_pattern_mining.ipynb by adding missing 'outputs' field to code cell
 
 ### Infra

diff --git a/demos/gfql/gfql_remote.ipynb b/demos/gfql/gfql_remote.ipynb
@@ -33,7 +33,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "id": "c9227361-7af6-4f52-b84e-3d7fd2f0f5b3",
    "metadata": {
     "execution": {
@@ -44,24 +44,8 @@
      "shell.execute_reply.started": "2024-12-10T19:12:01.693232Z"
     }
    },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'0+unknown'"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import pandas as pd\n",
-    "import graphistry\n",
-    "from graphistry import n, e_undirected, e_forward\n",
-    "graphistry.__version__"
-   ]
+   "outputs": [],
+   "source": "import pandas as pd\nimport graphistry\nfrom graphistry import n, e_undirected, e_forward\n\n# Import Python API for cleaner syntax with let bindings\nfrom graphistry.compute.ast import ref, let, ASTCall\n\ngraphistry.__version__"
   },
   {
    "cell_type": "code",
@@ -1401,6 +1385,96 @@
    "metadata": {},
    "outputs": [],
    "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fs1pabrqfaj",
+   "source": "## Combining Let Bindings with Call Operations\n\nLet bindings in GFQL allow you to create named intermediate results and compose complex operations. When combined with call operations in remote mode, you can orchestrate sophisticated graph analyses entirely on the server, minimizing data transfer and leveraging server-side GPU acceleration.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bs7rghntlp",
+   "source": "### Example 1: PageRank Analysis with Filtering\n\nThis example demonstrates using let bindings to:\n1. Compute PageRank scores\n2. Filter high-value nodes\n3. Extract subgraphs around important nodes\n4. Return results for visualization",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "wurwk0xplp",
+   "source": "# Create a more complex graph for demonstration\ncomplex_edges = pd.DataFrame({\n    's': ['a', 'b', 'c', 'd', 'e', 'f', 'a', 'b', 'c', 'd'],\n    'd': ['b', 'c', 'd', 'e', 'f', 'a', 'c', 'd', 'e', 'f'],\n    'weight': [1, 2, 1, 3, 1, 2, 1, 2, 1, 1],\n    'type': ['follow', 'mention', 'follow', 'follow', 'mention', 'follow', 'mention', 'follow', 'follow', 'mention']\n})\n\ng_complex = graphistry.edges(complex_edges, 's', 'd').upload()\nprint(f\"Uploaded graph with {len(complex_edges)} edges\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "code",
+   "id": "mwr3948llv",
+   "source": "%%time\n\n# Define a complex query using Python API for cleaner syntax\npagerank_analysis_query = let({\n    # Step 1: Compute PageRank scores\n    'with_pagerank': ASTCall('compute_pagerank'),\n    \n    # Step 2: Filter nodes with high PageRank scores\n    'important_nodes': ref('with_pagerank', [\n        n({'filter': {'gte': [{'col': 'pagerank'}, 0.15]}})\n    ]),\n    \n    # Step 3: Get 1-hop neighborhoods of important nodes\n    'important_neighborhoods': ref('important_nodes', [\n        e_undirected({'hops': 1}),\n        n()\n    ])\n})\n\n# Note: The 'in' clause is automatically the last binding when using Python let()\n# To specify a different output, pass it as second argument: let(bindings, 'output_name')\n\n# Execute the query remotely - chain_remote accepts Python objects directly!\nresult = g_complex.chain_remote([pagerank_analysis_query])\n\nprint(f\"Result has {len(result._nodes)} nodes and {len(result._edges)} edges\")\nprint(\"\\nNodes with PageRank scores:\")\nprint(result._nodes)",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "jdzoowghv2q",
+   "source": "### Example 2: Multi-Stage Analysis with Different Edge Types\n\nThis example shows how to use let bindings to analyze different edge types separately and combine the results:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eps7v0ovlxi",
+   "source": "### Python API vs JSON Format Comparison\n\nThe examples above use the clean Python API. For reference, here's what the equivalent JSON format looks like:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "sc00048dhan",
+   "source": "# Comparison: Python API vs JSON format\n\n# Clean Python API (what we use above):\npython_query = let({\n    'data': ASTCall('compute_pagerank'),\n    'filtered': ref('data', [\n        n({'filter': {'gte': [{'col': 'pagerank'}, 0.15]}})\n    ])\n})\n\n# Equivalent verbose JSON format:\njson_query = {\n    'let': {\n        'data': {\n            'type': 'Call',\n            'function': 'compute_pagerank',\n            'params': {}\n        },\n        'filtered': {\n            'type': 'Ref',\n            'ref': 'data',\n            'chain': [{\n                'type': 'Node',\n                'filter_dict': {\n                    'filter': {'gte': [{'col': 'pagerank'}, 0.15]}\n                }\n            }]\n        }\n    },\n    'in': {'type': 'Ref', 'ref': 'filtered', 'chain': []}\n}\n\n# Both work with chain_remote:\n# result = g.chain_remote([python_query])  # Clean!\n# result = g.chain_remote([json_query])    # Verbose but equivalent\n\nprint(\"Python object converts to JSON:\")\nprint(python_query.to_json())",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "code",
+   "id": "40m2wl0yn3m",
+   "source": "%%time\n\n# Analyze different edge types using clean Python API\nedge_type_analysis = let({\n    # Analyze follow edges\n    'follow_network': e_undirected({\n        'filter': {'eq': [{'col': 'type'}, 'follow']}\n    }),\n    \n    # Compute centrality on follow network  \n    'follow_centrality': ref('follow_network', [\n        n(),\n        ASTCall('compute_degree_centrality')\n    ]),\n    \n    # Find mention patterns\n    'mention_edges': e_undirected({\n        'filter': {'eq': [{'col': 'type'}, 'mention']}\n    }),\n    \n    # Get nodes that are both highly connected and frequently mentioned\n    'influential_nodes': ref('follow_centrality', [\n        n({'filter': {'gte': [{'col': 'degree_centrality'}, 0.5]}}),\n        ref('mention_edges', []),\n        n()\n    ])\n})\n\n# Execute remotely\ninfluential_result = g_complex.chain_remote([edge_type_analysis])\n\nprint(f\"Found {len(influential_result._nodes)} influential nodes\")\nprint(f\"Connected by {len(influential_result._edges)} edges\")\nprint(\"\\nInfluential nodes with centrality scores:\")\nprint(influential_result._nodes)",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5y02h8p7mkk",
+   "source": "### Example 3: Conditional Analysis with Let Bindings\n\nThis example demonstrates using let bindings to perform conditional analysis based on graph properties:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "gmy30drvbts",
+   "source": "%%time\n\n# Complex analysis with multiple algorithms using Python API\ncomprehensive_analysis = let({\n    # Base graph with PageRank computation\n    'enriched_graph': ASTCall('compute_pagerank'),\n    \n    # Add centrality metrics\n    'with_centrality': ref('enriched_graph', [\n        ASTCall('compute_degree_centrality')\n    ]),\n    \n    # Find bridge nodes (high PageRank, low-medium centrality)\n    'bridge_nodes': ref('with_centrality', [\n        n({\n            'filter': {\n                'and': [\n                    {'gte': [{'col': 'pagerank'}, 0.1]},\n                    {'lte': [{'col': 'degree_centrality'}, 0.7]}\n                ]\n            }\n        })\n    ]),\n    \n    # Find hub nodes (high degree centrality)\n    'hub_nodes': ref('with_centrality', [\n        n({'filter': {'gte': [{'col': 'degree_centrality'}, 0.7]}})\n    ]),\n    \n    # Get connections between bridges and hubs\n    'critical_paths': ref('bridge_nodes', [\n        e_forward({'to_nodes': ref('hub_nodes', [])}),\n        n()\n    ])\n})\n\n# Execute remotely with GPU acceleration\ncritical_paths_result = g_complex.chain_remote([comprehensive_analysis], engine='cudf')\n\nprint(f\"Critical paths network: {len(critical_paths_result._nodes)} nodes, {len(critical_paths_result._edges)} edges\")\n\n# Check if we got results\nif len(critical_paths_result._nodes) > 0:\n    print(\"\\nCritical path nodes:\")\n    print(critical_paths_result._nodes)\nelse:\n    print(\"\\nNo critical paths found with current thresholds\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "wbt02937wz",
+   "source": "### Example 4: Visualization-Ready Analysis\n\nThis example shows how to prepare data for visualization by enriching it with multiple metrics and creating a focused subgraph:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "4glzmgi1u3s",
+   "source": "%%time\n\n# Prepare visualization-ready data with all enrichments\nviz_prep_query = {\n    'let': {\n        # Compute all metrics - sequential operations\n        'with_pagerank': {\n            'call': {'method': 'compute_pagerank', 'args': [], 'kwargs': {}}\n        },\n        \n        'with_metrics': {\n            'type': 'Ref',\n            'ref': 'with_pagerank',\n            'chain': [\n                {'call': {'method': 'compute_degree_centrality', 'args': [], 'kwargs': {}}},\n                # Add node colors based on PageRank\n                {\n                    'call': {\n                        'method': 'nodes',\n                        'args': [],\n                        'kwargs': {\n                            'assign': {\n                                'node_color': {\n                                    'case': [\n                                        {\n                                            'when': {'gte': [{'col': 'pagerank'}, 0.2]},\n                                            'then': 65280  # Green for high PageRank\n                                        },\n                                        {\n                                            'when': {'gte': [{'col': 'pagerank'}, 0.15]},\n                                            'then': 16776960  # Yellow for medium\n                                        }\n                                    ],\n                                    'else': 16711680  # Red for low\n                                },\n                                'node_size': {\n                                    'mul': [\n                                        {'col': 'degree_centrality'},\n                                        50  # Scale factor\n                                    ]\n                                }\n                            }\n                        }\n                    }\n                }\n            ]\n        },\n        \n        # Add edge styling based on type and weight\n        'styled_graph': {\n            'type': 'Ref',\n            'ref': 'with_metrics',\n            'chain': [\n                {\n                    'call': {\n                        'method': 'edges',\n                        'args': [],\n                        'kwargs': {\n                            'assign': {\n                                'edge_color': {\n                                    'case': [\n                                        {\n                                            'when': {'eq': [{'col': 'type'}, 'follow']},\n                                            'then': 255  # Blue for follows\n                                        }\n                                    ],\n                                    'else': 16711935  # Magenta for mentions\n                                },\n                                'edge_weight': {\n                                    'col': 'weight'\n                                }\n                            }\n                        }\n                    }\n                }\n            ]\n        },\n        \n        # Focus on top nodes and their connections\n        'viz_subgraph': {\n            'type': 'Ref',\n            'ref': 'styled_graph',\n            'chain': [\n                {\n                    'n': {\n                        'filter': {\n                            'or': [\n                                {'gte': [{'col': 'pagerank'}, 0.15]},\n                                {'gte': [{'col': 'degree_centrality'}, 0.6]}\n                            ]\n                        }\n                    }\n                },\n                {'e_undirected': {'hops': 1}},\n                {'n': {}}\n            ]\n        }\n    },\n    \n    'in': {'type': 'Ref', 'ref': 'viz_subgraph', 'chain': []}\n}\n\n# Get visualization-ready data\nviz_result = g_complex.chain_remote([viz_prep_query])\n\nprint(f\"Visualization subgraph: {len(viz_result._nodes)} nodes, {len(viz_result._edges)} edges\")\nprint(\"\\nNodes with visualization attributes:\")\nprint(viz_result._nodes)\nprint(\"\\nEdges with styling:\")\nprint(viz_result._edges)\n\n# Ready to visualize\n# viz_result.plot()  # Uncomment to create visualization",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "p1o68dnaemk",
+   "source": "### Key Benefits of Let Bindings with Remote Calls\n\n1. **Server-Side Orchestration**: All operations happen on the server, minimizing data transfer\n2. **Named Intermediate Results**: Create readable, reusable steps in complex analyses\n3. **GPU Acceleration**: Leverage server GPU for compute-intensive operations like PageRank\n4. **Composability**: Build complex workflows from simple building blocks\n5. **Efficiency**: Avoid redundant computations by reusing named results\n\nWhen working with large graphs, this approach is particularly powerful as it allows you to:\n- Perform multiple analyses without downloading intermediate results\n- Chain together different algorithms and filters\n- Prepare visualization-ready data entirely on the server\n- Return only the final, filtered results you need",
+   "metadata": {}
   }
  ],
  "metadata": {
@@ -1424,4 +1498,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -97,3 +97,4 @@ AI_PROGRESS/ @@
     /PLAN.md
     plans/
     tmp/
+    test_env/