Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
c417fd0
feat(docs): add RST validation tooling for documentation
lmeyerov Aug 10, 2025
1d3738a
fix(ci): use setup.py docs extras for RST linting dependencies
lmeyerov Aug 10, 2025
82e3d31
refactor(ci): remove redundant lint-docs-rst job
lmeyerov Aug 10, 2025
a69c8c8
refactor(docs): move .rstcheck.cfg to docs/ directory
lmeyerov Aug 10, 2025
5cce51b
refactor(docs): move validation guidance to ai/README.md
lmeyerov Aug 10, 2025
fca87b1
refactor(docs): move validate-docs.sh to docs/ directory
lmeyerov Aug 10, 2025
7b5127b
fix(docs): expand rstcheck config to ignore all Sphinx-specific direc…
lmeyerov Aug 10, 2025
0b20261
fix: make validate-docs.sh work in both Docker and local contexts
lmeyerov Aug 10, 2025
de45ae5
feat(gfql): Extract comprehensive GFQL implementation from PR #708
lmeyerov Jul 28, 2025
c6ba1f6
fix(ast): Resolve _get_child_validators type annotation compatibility
lmeyerov Jul 28, 2025
fa42746
feat(gfql): Add group_in_a_box_layout to Call safelist
lmeyerov Jul 29, 2025
22b8c41
feat(gfql): Add GraphOperation type constraints for let() bindings
lmeyerov Jul 31, 2025
cb25bca
fix(chain): resolve edge binding inconsistency in chain() function
lmeyerov Jul 31, 2025
84112eb
fix(types): Add type ignores for mixed type processing in GraphOperation
lmeyerov Aug 1, 2025
6609bfe
fix(lint): Fix flake8 errors with proper dynamic imports
lmeyerov Aug 1, 2025
d726b36
fix(lint): Fix unused imports in hop.py
lmeyerov Aug 1, 2025
b7f83fe
fix(lint): Fix critical lint issues for CI
lmeyerov Aug 1, 2025
61b509b
fix(lint): Fix actual lint failures in test files
lmeyerov Aug 1, 2025
1efa7ad
fix(types): Add type ignores for runtime attribute checks
lmeyerov Aug 2, 2025
9394c9a
fix(types): Add type ignore for GraphOperation in validate_chain_sche…
lmeyerov Aug 2, 2025
4134682
fix(test): Update call operation tests for GraphOperation constraints
lmeyerov Aug 2, 2025
a895b22
fix(typing): Restore PEP 561 py.typed configuration accidentally removed
lmeyerov Sep 24, 2025
dbc7e2a
refactor(gfql): Replace duplicate gfql_validation with deprecation pr…
lmeyerov Sep 25, 2025
0b1c12d
fix(validation): Remove type ignore and add handling for new AST types
lmeyerov Sep 25, 2025
a602520
fix(tests): Fix floating point precision issues in ring layout tests
lmeyerov Sep 25, 2025
0a5ed79
fix(docs): Fix GFQL remote notebook syntax
lmeyerov Sep 25, 2025
07c943a
docs: Update GFQL remote notebook to use clean Python API
lmeyerov Sep 25, 2025
19d9cad
docs: Add auto-generated API documentation files
lmeyerov Sep 22, 2025
d7bb505
docs(gfql): Add comprehensive builtin_calls reference documentation
lmeyerov Sep 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,7 @@ jobs:
unzip -l dist/graphistry*.whl | grep -q "graphistry/py.typed" || (echo "ERROR: py.typed marker missing from wheel - users won't get type information" && exit 1)
echo "✅ py.typed marker confirmed in wheel distribution"
test-docs:
needs: [changes, python-lint-types]
# Run if docs changed OR Python changed OR infrastructure changed OR manual/scheduled run
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,4 @@ AI_PROGRESS/
/PLAN.md
plans/
tmp/
test_env/
28 changes: 22 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,17 @@ The changelog format is based on [Keep a Changelog](https://keepachangelog.com/e
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and all PyGraphistry-specific breaking changes are explictly noted here.

## Dev
### Added
* GFQL: Add comprehensive validation framework with detailed error reporting
* Built-in validation: `Chain()` constructor validates syntax automatically
* Schema validation: `validate_chain_schema()` validates queries against DataFrame schemas
* Pre-execution validation: `g.chain(ops, validate_schema=True)` catches errors before execution
* Structured error types: `GFQLValidationError`, `GFQLSyntaxError`, `GFQLTypeError`, `GFQLSchemaError`
* Error codes (E1xx syntax, E2xx type, E3xx schema) for programmatic error handling
* Collect-all mode: `validate(collect_all=True)` returns all errors instead of fail-fast
* JSON validation: `Chain.from_json()` validates during parsing for safe LLM integration
* Helpful error suggestions for common mistakes
* Example notebook: `demos/gfql/gfql_validation_fundamentals.ipynb`

## [0.41.2 - 2025-08-28]

Expand All @@ -15,7 +26,6 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
* shared types in `embed_types.py` and `umap_types.py`
* Add `mode_action` to `.privacy`
* Fixed `contains`, `startswith`, `endswith`, and `match` predicates to prevent error when run with cuDF


## [0.41.1 - 2025-08-15]

Expand Down Expand Up @@ -45,13 +55,19 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
## [0.41.0 - 2025-07-26]

### Added
* Typing: Add PEP 561 type distribution support (#714)
* Add py.typed marker file to enable type checking with mypy, pyright, and PyCharm
* Configure MANIFEST.in and setup.cfg to include py.typed in source and wheel distributions
* Add CI validation to prevent regressions where py.typed might be accidentally removed
* Enables accurate type information and autocompletion for PyGraphistry APIs
* GFQL: Add comprehensive validation framework with detailed error reporting
* Built-in validation: `Chain()` constructor validates syntax automatically
* Schema validation: `validate_chain_schema()` validates queries against DataFrame schemas
* Pre-execution validation: `g.chain(ops, validate_schema=True)` catches errors before execution
* Structured error types: `GFQLValidationError`, `GFQLSyntaxError`, `GFQLTypeError`, `GFQLSchemaError`
* Error codes (E1xx syntax, E2xx type, E3xx schema) for programmatic error handling
* Collect-all mode: `validate(collect_all=True)` returns all errors instead of fail-fast
* JSON validation: `Chain.from_json()` validates during parsing for safe LLM integration
* Helpful error suggestions for common mistakes
* Example notebook: `demos/gfql/gfql_validation_fundamentals.ipynb`

### Fixed
* Docs: Fix case sensitivity in server toctree to link concurrency.rst (#723)
* Docs: Fix notebook validation error in hop_and_chain_graph_pattern_mining.ipynb by adding missing 'outputs' field to code cell

### Infra
Expand Down
114 changes: 94 additions & 20 deletions demos/gfql/gfql_remote.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "c9227361-7af6-4f52-b84e-3d7fd2f0f5b3",
"metadata": {
"execution": {
Expand All @@ -44,24 +44,8 @@
"shell.execute_reply.started": "2024-12-10T19:12:01.693232Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'0+unknown'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import graphistry\n",
"from graphistry import n, e_undirected, e_forward\n",
"graphistry.__version__"
]
"outputs": [],
"source": "import pandas as pd\nimport graphistry\nfrom graphistry import n, e_undirected, e_forward\n\n# Import Python API for cleaner syntax with let bindings\nfrom graphistry.compute.ast import ref, let, ASTCall\n\ngraphistry.__version__"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -1401,6 +1385,96 @@
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "fs1pabrqfaj",
"source": "## Combining Let Bindings with Call Operations\n\nLet bindings in GFQL allow you to create named intermediate results and compose complex operations. When combined with call operations in remote mode, you can orchestrate sophisticated graph analyses entirely on the server, minimizing data transfer and leveraging server-side GPU acceleration.",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "bs7rghntlp",
"source": "### Example 1: PageRank Analysis with Filtering\n\nThis example demonstrates using let bindings to:\n1. Compute PageRank scores\n2. Filter high-value nodes\n3. Extract subgraphs around important nodes\n4. Return results for visualization",
"metadata": {}
},
{
"cell_type": "code",
"id": "wurwk0xplp",
"source": "# Create a more complex graph for demonstration\ncomplex_edges = pd.DataFrame({\n 's': ['a', 'b', 'c', 'd', 'e', 'f', 'a', 'b', 'c', 'd'],\n 'd': ['b', 'c', 'd', 'e', 'f', 'a', 'c', 'd', 'e', 'f'],\n 'weight': [1, 2, 1, 3, 1, 2, 1, 2, 1, 1],\n 'type': ['follow', 'mention', 'follow', 'follow', 'mention', 'follow', 'mention', 'follow', 'follow', 'mention']\n})\n\ng_complex = graphistry.edges(complex_edges, 's', 'd').upload()\nprint(f\"Uploaded graph with {len(complex_edges)} edges\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"id": "mwr3948llv",
"source": "%%time\n\n# Define a complex query using Python API for cleaner syntax\npagerank_analysis_query = let({\n # Step 1: Compute PageRank scores\n 'with_pagerank': ASTCall('compute_pagerank'),\n \n # Step 2: Filter nodes with high PageRank scores\n 'important_nodes': ref('with_pagerank', [\n n({'filter': {'gte': [{'col': 'pagerank'}, 0.15]}})\n ]),\n \n # Step 3: Get 1-hop neighborhoods of important nodes\n 'important_neighborhoods': ref('important_nodes', [\n e_undirected({'hops': 1}),\n n()\n ])\n})\n\n# Note: The 'in' clause is automatically the last binding when using Python let()\n# To specify a different output, pass it as second argument: let(bindings, 'output_name')\n\n# Execute the query remotely - chain_remote accepts Python objects directly!\nresult = g_complex.chain_remote([pagerank_analysis_query])\n\nprint(f\"Result has {len(result._nodes)} nodes and {len(result._edges)} edges\")\nprint(\"\\nNodes with PageRank scores:\")\nprint(result._nodes)",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "jdzoowghv2q",
"source": "### Example 2: Multi-Stage Analysis with Different Edge Types\n\nThis example shows how to use let bindings to analyze different edge types separately and combine the results:",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "eps7v0ovlxi",
"source": "### Python API vs JSON Format Comparison\n\nThe examples above use the clean Python API. For reference, here's what the equivalent JSON format looks like:",
"metadata": {}
},
{
"cell_type": "code",
"id": "sc00048dhan",
"source": "# Comparison: Python API vs JSON format\n\n# Clean Python API (what we use above):\npython_query = let({\n 'data': ASTCall('compute_pagerank'),\n 'filtered': ref('data', [\n n({'filter': {'gte': [{'col': 'pagerank'}, 0.15]}})\n ])\n})\n\n# Equivalent verbose JSON format:\njson_query = {\n 'let': {\n 'data': {\n 'type': 'Call',\n 'function': 'compute_pagerank',\n 'params': {}\n },\n 'filtered': {\n 'type': 'Ref',\n 'ref': 'data',\n 'chain': [{\n 'type': 'Node',\n 'filter_dict': {\n 'filter': {'gte': [{'col': 'pagerank'}, 0.15]}\n }\n }]\n }\n },\n 'in': {'type': 'Ref', 'ref': 'filtered', 'chain': []}\n}\n\n# Both work with chain_remote:\n# result = g.chain_remote([python_query]) # Clean!\n# result = g.chain_remote([json_query]) # Verbose but equivalent\n\nprint(\"Python object converts to JSON:\")\nprint(python_query.to_json())",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"id": "40m2wl0yn3m",
"source": "%%time\n\n# Analyze different edge types using clean Python API\nedge_type_analysis = let({\n # Analyze follow edges\n 'follow_network': e_undirected({\n 'filter': {'eq': [{'col': 'type'}, 'follow']}\n }),\n \n # Compute centrality on follow network \n 'follow_centrality': ref('follow_network', [\n n(),\n ASTCall('compute_degree_centrality')\n ]),\n \n # Find mention patterns\n 'mention_edges': e_undirected({\n 'filter': {'eq': [{'col': 'type'}, 'mention']}\n }),\n \n # Get nodes that are both highly connected and frequently mentioned\n 'influential_nodes': ref('follow_centrality', [\n n({'filter': {'gte': [{'col': 'degree_centrality'}, 0.5]}}),\n ref('mention_edges', []),\n n()\n ])\n})\n\n# Execute remotely\ninfluential_result = g_complex.chain_remote([edge_type_analysis])\n\nprint(f\"Found {len(influential_result._nodes)} influential nodes\")\nprint(f\"Connected by {len(influential_result._edges)} edges\")\nprint(\"\\nInfluential nodes with centrality scores:\")\nprint(influential_result._nodes)",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "5y02h8p7mkk",
"source": "### Example 3: Conditional Analysis with Let Bindings\n\nThis example demonstrates using let bindings to perform conditional analysis based on graph properties:",
"metadata": {}
},
{
"cell_type": "code",
"id": "gmy30drvbts",
"source": "%%time\n\n# Complex analysis with multiple algorithms using Python API\ncomprehensive_analysis = let({\n # Base graph with PageRank computation\n 'enriched_graph': ASTCall('compute_pagerank'),\n \n # Add centrality metrics\n 'with_centrality': ref('enriched_graph', [\n ASTCall('compute_degree_centrality')\n ]),\n \n # Find bridge nodes (high PageRank, low-medium centrality)\n 'bridge_nodes': ref('with_centrality', [\n n({\n 'filter': {\n 'and': [\n {'gte': [{'col': 'pagerank'}, 0.1]},\n {'lte': [{'col': 'degree_centrality'}, 0.7]}\n ]\n }\n })\n ]),\n \n # Find hub nodes (high degree centrality)\n 'hub_nodes': ref('with_centrality', [\n n({'filter': {'gte': [{'col': 'degree_centrality'}, 0.7]}})\n ]),\n \n # Get connections between bridges and hubs\n 'critical_paths': ref('bridge_nodes', [\n e_forward({'to_nodes': ref('hub_nodes', [])}),\n n()\n ])\n})\n\n# Execute remotely with GPU acceleration\ncritical_paths_result = g_complex.chain_remote([comprehensive_analysis], engine='cudf')\n\nprint(f\"Critical paths network: {len(critical_paths_result._nodes)} nodes, {len(critical_paths_result._edges)} edges\")\n\n# Check if we got results\nif len(critical_paths_result._nodes) > 0:\n print(\"\\nCritical path nodes:\")\n print(critical_paths_result._nodes)\nelse:\n print(\"\\nNo critical paths found with current thresholds\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "wbt02937wz",
"source": "### Example 4: Visualization-Ready Analysis\n\nThis example shows how to prepare data for visualization by enriching it with multiple metrics and creating a focused subgraph:",
"metadata": {}
},
{
"cell_type": "code",
"id": "4glzmgi1u3s",
"source": "%%time\n\n# Prepare visualization-ready data with all enrichments\nviz_prep_query = {\n 'let': {\n # Compute all metrics - sequential operations\n 'with_pagerank': {\n 'call': {'method': 'compute_pagerank', 'args': [], 'kwargs': {}}\n },\n \n 'with_metrics': {\n 'type': 'Ref',\n 'ref': 'with_pagerank',\n 'chain': [\n {'call': {'method': 'compute_degree_centrality', 'args': [], 'kwargs': {}}},\n # Add node colors based on PageRank\n {\n 'call': {\n 'method': 'nodes',\n 'args': [],\n 'kwargs': {\n 'assign': {\n 'node_color': {\n 'case': [\n {\n 'when': {'gte': [{'col': 'pagerank'}, 0.2]},\n 'then': 65280 # Green for high PageRank\n },\n {\n 'when': {'gte': [{'col': 'pagerank'}, 0.15]},\n 'then': 16776960 # Yellow for medium\n }\n ],\n 'else': 16711680 # Red for low\n },\n 'node_size': {\n 'mul': [\n {'col': 'degree_centrality'},\n 50 # Scale factor\n ]\n }\n }\n }\n }\n }\n ]\n },\n \n # Add edge styling based on type and weight\n 'styled_graph': {\n 'type': 'Ref',\n 'ref': 'with_metrics',\n 'chain': [\n {\n 'call': {\n 'method': 'edges',\n 'args': [],\n 'kwargs': {\n 'assign': {\n 'edge_color': {\n 'case': [\n {\n 'when': {'eq': [{'col': 'type'}, 'follow']},\n 'then': 255 # Blue for follows\n }\n ],\n 'else': 16711935 # Magenta for mentions\n },\n 'edge_weight': {\n 'col': 'weight'\n }\n }\n }\n }\n }\n ]\n },\n \n # Focus on top nodes and their connections\n 'viz_subgraph': {\n 'type': 'Ref',\n 'ref': 'styled_graph',\n 'chain': [\n {\n 'n': {\n 'filter': {\n 'or': [\n {'gte': [{'col': 'pagerank'}, 0.15]},\n {'gte': [{'col': 'degree_centrality'}, 0.6]}\n ]\n }\n }\n },\n {'e_undirected': {'hops': 1}},\n {'n': {}}\n ]\n }\n },\n \n 'in': {'type': 'Ref', 'ref': 'viz_subgraph', 'chain': []}\n}\n\n# Get visualization-ready data\nviz_result = g_complex.chain_remote([viz_prep_query])\n\nprint(f\"Visualization subgraph: {len(viz_result._nodes)} nodes, {len(viz_result._edges)} edges\")\nprint(\"\\nNodes with visualization attributes:\")\nprint(viz_result._nodes)\nprint(\"\\nEdges with styling:\")\nprint(viz_result._edges)\n\n# Ready to visualize\n# viz_result.plot() # Uncomment to create visualization",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "p1o68dnaemk",
"source": "### Key Benefits of Let Bindings with Remote Calls\n\n1. **Server-Side Orchestration**: All operations happen on the server, minimizing data transfer\n2. **Named Intermediate Results**: Create readable, reusable steps in complex analyses\n3. **GPU Acceleration**: Leverage server GPU for compute-intensive operations like PageRank\n4. **Composability**: Build complex workflows from simple building blocks\n5. **Efficiency**: Avoid redundant computations by reusing named results\n\nWhen working with large graphs, this approach is particularly powerful as it allows you to:\n- Perform multiple analyses without downloading intermediate results\n- Chain together different algorithms and filters\n- Prepare visualization-ready data entirely on the server\n- Return only the final, filtered results you need",
"metadata": {}
}
],
"metadata": {
Expand All @@ -1424,4 +1498,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
Loading
Loading