-
Notifications
You must be signed in to change notification settings - Fork 217
Feature/sentinel kql #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature/sentinel kql #743
Conversation
- Add SentinelConfig dataclass for connection configuration - Add SentinelQueryResult class for query results - Add custom exceptions for connection and query errors - Support for Azure authentication credentials
- Add SentinelMixin class extending Plottable - Implement configure_sentinel() with multiple auth methods - Support Azure CLI, Service Principal, and custom credentials - Add sentinel_from_client() for existing client reuse - Implement health check and basic query infrastructure - Add client initialization with DefaultAzureCredential support
- Implement kql() method with timespan and nested data support - Add kql_last() convenience method for recent data queries - Add sentinel_tables() and sentinel_schema() helper methods - Port nested data unwrapping from Kusto plugin - Support for multiple table responses - Handle JSON strings and dynamic columns in Sentinel results
- Add SentinelMixin import to plotter.py - Include SentinelMixin in Plotter class hierarchy - Update documentation to list Sentinel integration
- Add azure-monitor-query>=1.2.0 and azure-identity>=1.12.0 to sentinel extras - Add 'Sentinel' to package keywords for discoverability
- Test configuration methods (basic, service principal, custom credential) - Test KQL query execution (single/multiple tables, timespan) - Test helper methods (kql_last, sentinel_tables, sentinel_schema) - Test health check functionality - Test nested data unwrapping and JSON parsing - Test authentication initialization flows - Add mock-based testing to avoid API dependencies
- Demonstrate Azure CLI and Service Principal authentication - Show security use cases: failed logins, alerts, network analysis - Include graph visualizations for user-IP and alert correlations - Provide examples of multi-table KQL queries - Cover workspace exploration and schema inspection - Add troubleshooting guidance and next steps
- Replace hardcoded credentials with environment variables - Add example.env template file - Support custom .env file locations - Include python-dotenv dependency instructions - Improve security by avoiding credential commits
- Replace username/password with personal_key_id/personal_key_secret - Update example.env with new credential format - Use modern Graphistry authentication method - Maintain security with environment variables
- Add configure_sentinel and sentinel_from_client to __init__.py - Add corresponding methods in pygraphistry.py - Fix module-level access to Sentinel functionality
- Add configure_sentinel and sentinel_from_client assignments - Follow same pattern as other plugins (Kusto, Spanner) - Enable direct import from graphistry module
- Add use_device_auth parameter to configure_sentinel() - Support DeviceCodeCredential for interactive authentication - Show code and URL for authentication like Kusto plugin - Update type definitions and method signatures - Provide authentication precedence documentation
- Support both object columns (with .name/.type) and string columns - Default to 'string' type when type info not available - Handle missing table.name attribute gracefully - Fix AttributeError in query response processing
- Replace union withsource=TableName query that caused conflicts - Use Usage table to get DataType (table names) instead - Extend timespan to 30 days for better table coverage - Avoid SEM0001 semantic error from existing TableName columns
- Fix references from TableName to DataType column - Clean up notebook code for consistent formatting - Add device authentication example - Ensure table listing and schema queries work correctly
- Replace bind() with encode_point_color() and encode_edge_color() - Use encode_edge_size() for edge weight visualization - Fix method signatures to match Graphistry API
- Add better error handling and user guidance - Include troubleshooting tips and alternative auth methods - Improve error messages and empty result handling - Add comprehensive summary and next steps - Make notebook more robust for different workspace configurations
- Import SentinelMixin to fix F821 undefined name errors - Resolves flake8 lint issues in test_sentinel.py
- Fix W504: Move binary operator before line break - Fix W292: Add newlines at end of all files - Fix F841: Remove unused local variables in tests - All flake8 lint issues now resolved
"\n", | ||
"## Getting Started\n", | ||
"\n", | ||
"### Option 1: Azure CLI Authentication (Recommended for Development)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an azure cli auth link to here + option 2?
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": "# Alternative: Service Principal authentication from .env file\n# Uncomment the lines below if you prefer Service Principal over device authentication\n# g = graphistry.configure_sentinel(\n# workspace_id=os.getenv('SENTINEL_WORKSPACE_ID'),\n# tenant_id=os.getenv('AZURE_TENANT_ID'),\n# client_id=os.getenv('AZURE_CLIENT_ID'),\n# client_secret=os.getenv('AZURE_CLIENT_SECRET')\n# )\n\n# Alternative: Use DefaultAzureCredential (tries Azure CLI, Managed Identity, etc.)\n# g = graphistry.configure_sentinel(\n# workspace_id=os.getenv('SENTINEL_WORKSPACE_ID')\n# )" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something i like to do is
if False:
....
else:
print('Skipped, switch if statement to enable')
That way they don't have to edit all the #
's
g.sentinel_health_check() # Verify connection works | ||
""" | ||
try: | ||
self._sentinel_query("Heartbeat | take 1", timespan=timedelta(hours=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chatgpt thinks this is unsafe as it assumes azure log analytics, while union withsource=TableName * | take 1
can work on sentinel without
"\n", | ||
"1. **Azure Access**: You need access to a Microsoft Sentinel workspace\n", | ||
"2. **Authentication**: Either Azure CLI (`az login`) or service principal credentials\n", | ||
"3. **Dependencies**: Install required packages\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add 4.a..4.z
with assumptions about Sentinel setup, and if they don't satisfy any individual, rec workarounds?
"## Prerequisites\n", | ||
"\n", | ||
"1. **Azure Access**: You need access to a Microsoft Sentinel workspace\n", | ||
"2. **Authentication**: Either Azure CLI (`az login`) or service principal credentials\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
add docs link?
-
and if az login mode, do we need to qualify az login running in same server/container as the notebook, or ?
"source": [ | ||
"# Microsoft Sentinel Security Analysis with Graphistry\n", | ||
"\n", | ||
"This notebook demonstrates how to use Graphistry with Microsoft Sentinel (Log Analytics) to perform security analysis and visualization using KQL queries.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qualify which visualizations we'll do, like ... graph visualization using KQL queries for logins, alerts, traffic, and other built-in sentinel tables
"source": [ | ||
"## Security Analysis Examples\n", | ||
"\n", | ||
"### 1. Failed Login Analysis" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a separate analysis section without graphistry viz, I would instead fold Graph Visualization
into Security Analysis Examples
so just 1 section of use cases, where each is like
## Security Analysis Examples
### Failed logins: IP <> Principal <> Device graph
#### 1. Analysis
...
#### 2. Visualization
### Next use case ...
All of the use cases sound relevant
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Security Analysis Examples\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there are a lot of use cases, add a bulleted list enumerating which will be here?
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "## Summary\n\nThis notebook demonstrated:\n\n1. **Connecting to Microsoft Sentinel** using Azure authentication (device code, service principal, or DefaultAzureCredential)\n2. **Exploring available data** with `sentinel_tables()` and `sentinel_schema()`\n3. **Security analysis** using KQL queries for:\n - Failed login analysis\n - Security alerts monitoring\n - Network traffic analysis\n4. **Graph visualization** of:\n - User-IP relationships\n - Alert correlations\n5. **Advanced correlation** across multiple data sources\n\n## Next Steps\n\n- **Customize queries** for your specific security use cases and available data tables\n- **Create automated dashboards** by scheduling notebook execution\n- **Integrate with threat intelligence** feeds using additional KQL joins\n- **Build detection rules** based on graph patterns you discover\n- **Scale analysis** by adjusting time windows and data volumes\n\n## Troubleshooting Tips\n\n- **No data found**: Some workspaces may not have SecurityEvent, SigninLogs, or SecurityAlert tables\n- **Authentication issues**: Try `az login` first, or check your service principal credentials\n- **Permission errors**: Ensure your account has Log Analytics Reader permissions\n- **Empty results**: Adjust time ranges - some workspaces have limited data retention\n\n## Resources\n\n- [Microsoft Sentinel KQL Reference](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/)\n- [Graphistry Documentation](https://pygraphistry.readthedocs.io/)\n- [Azure Monitor Query Documentation](https://docs.microsoft.com/en-us/python/api/azure-monitor-query/)\n- [Sentinel Data Connectors](https://docs.microsoft.com/en-us/azure/sentinel/connect-data-sources)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next steps should include links, especially for most recommended
- Try for free with your graphistry hub account and jupyter / google colab notebooks
- Check out louie.ai for helping generating these queries and visualizations through genai
- Check out the Kusto tutorial
- Learn more about complementary pygraphistry features like UMAP for detecting patterns & outliers on events, GFQL for mining graphs once you've made them, and how to control graph colors and layouts
etc
"try:\n", | ||
" failed_logins = g.kql(failed_logins_query, timespan=timedelta(days=7))\n", | ||
" print(f\"Found {len(failed_logins)} users with multiple failed logins\")\n", | ||
" print(failed_logins.head())\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for all these df prints, you either want df.head()
as the last cell line, or something like display(df.head())
, so we get native notebook rendering
" print(f\"Created graph with {len(nodes)} nodes and {len(edges)} edges\")\n", | ||
" \n", | ||
" # Plot the graph\n", | ||
" graph.plot()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be the last line right? or a display()
?
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Graph Visualization\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure we have live plots in the output, so readthedocs will render
(unfortunately github won't)
unwrap_nested: Optional[bool] = None, | ||
single_table: bool = True | ||
single_table: bool = True, | ||
include_statistics: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume you tested kql didn't regress?
|
||
dfs: List[pd.DataFrame] = [] | ||
|
||
for result in results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can most of the wrangling in this file go away and be reused from kusto, so we're not maintaining 2 clones?
i'm not sure how close/different they are and how
ex: if just the initial connection changes, then we can parameterize kusto by the client connection
|
||
return dfs | ||
|
||
def kql_last( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i recommend removing kql_last()
as conceptual bloat of surface area for users: seems unnecessary as can be folded into kql()
as part of timespan param, like if timespan=...
is an int, autoconvert it to hours
# Use Usage table to get all table names - this avoids union conflicts | ||
query = """ | ||
Usage | ||
| where TimeGenerated > ago(30d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeGenerated > ago(30d)
is surprising to me
If desired as a convenience, add an optional param with a less surprising default? Eg, if not specified, drop that constraint?
""" | ||
return self.kql(query, timespan=timedelta(days=30)) | ||
|
||
def sentinel_schema(self, table: str) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helpful to document expected returned dtypes
print(schema[['ColumnName', 'DataType']]) | ||
""" | ||
query = f"{table} | getschema" | ||
return self.kql(query, timespan=timedelta(minutes=5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why timespan here?
|
||
# Check for partial failures | ||
if response.status == LogsQueryStatus.PARTIAL: | ||
logger.warning(f"Query returned partial results: {response.partial_error}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should warnings raise too? or we make it an optional default-off param to raise on warnings on kql+here ?
i'm unsure how freq & fatal warnings etc are here
if a logger.warning, louie wouldn't see it...
if response.status == LogsQueryStatus.PARTIAL: | ||
logger.warning(f"Query returned partial results: {response.partial_error}") | ||
elif response.status == LogsQueryStatus.FAILURE: | ||
raise SentinelQueryError(f"Query failed: {response.partial_error}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can propagate the partial result / response too so louie can inspect deeper.. esp in warning case..
from azure.monitor.query import LogsQueryClient | ||
|
||
try: | ||
assert cfg.workspace_id is not None, "workspace_id is not set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this get set? Give a more actionable error as may bubble up to user
|
||
if cfg.credential: | ||
credential = cfg.credential | ||
logger.info("Using custom credential object for Sentinel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug not info
client_id=cfg.client_id, | ||
client_secret=cfg.client_secret | ||
) | ||
logger.info(f"Using Service Principal authentication for workspace {cfg.workspace_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug not info
logger.info("You will be prompted to visit a URL and enter a code to authenticate") | ||
else: | ||
credential = DefaultAzureCredential() | ||
logger.info(f"Using DefaultAzureCredential (Azure CLI, Managed Identity, etc.) for workspace {cfg.workspace_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug not info
tenant_id=cfg.tenant_id # Optional, uses common tenant if not provided | ||
) | ||
logger.info(f"Using Device Code authentication for workspace {cfg.workspace_id}") | ||
logger.info("You will be prompted to visit a URL and enter a code to authenticate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this goes to the user for action afaict, so info seems right, and/or maybe print, unsure
return isinstance(val, (dict, list)) | ||
|
||
|
||
def _unwrap_nested(result: SentinelQueryResult) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah all this kind of stuff we want to combine as much as we can w/ kusto impl...
# to pass along args/kwargs to the next mixin in the chain | ||
class Plotter( | ||
KustoMixin, SpannerMixin, | ||
SentinelMixin, KustoMixin, SpannerMixin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this override kusto as both have kql()
?
'nodexl': ['openpyxl==3.1.0', 'xlrd'], | ||
'jupyter': ['ipython'], | ||
'spanner': ['google-cloud-spanner'], | ||
'sentinel': ['azure-monitor-query>=1.2.0', 'azure-identity>=1.12.0'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure about these restrictions or can be unpinned?
we prob want to preinstall pinned in graphistry and louie, so guidance here helps too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments for generic things I noticed (use judgement if critical)
Most prominent I would recommend looking at:
- does this break kusto, eg, overrides
kusto()
call, and thus a shared impl would make more sense - ... there's a bunch of df wrangling, can that be shared with the
kusto()
impl? - Next Steps section in notebook seems untargeted, should be clickable links to more pyg/louie stuff (+ hub if a non-user)
🎯 Overview
This PR introduces a new plugin for Microsoft Sentinel (Azure Log
Analytics) that enables users to query security data using KQL (Kusto
Query Language) and create graph visualizations directly from their
Sentinel workspaces.
✨ What's New
Core Features
workspaces using KQL
common in Sentinel
Integration
Kusto and Spanner plugins
📝 Changes
New Files
structures
.ipynb - Example notebook
template
Modified Files
azure-identity>=1.12.0
🔒 Security Analysis Use Cases
The notebook demonstrates real-world security analysis scenarios:
🧪 Testing
📚 Documentation
🔑 Authentication
Supports multiple authentication patterns for different use cases:
Device code (interactive)
g = graphistry.configure_sentinel(workspace_id="...",
use_device_auth=True)
Service Principal (production)
g = graphistry.configure_sentinel(
workspace_id="...",
tenant_id="...",
client_id="...",
client_secret="..."
)
DefaultAzureCredential (flexible)
g = graphistry.configure_sentinel(workspace_id="...")
📊 Example Usage
import graphistry
from datetime import timedelta
Configure connection
g = graphistry.configure_sentinel(
workspace_id="your-workspace-id",
use_device_auth=True
)
Query failed logins
failed_logins = g.kql("""
SigninLogs
| where ResultType != "0"
| summarize Count=count() by UserPrincipalName, IPAddress
| top 20 by Count
""", timespan=timedelta(days=7))
Create graph visualization
graph = g.nodes(users).edges(connections)
.encode_point_color('node_type')
.plot()
🚀 Benefits
Graphistry's graph visualization
support
familiarity
🔄 Compatibility
📈 Impact
This connector enables security teams to:
✅ Checklist
Breaking Changes: None
Migration Guide: Not applicable (new feature)
Related Issues: Addresses need for Microsoft Sentinel integration
Testing Instructions: