Skip to content

Conversation

SinsBre
Copy link
Contributor

@SinsBre SinsBre commented Sep 23, 2025

🎯 Overview

This PR introduces a new plugin for Microsoft Sentinel (Azure Log
Analytics) that enables users to query security data using KQL (Kusto
Query Language) and create graph visualizations directly from their
Sentinel workspaces.

✨ What's New

Core Features

  • New SentinelMixin plugin - Query Microsoft Sentinel Log Analytics
    workspaces using KQL
  • Multiple authentication methods:
    • Device code authentication (interactive with code/URL)
    • Service Principal (client_id, client_secret, tenant_id)
    • DefaultAzureCredential (Azure CLI, Managed Identity, Environment)
  • KQL query execution with automatic DataFrame conversion
  • Nested data handling - Automatic unwrapping of complex JSON columns
    common in Sentinel
  • Helper methods for workspace exploration:
    • sentinel_tables() - List available tables
    • sentinel_schema() - Get table schemas
    • kql_last() - Convenience method for recent data queries

Integration

  • Seamlessly integrated into the Graphistry ecosystem alongside existing
    Kusto and Spanner plugins
  • Full module-level access via graphistry.configure_sentinel()
  • Follows established plugin patterns for consistency

📝 Changes

New Files

  • graphistry/plugins/sentinel.py - Main plugin implementation (~650 lines)
  • graphistry/plugins_types/sentinel_types.py - Type definitions and data
    structures
  • graphistry/tests/test_sentinel.py - Comprehensive unit tests (379 lines)
  • demos/demos_databases_apis/microsoft/sentinel/sentinel_security_analysis
    .ipynb - Example notebook
  • demos/demos_databases_apis/microsoft/sentinel/example.env - Environment
    template

Modified Files

  • graphistry/plotter.py - Added SentinelMixin to Plotter class
  • graphistry/pygraphistry.py - Added module-level exports
  • graphistry/init.py - Added sentinel methods to public API
  • setup.py - Added dependencies: azure-monitor-query>=1.2.0,
    azure-identity>=1.12.0

🔒 Security Analysis Use Cases

The notebook demonstrates real-world security analysis scenarios:

  • Failed login detection and analysis
  • Security alert correlation
  • Network traffic analysis
  • User-IP relationship mapping
  • Alert-entity correlation graphs

🧪 Testing

  • Comprehensive unit tests with mocked Azure responses
  • Test coverage for all authentication methods
  • Nested data unwrapping tests
  • Error handling validation

📚 Documentation

  • Full docstrings with examples for all public methods
  • Security-focused example notebook with common KQL patterns
  • Environment setup guide with .env template
  • Troubleshooting tips and best practices

🔑 Authentication

Supports multiple authentication patterns for different use cases:

Device code (interactive)

g = graphistry.configure_sentinel(workspace_id="...",
use_device_auth=True)

Service Principal (production)

g = graphistry.configure_sentinel(
workspace_id="...",
tenant_id="...",
client_id="...",
client_secret="..."
)

DefaultAzureCredential (flexible)

g = graphistry.configure_sentinel(workspace_id="...")

📊 Example Usage

import graphistry
from datetime import timedelta

Configure connection

g = graphistry.configure_sentinel(
workspace_id="your-workspace-id",
use_device_auth=True
)

Query failed logins

failed_logins = g.kql("""
SigninLogs
| where ResultType != "0"
| summarize Count=count() by UserPrincipalName, IPAddress
| top 20 by Count
""", timespan=timedelta(days=7))

Create graph visualization

graph = g.nodes(users).edges(connections)
.encode_point_color('node_type')
.plot()

🚀 Benefits

  • Unified security analytics - Combine Sentinel's powerful KQL with
    Graphistry's graph visualization
  • No data movement - Query data directly from Sentinel workspaces
  • Enterprise ready - Multiple auth methods including managed identity
    support
  • Developer friendly - Follows existing Kusto plugin patterns for
    familiarity

🔄 Compatibility

  • Python 3.8+
  • Compatible with latest Azure Monitor Query SDK
  • Works with all Log Analytics workspace versions
  • Handles both legacy and modern column formats

📈 Impact

This connector enables security teams to:

  • Visualize complex attack patterns as graphs
  • Identify lateral movement and privilege escalation
  • Correlate alerts across multiple data sources
  • Build visual threat hunting workflows

✅ Checklist

  • Code follows project style guidelines
  • Unit tests pass
  • Documentation updated
  • Example notebook provided
  • Dependencies added to setup.py
  • Type hints included
  • Error handling implemented
  • Module exports configured

Breaking Changes: None

Migration Guide: Not applicable (new feature)

Related Issues: Addresses need for Microsoft Sentinel integration

Testing Instructions:

  1. Install with pip install -e .[sentinel]
  2. Set up Azure authentication (device code or service principal)
  3. Run the example notebook with your workspace ID
  4. Verify KQL queries return DataFrames
  5. Test graph visualizations work correctly

- Add SentinelConfig dataclass for connection configuration
- Add SentinelQueryResult class for query results
- Add custom exceptions for connection and query errors
- Support for Azure authentication credentials
- Add SentinelMixin class extending Plottable
- Implement configure_sentinel() with multiple auth methods
- Support Azure CLI, Service Principal, and custom credentials
- Add sentinel_from_client() for existing client reuse
- Implement health check and basic query infrastructure
- Add client initialization with DefaultAzureCredential support
- Implement kql() method with timespan and nested data support
- Add kql_last() convenience method for recent data queries
- Add sentinel_tables() and sentinel_schema() helper methods
- Port nested data unwrapping from Kusto plugin
- Support for multiple table responses
- Handle JSON strings and dynamic columns in Sentinel results
- Add SentinelMixin import to plotter.py
- Include SentinelMixin in Plotter class hierarchy
- Update documentation to list Sentinel integration
- Add azure-monitor-query>=1.2.0 and azure-identity>=1.12.0 to sentinel extras
- Add 'Sentinel' to package keywords for discoverability
- Test configuration methods (basic, service principal, custom credential)
- Test KQL query execution (single/multiple tables, timespan)
- Test helper methods (kql_last, sentinel_tables, sentinel_schema)
- Test health check functionality
- Test nested data unwrapping and JSON parsing
- Test authentication initialization flows
- Add mock-based testing to avoid API dependencies
- Demonstrate Azure CLI and Service Principal authentication
- Show security use cases: failed logins, alerts, network analysis
- Include graph visualizations for user-IP and alert correlations
- Provide examples of multi-table KQL queries
- Cover workspace exploration and schema inspection
- Add troubleshooting guidance and next steps
- Replace hardcoded credentials with environment variables
- Add example.env template file
- Support custom .env file locations
- Include python-dotenv dependency instructions
- Improve security by avoiding credential commits
- Replace username/password with personal_key_id/personal_key_secret
- Update example.env with new credential format
- Use modern Graphistry authentication method
- Maintain security with environment variables
- Add configure_sentinel and sentinel_from_client to __init__.py
- Add corresponding methods in pygraphistry.py
- Fix module-level access to Sentinel functionality
- Add configure_sentinel and sentinel_from_client assignments
- Follow same pattern as other plugins (Kusto, Spanner)
- Enable direct import from graphistry module
- Add use_device_auth parameter to configure_sentinel()
- Support DeviceCodeCredential for interactive authentication
- Show code and URL for authentication like Kusto plugin
- Update type definitions and method signatures
- Provide authentication precedence documentation
- Support both object columns (with .name/.type) and string columns
- Default to 'string' type when type info not available
- Handle missing table.name attribute gracefully
- Fix AttributeError in query response processing
- Replace union withsource=TableName query that caused conflicts
- Use Usage table to get DataType (table names) instead
- Extend timespan to 30 days for better table coverage
- Avoid SEM0001 semantic error from existing TableName columns
- Fix references from TableName to DataType column
- Clean up notebook code for consistent formatting
- Add device authentication example
- Ensure table listing and schema queries work correctly
- Replace bind() with encode_point_color() and encode_edge_color()
- Use encode_edge_size() for edge weight visualization
- Fix method signatures to match Graphistry API
- Add better error handling and user guidance
- Include troubleshooting tips and alternative auth methods
- Improve error messages and empty result handling
- Add comprehensive summary and next steps
- Make notebook more robust for different workspace configurations
- Import SentinelMixin to fix F821 undefined name errors
- Resolves flake8 lint issues in test_sentinel.py
- Fix W504: Move binary operator before line break
- Fix W292: Add newlines at end of all files
- Fix F841: Remove unused local variables in tests
- All flake8 lint issues now resolved
"\n",
"## Getting Started\n",
"\n",
"### Option 1: Azure CLI Authentication (Recommended for Development)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an azure cli auth link to here + option 2?

"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# Alternative: Service Principal authentication from .env file\n# Uncomment the lines below if you prefer Service Principal over device authentication\n# g = graphistry.configure_sentinel(\n# workspace_id=os.getenv('SENTINEL_WORKSPACE_ID'),\n# tenant_id=os.getenv('AZURE_TENANT_ID'),\n# client_id=os.getenv('AZURE_CLIENT_ID'),\n# client_secret=os.getenv('AZURE_CLIENT_SECRET')\n# )\n\n# Alternative: Use DefaultAzureCredential (tries Azure CLI, Managed Identity, etc.)\n# g = graphistry.configure_sentinel(\n# workspace_id=os.getenv('SENTINEL_WORKSPACE_ID')\n# )"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something i like to do is

if False:
  ....
else:
  print('Skipped, switch if statement to enable')

That way they don't have to edit all the #'s

g.sentinel_health_check() # Verify connection works
"""
try:
self._sentinel_query("Heartbeat | take 1", timespan=timedelta(hours=1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chatgpt thinks this is unsafe as it assumes azure log analytics, while union withsource=TableName * | take 1 can work on sentinel without

"\n",
"1. **Azure Access**: You need access to a Microsoft Sentinel workspace\n",
"2. **Authentication**: Either Azure CLI (`az login`) or service principal credentials\n",
"3. **Dependencies**: Install required packages\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 4.a..4.z with assumptions about Sentinel setup, and if they don't satisfy any individual, rec workarounds?

"## Prerequisites\n",
"\n",
"1. **Azure Access**: You need access to a Microsoft Sentinel workspace\n",
"2. **Authentication**: Either Azure CLI (`az login`) or service principal credentials\n",
Copy link
Contributor

@lmeyerov lmeyerov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • add docs link?

  • and if az login mode, do we need to qualify az login running in same server/container as the notebook, or ?

"source": [
"# Microsoft Sentinel Security Analysis with Graphistry\n",
"\n",
"This notebook demonstrates how to use Graphistry with Microsoft Sentinel (Log Analytics) to perform security analysis and visualization using KQL queries.\n",
Copy link
Contributor

@lmeyerov lmeyerov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qualify which visualizations we'll do, like ... graph visualization using KQL queries for logins, alerts, traffic, and other built-in sentinel tables

"source": [
"## Security Analysis Examples\n",
"\n",
"### 1. Failed Login Analysis"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a separate analysis section without graphistry viz, I would instead fold Graph Visualization into Security Analysis Examples so just 1 section of use cases, where each is like

## Security Analysis Examples

### Failed logins: IP <> Principal <> Device graph

#### 1. Analysis

...

#### 2. Visualization

### Next use case ...

All of the use cases sound relevant

"cell_type": "markdown",
"metadata": {},
"source": [
"## Security Analysis Examples\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there are a lot of use cases, add a bulleted list enumerating which will be here?

{
"cell_type": "markdown",
"metadata": {},
"source": "## Summary\n\nThis notebook demonstrated:\n\n1. **Connecting to Microsoft Sentinel** using Azure authentication (device code, service principal, or DefaultAzureCredential)\n2. **Exploring available data** with `sentinel_tables()` and `sentinel_schema()`\n3. **Security analysis** using KQL queries for:\n - Failed login analysis\n - Security alerts monitoring\n - Network traffic analysis\n4. **Graph visualization** of:\n - User-IP relationships\n - Alert correlations\n5. **Advanced correlation** across multiple data sources\n\n## Next Steps\n\n- **Customize queries** for your specific security use cases and available data tables\n- **Create automated dashboards** by scheduling notebook execution\n- **Integrate with threat intelligence** feeds using additional KQL joins\n- **Build detection rules** based on graph patterns you discover\n- **Scale analysis** by adjusting time windows and data volumes\n\n## Troubleshooting Tips\n\n- **No data found**: Some workspaces may not have SecurityEvent, SigninLogs, or SecurityAlert tables\n- **Authentication issues**: Try `az login` first, or check your service principal credentials\n- **Permission errors**: Ensure your account has Log Analytics Reader permissions\n- **Empty results**: Adjust time ranges - some workspaces have limited data retention\n\n## Resources\n\n- [Microsoft Sentinel KQL Reference](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/)\n- [Graphistry Documentation](https://pygraphistry.readthedocs.io/)\n- [Azure Monitor Query Documentation](https://docs.microsoft.com/en-us/python/api/azure-monitor-query/)\n- [Sentinel Data Connectors](https://docs.microsoft.com/en-us/azure/sentinel/connect-data-sources)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next steps should include links, especially for most recommended

  • Try for free with your graphistry hub account and jupyter / google colab notebooks
  • Check out louie.ai for helping generating these queries and visualizations through genai
  • Check out the Kusto tutorial
  • Learn more about complementary pygraphistry features like UMAP for detecting patterns & outliers on events, GFQL for mining graphs once you've made them, and how to control graph colors and layouts

etc

"try:\n",
" failed_logins = g.kql(failed_logins_query, timespan=timedelta(days=7))\n",
" print(f\"Found {len(failed_logins)} users with multiple failed logins\")\n",
" print(failed_logins.head())\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all these df prints, you either want df.head() as the last cell line, or something like display(df.head()), so we get native notebook rendering

" print(f\"Created graph with {len(nodes)} nodes and {len(edges)} edges\")\n",
" \n",
" # Plot the graph\n",
" graph.plot()\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be the last line right? or a display() ?

"cell_type": "markdown",
"metadata": {},
"source": [
"## Graph Visualization\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure we have live plots in the output, so readthedocs will render

(unfortunately github won't)

unwrap_nested: Optional[bool] = None,
single_table: bool = True
single_table: bool = True,
include_statistics: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume you tested kql didn't regress?


dfs: List[pd.DataFrame] = []

for result in results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can most of the wrangling in this file go away and be reused from kusto, so we're not maintaining 2 clones?

i'm not sure how close/different they are and how

ex: if just the initial connection changes, then we can parameterize kusto by the client connection


return dfs

def kql_last(
Copy link
Contributor

@lmeyerov lmeyerov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend removing kql_last() as conceptual bloat of surface area for users: seems unnecessary as can be folded into kql() as part of timespan param, like if timespan=... is an int, autoconvert it to hours

# Use Usage table to get all table names - this avoids union conflicts
query = """
Usage
| where TimeGenerated > ago(30d)
Copy link
Contributor

@lmeyerov lmeyerov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeGenerated > ago(30d) is surprising to me

If desired as a convenience, add an optional param with a less surprising default? Eg, if not specified, drop that constraint?

"""
return self.kql(query, timespan=timedelta(days=30))

def sentinel_schema(self, table: str) -> pd.DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helpful to document expected returned dtypes

print(schema[['ColumnName', 'DataType']])
"""
query = f"{table} | getschema"
return self.kql(query, timespan=timedelta(minutes=5))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why timespan here?


# Check for partial failures
if response.status == LogsQueryStatus.PARTIAL:
logger.warning(f"Query returned partial results: {response.partial_error}")
Copy link
Contributor

@lmeyerov lmeyerov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should warnings raise too? or we make it an optional default-off param to raise on warnings on kql+here ?

i'm unsure how freq & fatal warnings etc are here

if a logger.warning, louie wouldn't see it...

if response.status == LogsQueryStatus.PARTIAL:
logger.warning(f"Query returned partial results: {response.partial_error}")
elif response.status == LogsQueryStatus.FAILURE:
raise SentinelQueryError(f"Query failed: {response.partial_error}")
Copy link
Contributor

@lmeyerov lmeyerov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can propagate the partial result / response too so louie can inspect deeper.. esp in warning case..

from azure.monitor.query import LogsQueryClient

try:
assert cfg.workspace_id is not None, "workspace_id is not set"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this get set? Give a more actionable error as may bubble up to user


if cfg.credential:
credential = cfg.credential
logger.info("Using custom credential object for Sentinel")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug not info

client_id=cfg.client_id,
client_secret=cfg.client_secret
)
logger.info(f"Using Service Principal authentication for workspace {cfg.workspace_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug not info

logger.info("You will be prompted to visit a URL and enter a code to authenticate")
else:
credential = DefaultAzureCredential()
logger.info(f"Using DefaultAzureCredential (Azure CLI, Managed Identity, etc.) for workspace {cfg.workspace_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug not info

tenant_id=cfg.tenant_id # Optional, uses common tenant if not provided
)
logger.info(f"Using Device Code authentication for workspace {cfg.workspace_id}")
logger.info("You will be prompted to visit a URL and enter a code to authenticate")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this goes to the user for action afaict, so info seems right, and/or maybe print, unsure

return isinstance(val, (dict, list))


def _unwrap_nested(result: SentinelQueryResult) -> pd.DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah all this kind of stuff we want to combine as much as we can w/ kusto impl...

# to pass along args/kwargs to the next mixin in the chain
class Plotter(
KustoMixin, SpannerMixin,
SentinelMixin, KustoMixin, SpannerMixin,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this override kusto as both have kql()?

'nodexl': ['openpyxl==3.1.0', 'xlrd'],
'jupyter': ['ipython'],
'spanner': ['google-cloud-spanner'],
'sentinel': ['azure-monitor-query>=1.2.0', 'azure-identity>=1.12.0'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure about these restrictions or can be unpinned?

we prob want to preinstall pinned in graphistry and louie, so guidance here helps too

Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments for generic things I noticed (use judgement if critical)

Most prominent I would recommend looking at:

  • does this break kusto, eg, overrides kusto() call, and thus a shared impl would make more sense
  • ... there's a bunch of df wrangling, can that be shared with the kusto() impl?
  • Next Steps section in notebook seems untargeted, should be clickable links to more pyg/louie stuff (+ hub if a non-user)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants