Skip to content

Conversation

matheusmaldaner
Copy link
Contributor

@matheusmaldaner matheusmaldaner commented Sep 28, 2025

Added SentinelBench implementation for testing long-running monitoring capabilities of modern AI agents. This PR includes:

  • SentinelBench/: entire implementation of the SentinelBench benchmark
  • ...benchmarks/sentinelbench/...: specifications to run MagenticUI on SentinelBench,
  • ...benchmarks/sentinelbench/tools/...: python files to analyze and plot the results after running the tasks
  • many others QOL changes

matheusmaldaner and others added 30 commits August 3, 2025 01:20
…crosoft#317)

Co-authored-by: root <root@LAPTOP-AL1099TT>
Co-authored-by: Matheus Kunzler Maldaner <[email protected]>
Co-authored-by: Matheus Kunzler Maldaner <[email protected]>
Co-authored-by: Hussein Mozannar <[email protected]>
@matheusmaldaner matheusmaldaner marked this pull request as ready for review September 28, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants