Aok refactor od #66

sunildkumar · 2025-04-30T15:12:45Z

No description provided.

…d of open vocabulary (reducing degrees of freedom of the tool) Explicitly tell the model that it can call a tool but it does not have to. Explicitly tell the model it needs to consider all 4 options in the user prompt. Failures often look like torpedos, so maybe this helps prevent that? Doing this in the bootstrap prompt didn’t help, but I think the IFT model “listens” to the user more strongly. Reward schedule for tool use reward. The model gets 200 gradient updates with a tool use reward. The reward decays linearly between steps 0 and 200. Then it stays at 0. Point tool at b0 save more frequently

… process via an api

…ed on the flag

… sampled group

Implement the new combined correctness-and-tool-use reward

sunildkumar and others added 25 commits April 29, 2025 22:33

revised object detection tool to use triton + yolo

32054fb

ready to train

af03c8c

start server was only local

76542f5

working now

2d6b792

fix threading issues and bgr to rgb

41dceb6

up the weight to 1.0.

2bedbac

setup to restart the run

12fec83

the code technically works now, but it isn't pretty

0f8791f

this works but it is stupid slow, trying to move call out of training…

950950c

… process via an api

its working! and its fast

4d93020

remvoe fork thing

f28e31c

shuffle order of tools in system prompt

c471d8b

log metrics callback

a74cd68

reset the schedule

c2c702f

ready to start training again

77df0b6

more robust way of catching

56f804b

generalized the eval script

be43b07

better eval script

bd99398

implement the new combined correctness-and-tool-use reward

9eab93e

always return all 4 rewards but setting the schedules differently bas…

1cbdd10

…ed on the flag

add num_generations check to make sure the new reward sees the entire…

0ed8429

… sampled group

Merge pull request #67 from groundlight/aok_refactor_combined_reward

35cd0a4

Implement the new combined correctness-and-tool-use reward

run name

39f6e9e

try adding a short term incentive to use tools with new fancy reward

d9ddbcd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aok refactor od #66

Aok refactor od #66

Uh oh!

sunildkumar commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aok refactor od #66

Are you sure you want to change the base?

Aok refactor od #66

Uh oh!

Conversation

sunildkumar commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants