You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
constsystemPrompt=`You are a technical summarizer that creates concise, informative summaries of autonomous agent activities across multiple evaluation episodes.
16
+
17
+
Your task:
18
+
- Analyze the actions taken by an AI agent across 3 separate episodes of the same task
19
+
- Identify common patterns, tools used, files modified, and key behaviors
20
+
- Produce a clear, structured summary that highlights what the agent did
21
+
22
+
Focus on:
23
+
- **Tool usage patterns**: Which tools were used most frequently
24
+
- **File modifications**: Which files were created, edited, or read
25
+
- **Common strategies**: What approach did the agent consistently take
26
+
- **Consistency**: Did the agent behave similarly across episodes, or vary significantly?
27
+
- **Outcomes**: Any errors, successes, or notable behaviors
28
+
29
+
Output format:
30
+
Write 2-4 paragraphs in a professional, technical style. Be concise but informative.
31
+
32
+
Structure:
33
+
1. **Overview**: Brief description of what the agent accomplished
34
+
2. **Approach**: Tools and strategies used consistently across episodes
35
+
3. **Key actions**: Specific files modified or critical operations performed
36
+
4. **Observations**: Any notable patterns, inconsistencies, or issues
37
+
38
+
Guidelines:
39
+
- Keep it under 300 words
40
+
- Use technical language but be clear
41
+
- Focus on patterns across episodes, not individual actions
42
+
- Mention specific tool names and file paths when relevant
43
+
- Note any errors or issues encountered
44
+
- Be objective and descriptive, not evaluative`;
45
+
46
+
constsummarizerModelId=fallback(
47
+
"SUMMARIZER_MODEL",
48
+
"opencode/claude-sonnet-4-5",
49
+
);
50
+
51
+
exportasyncfunctiongenerateActionsSummary(
52
+
evaluation: DatasetEval,
53
+
model: string,
54
+
episodesActions: EpisodeActions[],
55
+
): Promise<string>{
56
+
if(episodesActions.length===0){
57
+
return"No actions recorded";
58
+
}
59
+
60
+
// Build a structured prompt with the actions data
61
+
constepisodesSummary=episodesActions
62
+
.map((ep)=>{
63
+
constsample=ep.actions.slice(0,50);// First 50 actions per episode
64
+
consttruncated=
65
+
ep.actions.length>50
66
+
? `\n... (${ep.actions.length-50} more actions)`
67
+
: "";
68
+
69
+
return`### Episode ${ep.episodeIndex}
70
+
Actions (${ep.actions.length} total):
71
+
${sample.join("\n")}${truncated}`;
72
+
})
73
+
.join("\n\n");
74
+
75
+
constprompt=`Repository: ${evaluation.repo}
76
+
Model: ${model}
77
+
Task: Implement changes from ${evaluation.from.slice(0,7)} to ${evaluation.to.slice(0,7)}
78
+
79
+
${episodesSummary}
80
+
81
+
Provide a concise summary of what the agent did across these episodes.`;
0 commit comments