Skip to content

Set command to spawn with lower priority in GenericScript Run method#438

Merged
beyhan merged 1 commit into
mainfrom
337-nice-child-scripts
Jun 18, 2026
Merged

Set command to spawn with lower priority in GenericScript Run method#438
beyhan merged 1 commit into
mainfrom
337-nice-child-scripts

Conversation

@neddp

@neddp neddp commented Jun 1, 2026

Copy link
Copy Markdown
Member

Lifecycle scripts (drain, pre-start, post-start, post-deploy, etc.) are executed by the agent and inherit its scheduling priority. When a script is CPU-intensive (e.g. cloning large amounts of data from a database cluster), it can starve the agent's own event loop, causing the director to time out with an agent-unreachable error - even though the agent is technically alive and the script is making progress. The deployment then fails and the script has to run from scratch.

The stemcell already runs the agent at nice -15 (see bosh-linux-stemcell-builder@00054bd) to give it priority over BOSH-managed jobs (nice 0). However, lifecycle scripts spawned by the agent inherit that -15 priority, defeating the purpose.

This PR sets SpawnWithLowerPriority = true on the Command struct for all lifecycle scripts, so they run at a lower priority than the agent itself:

  • Linux: child process nice = parent nice + 5 (capped at 19). With the agent at -15, scripts get nice -10 - still above normal jobs (0), but below the agent.
  • Windows: child process priority class is set to BelowNormal.

The priority logic itself lives entirely in bosh-utils (#142), which adds the SpawnWithLowerPriority field to boshsys.Command with platform-specific priority logic inlined directly (no external dependency, as suggested by @rkoster).

References

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1c9c1bb-cf42-4e78-92af-c1b427d0b0bb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 337-nice-child-scripts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@neddp

neddp commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Waiting for a new bosh-utils tag.

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 14, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: neddp / name: Ned Petrov (9f2cb04)

@neddp neddp force-pushed the 337-nice-child-scripts branch from 1e496ef to 9f2cb04 Compare June 14, 2026 08:12
@neddp neddp marked this pull request as ready for review June 14, 2026 08:12
@neddp neddp requested a review from rkoster June 14, 2026 08:14
@github-project-automation github-project-automation Bot moved this from Inbox to Pending Merge | Prioritized in Foundational Infrastructure Working Group Jun 14, 2026
@beyhan beyhan merged commit 2aebbdc into main Jun 18, 2026
94 of 96 checks passed
@github-project-automation github-project-automation Bot moved this from Pending Merge | Prioritized to Done in Foundational Infrastructure Working Group Jun 18, 2026
@beyhan

beyhan commented Jun 18, 2026

Copy link
Copy Markdown
Member

merging as bosh-utils update in bosh-agent happend.

@rkoster rkoster left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull Bot pushed a commit to kingdavid6336/docs-bosh that referenced this pull request Jun 25, 2026
Lifecycle scripts (pre-start, post-start, post-deploy, pre-stop, drain,
post-stop) are now spawned with lower CPU scheduling priority than the
BOSH agent itself. This prevents CPU-intensive scripts from starving the
agent's event loop and causing spurious agent-unreachable errors during
deployment.

References:
- cloudfoundry/bosh-agent#438
- cloudfoundry/bosh-utils#142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

Lifecycle hooks can make the agent unresponsive

3 participants