fix: propagate child process exit code#202
Conversation
02622ca to
61d23c4
Compare
Both LandJail.Run() and NSJailManager.Run() always returned nil, discarding the child process exit code. The landjail child also wrapped exit codes in fmt.Errorf() instead of calling os.Exit(). Changes: - Add exitcode.Error type to carry exit codes through the error chain - Fix landjail child to call os.Exit(exitCode), matching nsjail behavior - Fix both managers to capture child errors via a channel and return exitcode.Error from Run() - Fix main.go to extract exitcode.Error before defaulting to os.Exit(1) - Change NSJailManager.RunChildProcess to return error (was void) Fixes #190
61d23c4 to
cb56ca1
Compare
SasSwart
left a comment
There was a problem hiding this comment.
Nothing blocking given this is easy to patch and a minor fix. But a few comments to consider nonetheless.
|
|
||
| // If the child process exited with a non-zero code, exit | ||
| // with the same code directly. All cleanup (proxy, etc.) | ||
| // has already happened inside Run(). Exiting here ensures | ||
| // the correct code is propagated regardless of how the | ||
| // calling framework handles errors (standalone binary or | ||
| // embedded as a coder subcommand). | ||
| var exitErr *exec.ExitError | ||
| if errors.As(err, &exitErr) { | ||
| os.Exit(exitErr.ExitCode()) | ||
| } | ||
| return err |
There was a problem hiding this comment.
This is a convenient solution, but it makes me nervous to have an os.Exit that is more than a single level of indirection from the entrypoint like this. Looking at this bit of code here, we don't how what cleanup would have happened on the return path between here and the entry point.
I think the proper solution is to do the error checking near the entry point both here and in the coder subcommand.
In practice, the risk is low and its easy to patch later, so I don't think this blocks the PR. Its worth a mention though.
There was a problem hiding this comment.
Agreed it's not ideal. The problem is the embedded mode (coder boundary ...), we don't control coder's entrypoint, and serpent wraps our returned error in RunCommandError, so coder's main() just does os.Exit(1) losing the actual code.
To do it "properly" we'd need changes in coder/coder or serpent. This is a conscious tradeoff: all cleanup (proxy, iptables) already ran inside Run() via defers before the error returns, so the os.Exit is safe.
Let me know your thoughts!
| // This is an unexpected error | ||
| logger.Error("Command execution failed", "error", err) | ||
| return fmt.Errorf("command execution failed: %v", err) | ||
| return err |
There was a problem hiding this comment.
nit: I feel like the error wrapping here was useful. It provides a better single description of the failure path than disjoint debug logs that might be filtered out.
| // error is already buffered. In the signal path the child may still | ||
| // be running; return nil so deferred cleanup (iptables, proxy) can | ||
| // proceed before the process exits. | ||
| select { |
There was a problem hiding this comment.
Asking for clarity:
why do we need a second select here instead of a three case select above?
select {
case sig := <-sigChan:
// ...
case err := <-childErr:
// ...
case <-ctx.Done():
// ...
}There was a problem hiding this comment.
When the child finishes, the goroutine sends on childErr AND calls defer cancel(), which closes ctx.Done(). So both channels are ready at roughly the same time. Go picks randomly between ready cases - if we land in ctx.Done() instead of childErr, we lose the exit code error and return nil.
Two selects avoid that: first one waits for signal or context cancellation, second one (non-blocking) drains the child result. In the ctx.Done path the error is already buffered so we always get it. In the signal path the child may still be running, so default: return nil lets deferred cleanup proceed.
Addresses review feedback: re-add error wrapping for ExitError with fmt.Errorf and %w verb so the error message is descriptive while preserving the *exec.ExitError type for errors.As().
7b2e933 to
8b03bb7
Compare
Fixes #190
Both
LandJail.Run()andNSJailManager.Run()always returnednil, discarding the child process exit code. The boundary process exited 0 regardless of what the target command returned.Changes
Run()now blocks on the channel after the select and returns the error (which includes*exec.ExitErrorwith the correct code).*exec.ExitErrorfromcmd.Run()instead of callingos.Exit()or wrapping it infmt.Errorf(). Serpent'sRunCommandErrorhasUnwrap(), soerrors.Ascan find*exec.ExitErrorthrough the entire chain.os.Exit(exitCode)when the child process exits with a non-zero code. All cleanup (proxy stop, etc.) has already happened insideRun()via defers. This ensures the correct exit code is propagated regardless of how the caller handles errors, both as a standalone binary and when embedded as acoder boundarysubcommand (no changes needed in coder/coder).How to test
Build and run:
landjail requires kernel 6.7+ (Landlock V4). Use
--jail-type nsjailwith appropriate privileges on older kernels.Generated by Coder Agents