Skip to content

Commit 554415b

Browse files
committed
docs: update readme
1 parent 214ee83 commit 554415b

1 file changed

Lines changed: 77 additions & 10 deletions

File tree

README.md

Lines changed: 77 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,62 @@
55

66
# Batch Job Controller
77

8-
The batch job controller allows executing pods on nodes of a cluster, where the number of concurrent running pods can be
9-
configured. Each pod can report it's results back to the controller to have them exposed as metrics.
8+
The **Batch Job Controller** is a Kubernetes-native tool designed to execute jobs across multiple nodes in a cluster.
9+
It provides a flexible way to run diagnostic, maintenance, or data-collection tasks on specific nodes and aggregate the
10+
results.
11+
12+
Each job is executed as a Pod on a target node. These Pods can report their results back to the controller via a
13+
callback API.
14+
The controller then exposes these results as Prometheus metrics and stores any uploaded files or logs for later
15+
analysis.
16+
17+
## Features
18+
19+
- **Node-based Execution**: Automatically schedules a Pod on every node matching a specific selector.
20+
- **Cron Scheduling**: Supports standard cron expressions for recurring job executions.
21+
- **Concurrency Control**: A configurable worker pool limits the number of concurrent job Pods to prevent cluster
22+
overload.
23+
- **Callback API**:
24+
- **Metrics**: Pods can send JSON-formatted results that are dynamically converted into Prometheus metrics.
25+
- **File Upload**: Pods can upload arbitrary files (e.g., reports, logs, traces) to the controller.
26+
- **Kubernetes Events**: Pods can request the controller to create Kubernetes Events on their behalf.
27+
- **Result Persistence**:
28+
- Stores job results and uploaded files in a structured directory format.
29+
- Keeps a configurable history of past executions.
30+
- Optionally collects and stores logs from job Pods.
31+
- **Static File Server**: Built-in HTTP server to browse and download execution reports and uploaded files.
32+
- **Leader Election**: Supports high-availability deployments with multiple controller replicas.
33+
34+
## Installation
35+
36+
### Helm
37+
38+
A sample controller can be installed via Helm.
39+
As use-cases and configurations differ based on how to set up the controller, this chart should only be used as
40+
a reference and not used as is in production.
41+
42+
#### OCI Registry
43+
44+
```console
45+
helm install my-batch-job-controller oci://ghcr.io/bakito/helm-charts/batch-job-controller --version 1.4.9
46+
```
47+
48+
#### Helm Repository
49+
50+
```console
51+
helm repo add bakito https://charts.bakito.net
52+
helm install my-batch-job-controller bakito/batch-job-controller --version 1.4.9
53+
```
1054

1155
## Deployment
1256

1357
The controller expects the following environment variables
1458

15-
| Name | Value |
16-
| --- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
17-
| NAMESPACE | The current namespace |
59+
| Name | Value |
60+
|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
61+
| NAMESPACE | The current namespace |
1862
| CONFIG_MAP_NAME | The name of the configmap to read the config from |
19-
| POD_IP | The IP of the controller Pod. If defined, this IP is used for the callback URL of the job pods.(should be injected via [Downward API](https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#the-downward-api)) |
63+
| POD_IP | The IP of the controller Pod. If defined, this IP is used for the callback URL of the job pods.(should be injected via [Downward API](https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#the-downward-api)) |
2064

2165
## Configuration
2266

@@ -29,9 +73,9 @@ Controller configuration
2973
```yaml
3074
name: "" # name of the controller; will also be used as prefix for the job pods
3175
jobServiceAccount: "" # service account to be used for the job pods. If empty the default will be used
32-
jobImagePullSecrets: # pull secrets to be used for the job pods for pulling the image
33-
- name: secret_name
34-
jobNodeSelector: { } # node selector labels to define in which nodes to run the jobs
76+
jobImagePullSecrets: # pull secrets to be used for the job pods for pulling the image
77+
- name: secret_name
78+
jobNodeSelector: {} # node selector labels to define in which nodes to run the jobs
3579
runOnUnscheduledNodes: true # if true, jobs are also started on nodes that are unschedulable
3680
cronExpression: "42 3 * * *" # the cron expression to trigger the job execution
3781
reportDirectory: "/var/www" # directory to store and serve the reports
@@ -41,7 +85,7 @@ runOnStartup: true # if 'true' the jobs are triggered on startup o
4185
startupDelay: 10s # the delay as duration that is used to start the jobs if runOnStartup is enabled. default is '10s'
4286
callbackServiceName: "" # name of the controller service
4387
callbackServicePort: 8090 # port of the controller callback api service
44-
custom: { } # additional properties that can be used in a custom implementation
88+
custom: {} # additional properties that can be used in a custom implementation
4589
latestMetricsLabel: false # if 'true' each result metric is also created with executionID='latest'
4690
leaderElectionResourceLock: "" # type of leader election resource lock to be used. ('configmapsleases' (default), 'configmaps', 'endpoints', 'leases', 'endpointsleases')
4791
savePodLog: false # if enabled, pod logs are saved along other with other job files
@@ -161,3 +205,26 @@ The event URL is by default: **${CALLBACK_SERVICE_EVENT_URL}**
161205
### Examples
162206

163207
[test-queries.http](./testdata/test-queries.http)
208+
209+
## Development & Testing
210+
211+
### End-to-End Tests
212+
213+
The end-to-end tests are automated through GitHub Actions but can also be run locally.
214+
215+
#### Local E2E workflow:
216+
217+
1. **Build image**:
218+
```bash
219+
docker build -f Dockerfile --build-arg VERSION=e2e-tests -t batch-job-controller:e2e .
220+
```
221+
222+
2. **Setup kind cluster**:
223+
Use `kind-with-registry` or a similar tool to load the image into a local cluster.
224+
225+
3. **Run E2E scripts**:
226+
The scripts in `testdata/e2e/` are used to execute the tests:
227+
- `./testdata/e2e/installChart.sh`: Installs the Helm chart for E2E testing.
228+
- `./testdata/e2e/findExecutedJobPod.sh`: Waits for and identifies the executed job pod.
229+
- `./testdata/e2e/checkAndPrintControllerLogsAndEvents.sh`: Validates results by checking logs and events.
230+

0 commit comments

Comments
 (0)