Kubernetes jobs for batch workload
By Google Cloud Tech
Key Concepts
Job API, Pods, Completions, Parallelism, Indexed Completion Mode, Headless Service, Batch Workload, Kubernetes, YAML Manifest, Restart Policy.
Job API: The Fundamental Building Block
- The Job API in Kubernetes is the core component for running batch workloads.
- A job creates one or more pods (containers) and retries their execution until a specified number of them successfully terminate.
Simple Job Example
- YAML Definition: The example uses a YAML file to define a simple job.
apiVersion: batch/v1
indicates the API version for batch processing.kind: Job
specifies the resource type as a Job.metadata: name
defines the name of the job.spec.template.spec.containers
defines the container(s) to run as part of the job.- The example uses the
perl:5.34.0
image and executes a Perl command to calculate pi to the 2000th decimal place. restartPolicy: Never
ensures that the pod is not restarted if it fails.
- Deployment: The job is deployed using
kubectl apply -f simple-job.yaml
. - Verification:
kubectl get jobs
shows the status of the job.kubectl get pods
lists the pods associated with the job.kubectl logs [pod-name]
displays the logs from a specific pod.
- Outcome: The job runs a single pod, calculates pi, and completes successfully.
Non-Parallel Jobs and Completions
- A non-parallel job runs only one pod. The job is considered complete when that single pod finishes.
- If a non-parallel job fails, Kubernetes restarts it unless
restartPolicy
is set toNever
. - Completions: To run a job multiple times, the
completions
parameter is set to a value greater than zero. This specifies the number of pods that must complete successfully for the job to be considered complete. - Example: A job with
completions: 4
will run four pods. - Parallelism (Default): By default, if
parallelism
is not specified, the job runs one pod at a time until the specified number of completions is reached.
Parallel Jobs
- Parallelism: The
parallelism
parameter controls the number of pods that can run concurrently. - Example: Setting
parallelism: 2
allows two pods to run simultaneously. - Impact: Increasing parallelism can significantly reduce the overall job execution time by utilizing resources more efficiently.
- YAML Configuration: The YAML file is modified to include the
parallelism
parameter within thespec
section.
Indexed Completion Mode
- Use Case: Suitable for workloads that require inter-pod communication and coordination.
- Mechanism: Each pod within the job is assigned a unique, static index (starting from 0).
- Configuration:
completions
andparallelism
are set to the same value.completionMode: Indexed
is specified in the job'sspec
.
- Headless Service: A headless service (with
clusterIP: None
) is used to provide a domain name for inter-pod communication within the cluster. This allows pods to discover and communicate with each other using their index as part of their hostname. - Example:
- A headless service named
headless-service
is created. - A job named
index-job
is defined withcompletions: 3
,parallelism: 3
, andcompletionMode: Indexed
. - Each pod attempts to ping the other pods within the same job using their index-based hostnames.
- A headless service named
- Inter-Pod Communication: Pods use the headless service's domain name and their index to construct the hostnames of other pods (e.g.,
pod-0.headless-service
,pod-1.headless-service
, etc.). - Application: This pattern is useful for implementing MPI (Message Passing Interface) jobs or other distributed computing tasks.
Synthesis/Conclusion
The video provides a practical overview of using the Job API in Kubernetes to run batch workloads. It covers different job configurations, including simple jobs, jobs with completions, parallel jobs, and indexed jobs. The examples demonstrate how to define jobs using YAML manifests, deploy them using kubectl
, and monitor their execution. The video highlights the importance of understanding the completions
, parallelism
, and completionMode
parameters to optimize job execution based on the specific requirements of the workload. The indexed completion mode, in conjunction with a headless service, enables complex inter-pod communication scenarios, making Kubernetes a versatile platform for a wide range of batch processing applications.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Kubernetes jobs for batch workload". What would you like to know?