I started working on kubernetes recently, and I'm using GKE in autopilot mode.
I noticed that when I tried to use the pods (several hundred) (using jobs) to run some algorithms that require several minutes (>15 min), K8S was selecting pods randomly and kill them.
To solve the issue, I was able to change the data chunk size to make the each tasks faster (less than 2 min).
Now, I'm wandering: Are the PODS, and kubernets in general, a good tool for running long-time processing algorithms ?
Hi and thank Mohamed
Pods are mortal by design, ther are not supposed to live forever :-)
For your need, you can use Jobs, as you already used it.
Have you defined a good activeDeadlineSeconds parameter in jobs spec?
"The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded."
So you can increate it for your needs for example, and also run several pods in parallel in order to parallelize your job
Does this mean that I should never deploy a queuing system like rabbitmq as a pod?
In my case, the queuing system is very active in the application. All other pods rely on it. if the queuing system pod fails, I'll need to waste time before I can resume the process.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Hi @aurelievache
Thank you for sharing.
I started working on kubernetes recently, and I'm using GKE in autopilot mode.
I noticed that when I tried to use the pods (several hundred) (using jobs) to run some algorithms that require several minutes (>15 min), K8S was selecting pods randomly and kill them.
To solve the issue, I was able to change the data chunk size to make the each tasks faster (less than 2 min).
Now, I'm wandering: Are the PODS, and kubernets in general, a good tool for running long-time processing algorithms ?
Thank you again.
Kind regards.
Hi and thank Mohamed
Pods are mortal by design, ther are not supposed to live forever :-)
For your need, you can use Jobs, as you already used it.
Have you defined a good activeDeadlineSeconds parameter in jobs spec?
"The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded."
So you can increate it for your needs for example, and also run several pods in parallel in order to parallelize your job
Thank you @aurelievache
Does this mean that I should never deploy a queuing system like rabbitmq as a pod?
In my case, the queuing system is very active in the application. All other pods rely on it. if the queuing system pod fails, I'll need to waste time before I can resume the process.