Codeitout

Posted on Nov 12, 2022

Spring Batch Part 2 for Beginners- The Domain Language of Spring Batch

#beginners #springboot #springbatch #computerscience

Let's start with this diagram which highlights the key concepts which make up the domain language of spring batch.
Let's start understanding each of these concepts in detail.

Job

It is an entity which encapsulates entire batch process. It is wired together with the help of some configuration file (XML/Java based) which is called as job configuration.
Job is on top of the hierarchy.It is a container for Step instances.

This diagram will be explained in subsequent steps .
Job contains multiple steps in it and orders up the steps in the order of their execution. It applies the configuration globally on all steps like the restorability

The Job Configuration contains

Name of Job
Defination and ordering of step instances.
restartability of Job

Here if we see "footballJob" is the Job name , playerLoad(), gameLoad(), playerSummarization() are all steps .
.start() tells the job to start the playerLoad() step and once its completed .next() tells the Job to run the gameLoad() step and once it's completed playerSummariztion() step will be executed.

Job Instance

It refers to the concept of logical Job run.Let's understand in simple words.
Suppose you have a EOD job to fill the data from 1 table to other table [based on today's date]. There is only one EOD Job. But each day you have to run the Job with it's own parameters. In the case of this job, there is one logical Job Instance per day. There is Jan 1 run,Jan 2 run and so on. If the Jan 1 run fails, we trigger it again- Its still the Jan 1 run.
Here we saw that a Job instance can have multiple executions.[Job executions]. But only a single Job instance wrt the Job with it's set of parameters can be run at a given time. Starting a new Job instance means start from the beginning, whereas using an existing instance means starting from where we left off.

Job Parameters
How do we distinguish one Job instance from the other?
It holds a set of parameters which is used to start a Job. Something similar to passing arguments to a function.In the earlier case where we had a Job instance for Jan 1 and one more for Jan 2, It is a single job but it is differentiated based on the Job parameters.
Jan 1 would have started based the date parameter as 1 Jan and Jan 2 run would have the date parameter as 2 Jan.

Job instance = Job + Identifying parameters

Job Execution
It refers to the concept of single attempt to run a job.Job instance can be executed multiple times. This is called as execution. The execution may result in failure or completion but the Job instance corresponding to an execution will be considered as complete only when that execution is successful. Wrt to the EOD job, it might be possible that the 1st execution of Jan 1 Job instance failed. If it was run again with same job parameters , a new Job execution is created.
There are few Job execution properties like status, startTime, endTime, exitStatus, executionContext etc. These properties are persisted and can be use to check the status of the Execution.

Just assume that the developer took that entire day to figure out the issue. The next window for the execution of Job opened up again at 9 pm and this time she has run the Job instance of 1 Jan and it's successful at 9:30pm. And since it is the second day we must run the Job for Jan 2 as well. So we started the Job instance at 9:31 pm and it ends at 10:30pm
There is no need to start the job instance one after the other if they don't access the same data otherwise they might end up in locking the db
Since 2 job instances are run, we should have one more entry in the INSTANCE table and 2 extra entries in Job execution table.

Step

A step is a domain object which encapsulates an independent, sequential phase of batch job. Each job may have multiple steps to facilitate the batch processing. It is totally dependent on the developer on how complex each step might be.
It can be as simple as fetching from db or complex as performing heavy operations on db using business logic, etc. Similar to Job, each Step has an StepExecution.

Step Execution
It represents single attempt to execute a step. A new Step Execution is created each time a step is run. If the step fails to execute because of the failing of step prior to it, no execution is persisted for it. Each step execution contains an ExecutionContext which contains data needed by the developer be persisted across the batch runs.
For eg state info needed to restart.
Some Step Execution properties are status, startTime, endTime, exitStatus, executionContext etc.

Execution Context
It represents a collection of key value pairs that are persisted and controlled by the framework in order to allow developers a place to store persistent data that is scoped to a step execution obj or a job execution object.
Eg: executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());

Let's suppose you are reading lines from a file. Here 'LINES_READ_COUNT' is the key and the value will be set by the developer through the code. This key value pair is stored in the metadata table which can be easily accessed accross the different steps of a Job.
In this case the step failed after processing 40322 lines allowing the step to start again from the same line where it left off.
Exactly 1 execution Context must be there per step execution. There should be at least 1 execution context per job execution per step execution.

Here, it checks if executionContext has LINES_READ_COUNT. If it's present then lineCount fetches it and stores the value corresponding to the Lines_Read_Count. It then starts reading data from the next lines in the file. Thus execution context saved the time + extra processing here as it has stored the data of already read lines.

Job Repository

It provides CRUD operations for Job launcher,Job and Step implementations.When a job is first launched, Job execution is obtained from the repository. And during the process of job execution and step execution, other data is also persisted in it which can be used by the developer. All the metaData tables are part of this repository. You can also change the default names of these tables.
@EnableBatchProcessing gives a JobRepository automatically configured.

Job Launcher

It is a simple interface for launching a Job with set of Job parameters.

public interface JobLauncher {
 public JobExecution run(Job job, JobParameters jobParameters)
 throws JobExecutionAlreadyRunningException, JobRestartException,
 JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}

Item Reader

It is abstract class which represents retrieval of input for a step.When item reader has exhausted the retrieval of data, it returns null.

Item Writer

It is an abstract class representing output of a step, one batch or chunk at a time. It has no knowledge of pervious or next input which it takes. It knows only the item which was passed in the current invocation.

ItemProcessor

It is an abstraction which represents the business processing of data. It acts as a place of transformation of input data received from item reader to output of item writer.
If the data processed is not valid, it returns null which means the item writer won't include it while producing the output.

If you prefer watching video , you can follow this link🦋.

So now you know what to do. If you found this useful, you know what to do now. Hit that clap button and follow me to get more articles and tutorials on your feed.❤❤

DEV Community