Roddy

Posted on Sep 26, 2021 • Edited on Oct 5, 2021

Drools & Databases, Part 3: The Solution

#drools

In the third and last part of this extended rant about how not to design database and rules interactions, I will describe how you should actually go about doing this.

If you're new to the party, the first two parts of this discussion can be found here:

If you don't want to spend half your day reading, I'll summarize the salient points here.

One of the most common questions on StackOverflow in the Drools tag is something along the lines of "how do I call my database from my rules?"

In part 1, I explained why you shouldn't try to interact with the database directly from your rules (DRL or otherwise.) Safely interacting with a database using a JDBC driver (or library) is incredibly involved and takes significant effort to do safely and properly, especially in a production environment. It's just about impossible to properly manage transactions, connections (with or without pooling), prepared statements, result sets, and so on within the confines of a DRL file. I'm not even sure it's possible to attempt in any sane fashion in a decision table....

In part 2, I talked about why even interacting with DAOs should be avoided. A well-design data abstraction layer would mean that we don't need to worry about things like connection pooling and transaction management, so in theory we could just interact with the DAOs and we'd be fine, right? Technically, yes, you could -- but the danger here becomes that you need to have extensive knowledge of how the Drools lifecycle works, along with how your data models are coupled (especially if you use hibernate's lazy loading.) This creates a significant issue for future maintenance, since it's not possible to test for the problems that you might introduce due to lack of knowledge in these areas -- such as data inconsistency arising from race conditions.

Now, in part 3, I finally get around to explaining what you should do instead. ("Finally!" you say.)

The cop-out

The answer you've all been waiting for is .... it depends.

No really. It depends on what your rules are actually doing and your problem space. But luckily there's a general rule of thumb which is this:

Interact with the database before you call the rules to get your data and pass it into working memory. Interact with the database after you call the rules to persist your results.

There's still problems here and I'll call them out, but we'll be wading into treacherous waters. Just put a bunch of DBAs into a room, ask them if we should be storing calculated or derived data in the database, and duck for cover.

The details

Let's start with an example, because I like talking about actual code instead of waving my hands around and talking theory.

Scenario: the library

Here's the situation we're going to be coding against. Our client is a library (with books in it; link is to Wikipedia for those unfamiliar with the concepts.) Our application tracks users who borrow and return books. A user is allowed to borrow up to 3 books at a time. Each book is allowed to be borrowed for a set amount of time (1 week, rounded to the nearest day). If the user has any books that are overdue, they are not allowed to borrow any new books until they have returned the overdue book(s).

The library has a simple database that models books and people. The person table has an id (primary key), and some information about the person borrowing the book. The book table has an id (primary key), some information about the book itself, and a borrowed_by field which is a foreign key reference to the person table's id field. So when a book is "borrowed", the borrowed_by field is set to the borrower's id. The borrowed_on field indicates when this event occurred.

Pretty simple. For the graphically oriented, here's the relationship diagram:

(Diagram made in DbDiagram, a pretty sweet and free online tool for this stuff.)

If you prefer looking at a complete, working application, the code from this post is available on GitHub here.

The application

Our application is going to be handling BookEvents. These events come from an external system (the barcode readers used by the librarians to scan books that are being borrowed or returned.) The event arrives at our application looking like this simple model:



enum BookEventType { BORROW, RETURN; }

class BookEvent {
    BookEventType eventType; // required
    Integer bookId; // required
    Integer userId; // optional
}

Notice that userId is optional. When a book is returned, we don't actually care who is returning it -- maybe it was tossed into a chute on the outside of the building after hours and gets scanned when the librarians arrive the following day. Maybe someone found the book on the bus and is returning it out of good will. It doesn't really matter, from anyone's perspective, who actually does the return. But for a 'borrow' event we do need to know it.

When we receive a RETURN BookEvent, we need to process it and update the appropriate tables as needed; this method should generally always return an "OK"/success -- situations like "this is not a library book" or "the returned book is damaged" are out of scope for this exercise.

When we receive a BORROW BookEvent, we need to return an indicator that says either:

No, the user cannot borrow this book.
Yes, the user has now successfully borrowed this book.

The "Yes"/success workflow should actually update the database and underlying system.

Calling the rules

At this point, we have enough information to start coding. First, we need to write a method that will take the received BookEvent and call the rules.



/**
 *  Process the book event and return an indication of success or failure.
 *
 * @param  event the event being processed
 * @return       true if the event is successful, false otherwise
 */
 boolean processEvent(BookEvent event) {
   // TODO
 }

Now, this is the part where many people will go to StackOverflow and start asking about how to interact with the database fromthe rules. Their first instinct is, of course, to just pass the BookEvent to the rules and let it sort itself out.

That's the wrong way to go about it, however. What we should do instead is first get the data we need out of the database, and then call the rules.

Slight detour: the data

What I like to do at this point is take a minute to think about my rules and mentally design my input objects. I want to make sure that the data I pass into my rules is sufficient, and is organized in a way that that is easily accessible. The easiest way to do this, I find, it to describe my rules in plain English, and describe the data I need to have in order for it to work.

In our current scenario we have the following rules:

If the person has 3+ books already checked out, a BORROW is not allowed
If the person has any overdue book checked out, a BORROW is not allowed
If the person is RETURNing a book, it is allowed
If the person has < 3 books checked out, and none are overdue, the BORROW is allowed

And an error case, just to make things interesting:

If the book is already checked out by this user, a BORROW is not allowed (maybe this was an accidental duplicate request)

Also an error case because of how our BookEvent is modelled:

If the BORROW request has no associated person, it is not allowed

Looking at these rules we need the following information:

The event type (BORROW or RETURN)
The id of the book being borrowed
The person, if known
- All books currently checked out by this person
- For eached checked out book, the datetime it was checked out

Now that we know the data we need, we can model the data we're going to pass into the rules:

We'll pass in the BookEventType as-is
We need the Book we're trying to check out -- id at minimum
We'll pass in a Person instance, so we can identify when they're missing from the request for the error case
We'll need to pass in a CheckedOutBook model that indicates when the book was checked out

We can model this as follows:



class CheckedOutBook {
  int id;
  OffsetDateTime checkedOutOn;
}

class Person {
  int id;
  List<CheckedOutBook> checkedOutBooks;
}

Now we're good to go. Our next step is, of course, getting our data out of the database and creating our inputs

Preparing our rule inputs

For this example, we'll assume that we've already got a proper abstraction layer set up (Spring, Hibernate, whatever.) So we'll just get the data out of the database using some offscreen data services that will conveniently return the data data we need.



Person getUserFromEvent(BookEvent event) {
  User userFromDb = dataService.getUserById( event.getUserId() );

  Person person = new Person();
  person.setId( userFromDb.getId() );

  // Convert all Book (db model) into CheckedOutBook (rules model)
  // and save to Person
  person.setCheckedOutBooks(
    userFromDb.getBooks()
              .stream()
              .map( b -> {
                  CheckedOutBook book = new CheckedOutBook();
                  book.setId(b.getId());
                  book.setCheckedOutOn(b.getBorrowedOn());
                  return book;
              }).collect(Collectors.toList());
  return person;
}

And now all that's left is to actually call the rules:



boolean processEvent(BookEvent event) {
    // First, get the necessary data from the database
    Person person = getUserFromEvent(event);

    // Call the rules
    KieServices kieServices = KieServices.Factory.get();
    KieContainer kContainer = kieServices.getKieClasspathContainer();
    KieBase kBase = kContainer.getKieBase(kbaseName);

    KieSession session = kBase.newKieSession();
    session.insert(person); 
    session.insert(event); 
    session.fireAllRules();

    // TODO -- update the database and return the result
}

But wait, what's this? We're inserting the event instance?

We need two pieces of data from the event -- the type and the id of the book being borrowed. It makes more sense to pass them into the rules as part of the event instead of as free floating items. I've never been a fan of passing just Strings or UUIDs or Integers by themselves into working memory because there's no indication of what they actually are -- they're just random pieces of data who have lost any semantics they might've once had associated with them when they had a variable name.

Notice also that we've still got a pretty big TODO left in this method -- we haven't designed ourselves a way to get the actual results out of these rules. We'll address that in the next section, when we write the rules.

The rules

OK now for the fun part.

We'll start with the easy bit, the RETURN type:



rule "Book is returned"
when
  BookEvent( type == BookEventType.RETURN )
then
  // TODO this is ok / successful
end

... Well that was boring. Now for the "error" cases:



rule "Book is already checked out by this person"
when
  BookEvent( type == BookEventType.BORROW,
             $id: bookId )
  Person( $books: checkedOutBooks )

  exists( CheckedOutBook( id == $id ) from $books )
then
  // TODO this is not ok
end

rule "Unknown person"
when
  BookEvent( type == BookEventType.BORROW )
  not( Person() )
then
  // TODO this is not ok
end

And finally, the actual meat-and-potatoes rules:



rule "Person has 3+ existing checked out books"
when
  BookEvent( type == BookEventType.BORROW )
  Person( checkedOutBooks.size >= 3 )
then
  // TODO this is not ok
end

rule "Person has an overdue book"
when
  BookEvent( type == BookEventType.BORROW )
  Person( $books: checkedOutBooks )

  $checkoutLimit: OffsetDateTime() from OffsetDateTime.now().withHour(12).withMinute(0).withSecond(0).minusDays(7)
  exists( CheckedOutBook( checkedOutOn.isBefore($checkoutLimit) ) from $books )
then
  // TODO this is not ok
end

Cool. We're almost done. Now we just need to get the results out to the caller. This is another commonly asked question topic in StackOverflow; I'll post about it some other time at length, but basically we have three options:

Create some sort of "result" object and pass it into working memory; in the 'then' clause, update this result object with the results from the rule.
Create some sort of "result" object and set it as a global variable. This is nearly the same as 1, but more "old school" and you can't key off of the state of the result object in your "when" clause (not that we need to in our current application.)
Invoke a method or other action which has a side effect that is visible to the caller.

For the sake of simplicity, we're going to go with option 2. Our results object will look like this:



public class BookEventResult {
    private boolean success = true;
    private String reason;
    // getters and setters
}

And now all we have to do is update our rules to set success to false and a reason in the right hand side of each rule. We'll also want to declare the global at the top of the DRL. Here's one rule done as an example:



global BookEventResult result;

rule "Person has an overdue book"
when
  BookEvent( type == BookEventType.BORROW )
  Person( $books: checkedOutBooks )

  $checkoutLimit: OffsetDateTime() from OffsetDateTime.now().withHour(12).withMinute(0).withSecond(0).minusDays(7)
  exists( CheckedOutBook( checkedOutOn.isBefore($checkoutLimit) ) from $books )
then
  result.setSuccess(false);
  result.setReason("User has overdue book(s)");
end

To tie it all together, we update the calling function and it's done. Huzzah!



boolean processEvent(BookEvent event) {
    // First, get the necessary data from the database
    Person person = getUserFromEvent(event);

    // Create the result object
    BookEventResult result = new BookEventResult();

    // Call the rules
    KieServices kieServices = KieServices.Factory.get();
    KieContainer kContainer = kieServices.getKieClasspathContainer();
    KieBase kBase = kContainer.getKieBase(kbaseName);

    KieSession session = kBase.newKieSession();
    session.insert(person); 
    session.insert(event); 
    session.setGlobal("result", result);
    session.fireAllRules();

    // handle the result
    if (!result.isSuccess()) {
        logger.info("Failed to checkout book. Reason = "+result.getReason());
        return false;
    }

    if (event.getType() == BookEventType.BORROW){
        borrowBook(event);
    } else {
        returnBook(event);
    }
    return true;
}

The implementation of the borrowBook and returnBook functions left as an exercise for the reader. In these methods, we should actually implement the business logic involved in borrowing or returning a book, which will involve updating the appropriate rows in the database.

Um, but wait ...

Yes, this is a toy example. Yes we've got some pretty sizable gaps in our error handling. Consider these following use cases:

What if the book that we're trying to check out is already checked out by someone else?
What if a concurrent request comes in for the same user which puts them over the checked out limit?
What if the validity of the action changes between the rules determining go/no-go and the actual invocation of the borrowBook/returnBook methods?

These are all valid questions. But here's the thing -- we can handle them. And when we handle them they will be handled in code. Not rules. Because the whole point of this exercise was to implement our validations in the rules without having to touch the database from the DRL itself.

And in that regard we've been successful. Before invoking the rules, we get our data from the database. We are then able to invoke the database with a snapshot of the data at the current point in time. When we finish, we have our result, and can now update the database.

At minimum, what we should be doing is checking that the state that we ran against the rules is the same state that the database is in when we try to apply the user action. If something has changed in the meantime, we can terminate the request with an HTTP 409 Conflict (or equivalent if not in an HTTP API or not using HTTP status codes.) This sort of response will indicate that yes, we tried to do the thing you asked, but there was a conflict that occurred -- likely a conflicting request that changed state before we were able to apply our changes.

The end of the road

And that is that. Hopefully you now understand why interacting with a database from inside of a DRL directly is bad, interacting with a DAO from inside of a DRL is only marginally better, and what you should actually be doing is doing all of your database GETs/READs first and then calling the rules and then doing any necessary SAVEs for your updated state.

(Now if you really want to kick over an anthill, find a bunch of DBAs and ask them their opinion on saving derived or calculated data in a database. If they had their way, we'd never be saving anything that we "calculated" in rules into the DB.)

Thanks for sticking with me on this extended rant about Drools and databases. Hopefully it made sense; if not feel free to leave a message. All sample applications were made up on the fly and do not represent any real production application that I know of or have worked on.

And, as always, happy coding!

Cover image attribution: ulleo @ Pixabay

Top comments (3)

Jairo Dávila • Nov 1 '21

Hi Roddy! Hope you are doing well!

I'm currently working with business central Version 7.56.0.Final, and I'm newbie in the use of Drools. My problem is that the client needs to synchronize data from database to compare this data in the guided decision tables. I was thinking to upload this data in global variables but the problem in this case is the amount of data (million of records) and the time it takes to synchronize in case of reset.

I hope you can help me with an alternative way to solve this problem.
Thank you!

Roddy • Nov 5 '21

Hi Jairo,

I don't think this is possible with guided decision tables -- they're rather static in their implementation. However, a disclaimer: I haven't really looked at them since they first came out and I evaluated them for if they would be useful to my company (they were not.)

I can think of several ways of addressing your problem, but your biggest issue is going to be your database data. How often does it change? Will you need new data every time you fire the rules?

If your rules are using the database data to create your conditions, the problem is going to be that your rules will need to be regenerated and recompiled every so often. If you just need to resync your db data to your rules once a day, then maybe this is ok. But if you need to do it after every single time you run the rules, it's really not: compiling and loading the rules into memory is one of the slowest things that Drools does, but since most applications do this at most once it's OK.

On the DRL side of things, there's a feature called Rule Templates which you should look into using. The idea is that you create a template of a rule, and then you apply it against a data source -- traditionally a CSV. Drools then creates X number of rules by interpolating the data from your data source into your template.

But if you really want to use decision tables, I don't think there's similar functionality available. One of the problems is going to be that decision tables are extremely non-performant: my company actually banned their use because they were so bad. Maybe I'm reading it wrong, but when you say "millions of records" that's translating to me into "millions of decision table rows" and I can't even imagine how slow and resource intensive that's going to be. Especially for large volumes like this, I highly recommend going the DRL route.

I do need to revisit decision tables sometime soon in my research, so I'll keep your question in mind as I do. But in the meantime I suggest looking at the possibility of using Rule Templates ... and also maybe trying to figure out how much data out of those "millions of records" you actually do need.

Roddy

Jairo Dávila • Nov 5 '21

Roddy thanks for your answer and your suggestions, it’s a pleasure to read an opinion from someone with experience!