Rasa 2.x gives us a lot of new features as conversational designers / developers. The coolest feature that isn't quite apparent is the ability to group and organize your training data any way you please.
At my company we build "abilities" for our bot - think of them similar to Alexa Skills. Essentially you have some domain information, nlu training data, stories and sometimes actions or forms to round out the ability. The more abilities the bot has, the longer and more complex your files get; which leads to a longer time to grock what you're looking at if you drop in to tweak something, let alone onboard a new developer.
Rasa 1.x allowed you to split up your nlu and stories files simply by creating a
data/core directory in your project and putting the individual files there. You can group your data into separate files which makes it easier to find something if/when you need to change something. For example if you needed to add new chit-chat training data, you could jump into
data/nlu/chit-chat.md and add new data. Initiating the
rasa train command utilizes the files in
data/core in combination with
domain.yml in the root of your project to train your model.
This was great, but not ideal for me. I built a script to let me split my domain files in a similar way; creating a
data/domain directory and putting my files there. Rasa however, didn't recognize that directory, so I wrote a script to merge these files into a single
domain.yml file and drop it in the root. This allowed the
rasa train command to utilize my separate domain related files.
Rasa 2.x gives us the ability to split up our domain files and the benefit to that is clear; smaller files with more focused data. I also don't have to utilize my custom script now!
Why is this cool? To expand on my explanation above; if your bot can handle chit-chat, weather, restaurant search, and directions you would have a single long
domain.yml file in the root of your project with ALL of your intents, slots, entities, responses, action calls, and form config. Your topical data is interlaced, and it makes it hard to find things. Being able to split this into different files just makes more sense. (Thank you Rasa!)
Your new data directory structure can now change to -
data/core data/domain data/nlu
And each of these can contain multiple files that make up your bot's data. You can even do this with your action files.
Here's something that is an amazing side effect / undocumented feature of the way Rasa deals with training data in 2.x You can create directories under
nlu and Rasa will recurse down through looking for files during the training process.
I know you're asking - why is this awesome? In our case, as I said we build abilities, which are mostly isolated functions and conversational scenarios. In v1 we adopted a filename convention to differentiate between abilities. In v2, by exploiting this new directory structure we can have individual developers work on a single ability without stepping on the toes of other developers.
They can create a new ability directory - let's say they're working on a book recommendation ability. Our dev creates
data/book-recommendation and in that directory creates a
rules.yml and works solely from that directory. Fun fact, the filename doesn't matter. Each
.yml file is keyed -
rules: so it doesn't matter how many files you have, or what the names are, it all works!
If you decide to do this, you'll need to run
rasa train with the
--domain parameter so it will find your domain files
rasa train --domain data
If you leave off the
--domain parameter, Rasa will look for
domain.yml in the directory you're running it from so be sure to delete
domain.yml in the root of your project, or you may be quite confused why your latest changes aren't getting pulled in.
You can also do this with your
action.py file, albeit in a different location and there's an extra file. We create an ability directory under
actions/, drop in an empty
__init__.py file (making python treat it as a package) then add an
actions.py file (or whatever filename you want)
In our book recommendation example we would have something like this:
actions/__init__.py actions/book-recommendation actions/book-recommendation/__init__.py actions/book-recommendation/actions.py
Doing all of this directory organization centralizes the code and lets your developer spin up a local
rasa init project, and work on that ability from beginning to end, creating a very focused bot complete with tests. One caveat is if the ability being worked on is integrated with another ability in some way. Depending on the level of that reliability, you may think about whether the new code is actually a separate/new function as opposed to an extension of a current ability - but that's getting away from the main topic here.
We have plan to have a special repo of abilities, so when our devs are done they can just move their directories over issue a PR and that new ability will be available for everyone else on the team to pull down and add to their bot if needed.
Up until now, if your ability had any python related action code, you'd have 2 directories to manage. What if you could create a truly self-contained ability in one directory. Literally add one directory of files, retrain and have a new ability in your bot?
To achieve this, we'll move the action files into our
data/book-recommendation directory. There's some setup to do this however.
__init__.py files we've been dropping all over? Python uses those to detect if a directory is loadable (a package).
To get our all-in-one setup we'll need to drop an
data and then move our
__init__.py and the specific
actions.py file from our actions' ability folder into our data's ability directory. This way everything is 100% self-contained in one single directory like this:
data/__init__.py data/book-recommendation data/book-recommendation/__init__.py data/book-recommendation/actions.py data/book-recommendation/stories.yml data/book-recommendation/domain.yml data/book-recommendation/nlu.yml
The trick here is to run your action server with the
--actions parameter like this:
rasa run actions --actions data
This tells rasa to load the actions files from your data directory, and it will recurse down and load any python files it finds.
As noted above, you'll also need to run
rasa train with the
--data parameter like this:
rasa train --domain data
That will tell Rasa where your domain files are.
I think this is a pretty cool advancement in the ability (no pun intended) to organize our data, streamline our development process and allow a very interesting approach to developing different independent functions.
I'm not sure I like the python being intermingled in the same directory as my
.yml files, it feels a little gross, but I supposed I could also create a
data/book-recommendation/actions directory to move out all the python other than the
__init__.py file of course. Or maybe even go crazy with
OR even rename our
data directory and create something like this:
If you do something crazy like this be sure to alter your --data and --action parameters when firing up Rasa!
Those both feel a little over the top, but the point is the possibilities are endless and you have the ability to organize your files however makes sense to you.
I'll continue iterating on this approach. I'm interested in knowing what others are doing to organize their data. The single file system works for smaller / simpler bots, but anything with some robustness will quickly outgrow that model (pun intended).
Let me know what you think in the comments!