Another day, another ETL tool, this time Apache NiFi which is described as:
An easy to use, powerful, and reliable system to process and distribute data.
I’ve used SSIS and Kettle in the past, so I figured I’d be able to get this bad boy running easy enough – I mean – it’s ‘easy to use’ right? That’s not a descriptive sentence often used to describe SSIS or Kettle, so should be safe.
Unfortunately – as with politics, bare faced lies are apparently acceptable in the software world as well. I’d probably replace ‘easy to use’ with ‘usable’ – as to the other keywords – powerful / reliable – I can’t confirm or deny the truth of these.
Obviously – my experience is very very small – so take it with a pinch of salt – what I will say is that the documentation is OK, but doesn’t really describe just what the ____ is going on.
Complaints over – let’s crack on.
This is using NiFi 1.10.0, there are no graph bundles for lower versions – and in this version they are not packaged by default.
I tried 4 ways to run NiFi, initially – I went with running it off of my local machine – which – yes – runs Windows 10. But I also have Java installed, so saved me doing any extra leg work.
I downloaded the zip, unzipped to a folder – ran
bin\run-nifi.bat – it started, showed me a lot of Java output – but that’s cool – I’m used to that. I connected to the URI – (http://localhost:8080/nifi) and – good news! It’s there.
Stop it (CTRL+C), restart – and it never ever succeeds in starting again. For why? I have no idea. I extracted it into a fresh directory and ran it – no dice, I rebooted, no dice, nothing I could do would sort it out – I had no instances of Java running ,netstat said the ports were not in use. Nada.
On second thoughts – and with much frustration from the Local attempts – I decide that maybe, just maybe running this away from my work machine would make more sense, I have a Server 2019 ISO lying around – so – why not.
Long story shorter – same deal – though without the initial success – OK – so the obvious link here is Windows, let’s go Ubuntu.
Install. Run. Fail.
There’s a pattern. I followed the instructions to the letter – but no joy. At this point there are 2 options, but I wanted to be able to use my laptop later – so had to put the hammer back and go with Docker.
Back on the main work machine O/S – I install docker – wait what? You didn’t have Docker installed already? Well – no – I’d used it a while ago – but found it messed up my machine way too much. But hey – after 3 fails, it’s time for a win – and I need a win.
Crack open PowerShell and run:
docker run --name nifi-run <code>-p 8080:8080</code><br>-d `<br>apache/nifi:latest
Open up Chrome and go to the NiFi homepage (with the expectation of failure) – and… we have a winner!
Having been caught out by this before – I stop and start it again, and it’s still working! MAGIC.
Adding the Graph Bundles
Now we have NiFi actually running, we need to get the Graph Bundles in place, and firstly, we need to download those, so go get them from the releases page of GitHub.
You only need the first 3 for connecting to Neo4j, and why would you want to connect to another GraphDB eh? Oh yeah – Sadomasochism – forgot.
Stick these files into something simple to remember, in my case, I went with
D:\Docker as we’ll need to reference them for the Docker container to be made.
Also – I’m adding an ‘Import’ volume to the Docker container – to allow me to pass data into NiFi – my initial intention was (and in many ways still is) to be able to read a CSV file from this folder – and insert that into Neo4j.
Create the Docker Container
Pretty much the same command as before, only this time I’m adding 4
-v parameters to the call. 3 of them are putting the
.nar files (downloaded above) into the container, the last is the ‘Import’ folder.
docker run --name nifi `-p 8080:8080 `-v D:/Docker/nifi-graph-client-service-api-nar-1.10.0.nar:/opt/nifi/nifi-1.10.0/lib/nifi-graph-client-service-api-nar-1.10.0.nar `-v D:/Docker/nifi-graph-nar-1.10.0.nar:/opt/nifi/nifi-1.10.0/lib/nifi-graph-nar-1.10.0.nar `-v D:/Docker/nifi-neo4j-cypher-service-nar-1.10.0.nar:/opt/nifi/nifi-1.10.0/lib/nifi-neo4j-cypher-service-nar-1.10.0.nar `-v D:/Docker/Import:/opt/nifi/nifi-1.10.0/data-in `-d `apache/nifi:latest
This seems like a good place to pause – we have NiFi running with the Graph bundles there, next time we’ll execute some queries against it.