TL;DR: How to setup Apache NiFi on Kubernetes in 1 minute
Motivation
Apache NiFi is one of the most used tools data processing tools. With a great web base GUI you can design and deploy complex datapath workflows easily.
The core package includes a lot of operators (connectors). You can get tweets for a specific hashtag, load file from S3, call a HTTP API Rest service or send a email, for example.
It’s a Java software with a Web UI. It’s very simple to launch and start to build and deploy pipelines.
One of its main advantages are the queues between processing nodes. If one processing node is stopped or busy processing previous data, the data is enqueued.
On the other hand, it’s scalability capabilities are very limited.
Basically you can’t execute processing units in different computer nodes.
The new kid on the block, Apache Airflow, has a lot of operators too and it can scale out easily. It’s powerful and is gaining ground on Apache NiFi. We’ll explore it in some other time.
What we need
We need a Kubernetes cluster with an ingress service. You can follow these post to create one.