NiFi is vey good at reliably moving data at very high volumes, low latency, with a large number of mature integrations, in a way that allows for fine grained tuning, and i've seen first hand that it is very scalable. It's internal architecture is very principled: <a href="https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html" rel="nofollow">https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.ht...</a><p>Out of the box it is incredibly powerful and easy to use; in particular it's data provenance, monitoring, queueing, and back pressure capabilities are hard to match; custom solution would take extensive dev to even come close to the features.<p>It is not code, and that means it is resistant to code based tooling. For years it's critical weakness was related to migrating flows between environments, but this has been mostly resolved. If you are in a place with dev teams and separate ops teams, and lots of process required to make prod changes, then this was problematic.<p>However, the GUI flow programming is insanely powerful and is ideal when you need to do rapid prototyping, or quickly adapt existing pipelines; this same power and flexibility means that you can shoot yourself in the foot. As others have said, this is not a tool for non technical people; you need to understand systems, resource management, and the principles of scaling high volume distributed workloads.<p>This flow based visual approach makes understanding what is happening easier for someone coming later. I've seen a solution that required a dozen containers of redis, two multiple programming languages, zookeeper, a custom gui, and and mediocre operational visibility, be migrated to a simple nifi flow that was 10 connected squares in a row. The complexity of the custom solution, even though it was very stable and had nice code quality, meant that that solution became a legacy debt quickly after it was deployed. Now that same data flow is much easier to understand, and has great operational monitoring.<p>Some suggestions:
- limit NiFi's scope to data routing and movement, and avoid data transformations or ETL in the flow. This ensures you can scale to your network limits, and aren't cpu/memory bound by the transformation of content.
- constrain the scope of each instance of nifi, and not deploy 100s of flows onto a single cluster.
- you can do alot with a single node, only go to a cluster for HA and when you know you need the scale.