DEV Community

Cover image for Load balancing Vector using HAProxy to collect logs and metrics for High availability in a centralized design
Ahsan Nabi Dar
Ahsan Nabi Dar

Posted on

Load balancing Vector using HAProxy to collect logs and metrics for High availability in a centralized design

Vector on their website has 3 topologies to deploy vector. Distributed, Centralized and Stream based.
Distributed Vector

Centralized Vector

Stream based Vector

In the previous post shared how to do Prometheus metrics collection from distributed sources; using TLS to secure Prometheus remote write via Vector. It used a centralized approach where metrics and logs from multiple host were transferred over to a remote host over TLS where both source and sink were vector. Where Vector runs as an agent on the host and as an aggregator on the remote host. This IMO is probably the best option when you need to keep budget in control and have some durability over downstream with stability.
Prometheus centralised stream.

All sounds good till your aggregator becomes the bottleneck overwhelmed from all the agents and requires more and more resources and becomes fallible.

Vector has guides for architecting with high availability that do a great job over explaining how to design a system for scale.

The thing they don't have in the docs is how do you Load balance using load balancers got a centralized strategy that uses TLS for secure delivery. Most of the time one would use Load balancers available by cloud providers which would make most of the things simple but abstracts away the details over how it works and at what layer.

A vector agent and non HA aggregator setup would look like as such with aggregator taking the load and slowly grow over time to become the bottleneck
non HA aggregator

HA vector aggregator setup would be fronted with a load balancer to distribute load equally among the aggregators

HA aggregator

As the connection between agents and aggregator is based on TLs and the traffic is encrypted. Usually traffic is SSL terminated on load balancers but in this case as the certificates are with vector agent and aggregator the concept to know about is SSL Pass through

SSL pass through is the process of passing SSL-encrypted traffic on to a backend server for decryption.

HAProxy is a versatile load balancer that can support SSL pass through with TCP traffic making it possible to balance your vector nodes and scale out your aggregators. To transmit vector-vector and prometheus remote write traffic via vector the HAProxy config works out as such.

frontend vector-vector-in
    mode tcp
    bind *:19093
    option tcplog
    default_backend             kakashi-vector-vector

frontend vector-prometheus-in
    mode tcp
    bind *:19094
    option tcplog
    default_backend             kakashi-vector-prometheus
# round robin balancing between the various backends

backend kakashi-vector-vector
    mode tcp
    balance roundrobin
    option ssl-hello-chk
    server kakashi_vector_00 vector_00:19092 
    server kakashi_vector_01 vector_01:19092

backend kakashi-vector-prometheus
    mode tcp
    balance roundrobin
    option ssl-hello-chk
    server kakashi_vector_00 vector_00:19090 
    server kakashi_vector_01 vector_01:19090

Enter fullscreen mode Exit fullscreen mode

Your haproxy would receive traffic inbound and forward it as round robin to your vector aggregators
HAProxy vector LB

While each vector aggregator would then continue to work as a distributor and an agent on the node deployed.

Vector aggregator

For Going to production do refer to vector guides and its a difficult problem to balance your Load balancer to scale out and not become a bottleneck next and that is where managed load balancers are a blessing. Its more important to understand how you can scale up and out your distributed system as it grows and make it in to a High availability design.

Top comments (0)