Comms Network Writeup:
When we implemented MQ we ran into a problem: traffic needed to be secured. This was easy for the API with SSL certificates, and was still somewhat easy for gRPC, which could also use certificates and work with a proxy. However, it's much harder to use MQ behind a proxy, and we didn't like the idea of generating and distributing self-signed certificates. We came up with the idea of running the traffic over a wireguard tunnel, specifically the server's wireguard tunnel in the network. Since every network now has a mandatory server "node" (netmaker-1), we could use that tunnel for MQ traffic and keep it secure, hooray!
What we discovered over the past few weeks is that this is fundamentally problematic when it comes to network updates. Whenever a node changes its interface, is must break existing connections. The other side of any WireGuard tunnel must also re-configure the peer for that node. What this meant was, we were broadcasting changes in the network over a tunnel which was also being actively changed.
We had a few workarounds for this, most notably, when MQ connections break over WireGuard, it will re-pull its configs using gRPC. It was doing this a lot more than we liked. We mitigated as much as we could, but ultimately the best we could do is give the nodes a very good "guess" of what their config needed to be. If something went wrong, you could lose your ability to communicate with the server. Particularly difficult were network-wide updates. If you changed the network range, switched off UDP hole punching, or published a "key update", these changes would tend to fail. Think about it, we're telling every node in the network, including the server, to modify its WireGuard interface, change all of its peers, and re-establish connections, and this message is sent over the same WireGuard interface that is about to be modified! It led to a lot of chicken and egg problems.