[Notes] 6 Lessons we learned when debugging a scaling problem on GitLab.com

https://about.gitlab.com/2019/08/27/tyranny-of-the-clock/

The first step is to look for critical filters to dramatically reduce the area to troubleshoot. Such as type of logs, which server, etc.

Wireshark statistics tools could be super helpful.

If usage pattern aligns with some timing cadence, think scheduled jobs.

If the incoming rate exceeds the limit (measured every millisecond) the new connections are simply delayed. The TCP client (SSH in this case) simply sees a delay before the TCP connection is established, which is delightfully graceful, in my opinion.

Why? Rate limiting should be used for defensive needs, where it prevents from handling the unexpected requests. But in this case, those requests are expected and expect to be processed.

When you choose specific non-default settings, leave a comment or link to documentation/issues as to why, future people will thank you.

That applies to all things non-default, such as “magic numbers,” workarounds, tricks, possible valid values, etc.

Weiran's Recycle Bin

Life means something

[Notes] 6 Lessons we learned when debugging a scaling problem on GitLab.com

Related

Leave a comment Cancel reply