The 4 steps to fix the server’s overload issues
In an age when users expect immediate load times, server choice and management become critical factors in the success of a website.Therefore, an overloaded server is an insidious problem for many projects, which can compromise site uptime and also hinder organic performance in the long run. However, with careful and strategic management, it is possible to minimize the risks and ensure that the site remains performant and available to users, thus contributing to the success of the online business. Regardless of the type of hosting, in fact, there are some frequently recurring situations that can be solved upstream with 4-step work to identify and resolve bottlenecks that slow down the system, improve overall server performance, and avoid regressions.
What server overload means and what problems it generates
The beating heart of any website is the server on which it resides: this digital infrastructure not only hosts data, but also handles requests from users browsing the site, and its efficiency is directly proportional to the quality of the user experience. A fast and reliable server is essential to ensure that the site is always accessible and performs well.
It is easy to see, then, that when the server is not performing properly, trouble begins….
This is especially the case with server overloads, which occur when incoming requests exceed the server’s processing capacity.This can lead to a number of problems, including significant slowdowns, longer page load times, and, in the worst case, the dreaded “server unavailable” error, expressed by status code 500. This situation can frustrate users and damage the site’s reputation, as well as negatively affect search engine rankings.
Server overload can be caused by multiple factors. A sudden spike in traffic, such as that generated by a successful marketing campaign or an unexpected event that draws attention to the site, can strain server resources. Other reasons can include cyber attacks, such as DDoS (Distributed Denial of Service), which flood the server with requests with the intent of making it inaccessible, poor configuration or inadequate maintenance, which also can contribute to overload problems, or even the use of poorly optimized web applications that require more resources than necessary.
How to manage server issues
Managing an overloaded server requires a proactive approach and well-planned strategies. A first step is constant monitoring of server performance, which can help identify problems before they turn into crises. Implementing scaling solutions, both horizontal and vertical, can then provide the flexibility to handle traffic variations. It is also critical to have a robust security infrastructure to protect against external attacks. Finally, optimizing web applications to reduce server load and distributing traffic through the use of content delivery networks (CDNs) are practices that can significantly improve server resilience and performance.
The four steps to avoid server overload
To guide us in these operations is a post published on web.dev signed by Katie Hempenius, software engineer at Google, which immediately reports what are the four steps in which the work is declined:
- Evaluate, that is, determine the bottleneck that is impacting the server.
- Stabilize, which means implementing quick solutions to mitigate the impact.
- Improve: to increase and optimize server capacity.
- Monitor: use automated tools to help prevent future problems.
The step of problem analysis
In engineering (and then also in computer science) a bottleneck occurs when a single component heavily constrains and influences the performance of a system or its capabilities.
What are the bottlenecks of a server
For a site, in case of traffic that overloads the server, the CPU, network, memory or I/O disk can become bottlenecks; identifying which of these is the bottleneck allows you to focus efforts on interventions to mitigate the damage and solve it. The bottlenecks of the CPU and network are the most relevant during a traffic peak for most sites, so we focus mainly on them.
- CPU: CPU usage constantly above 80% must be studied and corrected. Server performances often worsens when CPU usage reaches a threshold of 80-90% and this becomes even more remarkable by approaching 100%.
The use of the CPU to satisfy a single request is negligible, but doing so in the scale found during traffic peaks can sometimes overwhelm a server. Downloading services on other infrastructures, reducing expensive operations and limiting the amount of requests can reduce CPU usage.
- Network: during periods of heavy traffic, the transmission capacity of the network that serves to meet the demands of users may exceed the limit. Some sites, depending on the hosting provider, may also exceed the limits related to cumulative data transfer. To remove this bottleneck you need to reduce the size and amount of data transferred to and from the server.
- Memory: when a system does not have enough memory, the data must be poured back to disk for storage. Disk access is much less rapid than memory access and this can slow down an entire application. If the memory becomes completely exhausted, it can cause Out of Memory (OOM) errors. Adjusting memory allocation, correcting memory loss, and updating memory can remove this bottleneck.
- Disk I/O: the speed at which data can be read or written from the disk is limited by the disk itself. If the I/O of the disk is a bottleneck, increasing the amount of data stored in the memory can mitigate this problem (at the expense of increased memory usage); if it does not work, you may need to upgrade the disks.
How to detect bottlenecks
Running the top Linux command on the affected server is a good starting point for the analysis of bottlenecks; if available, we can integrate it with historical data from the hosting provider or with other monitoring tools.
The stabilization step
An overloaded server can quickly lead to cascading failures in other parts of the system, so it is important to stabilize the server before attempting to make more significant changes.
The rate limiting protects the infrastructure by limiting the number of incoming requests, an increasingly important intervention when the server performances decrease: as response times lengthen, users tend to update the page aggressively, further increasing the server load.
Rejecting a request is relatively inexpensive, but the best way to protect the server is to manage the rate limiting upstream, for example through a load balancing, an inverse proxy or a CDN.
HTTP Caching
According to Hempenius, you should look for ways to more aggressively store the contents in the cache: if a resource can be provided by an HTTP cache (whether it is the browser cache or a CDN), you do not need to request it from the source server, which reduces the server load.
Http headers such as Cache-Control, Expires and Tag indicate how a resource must be stored by an HTTP cache: checking and correcting these headers will improve caching.
Even service workers can be used for caching, but use a separate cache and represent an integration, instead of a replacement, for proper HTTP caching, and then, in case of server overload, efforts should be focused on optimizing the memorization in the HTTP cache storage.
How to diagnose and solve problems
To address this, we run Google Lighthouse and focus on the audit Uses inefficient cache policy on static assets to view a short or medium life time (TTL) resource list, considering whether to increase the TTL of each resource. As rough advice, the Googler explains that:
- Static resources must be cached with a long TTL (1 year).
- Dynamic resources must be cached with a short TTL (3 hours).
The correction can be implemented by setting the max-age directive in the Cache-Control header to the appropriate number of seconds, which is just one of many directives and headers that influence the caching behavior of the application.
The Graceful Degradation strategy
The Graceful Degradation is a strategy based on the temporary reduction of functionality to eliminate excess load from a system. This concept can be applied in many different ways: for example, serving a static text page instead of a complete application, disabling search or returning fewer search results or disabling some wasteful or non-essential features. The important thing is to focus on features that can be removed easily and securely with minimal impact on the business.
The step of improvements
There are many suggestions to implement and optimize server capabilities; in particular, Katie Hempenius identifies at least five areas to focus attention to.
1. Using a content distribution network (CDN)
The static resource service can be downloaded from the server to a content distribution network (CDN), thus reducing the load. The main function of a CDN network is to quickly deliver content to users through an extensive network of servers located in their vicinity, but most CDNs also offer additional performance features such as compression, load balancing and support optimization.
2. Resizing calculation resources
The decision to resize calculation resources should be taken with care: although it is often necessary, doing so prematurely can generate “unnecessary architectural complexity and financial costs”.
A high Time To First Byte (TTFB) may indicate that a server is approaching its maximum capacity, and a monitoring tool allows you to more accurately assess CPU usage: if the current or expected level exceeds 80%, it is advisable to increase the servers.
Adding a load balancer allows you to distribute traffic across multiple servers, routing the traffic to the most appropriate one; cloud service providers offer their own load balancing systems, or you can configure your own using Haproxy or NGINX, then adding the other servers.
Most cloud providers offer automatic resizing, which works in parallel to load balancing, automatically changing computing resources at the top and bottom based on demand at a given time.
However, Hempenius points out that it is not a magic tool: it takes time for new instances to be online and requires a meaningful configuration. Due to its additional complexity, it is necessary to first consider a configuration based on the simpler load balancing.
3. Enable compression
Text-based resources must be compressed using gzip or brotli, which can reduce the transfer size by about 70%.
Compression can be enabled by updating the server configuration.
4. Optimize images and multimedia contents
For many sites, images represent the greatest load in terms of file size and image optimization can quickly and significantly reduce the size of a site, as we were saying in a past insight.
Lighthouse has a variety of audits that report potential optimizations on these resources, or alternatively you can use Devtools to identify larger files such as hero images (which probably need downsizing interventions).
In principle, Hempenius suggests a quick checklist:
- Size: images must not be larger than necessary.
- Compression: in general, a quality level of 80-85 will have a minimal effect on the image quality but reduces the file size by 30-40%.
- Format: use JPEG for photos instead of PNG; use MP4 for animated content instead of GIF.
More generally, you might consider setting up image CDNs, designed to serve and optimize images and download the service from the source server. Setting up such CDNs is simple, but requires the updating of existing image Urls to point to the new address.
5. Minimize JS and CSS
The minify removes unnecessary characters from Javascript and CSS. A quick intervention consists in minimizing only Javascript (usually more consistent on sites than CSS) to have an immediate greater impact.
The monitoring step
Server monitoring tools provide data collection, dashboards, and server performance alerts, and their use can help prevent and mitigate future performance issues.
There are some metrics that help to systematically and accurately detect problems; for instance, server response time (latency) works especially well for this, as it detects a wide variety of problems and correlates directly with the user experience. Alerts based on lower-level metrics such as CPU utilization can be a useful supplement, but they identify a smaller subset of problems; furthermore, the alert should be based on observed performance at the tail end (i.e., the 95th or 99th percentile), rather than on averages. Otherwise, averages can easily obscure problems that do not affect all users.
The correction phase
Ultimately, Hempenius further explains, we can refer to all the major cloud providers that offer their own monitoring tools (especially GCP, AWS, Azure). In addition, he adds, Netdata is an excellent no-cost and open source alternative.
Regardless of the tool you choose, you need to install the tool’s monitoring agent on each server we intend to monitor and, when finished, make sure to configure alerts.