A comprehensive NGINX troubleshooting guide

NGINX is a powerful, open-source HTTP software that can be used as a web server, load balancer, reverse proxy, mail proxy, and HTTP cache. Its versatility renders it indispensable in several high-performance, distributed IT infrastructures.

While NGINX is known for its stability and fault-tolerance, it can occasionally encounter issues or performance bottlenecks. In the dynamic world of web services, where downtime or performance blips can immediately impact user experience, prompt troubleshooting of these issues is critical.

This comprehensive guide will equip you with the knowledge and steps to swiftly restore desired NGINX functionality and maintain the integrity of the overall system. We will cover NGINX issues related to startup, connection, configuration, request processing, load balancing, and more.

What is NGINX?

Originally conceived as a fast and lightweight web server, NGINX has evolved to cater to a myriad of additional web use cases. Whether you want to implement SSL/TLS termination between your server and users, efficiently distribute traffic among backend servers, forward requests to different servers based on specific rules or cache frequently accessed resources, NGINX delivers.

Due to its event-driven architecture, NGINX excels at handling large volumes of concurrent connections with minimal resource consumption. This is part of the reason why it’s being used to power some of the busiest sites on the internet, including Adobe, WordPress, ahrefs, and Cloudflare.

NGINX offers granular control over its behavior through a user-friendly configuration syntax. This makes it easy for administrators to adapt NGINX to suit their specific architectural needs quickly. Moreover, NGINX’s modular architecture allows for extensive customization through external modules.

Troubleshooting NGINX for health and performance

In the next few sections, we will provide a troubleshooting framework to assist NGINX users in identifying and resolving common issues that may arise during configuration, deployment, and operation.

Startup and connection problems

Let’s start by dissecting problems you may be facing while getting an NGINX instance to start and connect with other entities in your architecture.

Issue # 1 – NGINX fails to start

Problem: You are unable to start the NGINX service.

Detection: You see errors and exceptions on the console after issuing the NGINX startup command.

Troubleshooting:

  • Start by looking at the NGINX file logs (default location: /var/log/nginx/*) for additional context about the issue.
  • Use the sudo nginx -t command to test your configuration file for syntax errors. Fix any reported errors and restart NGINX.
  • Verify that the NGINX ports (defaults: 80, 443) that you have specified in the configuration files are not occupied by other services. For example, you can use this command to check whether port 80 is already occupied:
    sudo netstat -tuln | grep ':80 '
  • Confirm that NGINX has the required permissions to access configuration and log files. Incorrect file permissions can impede NGINX startup. Use commands like chmod or chown to modify permissions if needed.
  • If NGINX is using third-party modules, ensure that they are compatible with the NGINX version and configuration. If there are incompatible modules, you will need to disable or update them to start NGINX.
  • If the problem persists, enable debug logging and then restart NGINX for more clues. If you are an NGINX Plus user, you can run this command to do so:
    service nginx stop && service nginx-debug start

    If you are using the open-source version, you will have to recompile NGINX using the --with-debug option to enable debugging. Follow these steps:

    • Navigate to the path where you are storing the NGINX source code.
    • Run this command to execute the configure script:
      ./configure --with-debug [add other relevant arguments to the command]
    • Run these commands to compile and install:
      sudo make
      sudo make install

Issue # 2 – NGINX starts but website doesn’t load

Problem: NGINX is up and running, but you are unable to visit your site on the browser.

Detection: NGINX seems to be running as reported by the systemctl status nginx (or similar) command but when you visit your website you get a blank page or an error message.

Troubleshooting:

  • Follow NGINX logs on a terminal window (using tailf, for example) and retry opening your website. If you see specific errors, try to resolve them and restart NGINX. If you don’t see new logs for your request, try the next steps.
  • Run sudo nginx -t to rule out any misconfigurations or syntax errors in the configuration file.
  • Ensure that the firewall allows incoming connections on port 80 (or the custom port you're using) for HTTP traffic.
  • If accessing through a DNS, verify that the domain name is resolving to the correct IP address and that there are no DNS-related issues preventing access.
  • Confirm that the server block configuration in your NGINX configuration file is correct. Verify the root, listen,and server_name parameters. For example, an incorrect value for the root parameter can lead to 404 Not Found errors.
  • Use the ps aux | grep nginx command to see if NGINX worker processes are running. If they are not, there may be a misconfiguration or a resource problem preventing them from spawning. Revisit your configuration file and check resource usage (CPU, memory) to ensure that NGINX has adequate resources to operate.
  • Use browser developer tools or network analysis tools (e.g., Wireshark) to inspect network requests and responses for any anomalies or errors.

Issue # 3 – NGINX isn’t connected to backend servers

Problem: You are unable to establish a connection between NGINX and your backend servers.

Detection:

  • NGINX logs are showing errors related to upstream connections.
  • Your requests are reaching NGINX but not getting fulfilled.

Troubleshooting:

  • Review the upstream configuration block in NGINX (usually located in the nginx.conf file). Ensure that the addresses and ports of the backend servers are correctly specified.
  • Confirm that backend servers are reachable and operational. You can test connectivity to backend servers from the NGINX server using tools like ping, telnet, or curl.
  • Ensure that firewall rules allow traffic from NGINX to backend servers on the specified ports. If needed, adjust firewall settings to allow the communication.
  • If NGINX is configured to run as a load balancer, review your load balancing configurations, especially the load balancing algorithms and health checks.
  • If the problem persists, investigate the health of the backend servers. Check for errors in the backend server logs and ensure that they are functioning as expected.

NGINX configuration issues

Like any other highly configurable system, NGINX is prone to misconfigurations. In the upcoming sections, we will share tips to detect and resolve a few common NGINX misconfigurations.

Misconfiguration # 1 – Suboptimal number of worker processes

Problem: Setting the worker_processes parameter too low can lead to connection timeouts while setting it extremely high might exhaust system resources.

Detection:

  • Use the ps aux | grep nginx command to see the number of running worker processes.
  • Monitor system resource utilization (CPU, memory) to identify any signs of overloading or underutilization.

Troubleshooting:

  • Adjust the number of worker processes in the NGINX configuration file based on server resources and anticipated traffic.
  • Use tools like ps or top to monitor NGINX worker processes in real-time and assess their resource utilization.
  • Realize that this may not be a one-time tweak. Perform regular load testing to determine the optimal number of worker processes for your specific workload.

Misconfiguration # 2 – Suboptimal buffer sizes

Problem: Unsuitable buffer-related settings can lead to issues such as buffer overflow, excessive memory consumption, or slow data transmission. Buffer-related settings include parameters like client_body_buffer_size, client_header_buffer_size, large_client_header_buffers, and proxy_buffers.

Detection:

  • Check the values of the above parameters in your configuration file.
  • Review NGINX error logs for buffer-related warnings or errors.
  • Monitor network traffic and connections for signs of slow data transmission or buffering issues.

Troubleshooting:

  • Go through the official docs of the NGINX core and the relevant HTTP modules to understand the purpose and working of each of the buffer settings. Then, adjust the settings in the NGINX configuration file to align with expected traffic patterns and resource availability.
  • Monitor NGINX access and error logs for any buffer-related errors or warnings during peak traffic periods.
  • Use network analysis tools (e.g., Wireshark) to inspect network traffic and identify any bottlenecks or congestion points.
  • Realize that these may not be one-time tweaks. Perform regular load testing to determine the optimal values for your environment.

Misconfiguration # 3 – Suboptimal worker connections

Problem: Setting the worker_connections parameter too low may result in connection failures/timeouts while setting it too high might slow down performance and/or exhaust resources.

Detection:

  • Check the value of the worker_connections parameter in your NGINX configuration file.
  • Monitor system resources and NGINX logs for signs of connection bottlenecks or resource exhaustion.

Troubleshooting:

  • Modify the value of the worker_connections parameter in the NGINX configuration file based on server resources and anticipated connection requirements.
  • Monitor server resource utilization (CPU, memory) and connection counts during peak traffic periods to identify the need for any subsequent tweaks.
  • If using NGINX Plus, leverage features like dynamic reconfiguration to adjust worker_connections dynamically based on real-time traffic patterns without having to restart.

NGINX web server and load balancer issues

If you are facing any problems related to request processing, load balancing, or similar NGINX functionalities, then consider the tips in the next few sections.

Issue # 1 – 4xx and 5xx HTTP errors

Problem: HTTP errors like 404 Not Found or 502 Bad Gateway indicate issues with client requests or server responses.

Detection: You are seeing HTTP error codes and/or stack traces in NGINX logs.

Troubleshooting:

  • Identify the specific error code and its meaning to understand the cause of the issue. For example, a 400 Bad Request error means that the client request syntax is malformed or invalid, a 401 Unauthorized error indicates that the client must authenticate itself before accessing the resource. 500 Internal Server Error signifies a generic server error that can happen because of misconfigurations or unexpected server-side issues.
  • Review NGINX configuration for misconfigurations related to request handling, proxying, or server blocks.
  • Check backend servers for errors or issues that may lead to the failures.
  • Consider enabling NGINX debug logging to capture more detailed information related to the errors.
  • Implement error handling mechanisms like custom error pages or redirect rules to offer a better user experience during error conditions.

Issue # 2 – Load balancing problems

Problem: Uneven traffic distribution among backend servers leads to overloaded servers and slow response times.

Detection: Your monitoring dashboard shows significant disparities in the number of requests handled by each backend server.

Troubleshooting:

  • Review NGINX file logs on both the load balancer and the backend servers to identify the root cause of the disparities.
  • Ensure that NGINX health checks are configured correctly to identify and remove unhealthy backend servers from the pool. For instance, you may have specified too long an interval for the health_check directive, such as in this example:
    health_check interval=10000 fails=3 passes=2;

    Consider adjusting the interval to ensure timely detection of server failures.

  • Choose the most suitable load balancing algorithm (e.g., round robin or least connections) based on your traffic pattern and server capabilities.
  • Assign weights to backend servers in the configuration (using the weight directive). This allows you to direct more traffic to powerful servers and distribute load more evenly.

Issue # 3 – Caching related problems

Problem: Caching functionality is not working as expected.

Detection: Your monitoring dashboard is reporting a high number of cache misses and an unexpectedly low number of cache hits.

Troubleshooting:

  • Review the configuration file to ensure that caching is set up properly. Verify cache directives such as proxy_cache and proxy_cache_valid for accuracy. The proxy_cache directive allows you to configure the path, levels, and purger threshold among other settings, whereas the proxy_cache_valid setting is used to customize the caching times based on response codes.
  • Ensure that the configured cache directory has sufficient space and appropriate permissions for NGINX to access and write to it.
  • Verify that backend servers are setting the appropriate caching headers for the relevant content. You can use network analysis tools like Wireshark or tcpdump for this purpose.
  • Monitor NGINX logs to identify potential issues with cache expiration, invalidation, or key generation.

Issue # 4 – SSL/TLS related problems

Problem: Problems with SSL/TLS termination are preventing secure connections or causing certificate validation errors.

Detection: You are seeing errors related to SSL/TLS in the logs, such as certificate validation failures or handshake errors.

Troubleshooting:

  • Ensure that the paths to your SSL certificate and keys are properly specified in the configuration.
  • Ensure that your SSL certificate is valid and hasn’t expired. Consider using online SSL/TLS testing tools to verify certificate chain validity and identify any misconfigurations.
  • Review other SSL-related configuration parameters, such as ssl_protocols, ssl_ciphers, ssl_session_cache, and ssl_prefer_server_ciphers for accuracy.
  • Consider enabling additional logging for SSL/TLS if neither of the above steps works.

NGINX best practices

To round off this troubleshooting guide, let’s explore some best practices that can help prevent some of the above issues from occurring in the first place.

Keep NGINX and third-party modules up to date

Formulate a strategy to keep NGINX and all your third-party modules up to date. This ensures that you never miss out on the newest features, vulnerability fixes, and security patches. Moreover, keep tabs on release notes and security advisories to quickly fix any vulnerabilities.

Use third-party modules judiciously

Exercise caution when using external modules with NGINX, as they may introduce complexity, compatibility issues, or security risks. As a golden rule, limit the use of third-party modules to essential functionality and always prefer built-in alternatives.

Regularly review and optimize configurations

Perform periodic reviews of NGINX configurations to identify inefficiencies, security vulnerabilities, or outdated settings. You can get the most out of an NGINX setup by removing unnecessary directives, consolidating similar configurations, and adhering to best practices.

Use dedicated monitoring tools

Deploy a dedicated monitoring tool, such as the NGINX and NGINX Plus Monitoring System by Site24x7, to analyze performance metrics, detect anomalies, and fast-track troubleshooting. The Site24x7 tool allows you to build customized dashboards to track key performance metrics like requests per second and dropped connections. Additionally, you can configure real-time alerts for critical events to avoid disruptions or downtime.

Leverage NGINX’s security controls

NGINX comes with several built-in security controls and configurations to restrict unauthorized access. These include access control mechanisms, HTTP authentication, JWT authentication, rate limiting, security headers, geographical access restrictions, dynamic denylisting, and upstream traffic security. Leverage these features to enhance the security of your NGINX deployment.

Implement high availability and redundancy

To minimize single points of failure, you must architect NGINX deployments with high availability and redundancy . Set up redundant NGINX instances, load balancers, and backend servers — preferably in geographically diverse locations — to make your architecture withstand hardware failures, network outages, failed nodes, and other such problems.

NGINX Plus users can easily implement these features using the nginx-ha-keepalived package.

Conclusion

NGINX is a highly adaptable and reliable solution that fuels many use cases within web architectures. To ensure the seamless operation of NGINX and the broader system it supports, it’s important to troubleshoot and resolve issues. We trust that the actionable advice provided in this guide will help you do just that.

If you want to stay ahead by anticipating issues, gaining insights on critical NGINX performance data and NGINX logs, check out NGINX and NGINX Plus Monitoring tool by Site24x7.

Was this article helpful?

Related Articles

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us