Troubleshooting latency spikes: Combining traceroute, MTR, and ISP analysis

Website latency spikes hurt the digital experience, leading to abandoned carts, frustrated users (who usually switch to competition), and reputation damage. IT teams and webmasters who are entrusted with the duty of ensuring the best digital experience have to shift their focus from core application servers to analyzing the end user experience by traversing the entire network path up to user screens. Wherever there is latency, it could be caused by internal servers, or it could be happening anywhere along the network path, with a large section of the journey being managed by ISPs before ending at the user. By combining diagnostic tools such as traceroute and MyTraceRoute (MTR) with targeted ISP analysis available in the complete digital experience monitoring platform ManageEngine Site24x7, teams can catch the pulse of their websites and get to the root cause of latency to eliminate it at the source.

Understanding latency: Hop count and packet loss

Latency is the time it takes for data to travel from source to destination. Two critical factors influence it:

Hop count: Data travels across the internet through a series of routers, or hops. Each hop adds to the total latency. While a higher hop count does not always indicate a problem, it can contribute to delays if a router in the path is congested or misconfigured.

Packet loss: When data packets fail to reach their destination, systems must retransmit them, causing significant delays. Packet loss manifests as buffering videos, lag in applications, or failed transactions, severely impacting the user experience.

When there is high latency or packet loss observed at a specific hop, it can be interpreted as a network issue that needs to be resolved immediately.

Identifying latency spike symptoms

Latency issues often surface as user complaints before triggering infrastructure alerts. Common symptoms include:

Slow or unresponsive website loading.
Lag and jitter in online gaming or video conferencing.
Transaction timeouts on e-commerce or banking platforms.
Alerts from monitoring tools, indicating breached response time thresholds.

These symptoms point to a bottleneck in the network path connecting users to your services. Pinpointing the exact location is critical for resolution.

Step 1: Use traceroute for network path analysis

Traceroute maps the journey of data packets from source to destination, displaying each hop and measuring the round-trip time (RTT). Tools like ManageEngine Site24x7 enable traceroute tests from over 130 global locations, simulating user perspectives. To analyze a traceroute report:

Identify high RTT: Look for a sudden spike in RTT at a specific hop. If latency remains elevated for subsequent hops, that hop is likely the bottleneck.

Assess consistency: Sporadic high RTT may indicate temporary congestion, while consistent delays suggest a persistent issue, such as a failing router or an ISP problem.

For example, a traceroute showing a jump from 10ms to 150ms at a specific hop indicates a potential issue within that network segment.

Step 2: Leverage MTR for real time diagnostics

MTR is a network diagnostic tool that goes one step further by combining traceroute and ping to provide continuous, real-time statistics for each hop, including packet loss, average latency, and jitter. Unlike traceroute's static snapshot, MTR is dynamic, capturing ongoing monitoring to continuously unearth transient congestion to be investigated further. Key metrics to analyze include:

Packet loss: Any loss above 1% at a hop is concerning and often indicates a failing router or network misconfiguration.

Latency trends: Consistent high latency or jitter starting at a specific hop pinpoints the problem source.

A sample MTR report:

Hop	Host	Loss%	Sent	Last	Avg	Best	Worst
1	your-router.home	0.0%	100	0.4ms	0.5ms	0.3ms	2.1ms
2	isp-A-gateway.net	0.0%	100	10.2ms	10.5ms	9.8ms	15.4ms
3	isp-B-transit.net	5.0%	100	150.1ms	155.8ms	148.2ms	210.5ms
4	isp-B-core.net	5.0%	100	152.3ms	156.2ms	149.1ms	212.0ms
5	your-server.com	5.0%	100	153.0ms	157.1ms	150.0ms	215.3ms

In this example, packet loss and high latency begin at hop 3 (isp-B-transit.net) and persist, indicating a problem within that ISP’s network.

Step 3: Collaborate with ISPs

When diagnostics indicate an issue outside your network, effective ISP collaboration is essential. To ensure a swift resolution:

Provide evidence: Share detailed traceroute and MTR reports from Site24x7, highlighting the problematic hop, packet loss, and latency spikes.

Be specific: Avoid vague complaints like, "The site is slow." Instead, specify the affected hop, time of occurrence, and impact (e.g., “5% packet loss at isp-B-transit.net during 6–9pm, causing 150ms latency.”).

Escalate if needed: If the ISP’s initial response is inadequate, request escalation to their network engineering team, using your data to justify urgency.

This data-driven approach accelerates troubleshooting and ensures the ISP addresses the root cause, such as rerouting traffic or fixing hardware issues.

Case study: How Zylker Games resolved its latency crisis

One fine morning in August 2025, Zylker Games, an online gaming platform, faced a crisis as hundreds of players reported severe lag and disconnections during peak hours (6–9pm). Within three days, player retention dropped by 15%, and negative reviews spiked. Using ManageEngine Site24x7, the IT team resolved the issue in four steps:

Firstly, a traceroute analysis from multiple global locations was conducted to gain a static snapshot of the network trace. This identified a latency spike for the RTT at a particular hop managed by a transit ISP but couldn’t confirm if it was chronic packet loss or a short-term issue.
Secondly, the team used Site24x7’s MTR feature to confirm that about 10% packet loss and 150ms latency were observed at the same hop and occurred only during peak hours. Since the MTR analysis showed the degradation to be time-dependent (6–8pm), the team guessed there was congestion in the peering link or perhaps an overloaded router buffer at the transit point.
Thirdly, they validated this impact by correlating RUM data to confirm that affected users were indeed routed through the problematic ISP. Using client-side performance metrics from RUM helped confirm that all affected players were in a specific region that involved an Autonomous System (AS) network engaged by that ISP with upstream congestion that knocked off users mid-game.
Finally, the above findings were tabled and presented to the ISP, which duly acknowledged the findings and involved its NOC team to quickly verify the data and find a peering point saturation. The NOC team temporarily rerouted Zylker’s traffic to a broader network path while also planning to implement a capacity upgrade.

Outcome: Latency dropped by 40ms, packet loss was eliminated, and player count recovered within a week, demonstrating the power of precise diagnostics and ISP collaboration.

Choose Site24x7 for a holistic approach

Resolving latency spikes is made immensely easier when you back your latency monitoring with a comprehensive IT monitoring strategy that covers the entire chain of components that powers your modern IT stack. Use the following features that are readily available in Site24x7 to ensure your network and apps work as intended.

Traceroute for path analysis helps diagnose issues in the network path by mapping the whole route to measure transit delays across a network from the source to the destination.
MTR helps provide a real-time, continuous path analysis to diagnose network health issues involving hops, latency, and packet loss.
RUM for user experience insights captures and analyzes performance data from actual end-user browsers to provide a grassroots understanding of what every user experiences.
Site24x7's CDN Report, available for Webpage Speed (Browser) monitors, can assess content delivery performance and generate a summary of underperforming elements and lags.
Site24x7's NetFlow monitoring provides traffic analysis to identify bandwidth hogs by specific apps or protocols or user actions that cause congestion, enabling you to remove them.
Site24x7 also provides SNMP monitoring to track the health and performance of critical network devices like routers, firewalls, and switches consuming resources, ensuring optimal utilization without processing delays or latency.
Site24x7's WAN monitoring spots paths that suffer high latency to enable you to resolve the network paths or work with ISPs further to solve the transit blocks.
Site24x7's application performance monitoring is a one-stop-shop that monitors, troubleshoots, and optimizes your app performance across any environment with deep code-level visibility to identify and eliminate bottlenecks and flow issues across microservices, ensuring your customers enjoy reliable services.

To ensure an optimal digital experience by eliminating external latency, IT teams can use ManageEngine Site24x7 for end-to-end website monitoring alongside specialized ISP latency tracking and network performance monitoring capabilities to complete the picture. Site24x7 is a unique and complete monitoring platform that unifies RUM, CDN analysis, traceroute, and MTR into a single view, enabling proactive identification and rapid resolution of public internet issues that degrade the user experience. To explore how Site24x7 can fix your website experience issues, including latency, visit /website-monitoring.html.