Insights

What is Low Latency in Cloud Computing?

5 min read Michael Carroll on Sep 15, 2022

What is latency in cloud computing?

Latency is defined, in the networking space, as the time it takes a packet to travel from source to destination (although it is often measured as the time it takes from the request for information to that information being received). It is, simply enough, the delay between data being requested and it being received. This is entirely separate to the question of bandwidth, which defines the speed at which data can flow once the connection is made. 

To illustrate this, we could imagine two connections, both of which offer the same bandwidth, but where Connection A’s latency is half that of  Connection B. All else being equal, users on each will see web pages load at the same speed - but users on Connection B will see a blank screen for twice as long as users on Connection A before the loading happens. In practical terms, Connection B will appear to be ‘slower’ then Connection A, despite offering the same bandwidth capacity. 

Latency is to no small degree dictated by geographical location. Although data flows through the Internet extremely quickly, it still takes time for a signal to be transported from one location to another. Even if a packet were moving at the speed of light (which it does not, to be clear), it would take a measurably longer time to move it from New York to Boston then it would to move it from New York to Sydney. For reference, assume light moves at a speed of 186,000 miles per second; Boston is 190 miles, so light would take 0.001 seconds to arrive; but Sydney is a shade under 10,000 miles away, so it would require a full 0.05 seconds or 50 times longer. And again, data does not (as nor does anything else other than light itself) travel at the speed of light.

How does geography affect low latency networks?

Geography affects latency at both ends of a connection - the distance between requestor and server is what matters, rather than the speed of connection to which each is connected. The speed at which a page loads is absolutely impacted by the Internet connection of user and server alike, but the latency is always impacted by geographical separation. This is one of the reasons that low latency cloud computing has gained in popularity. By putting websites and app servers into the cloud, and creating synchronized instances around the world, providers are able to reduce the distance between users and the nearest node of the system, thus reducing geography-based latency.

Equally importantly, distances, as measured across a vast global network like the Internet, are not the ‘as the crow flies’ direct paths used in the example above. Physical cables are used to connect devices together - as accustomed as we all are to WiFi, it is only the very tiniest bit of the connectivity architecture ‘at the edge’ that doesn’t run across physical equipment. Data being transported must run through every inch of the connection, so the route that a cable takes is absolutely part of the calculation of latency. Indeed, the bestselling Michael Lewis book Flash Boys details the efforts of a company called Spread Networks to build a an 827 mile connection between Chicago and New York that is as straight as possible so that latency is utterly minimized (they literally bulldozed fields and tunneled through hills to deliver the straightest path possible); by providing subscribers with a lower-latency connection, the network was intended to allow them to execute trades before non-subscribers could respond.

If latency is impacted by true distances traveled, it is equally impacted by routing, as Internet traffic can take circuitous routes from place to place. Imagine being a driver headed from Town A to Town B, 50 miles away. There are all sorts of different sets of directions you could take - say, using a freeway, or opting to stick to surface roads. As we look at the Internet, there is a similar set of choices, as data is actually passed from node to node across the network - the user, almost by definition, simply never actually connects directly to the server.

Border Gateway Protocol (BGP) explained

For some time, the key technology to reducing latency by optimizing cross-network routing has been Border Gateway Protocol (BGP), which connects the various sub-networks of the Internet, and allows each to have a general map of available paths from node to node. With this map in hand, networks can route data along pathways that have the smallest number of ‘hops’ between the connections.

BGP optimization, however, is a fairly blunt instrument for reducing latency, as it operates on, broadly speaking, a boolean paradigm: it can tell whether a node is currently available, or not, but cannot tell how busy a particularly node might be. In other words, a route may be selected because it has eight hops rather than ten, but if one of those eight nodes is extremely busy, using it may actually take longer than passing through three or four of the nodes in the ten-hop path. As the Internet has become busier and more crowded, BGP has moved from being the tip of the spear to being a foundation.

Increasingly, traffic management has moved to a more algorithmic approach, in which data gleaned from browser-based (so-called real-user measurement) beacons is informing core servers of the best routes, allowing providers to select, for instance, the fastest content delivery networks at the moment they are needed. By selecting the routes that are the least busy, providers can programmatically reduce latency, ensuring the highest level of user satisfaction.

For the average user, latency is not a measure that can be tuned: you’re connected to the ISP you’re connected to, and the information you’re seeking is located wherever it’s located. For sure, opting for an ISP that provides higher bandwidth will have a measurable impact on your overall experience, but even paying for the fastest connection will not change the distance data must travel between you and its source. Latency, in other words, is a problem that can be solved only by the provider that holds the data desired by the user.

Reducing network latency

The smartest solution to reducing latency is to invest in public networking, allowing clouds and data stream networks to do the work of connecting the user to the closest synchronized source of information. Many infrastructure providers, recognizing this important value that they are bringing to their customers, make the most by increasing their charges linearly with the addition of extra regions: the size of a cloud contract should, and normally is, bounded as much by the location of the supported audience as the amount of data that will flow across the wires.

PubNub not only accelerates the delivery of data with the core architecture of our pub/sub APIs, but also solves for latency by maintaining nodes around the world, ensuring latency remains low for all participants in a data stream. We also charge a flat fee, regardless of which regions users operate in, ensuring that your success does not bring unwanted upcharges that drain your achievement of its joy.

Latency defines the experience each user has - don’t let the cost and complexity of managing it detract from the success of your application.

0