Akami networkevolved out of an MIT research effort aimed at solving the flash crowd problem
15,000+ servers in 1,100+ networks and spans 65+ countries, consist of regions of machines (typically 2-18 commodity machines, not more reliable and expensive servers), communicate over the public Internet
It handles flash crowds by dynamically allocating server, e.g. allocate more servers to sites experiencing high load, directs client requests to the nearest available server likely to have the requested content.
- Nearest is a function of network topology and dynamic link characteristics:
- Available is a function of load and network bandwidth:
- Likely is a function of which servers carry the content for each customer in a data center:
Mapping
The direction of requests to content servers is referred to as mapping.
A special type of region - Mapping Center(MC) runs software components that analyze system load, traffic patterns, and performance information for the Akamai network and the Internet, produce toplevel map and
distribute to the TLNSes - Low Level Name Servers. TLNS consults map to select appropriate IPs.
Akamai agents communicate BGP with certain border routers as peers; mapping system uses the resulting BGP information to determine network topology. The number of hops between autonomous systems is a coarse but useful measure of network distance. The mapping system combines this information with live network statistics — such as traceroute data3 — to provide a detailed, dynamic view of network structure and quality measures for different mappings.
MonitorTo monitor the entire system’s health end-to-end, Akamai uses agents that simulate end-user behavior by downloading Web objects and measuring their failure rates and download times. Akamai uses this information to monitor overall system performance and to automatically detect and suspend problematic data centers or servers.
Network Service
Dynamic Content Service:
Edge Side Includes technology (www.esi.org) lets a content provider break a dynamic page into fragments with independent cacheability properties. These fragments are maintained as separate objects in the edge server’s cache and are dynamically assembled into Web pages in response to user requests.
ROC
Akami think their network is an implementation of Recovery Oriented Computing (ROC) [5, 8].
The fundamental element of ROC’s approach is component recovery, in the form of fail-stop and restart, “ROC emphasizes recovery from failures rather than failure-avoidance” [3].
The ROC deal with the manually suspended machine that with long-term and/or habitual problems. Akami network endures a daily churn rate of approximately 4%, mean-time-to-recovery is approximately 25 days(distribution is heavytailed).
Please note approximately 40% of the suspended machines are in regions where the whole region is suspended. This is likely from a region or datacenter error. Akami think losing multiple datacenters or numerous servers in a day is not unusual.
Principle #1: Ensure significant redundancy in all systems to facilitate failovere.g. DNS redundancy - dynamic, fault-tolerant DNS system.
1. DNS responses size limits:
Problem: the number of servers Generic that Top Level Domain (gTLD) server can return to 13 [9].
Method: IP anycast.
2. DNS fixing TTL resolution.
Problem: If the server fails, a user could fail to receive service until the TTL expired.
Method: two-level DNS: top level directs the user’s DNS resolver to multiple regions that each region with a TTL of 30 to 60 minutes for a particular domain (e.g., g.akamai.net). Ordinarily return eight LLNS IPs in three different regions. Region selection based on a) performance does not suffer but b) the chance of all nameservers failing simultaneously is low.
Failover in the low-level resolution, which has a TTL of only 20 seconds. Oridnarily return two server IPs for one DNS query. Then a live machine within the region will automatically ARP over the IP address of a down machine to address failures during the 20 seconds, .
Principle #2: Use software logic to provide message reliabilityAkami builts two underlying systems—a realtime (UDP) network and a reliable (HTTP/TCP) transport network.
UDP network which was first built in 1999 uses multi-path routing, and at times limited retransmission[7].
For dynamic or completely uncacheable content, built HTTP/TCP equivalent network in 2000, each file is transmitted as a separate HTTP request—though often in the same connection. which is the basis of our SureRoute product. System explores a variety of potential paths and provides an HTTP-based tunnel for requests through intermediate Akamai regions.
Principle #3: Use distributed control for coordinationTwo methods:
- Failover, e.g. a machine will ARP over the IP address of a down machine.
- Leader election, election depend on factors including machine status, connectivity to other machines, or even a preference to run the leader in a region with additional monitoring capabilities.
System Principle #4: Fail Cleanly and RestartRolling:
- fail and restart from a last known good state (a process we call “rolling”)
- enter a “long-sleep” mode after a certain number of rolls.
- enter a different mode of operation when observe a significant fraction of the network rolling. e.g. be more aggressive in recovering from errors, even shut down problematic software modules.
System Principle #5: Zoningsoftware phased rollout
minimum duration of each phase per release type
Release Type Phase One Phase Two
Customer Configuration 15 mins 20 mins
System Configuration 30 min 2 hours
Standard Software Release 24 hours 24 hours
Enabled a much more reliable and aggressive release process, both of which have been a huge benefit to our business.
System Principle #6: The network should notice and quarantine faults
Problem: a request for certain customer content triggers a latent bug.
Method: constrain the assignment of content to servers to limit the spread of particular content.
two level constraint: localized and globally
Reference:1. Experience with some Principles for Building an Internet-Scale Reliable System
Mike Afergan, Joel Wein, Amy LaMeyer, USENIX WORLS05.
2. J. Dilley et al. Globally distributed content delivery. IEEE Internet Computing, 6(5):50–58, 2002.
3. Recovery-Oriented Computing: Overview. http://roc.cs.berkeley.edu/roc_overview.html.
Furthermore Reading:5. A. Fox. Toward Recovery-Oriented Computing. In Proceedings of the 28th International Conference on Very Large Databases (VLDB), 2002.
7. L Kontothanassis et al. A Transport Layer for Live Streaming in a Content Delivery Network. Proceedings of the IEEE, 92(9):1408–1419, 2004.
8. D. A. Patterson. Recovery Oriented Computing: A New Research Agenda for a New Century. In HPCA, page 247, 2002.
9. I. E. Paul Vixie and W. Akira Kato. Dns response size issues. Dnsop working group ietf internet draft, July 2004. http://www.ietf.org/internet-drafts/draft-ietf-dnsop-respsize-01.txt.