2025年11月9日 星期日

是什麼導致了 AWS 服務中斷?它又為何會導致網路崩潰?

Recently Google News on-line picked up the following:

What caused the AWS outage - and why did it make the internet fall apart?

BBC - Zoe Kleinman

 21 October 2025

Amazon Web Services (AWS) had a bad day.

That's how the boss of another big US tech firm Cloudflare put it – probably feeling very relieved that Monday's outage, hitting over 1,000 companies and affecting millions of internet users, had nothing to do with him.

The places hit by the outage vary significantly. It took out major social media platforms like Snapchat and Reddit, banks like Lloyds and Halifax, and games like Roblox and Fortnite.

AWS is a US giant with a large global footprint, having positioned itself as the backbone of the internet.

It provides tools and computers which enable around a third of the internet to work, it offers storage space and database management, it saves firms from having to maintain their own costly set-ups, and it also connects traffic to those platforms.

That's how it sells its services: let us look after your business's computing needs for you.

But on Monday, something very mundane went very wrong: a common kind of outage known as a Domain Name System (DNS) error.

People who work in the tech industry will be rolling their eyes right now.

This common error can cause a lot of havoc.

"It's always DNS!" is something I hear a lot.

When someone taps an app or clicks a link, their device is essentially sending a request to be connected to that service.

DNS is supposed to act like a map, and on Monday, AWS lost its bearings – platforms like Snapchat, Canva and HMRC were all still there but it couldn't see where they were to direct traffic to them.

Why did it have such an impact?

These errors happen for a number of reasons.

Usually it's a maintenance issue or a server failure. Sometimes that's human error, someone misconfiguring something somewhere, or in extreme cases a cyber attack - although there's no evidence of this so far.

AWS said it occurred at its vast data centre plant in northern Virginia, its oldest and biggest site.

A chorus of experts said it was a textbook illustration of the risks of putting all of your eggs in one basket in terms of a service provider - AWS is a giant and millions of businesses rely on it.

And they are right, but the issue is there aren't many alternatives at the sheer scale provided by AWS.

There are only two main contenders in fact, and they're both other US giants: Microsoft's Azure and Google's Cloud Platform.

Smaller rivals include IBM and the Chinese firm Alibaba. The parent company of the supermarket Lidl launched a European rival called Stackit last year, in direct competition with Amazon.

But AWS remains the dominant player by some margin.

Some argue the UK and Europe urgently needs to build up its own infrastructure and be less reliant on the US for cloud services – while others say it's too late.

Someone working in government once told me an MP informally proposed creating a UK version of AWS.

"But what's the point?" came the reply. "We already have AWS, over there."

Perhaps incidents like Monday's massive outage highlight why it's not quite that simple.

Translation

是什麼導致了 AWS 服務中斷?它又為何會導致網路崩潰?

亞馬遜網路服務 (AWS) 度過了糟糕的一天。

這是另一家美國大型科技公司 Cloudflare 的老闆所說- 他可能覺得週一的中斷與他無關而可讓他鬆一口氣,這場中斷影響了 1000 多家公司和數百萬互聯網用戶的服務。

此次服務中斷影響的地區差異很大。 Snapchat Reddit 等主要社交媒體平台、勞埃德銀行和哈利法克斯銀行等銀行,以及 Roblox Fortnite 等遊戲都受到了影響。

AWS 是一家業務遍布全球的美國巨頭,將自己定位為網路的骨幹。

它提供的工具和電腦使大約三分之一的互聯網能夠運行,它提供儲存空間和資料庫管理,它使個別公司無需保有自己的昂貴系統設備,並連接交流量到這些平台。

這就是AWS的宣傳語:讓我們來滿足您企業的運算需求。

然而就在周一,一個非常普通的事情変得非常嚴重:一個常見的中斷,即網域名稱系統 (DNS) 錯誤。

科技業的從業人員現在肯定會翻白眼。

這種常見的錯誤可能會造成很大的破壞。

「總是DNS!」是我常聽到的。

當有人點擊某個應用程式或連結時,他們的裝置實際上是在發送連接到該服務的請求。

DNS應該像地圖一樣運行,但在周一,AWS失去了方向 - SnapchatCanvaHMRC平台仍然在線,但卻無法確定平台的位置,以便將交流引導到它們。

為什麼會造成這麼大的影響?

這些錯誤發生的原因有很多。

通常是維修問題或伺服器故障。有時,這是人為錯誤,可能是某人在某個地方配置錯誤,或者在極端情況下是網路攻擊 - 儘管目前還沒有證據證明這一點。

AWS 表示,這事件發生在其位於維吉尼亞州北部的大型數據中心廠房,這是其歷史最悠久、規模最大的廠房。

許多專家表示,這是教科書的範例,充分說明了在選擇服務提供者時把所有的雞蛋放在同一個籃子裡的風險 - AWS 是一家巨頭,有數百萬企業依賴它。

他們說得對,但問題在於,像 AWS 這樣規模龐大的雲端服務供應商,幾乎沒有其他替代方案。

事實上,只有兩家主要的競爭對手,而且都是美國其他巨頭:微軟的 Azure Google的雲端平台。

規模較小的競爭對手包括 IBM 和中國公司阿里巴巴。超市 Lidl 的母公司去年推出了一個名為 Stackit 的歐洲競爭對手,與亞馬遜直接競爭。

AWS 仍以一定優勢佔據主導地位。

有些人認為,英國和歐洲迫切需要建立自己的基礎設施,減少對美國雲端服務的依賴 - 而有些人則認為這為時已晚。

一位政府工作人員曾告訴我,一位議員非正式地提議創建一個英國版的AWS

對方回答:「但這有什麼意義呢?」; 「我們已經有AWS, 在那邊」。

或許,像週一的大規模大規模停運事件凸顯了事情並非如此簡單。

So, AWS’s Monday outage has hit over 1,000 companies and affecting millions of internet users. AWS is a US cloud services giant, and outside the US, some people have argued that the UK and Europe urgently need to build up its own infrastructure and be less reliant on the US for cloud services. Apparently, the demand for internet service is growing  and its security and reliability is a major concern to many people.

沒有留言:

張貼留言