Website and Res System Down

http://www.datacenterknowledge.com/archives/2011/06/13/us-airways-cites-data-center-in-systems-outage/
“Early reports indicate that the systems outage was the result of a power outage near one of our data centers in Phoenix,” US Airways said in a statement. IP records show US Airways’ web site is hosted by AT&T, which operates a data center in the Phoenix market.



Data centers are designed to supply backup power to keep computer systems operating in the event of a utility outage. It’s not immediately clear why the utility outage caused downtime in the U.S. Airways data center.
 
There are multiple industrial backup generators at the Tempe data center and redundancies across the server and infrastructure platforms including the AT&T data services. The backup equipment is there for loss of power and fail-safe protections. Whatever the root cause of the outage may be, it wasn't because redundancies weren't funded or planned into the infrastructure. This was either a highly irregular event that normal disaster recovery plans failed to identify or else someone failed to do their job properly.
 
There are multiple industrial backup generators at the Tempe data center and redundancies across the server and infrastructure platforms including the AT&T data services. The backup equipment is there for loss of power and fail-safe protections. Whatever the root cause of the outage may be, it wasn't because redundancies weren't funded or planned into the infrastructure. This was either a highly irregular event that normal disaster recovery plans failed to identify or else someone failed to do their job properly.


Thanks, that's good to know. Just curious do you know if any data was lost or security was breached?
 
Thanks, that's good to know. Just curious do you know if any data was lost or security was breached?
Sorry - I wouldn't know about that, but it seems unlikely that either of those would have occurred because of a lightening strike or facility fire. Just guessing though.
 
it wasn't because redundancies weren't funded or planned into the infrastructure.
Undoubtedly and obviously the world has at its disposal the tools to over come this situation because of the odds of this occurring and the backlash
To what extent was it could/should more have been done?
You think………..
What are we talking here…your escape hatch sounds all too familiar maybe some oversight and investigation should be involved?
To what extent was it could/should more have been done?
You think………..

Who was or was not paying attention. Who was not monitoring who is going to pay .Who and what is involved in the backup setup
Since you know and spoke to the subject as if you are in the know
 
Undoubtedly and obviously the world has at its disposal the tools to over come this situation because of the odds of this occurring and the backlash
To what extent was it could/should more have been done?
You think………..
What are we talking here…your escape hatch sounds all too familiar maybe some oversight and investigation should be involved?
To what extent was it could/should more have been done?
You think………..

Who was or was not paying attention. Who was not monitoring who is going to pay .Who and what is involved in the backup setup
Since you know and spoke to the subject as if you are in the know
What I have knowledge of, as a visual observer, is that the Tempe data center has backup generators, disaster recovery processes and procedures, and redundancies in critical infrastructure components. I posted these personally-seen observations so as to refute the false claims being made that this problem was proof that US Airways doesn't follow best practices for data center power backup and disaster recovery scenarios. I don't know what the lightening/fire did to temporarily knock down the systems. I further don't know if a person or a system failed to perform as they should have in response to the catastrophic event or if that person or system operates under the direction of USAIT, AT&T, HP/EDS, or some other entity. Once the root-cause has been identified, there it is a near-certainty that remediation steps will be taken to prevent a recurrence.

Complex computer and network infrastructure technology are described as "infinite-state" systems. That means that there is a limitless number of scenarios and permutations that can affect their normal operation. Fighting to maintain normal, stable operations of complex systems is a battle that is lost when an unforeseen or highly improbable event takes place. That being said, if someone failed to do the job they were hired to do, they should be held accountable and suffer any consequences for the outage if it could have and should have been prevented - IMO.
 
I've worked with many fortune 100 companies and their very large, global implementations of a particularly common ERP platform that essentially runs the entire business from Finance to Warehouse.

Every single one of them had not only disaster recovery procedures in place for internal data center failures (like a fire or lightning strike), they also had OFFSITE backup data centers that synchronously or asynchronously (depending on distance) replicated data. This was to address the question: what if the datacenter is flooded or otherwise destroyed in a natural disaster?. Answer: flip over to site B and you'll be current up to the last transaction before the outage at site A.

After this past week's issue, the question is how much money was lost? Was it more, or less, than the cost to build, implement and maintain an off site data center?
 
After this past week's issue, the question is how much money was lost? Was it more, or less, than the cost to build, implement and maintain an off site data center?


Some years agos, there was documentary showing Sabre's mainframe and backup systems. (Surprising that they let it be shown). The protections are extensive indeed. (Not a bad idea in Tornado Alley).
 
Some years agos, there was documentary showing Sabre's mainframe and backup systems. (Surprising that they let it be shown). The protections are extensive indeed. (Not a bad idea in Tornado Alley).

This is what remains a mystery to me. I'm not hugely knowledgeable on Main Frames and their applications, but I'm not a nit wit either. These systems operate in real time same as a bank, the backup, redundancies and security are similar. So I'm having a hard time understanding how this happened even with a lightning strike.
 

Latest posts

Back
Top