Website and Res System Down

SparrowHawk · Jun 11, 2011

nycbusdriver said:
It was Cleary.

As soon as I stop chuckling, go back to the pilots thread!!! That' 😛 😀 :lol:

😀 s just wrong

john john · Jun 13, 2011

http://www.datacenterknowledge.com/archives/2011/06/13/us-airways-cites-data-center-in-systems-outage/
“Early reports indicate that the systems outage was the result of a power outage near one of our data centers in Phoenix,” US Airways said in a statement. IP records show US Airways’ web site is hosted by AT&T, which operates a data center in the Phoenix market.

Data centers are designed to supply backup power to keep computer systems operating in the event of a utility outage. It’s not immediately clear why the utility outage caused downtime in the U.S. Airways data center.

CallawayGolf · Jun 14, 2011

There are multiple industrial backup generators at the Tempe data center and redundancies across the server and infrastructure platforms including the AT&T data services. The backup equipment is there for loss of power and fail-safe protections. Whatever the root cause of the outage may be, it wasn't because redundancies weren't funded or planned into the infrastructure. This was either a highly irregular event that normal disaster recovery plans failed to identify or else someone failed to do their job properly.

flybynite · Jun 14, 2011

As I understand it, there was a lightning strike at OCC. It was up and running fairly quickly after the hit.

SparrowHawk · Jun 14, 2011

CallawayGolf said:
There are multiple industrial backup generators at the Tempe data center and redundancies across the server and infrastructure platforms including the AT&T data services. The backup equipment is there for loss of power and fail-safe protections. Whatever the root cause of the outage may be, it wasn't because redundancies weren't funded or planned into the infrastructure. This was either a highly irregular event that normal disaster recovery plans failed to identify or else someone failed to do their job properly.

Thanks, that's good to know. Just curious do you know if any data was lost or security was breached?

CallawayGolf · Jun 14, 2011

SparrowHawk said:
Thanks, that's good to know. Just curious do you know if any data was lost or security was breached?

Sorry - I wouldn't know about that, but it seems unlikely that either of those would have occurred because of a lightening strike or facility fire. Just guessing though.

flybynite · Jun 14, 2011

Oops, sorry, that was the SOC in PIT. No breaches that I'm aware of.

john john · Jun 15, 2011

CallawayGolf said:
it wasn't because redundancies weren't funded or planned into the infrastructure.

Undoubtedly and obviously the world has at its disposal the tools to over come this situation because of the odds of this occurring and the backlash
To what extent was it could/should more have been done?
You think………..
What are we talking here…your escape hatch sounds all too familiar maybe some oversight and investigation should be involved?
To what extent was it could/should more have been done?
You think………..

Who was or was not paying attention. Who was not monitoring who is going to pay .Who and what is involved in the backup setup
Since you know and spoke to the subject as if you are in the know

john john · Jun 15, 2011

CallawayGolf said:
because of a lightening strike

Did this created the RES outage
And how and what problems did this situation create

CallawayGolf · Jun 15, 2011

john john said:
Undoubtedly and obviously the world has at its disposal the tools to over come this situation because of the odds of this occurring and the backlash
To what extent was it could/should more have been done?
You think………..
What are we talking here…your escape hatch sounds all too familiar maybe some oversight and investigation should be involved?
To what extent was it could/should more have been done?
You think………..

Who was or was not paying attention. Who was not monitoring who is going to pay .Who and what is involved in the backup setup
Since you know and spoke to the subject as if you are in the know

What I have knowledge of, as a visual observer, is that the Tempe data center has backup generators, disaster recovery processes and procedures, and redundancies in critical infrastructure components. I posted these personally-seen observations so as to refute the false claims being made that this problem was proof that US Airways doesn't follow best practices for data center power backup and disaster recovery scenarios. I don't know what the lightening/fire did to temporarily knock down the systems. I further don't know if a person or a system failed to perform as they should have in response to the catastrophic event or if that person or system operates under the direction of USAIT, AT&T, HP/EDS, or some other entity. Once the root-cause has been identified, there it is a near-certainty that remediation steps will be taken to prevent a recurrence.

Complex computer and network infrastructure technology are described as "infinite-state" systems. That means that there is a limitless number of scenarios and permutations that can affect their normal operation. Fighting to maintain normal, stable operations of complex systems is a battle that is lost when an unforeseen or highly improbable event takes place. That being said, if someone failed to do the job they were hired to do, they should be held accountable and suffer any consequences for the outage if it could have and should have been prevented - IMO.

john john · Jun 15, 2011

CallawayGolf said:
What I have knowledge of, as a visual observer,

OK

PHL · Jun 15, 2011

I've worked with many fortune 100 companies and their very large, global implementations of a particularly common ERP platform that essentially runs the entire business from Finance to Warehouse.

Every single one of them had not only disaster recovery procedures in place for internal data center failures (like a fire or lightning strike), they also had OFFSITE backup data centers that synchronously or asynchronously (depending on distance) replicated data. This was to address the question: what if the datacenter is flooded or otherwise destroyed in a natural disaster?. Answer: flip over to site B and you'll be current up to the last transaction before the outage at site A.

After this past week's issue, the question is how much money was lost? Was it more, or less, than the cost to build, implement and maintain an off site data center?

Dont call me Shirley · Jun 15, 2011

PHL said:
After this past week's issue, the question is how much money was lost? Was it more, or less, than the cost to build, implement and maintain an off site data center?

Some years agos, there was documentary showing Sabre's mainframe and backup systems. (Surprising that they let it be shown). The protections are extensive indeed. (Not a bad idea in Tornado Alley).

SparrowHawk · Jun 15, 2011

Dont call me Shirley said:
Some years agos, there was documentary showing Sabre's mainframe and backup systems. (Surprising that they let it be shown). The protections are extensive indeed. (Not a bad idea in Tornado Alley).

This is what remains a mystery to me. I'm not hugely knowledgeable on Main Frames and their applications, but I'm not a nit wit either. These systems operate in real time same as a bank, the backup, redundancies and security are similar. So I'm having a hard time understanding how this happened even with a lightning strike.

LD3 · Jun 15, 2011

SparrowHawk said:
So I'm having a hard time understanding how this happened even with a lightning strike.

Maybe it was a mechanic from TPA....

Search

Search

Website and Res System Down

SparrowHawk

Veteran

john john

Veteran

CallawayGolf

Veteran

flybynite

Veteran

SparrowHawk

Veteran

CallawayGolf

Veteran

flybynite

Veteran

john john

Veteran

john john

Veteran

CallawayGolf

Veteran

john john

Veteran

PHL

Veteran

Dont call me Shirley

Veteran

SparrowHawk

Veteran

LD3

Veteran

Similar threads

Latest posts