NY Connectivity issues Monday 10th July 2023 17:54:00


We are investigating connectivity issues for our New York location now. We will update this posting as we have more information.

The switch stack for series srv54xx servers has been fixed, which makes New York fully operational at this point.

If you're noting an issue with your server or colo, please reach out with details and we can investigate further.

We have been granted access to our pod. A reminder to please check your server and submit a ticket if you need assistance after checking it. Please see the previous posting for specific instructions regarding checking your server and ticketing: https://status.dedicated.com/incidents/44#update-122

We are ready for per-device colo and rented server issues. There's thousands of rented servers and colo devices in the New York deployment, expecting us to check each of them individually is not possible.

Please refrain from opening "my server is down" tickets without further information. Please work to check your server. If your server is unreachable, please open a ticket showing your server's KVM console and what you're currently seeing on the console, along with investigation steps you've taken already.

Tickets opened as "my server is down" full stop with no information will be deprioritized against tickets that showing that you've checked your server and it's still unreachable.

Reminder: servers in series srv54xx are still unreachable due to the switch issue. Remotely, we've determined that the switch has corrupted both its primary and backup OS images, and will need to be restored when our team has been able to gain access to the building, which is still in progress. Please don't open tickets for servers in series srv54xx at this time; we are aware and will post an update for them soon.

Please refrain from opening "my server is down" tickets. We've reached out to you if you're an enterprise customer to confirm that your stack is up. We'll update this posting when we're ready to work rented server and single-server colo down.

We are still awaiting building access. Pod cams show no fire damage.

No Damage

Most things have come online by themselves after our pod was energized. We have reached out to our enterprise customers to have them confirm any power or network issues. We are working to confirm rented servers are online.

Currently known issues: -Servers in series srv54xx are still offline as the switch did not come online. We are investigating. -The server control system remains inaccessible to end users as we confirm the scope of what may still have issues.

Our pod has been energized. Things are coming online. We're still working on site access. We'll post more info as this progresses.

DC Update:

Our onsite team is currently bringing our UPS Systems online. We now have UPS-4 and UPS-R online. While bringing up UPS-1, we ran into an issue and we are unable to bring it online. We have engaged with our vendor and are finding a workaround to deliver power downstream to customer cabinets.

Due to the issue with UPS-1, we will not be automatically powering up all customer cabinets. We will be reaching out to you to let you know if you are part of the group of customers on unprotected/non-UPS backed utility power so that you can make the decision whether to energize your cabinets at that time.

We are currently at 50% for completion toward bringing the site back online and the revised ETA for bringing up the critical infrastructure systems is approximately 3 hours. The current time frame for when clients will be able to come back onsite is approximately 10:30PM EDT.

We are currently sourcing materials to bring our fire system fully online and do not have an ETA for completion. Because of this and fire marshal compliance we will only be allowed to have supervised escorted customer access when we finish bringing up the critical infrastructure systems. We will have additional personnel onsite to assist us with this escort policy.

We are not part of UPS-1.

DC Update:

Our onsite team is currently bringing our UPS Systems online. We have our UPS vendor onsite assisting us with this. We have brought UPS-4 online. We resolved our issues with UPS-R and have brought it online and are currently charging it's associated battery system. We will now begin work on bringing UPS-1 online. After bringing the UPS Systems online, we will then fully transfer load to our downstream power distribution, which will enable us to start powering up individual customer cabinets.

We are currently at 40% for completion toward bringing the site back online and the revised ETA for bringing up the critical infrastructure systems is approximately 4 hours. We are still planning for an evening time frame when clients will be able to come back on site.

We are currently sourcing materials to bring our fire system fully online and do not have an ETA for completion. Because of this and fire marshal compliance we will only be allowed to have supervised escorted customer access when we finish bringing up the critical infrastructure systems.

DC Update:

Our onsite team is currently bringing our UPS Systems online. We have our UPS vendor onsite assisting us with this. We have brought UPS-4 online and are currently charging it's associated battery system. While bringing UPS-R online we have run into a minor issue that we are currently investigating.

We are currently at 35% for completion toward bringing the site back online and the revised ETA for bringing up the critical infrastructure systems is approximately 5 hours. We are still planning for an evening time frame when clients will be able to come back on site.

In process with our power system re-energizing, we have been working on our fire system as well. We are currently sourcing materials to bring our fire system fully online and do not have an ETA for completion. Because of this and fire marshal compliance we will only be allowed to have supervised escorted customer access when we finish bringing up the critical infrastructure systems. We are currently sourcing additional personnel to assist us with this escort policy.

DC update:

Our onsite team is currently bringing our UPS Systems online. We have our UPS vendor onsite assisting with this as we bring up UPS-4 and then UPS-R after that.

As these systems are brought online, we will concurrently work on bringing carriers up.

We are currently at 30% for completion toward bringing the site back online and the revised ETA for bringing up the critical infrastructure systems is approximately 6 hours. We are still planning for an evening time frame when clients will be able to come back on site.

A reminder that we do not use the building's carrier blend, so our carrier turn-up will be different.

DC update:

Our onsite team has energized the primary electrical equipment that powers the site, enabling us to bring our mechanical plant online. We are currently cooling the facility.

As we monitor for stability, we are focused on bringing up our electrical systems. In starting this process, we have identified an issue with powering up our fire panel as well as power systems that were powered by UPS3. While this will cause us a delay, we are working with our vendors for remediation.

We are currently at 25% for completion toward bringing the site back online and the revised ETA for bringing up the critical infrastructure systems is approximately 7 hours. We are still planning for an evening time frame when clients will be able to come back on site. We will send out additional information regarding access to the facility and remote hands assistance and we will notify you once client access to the facility is permitted.

When our team is permitted on-site, our plan is as follows: Our COO will be on site waiting to enter the building as soon as possible, and will start powering everything up starting with the network edge/core, followed by racks in order of rack number.

Each rack will be worked to ensure it's up and running before moving on to the next.

If you have a full rack with access, please be advised that access would not be permitted until everything is online. We appreciate that you may want to prioritize your services online before others, but we are unable to permit such prioritization, and interference will only delay our ability to perform work.

We fully understand the gravity of the situation, and everyone will be brought up at maximum urgency.

Our team will remain on site after everything is up for a period of time to ensure the facility is going to remain stable in addition to our own services.

We will continue to post updates as we have them.

DC's hourly update:

We have completed the full site inspection with the fire marshal and the electrical inspector and utility power has been restored to the site.

We are now working to restore critical systems and our onsite team has energized the primary electrical equipment that powers the site. Concurrently, we are beginning work to bring the mechanical plant online. Additional engineers from other facilities are on site this morning to expedite site turn up.

The ETA for bringing up the critical infrastructure systems is approximately 5 hours.

We are planning for a late afternoon/early evening time frame when clients will be able to come back on site.

Datacenter update:

Our site inspection this morning went well and we have been granted authorization to restore utility power to the site and are currently working on re-energizing utility power to the facility. Our onsite team is working with the fire marshal and electrical inspectors, ensuring electrical system safety as we prepare to bring utility power back to the site.

Once that is completed, we will work towards bringing up our critical infrastructure systems. This will take approximately 5 hours.

While we are working on that, we will also be working on our fire/life safety systems as we need to replace some smoke detectors and have a full inspection of the fire system prior to allowing customers to enter the facility.

We will be sending out hourly updates as we make progress on bringing the facility back online.

Preliminary update from our CSM:

I heard the preliminary inspection is good and we are taking steps to energize the property now.

I’m waiting for the official update from DC Ops. More to come.

We'll post more info as soon as we have it, in addition to our power-up plan.

Below is the latest update:

The EWR Secaucus data center remains powered down at this time per the fire marshal. We are continuing with our cleanup efforts into the evening and working overnight as we make progress towards our 9AM EDT meeting time with the fire marshal and electrical inspectors in order to reinstate power at the site.

Once we receive approval and utility is restored, we will turn up critical systems. This will take approximately 5 hours. After the critical systems are restored, we will be turning up the carriers and then will start to turn the servers back on.

The fire marshal has requested replacement of the smoke detectors in the affected area as well as a full site inspection of the fire life safety system prior to allowing customers to enter the facility. Assuming that all goes as planned, the earliest that clients will be allowed back into the site to work on their equipment would be late in the day Wednesday.

We will notify you when client access to the facility has been approved. Please open a separate ticket per standard process to request additional badge access if needed prior to arrival.

This will be the last update of the day. We will provide further updates tomorrow after our site inspection.

Note that our process to turn up carriers and servers will be different as we do not use the datacenter's carrier blend. We will share our plan when we have a more finite definition of when we can be expected to be allowed into the building to perform turn up on our equipment.

We have received the following disappointing response from the datacenter:

The EWR Secaucus data center remains powered down at this time per the fire marshal.

We have just finished the meeting with the fire marshal, electrical inspectors, and our onsite management. We have made great progress cleaning and after reviewing it with the fire marshal, they have asked us to clean additional spaces and they have also asked us to replace some components of the fire system. They have set a time to come back and review these requests at 9am EDT Wednesday. We are working to comply completely with these new requests with these vendors and are bringing in additional cleaning personnel onsite to make the fire marshal's deadline.

In preparation for being able to allow clients onsite, the fire marshal has stated that we need to perform a full test of the fire/life safety systems which will be done after utility power has been restored and fire system components replaced. We have these vendors standing by for this work tomorrow.

Assuming that all goes as planned, the earliest that clients will be allowed back into the site to power up their servers would be late in the day Wednesday.

We are working to see what alternatives we have, if any. Again, please continue to engage your business continuity plans.

As we have not heard back about the results of the fire marshal/electric utility/DC ops meeting, we have pinged for an update.

Datacenter update:

The EWR Secaucus data center remains powered down at this time per the fire marshal.

Site management, the fire marshal, and electrical contractors are currently meeting to review the process of the cleaning effort to get approval from the fire marshal to re-energize the site.

Access update for our team from the datacenter:

VP of DC Ops will be sending out instructions for re-entry to the site. If all goes as planned, it will be around 6:00/7:00PM. We need to re-energize the critical infra at the site and get it cooled down prior to giving customer access. Will take 4 to 5 hours assuming Fire Marshall gives all clear.

Mid-day update from the datacenter:

The EWR Secaucus data center remains powered down at this time per the fire marshal. We continue to clean and ready the site for final approval by the fire marshal in order to re-energize the facilities critical equipment. Site management, the fire marshal, and electrical contractors will be meeting at 2PM EDT in an attempt to receive approval from the fire marshal to re-energize the site. We do not foresee any issues that would result in not receiving such approval.

Re-energizing critical equipment will take 4-5 hours. After this process, we will be energizing customer circuits and powering on all customer equipment. We will provide updates as to when customers will be allowed in the facility once approved by the fire marshal.

Addressing "is our data safe?" concerns: we haven't been able to actively make that determination yet.

Our understanding of the scope of the issue is that the fire was limited to an UPS in an electrical room, which was extinguished almost immediately by on-site fire suppressant, and did not extend to the data halls. This means that data should indeed be safe, however, our team has not been permitted on site yet, and power remains off to the building.

Our expectation is that when power is restored, we should be able to power on all routers, switches, and servers normally, and any data loss would be limited to anything resulting from a hard power off situation, but again, this isn't confirmable yet.

In the event that you need to restore data from off-site backups, we can reset your monthly bandwidth cap to ensure that you can pull down any data you need from your off-site backups.

We will continue to post updates as we have them.

Update from the datacenter below. Their message did not specify a next update time, I've asked when we can expect to be updated.

Power remains off at our data center in Secaucus/EWR1 per the local fire marshal.

Current status update from DC Ops:

Our remediation vendor and our team has worked through the night to clean the UPS' at the request of the fire marshal. They have made significant progress and we hope to have the cleaning completed by mid-day, at which time we will engage the fire marshal to review the site. Following their review, we hope to get a sign off from them so that we can start the reenergizing process. The reenergizing process can take 4-5 hours, as we need to turn up the critical infrastructure prior to any servers.

Current status from the datacenter below. We've asked for that 8 AM EDT update, as the time frame has come and gone.

Power remains off at our data center in Secaucus/EWR1 per the local fire Marshall.

After reviewing the site, the fire Marshall is requiring that we extensively clean the UPS devices and rooms before they will allow us to re-energize the site. We have a vendor at the site currently who will be performing that cleanup. We will provide an update at 8:00AM EDT unless something significant changes overnight.

We will continue to provide updates as we receive them.

Statement from the datacenter itself:

Power remains off at our data center in Secaucus/EWR1 per the local fire marshal.

We have had an electrical failure with one of our redundant UPS' that started to smoke and then had a small fire in the UPS room. The fire department was dispatched and the fire was extinguished quickly. The fire department subsequently cut power to the entire data center and disabled our generators while they and the utility verify electrical system. We have been working with both the fire department and the utility to expedite this process.

We are currently waiting on the fire marshal and local utility to reenergize the site. We are completely dependent upon their inspection and approval. We are hoping to get an update that we can share in the next hour.

At the current time, the fire department is controlling access to the building and we will not be able to let customers in.

We've received the update: an isolated fire in an UPS in an electrical room was detected and put out by fire suppression. The local fire department arrived on the scene, and per NEC guidelines and likely local laws and general best practices for firefighters, cut the power to the building. This caused the down -> up -> down cycle noted earlier today.

Current state is that datacenter electricians are on site awaiting access to the building to perform repair work to the UPS, but are currently waiting permission from the fire department to enter the building.

Once the electrical work is complete, the power will be applied to HVAC to subcool the facility, which will take an estimated 3-4 hours, and at that point, power will be restored to data halls, which will bring our network and servers back online.

The datacenter manager gave a best-case ETA of tomorrow morning, July 11th, for power to be restored to data halls. Again, please engage your business continuity plans as this does remain an open-ended outage outside of our control.

We will post mode updates as we have them.

We expect to be updated within the hour. We will pass on information we receive.

This situation remains in progress. We will post updates as we have them.

We've been informed that an electrical room experienced a fire, was put out by retardant, and the datacenter is in emergency power off status at the requirement of on-site fire fighters. Our COO is on site, but we are presently unable to access the building. We are working to learn more. We do not have further information at this time.

As this outage does not have a clearly defined resolution and is outside of our control at this time, please execute your business continuity plans accordingly.

We will continue to post updates here as we have them.

The datacenter is experiencing a fire and is in emergency power off. We don't have further information at this time. We'll update this posting as we have more information.

We are seeing this potentially reoccur. We are continuing to investigate, and continue to engage the datacenter to understand the situation.

All servers should be back online with the exception of NY servers starting with ID srv54xx. We are working on this server series now. We will post more information as we have it.

We seem to have experienced some kind of DC-wide power event. We are still investigating. We will post updates here as we have them.