Luna Node Status

Information about upcoming maintenance, current downtime, and RFO for past downtime will be posted here. Updates more than six months old may be purged.

Please contact support@lunanode.com for technical support inquiries.

Planned maintenance / current issues

There are no planned maintenance or ongoing issues at this time.

Resolved issues

Montreal planned network maintenance (08 May 2020 05:30 EDT)

Update 3 (08 May 2020 05:40 EDT): services are back online after five minutes.

Update 2 (08 May 2020 05:35 EDT): network is offline for the maintenance.

Update 1 (01 May 2020 22:00 EDT): our datacenter provider in Montreal (OVH BHS) plans network maintenance affecting one Montreal SSD hypervisor (de6e4fa83fdc) on 08 May 2020 05:30 EDT. The maintenance affects network equipment only, meaning VMs will remain online but will be inaccessible over network during the maintenance, which is expected to last up to 15 minutes. Please check http://travaux.ovh.net/?do=details&id=44377 for more details/updates.

Montreal planned maintenance (29 April 2020 23:00 EDT)

Update 3 (29 April 2020 23:21 EDT): yes, after the reboot this hypervisor looks stable, all is good.

Update 2 (29 April 2020 23:10 EDT): services are back online after eight minutes. We need to verify the errors are gone shortly.

Update 1 (29 April 2020 18:00 EDT): customers have been notified of emergency planned maintenance tonight involving reboot of one Montreal SSD hypervisor (55d26960353e). It is necessary due to errors indicating some kernel issue.

Toronto hypervisor downtime (26 April 2020 17:30 EDT)

Update 1 (26 April 2020 17:45 EDT): services are back online.

One Toronto SSD hypervisor is offline. We are investigating.

Montreal network downtime (24 April 2020 15:45 EDT)

Update 1 (24 April 2020 15:55 EDT): they fix it, back online.

One Montreal SSD hypervisor has network downtime due to datacenter vrack issue (OVH).

Roubaix hypervisor downtime (13 April 2020 16:45 EDT)

Update 3 (20 April 2020 16:00 EDT): we do not see more issues recently. We believe reboot and updated kernel solves the problem.

Update 2 (13 April 2020 17:15 EDT): services are back online.

Update 1 (13 April 2020 16:58 EDT): vrack is not up after reboot. We try hard reboot.

One Roubaix SSD hypervisor is offline for ten minutes from 16:45 to 16:55 EDT due to emergency maintenance to correct kernel issue causing high latency and dropped packets.

Toronto hypervisor issues (07 April 2020 03:00 EDT)

Update 4 (10 April 2020 15:00 EDT): we do not see any further issues.

Update 3 (09 April 2020 13:25 EDT): services are back online at this time. We continue to monitor the hypervisor, but we believe the issues should be resolved.

Update 2 (09 April 2020 13:00 EDT): we continue to see intermittent issues. We are rebooting the hypervisor now.

Update 1 (09 April 2020 08:40 EDT): we have not seen the issue (brief loss of network connectivity and lockup of VMs) recur since our last intervention, which was to restart hypervisor software. It is not clear why hypervisor software would have caused this problem, so we continue to monitor the hypervisor.

We see intermittent issues on hypervisor fe77de27d9fb in Toronto (you can check the hypervisor ID by selecting VM and look at ID next to "Hypervisor"). We are investigating the issues. Please open ticket if you would like your VM to be migrated to a different hypervisor.

Toronto network issue (09 April 2020 00:00 EDT)

Update 2 (09 April 2020 08:40 EDT): our team cannot find issue last night, switching to backup router and replacing fiber module does not help. Now the packet loss is gone. It must have been datacenter issue. Big waste of time.

Update 1 (09 April 2020 01:23 EDT): we still see 1% packet loss in Toronto. It does not appear to be datacenter-wide issue. Our team is on-site and still investigating. We may switch to backup router or perform other similar actions that may cause brief network disruptions (less than one minute).

We observe packet loss in Toronto. We are investigating.

Roubaix network downtime (30 March 2020 11:10 EDT)

Update 1 (30 March 2020 11:43 EDT): services are back online at this time. Here is OVH issue.

We see network downtime due to datacenter issue at this time.

Toronto planned maintenance (12 March 2020 00:00 EDT)

There may be 30-60 minutes of network downtime between 00:00 EDT and 03:00 EDT in Toronto due to Cogent planned network maintenance.

Roubaix hypervisor downtime (17 February 2020 23:40 EST)

Update 1 (18 February 2020 00:10 EST): services are back online at this time.

We are conducting emergency maintenance related to CPU cooling on one Roubaix hypervisor.

Montreal hypervisor network downtime (5 February 2020 15:45 EST)

Update 2 (5 February 2020 16:25 EST): services are back online at this time. Here is OVH issue.

Update 1 (5 February 2020 16:10 EST): we expect resolution will require reboot of this server and may take 30 minutes.

Internal and external network for VMs on one Montreal hypervisor is offline due to datacenter issue. We are in communication with OVH to resolve the problem.

Montreal hypervisor downtime (14 January 2020 00:10 EST)

Update 1 (14 January 2020 00:20 EST): services are back online at this time.

We are investigating downtime on one HDD hypervisor in Montreal.

Toronto hypervisor downtime (15 November 2019 00:25 EST)

Update 1 (15 November 2019 00:33 EST): services are back online at this time.

One SSD hypervisor in Toronto went offline. We are investigating and resolving issue now.

Toronto network downtime (9 November 2019 00:20 EST)

Update 1 (9 November 2019 00:30 EST): services are back online at this time. There was kernel panic on one hypervisor in Toronto while installing new equipment. It is unclear if these events are related.

We are working to resolve network downtime in Toronto affecting some customers.

Toronto hypervisor downtime (10 October 06:50 EDT)

One SSD hypervisor was offline from 06:50 to 07:10 EDT due to kernel lockup. Services are back online at this time.

Toronto hypervisor downtime (29 September 10:40 EDT)

Update 2 (29 September 11:55 EDT): services are back online. Further downtime is not required because we upgrade the firmware and it works.

Update 1 (29 September 11:10 EDT): the RAID controller failed. We need to swap disks to another server. We expect four to twelve hours downtime.

We are investigating downtime on one SSD hypervisor in Toronto. There seems to be hardware failure so downtime may be extended. We will post updates.

Toronto hypervisor downtime (12 August 2019 16:20 EDT)

Update 5 (12 August 2019 19:05 EDT): all services back online now.

Update 4 (12 August 2019 18:50 EDT): backup is done we are booting node after fdisk now.

Update 3 (12 August 2019 18:15 EDT): OK 15 more minute.

Update 2 (12 August 2019 17:26 EDT): we are backing up all VM disk images before running fdisk to ensure data integrity in case something goes wrong. This will take 30 minutes.

Update 1 (12 August 2019 16:33 EDT): there is filesystem issue on hypervisor, we are looking into it.

We are investigating downtime on one SSD hypervisor in Toronto. dynamic.lunanode.com is affected as well.

Roubaix network outage (27 July 2019 02:30 EDT)

Update 2 (27 July 2019 07:55 EDT): the issue is resolved at this time. Datacenter resolved the underlying issue earlier but vracks came back online sequentially and so network connectivity was not restored until 07:55 EDT. Services were down from 02:30 EDT to 07:55 EDT.

Update 1 (27 July 2019 02:56 EDT): datacenter will post updates on their status page.

Roubaix datacenter is currently having a network outage on their VRack infrastructure, resulting in service outage for our clients in this region.

Montreal hypervisor downtime (05 July 2019 04:01 EDT)

Update 7 (06 July 2019 10:46 EDT): Remaining failed disk has now been replaced and RAID10 is rebuilt. VMs were offline on 05 July 2019 from 04:01 EDT to 13:20 EDT. Customers have been compensated 55% of the monthly cost for affected virtual machines, check "Account Credit for Montreal Hypervisor Downtime" e-mail for details. Volume-backed VMs (where volume is root partition) were not affected as they were evacuated to other hypervisors shortly after the hardware failure.

Update 6 (05 July 2019 13:30 EDT): Power Supply failure led to multiple component (motherboard, cpu, ram) failure. Faulty components replaced and server is currently back online. A hdd for OS partition (not relevant to customer disk images) remains faulty and requires replacement. We will schedule for this later.

Update 5 (05 July 2019 12:50 EDT): The issue is still being investigated. Next expected updated from datacenter staff is 13:30 EDT

Update 4 (05 July 2019 10:30 EDT): Datacenter still investigating the issue, we continue to wait for updates. Technician advises that they are missing some replacement hardware components and they are looking for it.

Update 3 (05 July 2019 07:00 EDT): still waiting for resolution to hardware issue, latest update from datacenter technician is that they fixed a component and are verifying that other components were not damaged.

Update 2 (05 July 2019 04:40 EDT): ETA 40 minutes until they are able to investigate. Appears to be hardware issue.

Update 1 (05 July 2019 04:27 EDT): we contacted the upstream provider (OVH) and they are investigating now.

We are investigating fault in one Montreal SSD hypervisor. VMs on this hypervisor are offline.

Toronto network downtime (23 April 2019 03:00 EDT)

Update 11 (23 April 2019 06:15 EDT): Circuits are now back online. A combination of hardware failure and software misconfiguration on upstream provider resulted in extended down time for Toronto region. We will continue to monitor the situation and follow up as needed.

Update 10 (23 April 2019 06:00 EDT): Upstream technician still working on the issue.

Update 9 (23 April 2019 05:45 EDT): Upstream technician still working on the issue.

Update 8 (23 April 2019 05:30 EDT): Upstream technician still working on the issue.

Update 7 (23 April 2019 05:15 EDT): Upstream technician still working on the issue.

Update 6 (23 April 2019 05:00 EDT): Upstream technician still working on the issue.

Update 5 (23 April 2019 04:45 EDT): Upstream technician still working on the issue

Update 4 (23 April 2019 04:30 EDT): Upstream technician still onsite investigating

Update 3 (23 April 2019 04:15 EDT): Upstream situation unchanged, we are heading to datacenter to monitor the situation and to be physically present if anything is needed from our end

Update 2 (23 April 2019 04:00 EDT): Upstream provider still investigating the issue

Update 1 (23 April 2019 03:45 EDT): Contacted upstream provider and it appears they ran into complication and tech has been dispatched to investigate the issue

Our upstream provider is performing a scheduled upgrade to the core router and network is affected. We will update once we have more information

Toronto network downtime (20 April 2019 20:30 EDT)

Update 1 (20 April 2019 23:00 EDT): the affected VMs should be online at this time. VMs on virtual networks that were utilizing the failed router node were down for an extended period. In addition to the router node failure, the OpenStack controller node went offline, which prevented quick fallback to another router node. While the failed router node came back online quickly, we were not able to bring the controller server up for some time, which led to this extended downtime.

We are working to resolve network outage after a router node failed in Toronto.

Roubaix network downtime (18 March 2019 17:20 EDT)

We are investigating packet loss affecting SSD virtual machines in Roubaix.

Toronto network downtime (16 February 2019 13:30 EST)

Several VMs had loss of network connectivity due to loose cable while installing new equipment. Connectivity was restored at 13:55 ET.

Toronto network downtime (23 January 2019 15:18 EST)

Some VMs had loss of network connectivity due to hardware failure on one of three network nodes. After migrating the affected virtual networks to the two other nodes, services were back online at 15:23 EST.

Roubaix network downtime (16 January 2019 22:14 EST)

Update 1 (23:55 EST): Roubaix network was down from 22:10 EST to 22:37 EST due to a datacenter-wide outage incident (see details here).

We are currently investigating network downtime in Roubaix.