512
Comment:
|
7530
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Status = | #rev 2018-08-27 mreimers #rev 2020-08-31 alders |
Line 3: | Line 4: |
<<Anchor(2010-11-15-servers-down)>> | = General Informations = * This page lists announcements and status messages for IT services managed by [[http://www.isg.ee.ethz.ch/|ISG.EE]]. * For notifications and announcements of central IT services managed by ID, please visit https://www.ethz.ch/services/de/it-services/service-desk.html * For a detailed status overview of central IT services managed by ID, please visit https://ueberwachung.ethz.ch |
Line 5: | Line 9: |
== 2010-11-15: cooling water system outage for some clusters == | ||||<style="border-width: 1px 0px; border-color: rgb(85, 136, 238); padding: 0.6em;">'''Status-Key''' || ||<style="border: medium none;"> {{attachment:Status/green.gif}} ||<style="border: medium none;">Resolved || ||<style="border: medium none;"> {{attachment:Status/orange.gif}} ||<style="border: medium none;">Still working but with some errors || ||<style="border-width: medium medium 1px; border-top: medium none rgb(85, 136, 238); border-left: medium none rgb(85, 136, 238); border-right: medium none rgb(85, 136, 238); border-color: rgb(85, 136, 238);"> {{attachment:Status/red.gif}} ||<style="border-width: medium medium 1px; border-top: medium none rgb(85, 136, 238); border-left: medium none rgb(85, 136, 238); border-right: medium none rgb(85, 136, 238); border-color: rgb(85, 136, 238);">Pending || |
Line 7: | Line 14: |
On last friday evening one of the cooling wather pumps installed in ETZ/D/96 stopped working correctly. In the following some of the 6 racks in this server room began to shutdown. All clusters from IFH, IBT, BIWI and TIK where affected. | = Current status reports = <<Anchor(2021-04-27-itetmaster01-update)>> == Downtime various D-ITET services for server maintenance == '''Status:''' {{attachment:Status/red.gif}} 2021-04-27 08:00:: Database upgrade done and online. 2021-04-27 07:30:: Slurm services are back online. 2021-04-27 07:00:: Base system has been upgraded, main database services in progress of upgrade. 2021-04-27 06:00:: On 2021-04-27 between 06:00 and 08:30 ISG is going to update a server providing access to various D-ITET services. During the migration the following services will be affected and offline: * Matrix/Element Chat services (the instances will be unavailable) * IFA/Control Website: Access to the IFA database is blocked * Slurm (D-ITET Arton Cluster): It won't be possible to submit new jobs or view Slurm statistics. Already running jobs will not be affected. * Condor: Condor clients will be shut down the evening before to avoid running jobs during the migration. <<Anchor(2021-03-31-network-disruption)>> == Network disruption affecting several ISG.EE services == '''Status:''' {{attachment:Status/green.gif}} 2021-03-31 09:30:: The configuration error was found. The configuration change will be deployed on '''2021-04-01 around 06:15''' and a short network of about 1min is expected. 2021-03-31 08:00:: ID Networking team has rolled-back a deployed configuration, pending further investigation/analysis. 2021-03-31 07:30:: There are currently disruption affecting a VPZ with servers managed by ISG.EE. Networking team of ID is investigating the issue. There are several ISG.EE services affected/malfunctioning due to this in particluar the FindYourData service. <<Anchor(2021-03-11-mira-upgrade)>> == login.ee.ethz.ch: downtime for server upgrade == '''Status:''' {{attachment:Status/green.gif}} 2021-03-11 06:30:: Upgrade completed and service is up and running again. 2021-03-11 06:00:: The server servicing login.ee.ethz.ch will be upgraded to a new OS version (Debian buster). During the time of the update logins might not be possible. <<Anchor(2020-07-11-storage-downtime)>> == Planned project/ archive storage downtime and client reboot == '''Status:''' {{attachment:Status/green.gif}} 2020-07-11 12:00:: Migration has been completed, all services are back to operational state. 2020-07-11 08:00:: Migration started, services are shutdown 2020-07-11 8:00-12:00:: Start of planned maintenance work. Project/ archive storage services (known under the names "ocean", "bluebay", "lagoon" and "benderstor") will not be available. ISG-managed Linux clients will be rebooted. |
Line 10: | Line 55: |
The facility management is looking to solve the problems with these pump and we expect that at the latest 4 PM the problems would be solved. | <<Anchor(2020-06-04-svnsrv-upgrade)>> == svn.ee.ethz.ch downtime for server upgrade == '''Status:''' {{attachment:Status/green.gif}} 2020-06-04 07:05:: Webservices for managing SVN repositories are enabled. 2020-06-04 06:15:: Systemupgrade is done and access to the SVN repositories via the `svn` and `https` transport protocols are back online. 2020-06-04 06:00:: The server servicing the SVN repositories will be upgraded to a new operating system version. During this timeframe outages for access to the SVN repositories are expected. <<Anchor(2020-05-17-cluster-abuse)>> == European HPC cluster abuse == '''Status:''' {{attachment:Status/green.gif}}<<BR>> Recently European HPC clusters have been attacked and abused for mining purposes. The D-ITET Slurm and SGE clusters have not been compromised. We are monitoring the situation closely. 2020-05 17 08:30:: No successful login from known attacker IP addresses could be determined, none of the files indicating being compromised have been found on our file systems 2020-05-16 14:30:: No unusal cluster job activity was observed <<Anchor(2020-05-04-itetnas04-upgrade)>> == D-ITET Netscratch downtime for server upgrade == '''Status:''' {{attachment:Status/green.gif}} 2020-05-04 06:00:: Server upgrade has been completed. 2020-05-04 06:00:: The server servicing the D-ITET Netscratch service will be upgraded to a new operating system version. During this timeframe outages for the NFS service will be expected. <<Anchor(2020-04-07-network-interuption)>> == Network outage ETx router == '''Status:''' {{attachment:Status/green.gif}} 2020-04-07 05:30:: There was an issue on the Router `rou-etx`. ID networking team trackled and solved the issue. There was about a 10min interuption for the ETx networking zone affecting almost all ISG.EE maintained systems. <<Anchor(2020-04-06-mira-maintenance)>> == login.ee.ethz.ch: Reboot for maintenance == '''Status:''' {{attachment:Status/green.gif}} 2020-04-06 05:35:: System behind `login.ee.ethz.ch` has been rebootet for maintenance and increase available resources. See the [[RemoteAccess|information on access D-ITET resources remotely]]. To distribute better the load user are encouraged to use the VPN service whenever possible. <<Anchor(2020-02-18-nostro-maintenance)>> == itet-stor (FindYourData) Server maintenance: Reconfiguration of VM parameters == '''Status:''' {{attachment:Status/green.gif}} 2020-02-18 19:03:: System again up and running. 2020-02-18 19:00:: Scheduled downtime for the [[Workstations/FindYourData|itet-stor/FindYourData service]] due to maintenance work on the underlying server. <<Anchor(2020-01-20-nostro-os-upgrade)>> == itet-stor (FindYourData) Server migration: New operating system version == '''Status:''' {{attachment:Status/green.gif}} 2020-01-20 07:15:: OS upgrade done, there were short interruptions to the [[Workstations/FindYourData|itet-stor/FindYourData service]]. 2020-01-20 06:00:: We will update the server servicing the [[Workstations/FindYourData|FindYourData service]] from Debian jessie 8 to Debian stretch 9. There will be short downtimes accessing this service during the update. = Archived status reports = [[Status/Archive/2010|2010]] [[Status/Archive/2011|2011]] [[Status/Archive/2012|2012]] [[Status/Archive/2013|2013]] [[Status/Archive/2014|2014]] [[Status/Archive/2015|2015]] [[Status/Archive/2016|2016]] [[Status/Archive/2017|2017]] [[Status/Archive/2018|2018]] [[Status/Archive/2019|2019]] |
Line 13: | Line 119: |
[[CategoryEDUC]] |
General Informations
This page lists announcements and status messages for IT services managed by ISG.EE.
For notifications and announcements of central IT services managed by ID, please visit https://www.ethz.ch/services/de/it-services/service-desk.html
For a detailed status overview of central IT services managed by ID, please visit https://ueberwachung.ethz.ch
Status-Key |
|
|
Resolved |
|
Still working but with some errors |
|
Pending |
Current status reports
Downtime various D-ITET services for server maintenance
Status:
- 2021-04-27 08:00
- Database upgrade done and online.
- 2021-04-27 07:30
- Slurm services are back online.
- 2021-04-27 07:00
- Base system has been upgraded, main database services in progress of upgrade.
- 2021-04-27 06:00
- On 2021-04-27 between 06:00 and 08:30 ISG is going to update a server providing access to various D-ITET services. During the migration the following services will be affected and offline:
- Matrix/Element Chat services (the instances will be unavailable)
- IFA/Control Website: Access to the IFA database is blocked
- Slurm (D-ITET Arton Cluster): It won't be possible to submit new jobs or view Slurm statistics. Already running jobs will not be affected.
- Condor: Condor clients will be shut down the evening before to avoid running jobs during the migration.
Network disruption affecting several ISG.EE services
Status:
- 2021-03-31 09:30
The configuration error was found. The configuration change will be deployed on 2021-04-01 around 06:15 and a short network of about 1min is expected.
- 2021-03-31 08:00
- ID Networking team has rolled-back a deployed configuration, pending further investigation/analysis.
- 2021-03-31 07:30
There are currently disruption affecting a VPZ with servers managed by ISG.EE. Networking team of ID is investigating the issue. There are several ISG.EE services affected/malfunctioning due to this in particluar the FindYourData service.
login.ee.ethz.ch: downtime for server upgrade
Status:
- 2021-03-11 06:30
- Upgrade completed and service is up and running again.
- 2021-03-11 06:00
- The server servicing login.ee.ethz.ch will be upgraded to a new OS version (Debian buster). During the time of the update logins might not be possible.
Planned project/ archive storage downtime and client reboot
Status:
- 2020-07-11 12:00
- Migration has been completed, all services are back to operational state.
- 2020-07-11 08:00
- Migration started, services are shutdown
- 2020-07-11 8:00-12:00
- Start of planned maintenance work. Project/ archive storage services (known under the names "ocean", "bluebay", "lagoon" and "benderstor") will not be available. ISG-managed Linux clients will be rebooted.
svn.ee.ethz.ch downtime for server upgrade
Status:
- 2020-06-04 07:05
- Webservices for managing SVN repositories are enabled.
- 2020-06-04 06:15
Systemupgrade is done and access to the SVN repositories via the svn and https transport protocols are back online.
- 2020-06-04 06:00
- The server servicing the SVN repositories will be upgraded to a new operating system version. During this timeframe outages for access to the SVN repositories are expected.
European HPC cluster abuse
Status:
Recently European HPC clusters have been attacked and abused for mining purposes. The D-ITET Slurm and SGE clusters have not been compromised. We are monitoring the situation closely.
- 2020-05 17 08:30
- No successful login from known attacker IP addresses could be determined, none of the files indicating being compromised have been found on our file systems
- 2020-05-16 14:30
- No unusal cluster job activity was observed
D-ITET Netscratch downtime for server upgrade
Status:
- 2020-05-04 06:00
- Server upgrade has been completed.
- 2020-05-04 06:00
- The server servicing the D-ITET Netscratch service will be upgraded to a new operating system version. During this timeframe outages for the NFS service will be expected.
Network outage ETx router
Status:
- 2020-04-07 05:30
There was an issue on the Router rou-etx. ID networking team trackled and solved the issue. There was about a 10min interuption for the ETx networking zone affecting almost all ISG.EE maintained systems.
login.ee.ethz.ch: Reboot for maintenance
Status:
- 2020-04-06 05:35
System behind login.ee.ethz.ch has been rebootet for maintenance and increase available resources.
See the information on access D-ITET resources remotely. To distribute better the load user are encouraged to use the VPN service whenever possible.
itet-stor (FindYourData) Server maintenance: Reconfiguration of VM parameters
Status:
- 2020-02-18 19:03
- System again up and running.
- 2020-02-18 19:00
Scheduled downtime for the itet-stor/FindYourData service due to maintenance work on the underlying server.
itet-stor (FindYourData) Server migration: New operating system version
Status:
- 2020-01-20 07:15
OS upgrade done, there were short interruptions to the itet-stor/FindYourData service.
- 2020-01-20 06:00
We will update the server servicing the FindYourData service from Debian jessie 8 to Debian stretch 9. There will be short downtimes accessing this service during the update.
Archived status reports
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019