Differences between revisions 6 and 647 (spanning 641 versions)
Revision 6 as of 2010-11-15 15:13:32
Size: 603
Editor: bonaccos
Comment:
Revision 647 as of 2022-07-07 11:47:13
Size: 9096
Editor: bonaccos
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Status = #rev 2018-08-27 mreimers
#rev 2020-08-31 alders
Line 3: Line 4:
<<Anchor(2010-11-15-servers-down)>> = General Informations =
 * This page lists announcements and status messages for IT services managed by [[http://www.isg.ee.ethz.ch/|ISG.EE]].
 * For notifications and announcements of central IT services managed by ID, please visit https://www.ethz.ch/services/de/it-services/service-desk.html
 * For a detailed status overview of central IT services managed by ID, please visit https://ueberwachung.ethz.ch
Line 5: Line 9:
== 2010-11-15: cooling water system outage for some clusters == ||||<style="border-width: 1px 0px; border-color: rgb(85, 136, 238); padding: 0.6em;">'''Status-Key''' ||
||<style="border: medium none;"> {{attachment:Status/green.gif}} ||<style="border: medium none;">Resolved ||
||<style="border: medium none;"> {{attachment:Status/orange.gif}} ||<style="border: medium none;">Still working but with some errors ||
||<style="border-width: medium medium 1px; border-top: medium none rgb(85, 136, 238); border-left: medium none rgb(85, 136, 238); border-right: medium none rgb(85, 136, 238); border-color: rgb(85, 136, 238);"> {{attachment:Status/red.gif}} ||<style="border-width: medium medium 1px; border-top: medium none rgb(85, 136, 238); border-left: medium none rgb(85, 136, 238); border-right: medium none rgb(85, 136, 238); border-color: rgb(85, 136, 238);">Pending ||
Line 7: Line 14:
On last friday evening one of the cooling water pumps installed in ETZ/D/96.2 stopped working correctly. This forced some of the racks in this server room to shutdown in order to protect the servers from thermal damage. All clusters from IFH, IBT, BIWI, TIK, IKT and VAW were affected. = Current status reports =

<<Anchor(2022-07-09-netscratch-maintenance)>>
== NetScratch Server Filesystem Maintenance ==
'''Status:''' {{attachment:Status/orange.gif}}

  2022-07-09 13:00:: The [[Services/NetScratch|NetScratch]] filesystem will be put into read-only mode for maintenance.

<<Anchor(2022-06-16-exchange-support)>>
== Unreachable ee.ethz.ch Email recipients over ID Exchange Mailserver ==
'''Status:''' {{attachment:Status/green.gif}}

  2022-06-16 16:45:: Configuration issue has been resolved.
  2022-06-16 15:00:: Emails with ee.ethz.ch recipients sent over the ID Exchange Server do not reach destination. ID Exchange Admins are working on fixing the problem.

<<Anchor(2021-11-01-home-maintenance)>>
== HOME Server maintenance to repair fileystem inconsistence ==
'''Status:''' {{attachment:Status/green.gif}}

  2021-11-01 06:30:: System back online and HOME directories are again accessible for all D-ITET user
  2021-11-01 05:50:: HOME Server will be put offline to start a repair of a filesystem inconsistence

<<Anchor(2021-07-05-linux-printing)>>
== Linux printing affected by PrintNightmare vulnerability patch ==
'''Status:''' {{attachment:Status/green.gif}}
  2021-07-05 13:00:: ID resolved the issue
  2021-07-05 09:41:: Workaround: Use [[Printing#Platform-independent_printing| platform-independent printing]]
  2021-07-05 09:41:: Authentification for printing fails. Ticket at ID servicedesk opened.

<<Anchor(2021-04-27-itetmaster01-update)>>
== Downtime various D-ITET services for server maintenance ==
'''Status:''' {{attachment:Status/green.gif}}

  2021-04-27 08:30:: Condor is back online, all services restored.
  2021-04-27 08:15:: Matrix/Element Chat services back online.
  2021-04-27 08:00:: Database upgrade done and online.
  2021-04-27 07:30:: Slurm services are back online.
  2021-04-27 07:00:: Base system has been upgraded, main database services in progress of upgrade.
  2021-04-27 06:00:: On 2021-04-27 between 06:00 and 08:30 ISG is going to update a server providing access to various D-ITET services. During the migration the following services will be affected and offline:
   * Matrix/Element Chat services (the instances will be unavailable)
   * IFA/Control Website: Access to the IFA database is blocked
   * Slurm (D-ITET Arton Cluster): It won't be possible to submit new jobs or view Slurm statistics. Already running jobs will not be affected.
   * Condor: Condor clients will be shut down the evening before to avoid running jobs during the migration.

<<Anchor(2021-03-31-network-disruption)>>
== Network disruption affecting several ISG.EE services ==
'''Status:''' {{attachment:Status/green.gif}}

  2021-03-31 09:30:: The configuration error was found. The configuration change will be deployed on '''2021-04-01 around 06:15''' and a short network of about 1min is expected.
  2021-03-31 08:00:: ID Networking team has rolled-back a deployed configuration, pending further investigation/analysis.
  2021-03-31 07:30:: There are currently disruption affecting a VPZ with servers managed by ISG.EE. Networking team of ID is investigating the issue. There are several ISG.EE services affected/malfunctioning due to this in particluar the FindYourData service.

<<Anchor(2021-03-11-mira-upgrade)>>
== login.ee.ethz.ch: downtime for server upgrade ==
'''Status:''' {{attachment:Status/green.gif}}

  2021-03-11 06:30:: Upgrade completed and service is up and running again.
  2021-03-11 06:00:: The server servicing login.ee.ethz.ch will be upgraded to a new OS version (Debian buster). During the time of the update logins might not be possible.

<<Anchor(2020-07-11-storage-downtime)>>
== Planned project/ archive storage downtime and client reboot ==
'''Status:''' {{attachment:Status/green.gif}}

  2020-07-11 12:00:: Migration has been completed, all services are back to operational state.

  2020-07-11 08:00:: Migration started, services are shutdown

  2020-07-11 8:00-12:00:: Start of planned maintenance work. Project/ archive storage services (known under the names "ocean", "bluebay", "lagoon" and "benderstor") will not be available. ISG-managed Linux clients will be rebooted.
Line 10: Line 84:
The facility management is working on solving the problem. A technician is scheduled to be on-site, we expect the problem to be remedied at 5 PM. The technician is currently on-site.
<<Anchor(2020-06-04-svnsrv-upgrade)>>
== svn.ee.ethz.ch downtime for server upgrade ==
'''Status:''' {{attachment:Status/green.gif}}

  2020-06-04 07:05:: Webservices for managing SVN repositories are enabled.
  2020-06-04 06:15:: Systemupgrade is done and access to the SVN repositories via the `svn` and `https` transport protocols are back online.
  2020-06-04 06:00:: The server servicing the SVN repositories will be upgraded to a new operating system version. During this timeframe outages for access to the SVN repositories are expected.

<<Anchor(2020-05-17-cluster-abuse)>>
== European HPC cluster abuse ==
'''Status:''' {{attachment:Status/green.gif}}<<BR>>
Recently European HPC clusters have been attacked and abused for mining purposes. The D-ITET Slurm and SGE clusters have not been compromised. We are monitoring the situation closely.
  2020-05 17 08:30:: No successful login from known attacker IP addresses could be determined, none of the files indicating being compromised have been found on our file systems
  2020-05-16 14:30:: No unusal cluster job activity was observed

<<Anchor(2020-05-04-itetnas04-upgrade)>>
== D-ITET Netscratch downtime for server upgrade ==
'''Status:''' {{attachment:Status/green.gif}}

  2020-05-04 06:00:: Server upgrade has been completed.
  2020-05-04 06:00:: The server servicing the D-ITET Netscratch service will be upgraded to a new operating system version. During this timeframe outages for the NFS service will be expected.

<<Anchor(2020-04-07-network-interuption)>>
== Network outage ETx router ==
'''Status:''' {{attachment:Status/green.gif}}
  2020-04-07 05:30:: There was an issue on the Router `rou-etx`. ID networking team trackled and solved the issue. There was about a 10min interuption for the ETx networking zone affecting almost all ISG.EE maintained systems.

<<Anchor(2020-04-06-mira-maintenance)>>
== login.ee.ethz.ch: Reboot for maintenance ==
'''Status:''' {{attachment:Status/green.gif}}
  2020-04-06 05:35:: System behind `login.ee.ethz.ch` has been rebootet for maintenance and increase available resources.

See the [[RemoteAccess|information on access D-ITET resources remotely]]. To distribute better the load user are encouraged to use the VPN service whenever possible.

<<Anchor(2020-02-18-nostro-maintenance)>>
== itet-stor (FindYourData) Server maintenance: Reconfiguration of VM parameters ==
'''Status:''' {{attachment:Status/green.gif}}

  2020-02-18 19:03:: System again up and running.
  2020-02-18 19:00:: Scheduled downtime for the [[Workstations/FindYourData|itet-stor/FindYourData service]] due to maintenance work on the underlying server.

<<Anchor(2020-01-20-nostro-os-upgrade)>>
== itet-stor (FindYourData) Server migration: New operating system version ==
'''Status:''' {{attachment:Status/green.gif}}

  2020-01-20 07:15:: OS upgrade done, there were short interruptions to the [[Workstations/FindYourData|itet-stor/FindYourData service]].
  2020-01-20 06:00:: We will update the server servicing the [[Workstations/FindYourData|FindYourData service]] from Debian jessie 8 to Debian stretch 9. There will be short downtimes accessing this service during the update.


= Archived status reports =

[[Status/Archive/2010|2010]]
[[Status/Archive/2011|2011]]
[[Status/Archive/2012|2012]]
[[Status/Archive/2013|2013]]
[[Status/Archive/2014|2014]]
[[Status/Archive/2015|2015]]
[[Status/Archive/2016|2016]]
[[Status/Archive/2017|2017]]
[[Status/Archive/2018|2018]]
[[Status/Archive/2019|2019]]
Line 13: Line 148:
[[CategoryEDUC]]

General Informations

Status-Key

Status/green.gif

Resolved

Status/orange.gif

Still working but with some errors

Status/red.gif

Pending

Current status reports

NetScratch Server Filesystem Maintenance

Status: Status/orange.gif

2022-07-09 13:00

The NetScratch filesystem will be put into read-only mode for maintenance.

Unreachable ee.ethz.ch Email recipients over ID Exchange Mailserver

Status: Status/green.gif

2022-06-16 16:45
Configuration issue has been resolved.
2022-06-16 15:00
Emails with ee.ethz.ch recipients sent over the ID Exchange Server do not reach destination. ID Exchange Admins are working on fixing the problem.

HOME Server maintenance to repair fileystem inconsistence

Status: Status/green.gif

2021-11-01 06:30
System back online and HOME directories are again accessible for all D-ITET user
2021-11-01 05:50
HOME Server will be put offline to start a repair of a filesystem inconsistence

Linux printing affected by PrintNightmare vulnerability patch

Status: Status/green.gif

2021-07-05 13:00
ID resolved the issue
2021-07-05 09:41

Workaround: Use platform-independent printing

2021-07-05 09:41
Authentification for printing fails. Ticket at ID servicedesk opened.

Downtime various D-ITET services for server maintenance

Status: Status/green.gif

2021-04-27 08:30
Condor is back online, all services restored.
2021-04-27 08:15
Matrix/Element Chat services back online.
2021-04-27 08:00
Database upgrade done and online.
2021-04-27 07:30
Slurm services are back online.
2021-04-27 07:00
Base system has been upgraded, main database services in progress of upgrade.
2021-04-27 06:00
On 2021-04-27 between 06:00 and 08:30 ISG is going to update a server providing access to various D-ITET services. During the migration the following services will be affected and offline:
  • Matrix/Element Chat services (the instances will be unavailable)
  • IFA/Control Website: Access to the IFA database is blocked
  • Slurm (D-ITET Arton Cluster): It won't be possible to submit new jobs or view Slurm statistics. Already running jobs will not be affected.
  • Condor: Condor clients will be shut down the evening before to avoid running jobs during the migration.

Network disruption affecting several ISG.EE services

Status: Status/green.gif

2021-03-31 09:30

The configuration error was found. The configuration change will be deployed on 2021-04-01 around 06:15 and a short network of about 1min is expected.

2021-03-31 08:00
ID Networking team has rolled-back a deployed configuration, pending further investigation/analysis.
2021-03-31 07:30

There are currently disruption affecting a VPZ with servers managed by ISG.EE. Networking team of ID is investigating the issue. There are several ISG.EE services affected/malfunctioning due to this in particluar the FindYourData service.

login.ee.ethz.ch: downtime for server upgrade

Status: Status/green.gif

2021-03-11 06:30
Upgrade completed and service is up and running again.
2021-03-11 06:00
The server servicing login.ee.ethz.ch will be upgraded to a new OS version (Debian buster). During the time of the update logins might not be possible.

Planned project/ archive storage downtime and client reboot

Status: Status/green.gif

2020-07-11 12:00
Migration has been completed, all services are back to operational state.
2020-07-11 08:00
Migration started, services are shutdown
2020-07-11 8:00-12:00
Start of planned maintenance work. Project/ archive storage services (known under the names "ocean", "bluebay", "lagoon" and "benderstor") will not be available. ISG-managed Linux clients will be rebooted.

svn.ee.ethz.ch downtime for server upgrade

Status: Status/green.gif

2020-06-04 07:05
Webservices for managing SVN repositories are enabled.
2020-06-04 06:15

Systemupgrade is done and access to the SVN repositories via the svn and https transport protocols are back online.

2020-06-04 06:00
The server servicing the SVN repositories will be upgraded to a new operating system version. During this timeframe outages for access to the SVN repositories are expected.

European HPC cluster abuse

Status: Status/green.gif
Recently European HPC clusters have been attacked and abused for mining purposes. The D-ITET Slurm and SGE clusters have not been compromised. We are monitoring the situation closely.

2020-05 17 08:30
No successful login from known attacker IP addresses could be determined, none of the files indicating being compromised have been found on our file systems
2020-05-16 14:30
No unusal cluster job activity was observed

D-ITET Netscratch downtime for server upgrade

Status: Status/green.gif

2020-05-04 06:00
Server upgrade has been completed.
2020-05-04 06:00
The server servicing the D-ITET Netscratch service will be upgraded to a new operating system version. During this timeframe outages for the NFS service will be expected.

Network outage ETx router

Status: Status/green.gif

2020-04-07 05:30

There was an issue on the Router rou-etx. ID networking team trackled and solved the issue. There was about a 10min interuption for the ETx networking zone affecting almost all ISG.EE maintained systems.

login.ee.ethz.ch: Reboot for maintenance

Status: Status/green.gif

2020-04-06 05:35

System behind login.ee.ethz.ch has been rebootet for maintenance and increase available resources.

See the information on access D-ITET resources remotely. To distribute better the load user are encouraged to use the VPN service whenever possible.

itet-stor (FindYourData) Server maintenance: Reconfiguration of VM parameters

Status: Status/green.gif

2020-02-18 19:03
System again up and running.
2020-02-18 19:00

Scheduled downtime for the itet-stor/FindYourData service due to maintenance work on the underlying server.

itet-stor (FindYourData) Server migration: New operating system version

Status: Status/green.gif

2020-01-20 07:15

OS upgrade done, there were short interruptions to the itet-stor/FindYourData service.

2020-01-20 06:00

We will update the server servicing the FindYourData service from Debian jessie 8 to Debian stretch 9. There will be short downtimes accessing this service during the update.

Archived status reports

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019


CategoryEDUC

Status (last edited 2023-10-16 11:24:17 by alders)