Differences between revisions 437 and 630 (spanning 193 versions)
Revision 437 as of 2017-10-18 08:59:28
Size: 4448
Editor: gfreudig
Comment:
Revision 630 as of 2021-04-22 08:55:31
Size: 7323
Editor: bonaccos
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#rev 2018-08-27 mreimers
#rev 2020-08-31 alders
Line 13: Line 16:
<<Anchor(2017-10-18-outage-etz-d-96-2)>> <<Anchor(2021-04-27-itetmaster01-update)>>
== Downtime various D-ITET services for server maintenance ==
'''Status:''' {{attachment:Status/orange.gif}}
Line 15: Line 20:
== Outage of Servers in Serverroom ETZ/D/96.2 ==
'''Status:''' {{attachment:Status/red.gif}}
  2021-04-27 06:00:: On 2021-04-27 between 06:00 and 08:30 ISG is going to update a server providing access to various D-ITET services. During the migration the following services will be affected and offline:
   * Matrix/Element Chat services (the instances will be unavailable)
   * IFA/Control Website: Access to the IFA database is blocked
   * Slurm (D-ITET Arton Cluster): It won't be possible to submit new jobs or view Slurm statistics. Already running jobs will not be affected.
   * Condor: Condor clients will be shut down the evening before to avoid running jobs during the migration.
Line 18: Line 26:
  2017-10-18 10:30:: Outage of most racks in ETZ/D/96.2 (cooling problem) . Most compute servers are offline.

<<Anchor(2017-05-13-outage-etz-d-96-2)>>

== Outage Servers in Serverroom ETZ/D/96.2 ==
<<Anchor(2021-03-31-network-disruption)>>
== Network disruption affecting several ISG.EE services ==
Line 25: Line 30:
  2017-05-13 20:00:: Outage of some racks in ETZ/D/96.2. Several compute servers offline.
  2017-05-13 23:59:: Most of the servers are back online.
  2017-05-15 08:45:: Status of remaining servers verified. All back online.
  2021-03-31 09:30:: The configuration error was found. The configuration change will be deployed on '''2021-04-01 around 06:15''' and a short network of about 1min is expected.
  2021-03-31 08:00:: ID Networking team has rolled-back a deployed configuration, pending further investigation/analysis.
  2021-03-31 07:30:: There are currently disruption affecting a VPZ with servers managed by ISG.EE. Networking team of ID is investigating the issue. There are several ISG.EE services affected/malfunctioning due to this in particluar the FindYourData service.
Line 29: Line 34:
----

<<Anchor(2017-03-24-cronbox-login-ssh-keys)>>

== Cronbox/Login Server migration: new SSH host key ==
<<Anchor(2021-03-11-mira-upgrade)>>
== login.ee.ethz.ch: downtime for server upgrade ==
Line 36: Line 38:
  2017-03-24 17:00:: The cronbox and login server has moved to a new host. A new SSH host key has been generated:
  {{{
4096 MD5:fc:a8:00:5b:64:90:86:a1:fb:49:75:ef:55:58:90:b3 (RSA)
4096 SHA256:v48HAAAjr+avnPAESdQzazSriKYZeTGGtIPKfoE8Dg0 (RSA)
}}}
The SSH host key is as well listed on: https://people.ee.ethz.ch/
  2021-03-11 06:30:: Upgrade completed and service is up and running again.
  2021-03-11 06:00:: The server servicing login.ee.ethz.ch will be upgraded to a new OS version (Debian buster). During the time of the update logins might not be possible.
Line 43: Line 41:
  Remember:: '''Always''' verify a fingerprint of a SSH host key before accepting it. <<Anchor(2020-07-11-storage-downtime)>>
== Planned project/ archive storage downtime and client reboot ==
'''Status:''' {{attachment:Status/green.gif}}
Line 45: Line 45:
----   2020-07-11 12:00:: Migration has been completed, all services are back to operational state.
Line 47: Line 47:
<<Anchor(2017-01-07-Mailsystem migration)>>   2020-07-11 08:00:: Migration started, services are shutdown
Line 49: Line 49:
== EE Mailsystem migration ==
'''STATUS:''' {{attachment:Status/green.gif}} '''Mailsystem up'''
  2020-07-11 8:00-12:00:: Start of planned maintenance work. Project/ archive storage services (known under the names "ocean", "bluebay", "lagoon" and "benderstor") will not be available. ISG-managed Linux clients will be rebooted.
Line 52: Line 51:
  2017-01-08 15:00:: The new mailsystem is now started. In case of unattended problems we will stop it again to prevent data loss and to analyze the problem.
Line 54: Line 52:
  2017-01-07 24:00:: Not all testcases could be performed. We now plan to enable the new system about noon.
Line 56: Line 53:
  2017-01-07 20:45:: Old Mailserver Configuration migrated, starting the mailserver testing <<Anchor(2020-06-04-svnsrv-upgrade)>>
== svn.ee.ethz.ch downtime for server upgrade ==
'''Status:''' {{attachment:Status/green.gif}}
Line 58: Line 57:
  2017-01-07 14:00:: User mailbox data migrated, starting mailserver configuration migration   2020-06-04 07:05:: Webservices for managing SVN repositories are enabled.
  2020-06-04 06:15:: Systemupgrade is done and access to the SVN repositories via the `svn` and `https` transport protocols are back online.
  2020-06-04 06:00:: The server servicing the SVN repositories will be upgraded to a new operating system version. During this timeframe outages for access to the SVN repositories are expected.
Line 60: Line 61:
  2017-01-07 07:00:: All mail services are stopped. Mailbox data copy started. <<Anchor(2020-05-17-cluster-abuse)>>
== European HPC cluster abuse ==
'''Status:''' {{attachment:Status/green.gif}}<<BR>>
Recently European HPC clusters have been attacked and abused for mining purposes. The D-ITET Slurm and SGE clusters have not been compromised. We are monitoring the situation closely.
  2020-05 17 08:30:: No successful login from known attacker IP addresses could be determined, none of the files indicating being compromised have been found on our file systems
  2020-05-16 14:30:: No unusal cluster job activity was observed
Line 62: Line 68:
---- <<Anchor(2020-05-04-itetnas04-upgrade)>>
== D-ITET Netscratch downtime for server upgrade ==
'''Status:''' {{attachment:Status/green.gif}}
Line 64: Line 72:
<<Anchor(2016-09-12-network-outage)>>   2020-05-04 06:00:: Server upgrade has been completed.
  2020-05-04 06:00:: The server servicing the D-ITET Netscratch service will be upgraded to a new operating system version. During this timeframe outages for the NFS service will be expected.
Line 66: Line 75:
== Networkoutage ETH ==
'''STATUS:''' {{attachment:Status/green.gif}}
<<Anchor(2020-04-07-network-interuption)>>
== Network outage ETx router ==
'''Status:''' {{attachment:Status/green.gif}}
  2020-04-07 05:30:: There was an issue on the Router `rou-etx`. ID networking team trackled and solved the issue. There was about a 10min interuption for the ETx networking zone affecting almost all ISG.EE maintained systems.
Line 69: Line 80:
  2016-02-09 08:20:: ETH wide network outage due to hardware problems for the firewall infrastructure. In any case, please reboot your computer before continue. <<Anchor(2020-04-06-mira-maintenance)>>
== login.ee.ethz.ch: Reboot for maintenance ==
'''Status:''' {{attachment:Status/green.gif}}
  2020-04-06 05:35:: System behind `login.ee.ethz.ch` has been rebootet for maintenance and increase available resources.
Line 71: Line 85:
  2016-02-09 12:35:: Network is back online and services are being recovered. Due to the hardware failure 53 network zones were affected. The problem got localized and resolved. See the [[RemoteAccess|information on access D-ITET resources remotely]]. To distribute better the load user are encouraged to use the VPN service whenever possible.
Line 73: Line 87:
  2016-02-09 14:25:: Our systems should be all back to normal. In case you experience any problem please contact support via mailto:support@ee.ethz.ch. <<Anchor(2020-02-18-nostro-maintenance)>>
== itet-stor (FindYourData) Server maintenance: Reconfiguration of VM parameters ==
'''Status:''' {{attachment:Status/green.gif}}
Line 75: Line 91:
----   2020-02-18 19:03:: System again up and running.
  2020-02-18 19:00:: Scheduled downtime for the [[Workstations/FindYourData|itet-stor/FindYourData service]] due to maintenance work on the underlying server.
Line 77: Line 94:
<<Anchor(2016-02-10-maintenance-polaris)>> <<Anchor(2020-01-20-nostro-os-upgrade)>>
== itet-stor (FindYourData) Server migration: New operating system version ==
'''Status:''' {{attachment:Status/green.gif}}
Line 79: Line 98:
== Maintenance login.ee.ethz.ch and cronbox.ee.ethz.ch service ==
'''STATUS:''' {{attachment:Status/green.gif}}
  2020-01-20 07:15:: OS upgrade done, there were short interruptions to the [[Workstations/FindYourData|itet-stor/FindYourData service]].
  2020-01-20 06:00:: We will update the server servicing the [[Workstations/FindYourData|FindYourData service]] from Debian jessie 8 to Debian stretch 9. There will be short downtimes accessing this service during the update.
Line 82: Line 101:
  2016-02-10: 06:05:: The server for the [[Services/Cronjob|cronbox]] and login service is currently beeing updated from Debian Wheezy to Debian Jessie. The services will be temporarly unavailable.
Line 84: Line 102:
  2016-02-10: 12:00:: Server update is done.
----
Line 88: Line 104:
[[Status/Archive/2010|2010]]
[[Status/Archive/2011|2011]]
[[Status/Archive/2012|2012]]
[[Status/Archive/2013|2013]]
[[Status/Archive/2014|2014]]
Line 89: Line 110:
[[Status/Archive/2014|2014]]
[[Status/Archive/2013|2013]]
[[Status/Archive/2012|2012]]
[[Status/Archive/2011|2011]]
[[Status/Archive/2010|2010]]
[[Status/Archive/2016|2016]]
[[Status/Archive/2017|2017]]
[[Status/Archive/2018|2018]]
[[Status/Archive/2019|2019]]

General Informations

Status-Key

Status/green.gif

Resolved

Status/orange.gif

Still working but with some errors

Status/red.gif

Pending

Current status reports

Downtime various D-ITET services for server maintenance

Status: Status/orange.gif

2021-04-27 06:00
On 2021-04-27 between 06:00 and 08:30 ISG is going to update a server providing access to various D-ITET services. During the migration the following services will be affected and offline:
  • Matrix/Element Chat services (the instances will be unavailable)
  • IFA/Control Website: Access to the IFA database is blocked
  • Slurm (D-ITET Arton Cluster): It won't be possible to submit new jobs or view Slurm statistics. Already running jobs will not be affected.
  • Condor: Condor clients will be shut down the evening before to avoid running jobs during the migration.

Network disruption affecting several ISG.EE services

Status: Status/green.gif

2021-03-31 09:30

The configuration error was found. The configuration change will be deployed on 2021-04-01 around 06:15 and a short network of about 1min is expected.

2021-03-31 08:00
ID Networking team has rolled-back a deployed configuration, pending further investigation/analysis.
2021-03-31 07:30

There are currently disruption affecting a VPZ with servers managed by ISG.EE. Networking team of ID is investigating the issue. There are several ISG.EE services affected/malfunctioning due to this in particluar the FindYourData service.

login.ee.ethz.ch: downtime for server upgrade

Status: Status/green.gif

2021-03-11 06:30
Upgrade completed and service is up and running again.
2021-03-11 06:00
The server servicing login.ee.ethz.ch will be upgraded to a new OS version (Debian buster). During the time of the update logins might not be possible.

Planned project/ archive storage downtime and client reboot

Status: Status/green.gif

2020-07-11 12:00
Migration has been completed, all services are back to operational state.
2020-07-11 08:00
Migration started, services are shutdown
2020-07-11 8:00-12:00
Start of planned maintenance work. Project/ archive storage services (known under the names "ocean", "bluebay", "lagoon" and "benderstor") will not be available. ISG-managed Linux clients will be rebooted.

svn.ee.ethz.ch downtime for server upgrade

Status: Status/green.gif

2020-06-04 07:05
Webservices for managing SVN repositories are enabled.
2020-06-04 06:15

Systemupgrade is done and access to the SVN repositories via the svn and https transport protocols are back online.

2020-06-04 06:00
The server servicing the SVN repositories will be upgraded to a new operating system version. During this timeframe outages for access to the SVN repositories are expected.

European HPC cluster abuse

Status: Status/green.gif
Recently European HPC clusters have been attacked and abused for mining purposes. The D-ITET Slurm and SGE clusters have not been compromised. We are monitoring the situation closely.

2020-05 17 08:30
No successful login from known attacker IP addresses could be determined, none of the files indicating being compromised have been found on our file systems
2020-05-16 14:30
No unusal cluster job activity was observed

D-ITET Netscratch downtime for server upgrade

Status: Status/green.gif

2020-05-04 06:00
Server upgrade has been completed.
2020-05-04 06:00
The server servicing the D-ITET Netscratch service will be upgraded to a new operating system version. During this timeframe outages for the NFS service will be expected.

Network outage ETx router

Status: Status/green.gif

2020-04-07 05:30

There was an issue on the Router rou-etx. ID networking team trackled and solved the issue. There was about a 10min interuption for the ETx networking zone affecting almost all ISG.EE maintained systems.

login.ee.ethz.ch: Reboot for maintenance

Status: Status/green.gif

2020-04-06 05:35

System behind login.ee.ethz.ch has been rebootet for maintenance and increase available resources.

See the information on access D-ITET resources remotely. To distribute better the load user are encouraged to use the VPN service whenever possible.

itet-stor (FindYourData) Server maintenance: Reconfiguration of VM parameters

Status: Status/green.gif

2020-02-18 19:03
System again up and running.
2020-02-18 19:00

Scheduled downtime for the itet-stor/FindYourData service due to maintenance work on the underlying server.

itet-stor (FindYourData) Server migration: New operating system version

Status: Status/green.gif

2020-01-20 07:15

OS upgrade done, there were short interruptions to the itet-stor/FindYourData service.

2020-01-20 06:00

We will update the server servicing the FindYourData service from Debian jessie 8 to Debian stretch 9. There will be short downtimes accessing this service during the update.

Archived status reports

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019


CategoryEDUC

Status (last edited 2023-10-16 11:24:17 by alders)