Differences between revisions 537 and 538
Revision 537 as of 2019-07-19 11:20:18
Size: 17096
Editor: alders
Comment:
Revision 538 as of 2019-07-19 11:21:01
Size: 17093
Editor: alders
Comment:
Deletions are marked like this. Additions are marked like this.
Line 20: Line 20:
  A current Linux kernel bug allows any unprivileged user to gain root access. A proof of concept code snippet that exploits this vulnerability is publicly available.

  To protect our systems we temporarily disabled {{{ptrace}}} on all managed Linux systems. All software depending on {{{ptrace}}} will completely or at least partially fail. A prominent example is the GNU Debugger {{{gdb}}}.

    A patched Linux kernel will come soon. Once this new kernel is running, we will enable {{{ptrace}}} again.
   A current Linux kernel bug allows any unprivileged user to gain root access. A proof of concept code snippet that exploits this vulnerability is publicly available.

   To protect our systems we temporarily disabled {{{ptrace}}} on all managed Linux systems. All software depending on {{{ptrace}}} will completely or at least partially fail. A prominent example is the GNU Debugger {{{gdb}}}.

   A patched Linux kernel will come soon. Once this new kernel is running, we will enable {{{ptrace}}} again.

General Informations

Status-Key

Status/green.gif

Resolved

Status/orange.gif

Still working but with some errors

Status/red.gif

Pending

Current status reports

Disabling ptrace on managed Debian/GNU Linux computers

Status: Status/orange.gif

2019-07-19 13:00
  • A current Linux kernel bug allows any unprivileged user to gain root access. A proof of concept code snippet that exploits this vulnerability is publicly available.

    To protect our systems we temporarily disabled ptrace on all managed Linux systems. All software depending on ptrace will completely or at least partially fail. A prominent example is the GNU Debugger gdb.

    A patched Linux kernel will come soon. Once this new kernel is running, we will enable ptrace again.

Outage of Project Storage

Status: Status/green.gif

2019-07-18 18:45
Service is back to normal
2019-07-18 17:00
Project homes on bluebay are unavailable

D-ITET services outages

Status: Status/green.gif

2019-04-24 14:18
Virtualisation cluster back with full redudancy. User affecting virtual machines were back online already at 07:30.
2019-04-24 13:00
A issue between the redunant switches caused the network issue, leading the cluster to be in an inconsistent state and rebooting all virtual machines. Networking people are investigating further the issue between the switches.
2019-04-24 09:00
Further analysis ongoing, but healt status of virtualisation cluster was affected leading to resets of all hosted virtual machines.
2019-04-24 07:30
We are bringing services back to normal.
2019-04-24 07:00
A planned outage in ETZ building caused stability issues on serveral services of D-ITET in particular HOME and mail services.

Downtime HOME server: Repair filesystem inconsistency

Status: Status/green.gif

2019-02-25 06:18
System is back online.
2019-02-25 06:00
we identified a filesystem usage accounting discrepancy on one filesystem on the HOME server requiring taking down the server and issuing a repair of the underlying filesystem. The home storage is the default storage location for personal accounts on computers managed by ISG.EE.

One of two RDS servers is not reachable

Status: Status/green.gif

2019-01-25 23:50
Maintenance issues have been resolved. All RDS servers are up and running now.
2019-01-25 14:00
RDS maintenance window is terminated but one server has still pending updates. Logins are not allowed until this issue has been fixed.

Upgrade license, database and distributed computing management server itetmaster01

Status: Status/green.gif

2018-11-12 12:00
PostgreSQL services are online.
2018-11-12 09:31
Arton-Cluster and Condor Grid updates are finished.
2018-11-12 07:30
Arton-Cluster updated (final checks pending)
2018-11-12 07:00
System and license services upgraded. Still pending: Arton, Condor and PostgreSQL Upgrades.
2018-11-12 06:00 - 12:00

Planned upgrade of itetmaster01

Maintenance Project Storage

Status: Status/green.gif

2018-11-08 07:00
Services back online (some recovering slowly)
2018-11-08 06:15
Starting downtime for project storage due to an important maintenance on master node.

D-ITET RDS fronted is difficult to reach due to AD name resolution issues

Status: Status/green.gif

2018-11-02 08:00
RDS is working normally. Protective Measures were put into place to ensure the AD name is updated correctly.
2018-11-01 09:45
worli.d.ethz.ch can not be resolved by the AD name service. DNS works fine but AD-DNS Synchronization seems to be in an unstable state. We are in contact with the responsible team of central IT services.
  • WORKAROUND:
    • D-ITET users: Connenct directly to vela.ee.ethz.ch
    • IFE users: Connect directly to neon.ee.ethz.ch

Jabba Tape Library HW Problems

Status: Status/green.gif

2018-08-17 07:00

Update: All tape library issues have been solved. All backup and archive services are back online

2018-08-15 12:00

Update: There are still issues with the tape library. It will be down for at least another day.

2018-08-13 11:30

Update: The failed tape drive has been replaced. The management PC is still not working. Due to the very old hardware the supply with spare parts does take longer as usual. The tape library remains offline at least until tomorrow afternoon.

2018-08-10 10:00
Due to a failed tape drive and a defective management PC Jabba's Tape Library is not working. The Jabba server is not affected. We are in contact with IBM and we hope the problem will be fixed by next Monday.

What does this mean for you:

  • BACKUP
    • Store new data: New data are stored to a SAM-FS cache area first, but can not be written to tape afterwards, i.e. the backup process can not be completed finally, but will started automatically as soon as the tape library works again.
    • Access existing data: Only access to data still available in the SAM-FS cache area is possible. NO ACCESS to data located on tape only.
  • ARCHIVE
    • Store new data: New data are stored to a SAM-FS cache area first. In second step data are copied to a archive disk but the second copy to a tape will fail. I.e. the archive process can be completed for one half only. The copy-to-tape will started automatically as soon as the tape library works again.
    • Access existing data: there is no limitation in access to archive data

D-ITET mail server downtime: New operating system version

Status: Status/green.gif

2018-06-16 08:24

System is up and running, all tests done. Everything should work as intended. If you find errors, please contact us under <support@ee.ethz.ch>

2018-06-16 08:08
System is up and running, we are performing some final tests before releasing access with little delay as previously announced.
2018-06-16 07:11
Everything works as planned currently
2018-06-16 06:00
Due to a planned operating system update, the D-ITET mail server will be unavailable today, June 16, 2018 between 06:00 and 08:00.

Major outage virtualization cluster/networking switch

Status: Status/green.gif

2018-04-24 08:56
Sending of emails is restored again. Recieving mail should not be lost for any properly sending email server, since the issues caused a temporary error notification to the sending server which should in turn retry resubmitting an email correctly later on with some delay.
2018-04-24 07:45
Bringing back online most important services, including home service; issue being investigated.
2018-04-24 06:29
Major outage of Networking/virtualization Cluster taking down important D-ITET Services (home Server, partially mailsystem, Linux clients).

Jabba Maintenance

Status: Status/green.gif

2018-04-06 08:10
Jabba is back online
2018-04-06 07:00
Jabba is offline due to maintenance work

D-ITET Storage Migration

Status: Status/green.gif

2018-03-10 15:00
Migration of user homes completed.
2018-03-10 14:15
User homes migrated, access is unblocked again, some post-migration tasks still pending.
2018-03-10 10:00
D-ITET user homes will be migrated from ID Storage to D-ITET Storage. During the whole migration time access to the user homes for the affected users is blocked. Affected users are informed directly by an email.

svn.ee.ethz.ch Server migration: New operating system version

Status: Status/green.gif

2018-02-12 08:55
Server upgrade has been completed and all services up and running again.
2018-02-12 06:15

Start updating server from Debian Wheezy 7 to Debian Stretch 9. Downtimes for https://svn.ee.ethz.ch, svn://svn.ee.ethz.ch and https://svnmgr.ee.ethz.ch.

Cronbox/Login Server migration: New operating system version

Status: Status/green.gif

2018-02-05 07:00

The host mira has been upgraded to Debian 9 Stretch. SSH Host keys fingerprints for RSA and ED25519 are:

4096 MD5:fc:a8:00:5b:64:90:86:a1:fb:49:75:ef:55:58:90:b3 (RSA)
4096 SHA256:v48HAAAjr+avnPAESdQzazSriKYZeTGGtIPKfoE8Dg0 (RSA)
256 SHA256:SgvaiZyIgzujLJdbtRij5VGUOXm/IuAs3MkMYtGZNhc (ED25519)
256 MD5:3b:b0:1a:8a:ea:0a:e5:ea:bb:9e:bb:5c:ef:24:c3:92 (ED25519)

The SSH host key is as well listed on: https://people.ee.ethz.ch/

2018-01-31 11:00

The host mira holding the cronbox and login service will be upgraded to Debian 9 Stretch on 2018-02-05 at 06:10.

Upgrade of Server itetnas02

Status: Status/green.gif

2018-01-25 07:30
Upgrade completed.
2018-01-24 16:45

On 2018-01-25 around 06:10 we will upgrade the server itetnas02. Several short outages for Fileservices (Samba, NFS) are expected. Services for project accounts and dedicated shares for biwi, ibt, ini and tik are affected.

Outage of Server itetnas03

Status: Status/green.gif

2017-11-15 07:00
Battery unit replaced
2017-11-10 07:20

Server is back online but without battery unit. We will need to shutdown itetnas03 again once the problem is isolated and can be fixed.

2017-11-10 06:15
The server itetnas03 is down due to hardware problems (A battery replacement caused controller problems). ISG and the hardware vendor are currently working to get this problem solved.

User Home accessibility

Status: Status/green.gif

2017-11-08 06:25
Informatikdienste have reverted a change which caused the problems for accessing all user's HOME via the CIFS (SAMBA) protocol.
2017-11-07 08:00
All users' HOME are currently not accessible by CIFS (SAMBA) protocol. NFS access is still available.

Outage of Server ibtnas02

Status: Status/green.gif

2017-10-31 08:00
Upgrade successfully completed
2017-10-30 16:50
The server will be upgraded to a new OS release on 2017-10-31 starting around 06:15. Short outages of Samba and NFS services are going to be expected.
2017-10-25 10:00
ibtnas02 now serves all partitions but the problem is not yet identified
2017-10-24 15:00
The server ibtnas02 is up again (partition data-08 is not available)
2017-10-24 12:50
The server ibtnas02 is down again
2017-10-24 09:30
The server ibtnas02 is back online
2017-10-24 08:00
The server ibtnas02 is down due to hardware problems

Outage of Server itetnas03

Status: Status/green.gif

2017-10-23 18:15
Data are also accessible via NFS.
2017-10-23 9:30
The server is up. Data are accessible via Samba. NFS file service is still down.
2017-10-21 15:00
The server itetnas03 is down due to hardware problems

Outage of Servers in Serverroom ETZ/D/96.2

Status: Status/green.gif

2017-10-20 13:45
All racks in ETZ/D/96.2 are working again (cooling problem solved).
2017-10-20 10:00
The technician will arrive at 13:00 hours. Some servers are running, but without watercooling. So any rack might shutdown at any time if the air cooling is not sufficient. This will most probably again happen when the technician will be working in the room (i.e. this afternoon).
2017-10-19 18:30
The cooling engineer could not fix the problem, so some servers are still offline. Another technicial will try to fix the cooling system tomorrow morning.
2017-10-18 14:00
Cooling system is still not working correctly, we only selectively powered on a couple of compute machines.
2017-10-18 12:50
The problem has been localized and repaired. We need to wait that the circuit is cooling down.
2017-10-18 10:30
Outage of most racks in ETZ/D/96.2 (cooling problem) . Most compute servers are offline.

Outage Servers in Serverroom ETZ/D/96.2

Status: Status/green.gif

2017-05-13 20:00
Outage of some racks in ETZ/D/96.2. Several compute servers offline.
2017-05-13 23:59
Most of the servers are back online.
2017-05-15 08:45
Status of remaining servers verified. All back online.

Cronbox/Login Server migration: new SSH host key

Status: Status/green.gif

2017-03-24 17:00
The cronbox and login server has moved to a new host. A new SSH host key has been generated:
4096 MD5:fc:a8:00:5b:64:90:86:a1:fb:49:75:ef:55:58:90:b3 (RSA)
4096 SHA256:v48HAAAjr+avnPAESdQzazSriKYZeTGGtIPKfoE8Dg0 (RSA)

The SSH host key is as well listed on: https://people.ee.ethz.ch/

Remember

Always verify a fingerprint of a SSH host key before accepting it.

EE Mailsystem migration

STATUS: Status/green.gif Mailsystem up

2017-01-08 15:00
The new mailsystem is now started. In case of unattended problems we will stop it again to prevent data loss and to analyze the problem.
2017-01-07 24:00
Not all testcases could be performed. We now plan to enable the new system about noon.
2017-01-07 20:45
Old Mailserver Configuration migrated, starting the mailserver testing
2017-01-07 14:00
User mailbox data migrated, starting mailserver configuration migration
2017-01-07 07:00
All mail services are stopped. Mailbox data copy started.

Networkoutage ETH

STATUS: Status/green.gif

2016-02-09 08:20
ETH wide network outage due to hardware problems for the firewall infrastructure. In any case, please reboot your computer before continue.
2016-02-09 12:35
Network is back online and services are being recovered. Due to the hardware failure 53 network zones were affected. The problem got localized and resolved.
2016-02-09 14:25

Our systems should be all back to normal. In case you experience any problem please contact support via mailto:support@ee.ethz.ch.

Maintenance login.ee.ethz.ch and cronbox.ee.ethz.ch service

STATUS: Status/green.gif

2016-02-10: 06:05

The server for the cronbox and login service is currently beeing updated from Debian Wheezy to Debian Jessie. The services will be temporarly unavailable.

2016-02-10: 12:00
Server update is done.

Archived status reports

2015 2014 2013 2012 2011 2010


CategoryEDUC

Status (last edited 2023-10-16 11:24:17 by alders)