Differences between revisions 8 and 350 (spanning 342 versions)
Revision 8 as of 2010-11-15 15:17:43
Size: 620
Editor: bonaccos
Comment:
Revision 350 as of 2015-02-25 11:10:18
Size: 50948
Editor: alders
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Status =

<<Anchor(2010-11-15-servers-down)>>

== 2010-11-15: cooling water system outage for some clusters ==

On last friday evening one of the cooling water pumps installed in ETZ/D/96.2 stopped working correctly. This forced some of the racks in this server room to shutdown in order to protect the servers from thermal damage. '''All clusters from IFH, IBT, BIWI, TIK, IKT and VAW were affected.'''


The facility management is working on solving the problem. A technician is scheduled to be on-site, we expect the problem to be remedied at 5 PM.

The technician is currently on-site (16:15).

----
= General Informations =
 * This page lists announcements and status messages for IT services managed by [[http://www.isg.ee.ethz.ch/|ISG.EE]].
 * For notifications and announcements of central IT services managed by ID, please visit https://www1.ethz.ch/id/servicedesk/sysstat/index_EN
 * For a detailed status overview of central IT services managed by ID, please visit http://eranger3.ethz.ch/Ueberwachung/index.html

||||<style="border-width: 1px 0px; border-color: rgb(85, 136, 238); padding: 0.6em;">'''Status-Key''' ||
||<style="border: medium none;"> {{attachment:green.gif}} ||<style="border: medium none;">Resolved ||
||<style="border: medium none;"> {{attachment:orange.gif}} ||<style="border: medium none;">Still working but with some errors ||
||<style="border-width: medium medium 1px; border-top: medium none rgb(85, 136, 238); border-left: medium none rgb(85, 136, 238); border-right: medium none rgb(85, 136, 238); border-color: rgb(85, 136, 238);"> {{attachment:red.gif}} ||<style="border-width: medium medium 1px; border-top: medium none rgb(85, 136, 238); border-left: medium none rgb(85, 136, 238); border-right: medium none rgb(85, 136, 238); border-color: rgb(85, 136, 238);">Pending ||

<<Anchor(2015-02-09-drwho-not-available)>>

= Problems on core server drwho =
'''STATUS:''' {{attachment:orange.gif}}
 2015-02-27: 17:00:: Planned maintenance downtime.

  ITET's core server DRWHO experiences major problems with its disk controller and its associated disks. Affected hardware needs to be replaced by Oracle support. Due to this maintenance work we are going to shutdown DRWHO on

  '''Friday, February 27th, between 17:00 and 22:00.'''

 2015-02-13: 09:15:: All services should now be available again. The maintenance downtime was much longer than planned. Please apologize for any inconveniences that might have caused.
 2015-02-13: 08:30:: The problems seem not to be resolved. The server is online but some services are not running yet.
 2015-02-13: 08:00:: The firmware update took longer than planned. The server should be back online in the next minutes.
 2015-02-12: 11:00:: The firmware needs to be updated. This will be done on Friday, 13th of February, between 06:30 and 07:30 hours.
 2015-02-09: 14:30:: The server is back online, we are still investigating the issue.
 2015-02-09: 13:15:: Affected are all diskless clients and services which depend on drwho (e.g. the email.ee.ethz.ch website).
 2015-02-09: 12:55:: The D-ITET server drwho.ee.ethz.ch is currently not reachable.

----

<<Anchor(2015-01-26-ee-website-down)>>

= www.ee.ethz.ch and www.werkstatt.ee.ethz.ch - Websites down =
'''STATUS:''' {{attachment:green.gif}}

 2015-01-26: 18:53:: Informatikdienste resolved the problems and http://www.ee.ethz.ch and http://www.werkstatt.ee.ethz.ch are now again online.

 2015-01-26: 14:12:: ID annouces, that the downtime could last until 18:30.

 2015-01-26: 09:30:: Informed ID Servicedesk about the issue.

----

<<Anchor(2015-01-20-Sympa-compose-flaw)>>

= lists.ee.ethz.ch - Composing has been temporarily disabled =
'''STATUS:''' {{attachment:green.gif}}

 2015-01-20: 22:05:: We applied the security updates and released again the composing area.

 2015-01-20: 11:15:: Composing of messages has been temporarily disabled due to a security flaw. People using the composing tool are able to view and send any file on the filesystem. Composing will be re-enabled after the package maintainers have updated sympa.

----

<<Anchor(2015-01-15-Sophos-problems)>>

= SMTP Email Sending: Sophos Problems =
'''STATUS:''' {{attachment:green.gif}}

 2015-01-15: 17:45:: After the last virus-signature update of the Sophos antivirus software, we had a virus data checksum error. This was the reason for the delays. After fixing this issue, everything went to normal state.
 2015-01-15: 16:00:: The sending and receiving of emails takes too much time and the mailserver is on high load. We are working on a solution.
----

<<Anchor(2015-01-12-oenone-nfs-problems)>>

= Server oenone NFS problems =
'''STATUS:''' {{attachment:green.gif}}

 2015-01-12: 21:37:: We see NFS related problems on server oenone again. Affected were homes of all users from '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''. This issue also affects related webpages and services.

 2015-01-13: 07:20:: The system is again recovered and back online.
----

<<Anchor(2014-11-30-Email-problems)>>

= Problems sending emails because of LDAP outage =
'''STATUS:''' {{attachment:green.gif}}

 2014-12-02: 16:15:: Set status to green as we had no other problems.
 2014-11-30: 23:15:: We have reconfigured dovecot to use another ldap server. Currently it looks that sending emails with dovecot or roundcube is working again. But since we do not know the cause of the outtage it could be possible, that this newly configured LDAP server could also crash.

 2014-11-30: 16:50:: One of the main LDAP Servers (ldaps-rz-2) from the informatik dienste, used to authenticate the sending user, is not available anymore.

----
<<Anchor(2014-11-22-solaris-server-patching)>>

= Maintenance work on D-ITET's central IT infrastructure =
'''STATUS:''' {{attachment:green.gif}}

 2014-11-22: 14:00:: All services are back online.

 2014-11-22: 10:00-16:00:: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance.

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Windows)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside

More details:
 * servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)

----
<<Anchor(2014-11-13-ausfall-kuehlung-serverraum)>>

= Server room ETZ D 96 down =
'''STATUS:''' {{attachment:green.gif}}

 2014-11-14: 07:00:: bender30 online

 2014-11-13: 17:10:: All servers listed below should be back online. With the exception of bender30 which seems to have a more serious problem.

 2014-11-13: 15:45:: The cooling system in ETZ D 96 should be back to normal operation. We will progressively power on all servers listed below.

 2014-11-13: 15:24:: The cooling system in ETZ D 96 has temporarily been out of order. To prevent a overheating of the servers, all racks in ETZ D 96 have automatically been shut down. Affected servers include
  * arton*
  * autserv*
  * bender*
  * biwirender*
  * colombo*
  * ifhlux*
  * nariwork*
  * tik*x
  * vierzack*

----

<<Anchor(2014-11-08-oenone-nfs-problems)>>

= Server oenone NFS problems =
'''STATUS:''' {{attachment:green.gif}}

 2014-11-07: 23:50:: We see NFS related problems on server oenone again. Affected were homes of all users from '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''. This issue also affects related webpages and services.

 2014-11-08: 05:00:: The system is again recovered and back online.
----

<<Anchor(2014-08-21-YOSEMITE-shutdown-for-maintenance)>>
= YOSEMITE shutdown for maintenance =
'''STATUS:''' {{attachment:green.gif}}

 2014-08-21: 07:00:: Server YOSEMITE is back online

 2014-08-21: 07:00:: Server YOSEMITE is down for emergency maintenance


----


<<Anchor(2014-06-30-Webmail-Outage)>>
= Webmail outage =
'''STATUS:''' {{attachment:green.gif}}

 2014-06-30: 18:25:: Webmail is running normal.
 2014-06-30: 17:30:: Our webmail server "roundcube" is currently down due to maintenance.

----

<<Anchor(2014-06-05-Security-Linux-clients-reboots)>>

= Swisstime outage =
'''STATUS:''' {{attachment:green.gif}}

 2014-06-05:: Due to a new Linux kernel vulnerability all of our workstations, compute cluster servers and service servers will be rebooted.

----
<<Anchor(2014-05-25-Swisstime-Outage)>>

= Swisstime outage =
'''STATUS:''' {{attachment:green.gif}}

 2014-05-26: 17:00:: Our time server "swisstime" is currently offline due to a switch hardware failure.
 2014-05-27: 15:30:: The switch has been replaced and reconfigured.

----

<<Anchor(2014-04-17-OpenSSL-Heartbleed)>>

= OpenSSL Heartbleed vulnerability =
'''STATUS:''' {{attachment:green.gif}}

 2014-04-07: 21:00:: we got aware of a OpenSSL vulnerability in OpenSSL's support for the TLS/DTLS Heartbeat extension. Memory from either client or server can be recovered by an attacker. This vulnerability might allow an attacker to compromise the private key and other sensitive data in memory.
 2014-04-08: 06:30:: Relying on the updates of our used distribution we rolled out the update and had at this time already patched versions on all vulnerable servers.
 2014-04-10: 08:00:: Updates were rolled out also on Clients side.
 2014-04-14: 12:00:: OpenSSL certificates are replaced for previously affected services on the vulnerable servers.
 2014-04-16: 16:00:: Informatikdienste announce that they also patched all relevant services and advice customers to change the password. We confirm that this step is '''strongly recommended'''. Note in particular that the vulnerability was already exploited since November 2013 to collect passwords, session data, private keys, etc.

----
<<Anchor(2014-04-13-no-sendings_through_webform)>>

= No Emails accepted from webservers =
'''STATUS:''' {{attachment:green.gif}}

 2014-04-14: 14:00:: The plugin which caused this malicious behavior has been deactivated. Email connections from the webcluster work now as intended.

 2014-04-13: 18:00:: A webform has been missused to send spam over our infrastructure. In order to make sure this does not happen again until we investigate the case, all emails sent from our webservers will be rejected. Unfortunately some scripts do not realize this rejection and create a success message.
 
We appologize for this.

----
<<Anchor(2014-04-14-solaris-server-patching)>>

= Maintenance work on D-ITET's central IT infrastructure =
'''STATUS:''' {{attachment:green.gif}}

 2014-04-12: 15:50:: All services are back online

 2014-04-12: 10:00-16:00:: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Windows)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside

More details:
 * servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)

----


<<Anchor(2014-01-07-networkmaintenance-affecting-services)>>

= Networkmaintenance affecting some services =
'''STATUS:''' {{attachment:green.gif}}

Netzwerkdienste is doing an router update '''between 6:30 and 7:00 on 2013-01-07'''. This can cause a downtime of the network reaching the following services for about 15 minutes:

 * '''Subversion''' service on svn.ee.ethz.ch
 * '''Sympa mailinglists''' on lists.ee.ethz.ch (both mailing and webinterface). Note: No mails will be lost

----
<<Anchor(2013-11-18-mysql-server-problem)>>

= MySQL Server downtime =
'''STATUS:''' {{attachment:green.gif}}

 2013-11-18: 12:00:: We are experiencing Problems with the MySQL database cluster. Affected are also all websites using the MySQL instance on remi.ee.ethz.ch

 2013-11-18: 12:45:: The system is working again.

----

<<Anchor(2013-11-12-oenone-nfs-problems)>>

= Server oenone NFS problems =
'''STATUS:''' {{attachment:green.gif}}

 2013-11-12: 04:00:: We see NFS related problems on server oenone again. Affected were homes of all users from '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''.

 2013-11-12: 07:40:: The system is again back online.
----

<<Anchor(2013-10-16-oenone-nfs-problems)>>

= Server oenone NFS problems =
'''STATUS:''' {{attachment:green.gif}}

 2013-10-16: 02:00:: Starting from around 02:00 in the morning we had NFS related problems on server oenone. Affected were homes of all users from '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''. Webpages of these institutes may also have been affected.

 2013-10-16: 03:00:: Server oenone recovered itself and is running as normal again.

----

<<Anchor(2013-09-28-solaris-server-patching)>>

= Maintenance work on D-ITET's central IT infrastructure =
'''STATUS:''' {{attachment:green.gif}}

 2013-09-28: 14:30:: All services are back online

 2013-09-28: 10:00-16:00:: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Windows)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside

More details:
 * servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)

----

<<Anchor(2013-09-05-oenone-servercrash)>>

= Server oenone crash =
'''STATUS:''' {{attachment:green.gif}}

 2013-09-04: 21:30:: oenone nfs server crashed. Affecting users of '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''. Homes and Webpages of these institutes.

 2013-09-05: 04:00:: we needed to reset the server to bring it again up.

----

<<Anchor(2013-06-15-oenone-servercrash)>>

= Server oenone crash =
'''STATUS:''' {{attachment:green.gif}}

 2013-06-15: 08:30:: oenone nfs server crashed. Affecting users of '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''. Homes and Webpages of these institutes.

 2013-06-15: 11:45:: we needed to reset the machine as the service for providing the home directories was unresponsive.

----

<<Anchor(2013-05-16-VPP-Poster-Printer-ETZSPEZ)>>

= ETZSPEZ - HP 6100 encountered print quality issues =
'''STATUS:''' {{attachment:green.gif}}

 2013-05-30 12:00:: The Printer has been fixed.

 2013-05-30 09:00:: Technician is mending the printer right now. Printer is offline for the next 4 hours.

 2013-05-16 12:00:: An external technician has been informed and will fix the printer in the room ETZ J66 (ETZSPEZ) as soon as possible.

----

<<Anchor(2013-04-23-ID-NAS-problems)>>

= The Informatikdienste have some problems with their NAS =
'''STATUS:''' {{attachment:green.gif}}

 2013-04-23 14:42:: As a result of the disorders of the ID-NAS, the VPP printers don't work properly at the moment.

 2013-04-24 07:00:: All ID Services should be back to normal.

----

<<Anchor(2013-03-06-network-outage-ETH)>>

= Outage Network Infrastructure ETH =
'''STATUS:''' {{attachment:green.gif}}

 2013-03-06 11:55:: Again Informatikdienste has a global problem with the network infrastructure.
 2013-03-06 13:00:: Services should be all back. We got informed about the cause by the [[https://www1.ethz.ch/id/servicedesk/sysstat/index|Informatikdienste]]. A Hardware-Loadbalancer in RZ crashed and was in an undefined state. Therefore the failover to HCI did not work.

----

<<Anchor(2011-11-12-solaris-server-patching)>>

= Maintenance work on D-ITET's central IT infrastructure =
'''STATUS:''' {{attachment:green.gif}}


 2013-02-23: 16:00:: All systems are back online.

 2013-02-23: 10:00-14:00:: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Solaris, Windows)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside

More details:
 * servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)

----

<<Anchor(2013-02-15-network-outage-ETH)>>

= Outage Network Infrastructure ETH =
'''STATUS:''' {{attachment:green.gif}}

 2013-02-15 08:05:: Again Informatikdienste has a global problem with the network infrastructure.

 2013-02-15 09:10:: Network is coming back, still problems present.

 2013-02-15 09:20:: Network ist still not 100% recovered.

 2013-02-15 09:45:: Network completly down again. We still don't have an update from Informatikdienste what is going on.

 2013-02-15 10:45:: Network ist still not stable.

 2013-02-15 12:10:: Network is coming back to normal, we are working on restoring the services.

 2013-02-15 12:30:: Most of our services are back online.

 2013-02-15 14:00:: We got informed about the cause by the [[https://www1.ethz.ch/id/servicedesk/sysstat/index|Informatikdienste]]. Basically a virtuall firewall which was not needed anymore was deleted by the Networkteam at ID. In consequence all interfaces of all virtual firrewalls got down bringing down a big part of the ETH network. Neither a reload of the firewall hardware helped on their side, so they needed to reinstall all appliances. See the status website for a german written explanation.
----

<<Anchor(2013-02-14-dns-problems)>>

= DNS Outage Network ETH =
'''STATUS:''' {{attachment:green.gif}}

 2013-02-14 14:35:: Currently DNS at ETH is down. ID will keep us informed. This affects all services.

 2013-02-14 14:45:: Services are stabilizing.

----

<<Anchor(2013-02-12-dns-problems)>>

= DNS Outage Network ETH =
'''STATUS:''' {{attachment:green.gif}}

 2013-02-12 16:30:: Currently DNS at ETH is down. ID will keep us informed. This affects all services.

 2013-02-12 18:50:: Services are stabilizing.

 2013-02-12 19:30:: Not all DNS names are yet restored. However most services affecting D-ITET customers should work again. More informations will be publisched at [[https://www1.ethz.ch/id/servicedesk/sysstat/index|Informatikdienste Statusseite]]. In case you experience some specific problems please contact us at support@ee.ethz.ch

----

<<Anchor(2013-02-07-oenone-servercrash)>>

= Server oenone crash =
'''STATUS:''' {{attachment:green.gif}}

 2013-02-06: 23:00:: oenone nfs server crashed. Affecting users of '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''. Homes and Webpages of these institutes.

 2013-02-07: 07:10:: we needed to reset the machine as the service for providing the home directories was unresonsive.

----

<<Anchor(2013-02-05-people-webserver)>>

= Hanging people.ee.ethz.ch Webserver =
'''STATUS:''' {{attachment:green.gif}}

 2013-02-05: 06:00:: The webserver for people.ee.ethz.ch was hanging this morning and was not pingable anymore.

 2013-02-05: 07:00:: We reseted the webserver and now investigating the issue. It's probably kernel- and hardware related.

----

<<Anchor(2013-01-14-oenone-servercrash)>>

= Server oenone crash =
'''STATUS:''' {{attachment:green.gif}}

 2013-01-14: 21:28:: oenone crashed. Affecting users of '''AUT''', '''BIWI''', '''IKT''' and '''VAW'''.

 2013-01-14 21:52:: oenone rebooted and is running again.

----
<<Anchor(2012-11-15-Webserver Outage)>>

= Webserver Outage =
'''STATUS:''' {{attachment:green.gif}}

 2012-11-15: 14:30:: Erroneous deletion of some apache configuration files led to outages of the webservers '''oenone''' and '''yosemite''' today between 13:30 and 14:15.

<<Anchor(2012-11-13-VPP Service Outage)>>

= Email Service Outage =
'''STATUS:''' {{attachment:green.gif}}

 2012-11-13: 07:30:: VPP announces print service outage for about 15 min. . This should resolve the latest VPP service issues.

----

<<Anchor(2012-11-09-Email Service Outage)>>

= Email Service Outage =
'''STATUS:''' {{attachment:green.gif}}

 2012-11-09: 10:05:: Emails can not be sent/received. ISG.EE is working to resolve this problem as soon as possible.
 2012-11-09: 12:20:: We are currently still investigating what caused the outage of our mail server.
 2012-11-09: 15:30:: The amavis Daemon (a high-performance interface between mailer (MTA) and AVI content checker) stopped working because his temporary directories were removed. It is not clear what removed these directories. We are still investigating this but in the meantime the mailserver is up and running. '''NO EMAILS WERE LOST''' but the mails were sent/received with a 1-2 hours delay.

----

<<Anchor(2012-11-02-VPP Outage)>>

= VPP Outage =
'''STATUS:''' {{attachment:green.gif}}

 2012-11-02: 16:00:: Jobs sent to VPP Printers don't get printed. We are investigating this problem together with VPP.
 2012-11-05: 07:00:: Services up and running again.

----

<<Anchor(2012-10-25-Network Outage)>>

= Complete Network Outage =
'''STATUS:''' {{attachment:green.gif}}

 2012-10-25: 07:05 - 09:15:: Complete network outage. Cause is still unclear but might be a side effect of the router hardware replacement due to a defect hardware announced yesterday.
 2012-10-25: 09:15:: The Informatik Dienste have posted a statement on their [[https://www1.ethz.ch/id/servicedesk/index|status page]]
----

<<Anchor(2012-10-16-Network Outage)>>

= Complete Network Outage =
'''STATUS:''' {{attachment:green.gif}}

 2012-10-16: 20:20 - 20:40 / 22:10 - 23:20 :: Complete network outtage could be a side effect of a central router upgrade from informatik dienste. We are investigating it.
 2012-10-17: 08:52 :: The informatik dienste have posted a statement on their [[https://www1.ethz.ch/id/servicedesk/index|status page]]

----

<<Anchor(2012-10-16-VPP Plotter at ETZSPEZ J66)>>

= Outage Poster printer at ETL J 66 / ETZSPEZ - 16. October 2012 =
'''STATUS:''' {{attachment:green.gif}}

 2012-10-16: 08:30 - 12:30 :: plotter HP6100 at ETZ J66 will be under maintenance and is not available during that time

----

<<Anchor(2012-10-10-lists.ee.ethz.ch_downtime)>>

= Mailinglist Downtime (lists.ee.ethz.ch) =
'''STATUS:''' {{attachment:green.gif}}

 2012-10-10: 09:00 - 17:00 :: Due to the migration of our mailing list software to sympa, we will take down the [[http://lists.ee.ethz.ch|lists.ee.ethz.ch]] website. Mails sent to lists.ee.ethz.ch will put into a HOLD queue and delivered to the mailinglist after migration. So '''no mails get lost'''!

 2012-10-10: 17:04 :: Mailinglists converted. Services up on running.

----

<<Anchor(2012-09-08-Ldap-Migration)>>

= LDAP Migration =
'''STATUS:''' {{attachment:green.gif}}

 2012-09-08: 10:00 - 17:00 :: All systems and services are not available during migration
 
 2012-09-08: 17:30 :: All systems are back online

----

<<Anchor(2012-08-17-Network-Problem-ETH)>>

= Network outage ETH =
'''STATUS:''' {{attachment:green.gif}}

 2012-08-17: 16:30:: Currently on ETH network there seem to be problems related on networking level. No information is available yet.

 2012-08-17: 17:15:: Systems are back online.

----

<<Anchor(2012-06-28-power-outage)>>

= Power outage at ETH =
'''STATUS:''' {{attachment:green.gif}}

 2012-06-28: 19:17:: ETH had power outage affecting many services. D-ITET infrastructure was partially affected too.

 2012-06-29: 08:00:: We are currently working on resolving the outstanding issues and bringing back online services which are still down.

 2012-06-29: 09:30:: On ID Status website https://www1.ethz.ch/id/servicedesk/sysstat/index you can find now further information.

 2012-06-29: 10:05:: The cause was a transformator on fire in the main building, causing a power downtime in the computer centres from 19:30 to 22:00.

----

<<Anchor(2012-06-20-oenone-servercrash)>>

= Server oenone crash =
'''STATUS:''' {{attachment:green.gif}}

 2012-06-20: 00:00:: oenone crashed. Affecting users of '''AUT''', '''BIWI''', '''COLLEGIUM''', '''IKT''' and '''VAW'''.

 2012-06-05: 00:30:: oenone rebooted and is running again.


(!) We are still investigating what caused the crash and will report further information here.

----

<<Anchor(2012-06-05-oenone-server-problem)>>

= oenone: hanging lockd affecting some User homes and webpages =
'''STATUS:''' {{attachment:green.gif}}

 2012-06-05: 21:37:: oenone is unresponsive. Affecting users of '''AUT''', '''BIWI''', '''COLLEGIUM''', '''IKT''' and '''VAW'''.

 2012-06-05: 00:00:: oenone was rebooted and is running again.

----

<<Anchor(2012-06-05-alumni-mailserver-problem)>>

= Outage on 2/3 of alumni.ethz.ch mailservers =
'''STATUS:''' {{attachment:green.gif}}

 2012-06-04: 16:00:: Our mailserver can't deliver emails to alumni.ethz.ch addresses. Reason: The receiving Servers have temporary errors: "451 unable to verify user". Looks like something is misconfigured there.
 2012-06-05: 10:00:: Looks like all alumni servers hosted on tophost.ch have a problem. We have created a temporary transport map which delivers the mails to the genotec.ch (the only one which works) alumni mailserver. As long as you use our mailserver to send emails to alumni addresses they will be delivered immediately.
----
<<Anchor(2012-05-13-oenone-server-problem)>>

= oenone: hanging lockd affecting some User homes and webpages =
'''STATUS:''' {{attachment:green.gif}}

 2012-05-13: 22:50:: oenone is unresponsive. Affecting users of '''AUT''', '''BIWI''', '''COLLEGIUM''', '''IKT''' and '''VAW'''.

 2012-05-14: 00:15:: oenone was rebooted and is running again.

----
<<Anchor(2012-05-08-drwho-server-problem)>>

= drwho: Main server outage affecting 64bit diskless Linux clients =
'''STATUS:''' {{attachment:green.gif}}

 2012-05-08: 11:50:: We currently experience some problems with one of our main server. All 64bit diskless clients are affected. We are working on the Problem. Furthermore some svn repositories might be affected.

 2012-05-08: 13:15:: The system is going back to normal but needs some time to fully recover.

 2012-05-08: 13:45:: The system should be back to normal. We still are working on some single hosts to recover them.

----


<<Anchor(2011-11-12-solaris-server-patching)>>

= Maintenance work on D-ITET's central IT infrastructure =
'''STATUS:''' {{attachment:green.gif}}

 2012-03-13: 21:30 :: All systems are back online.

 2012-03-13: 19:00 - 22:00 :: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside

More details:
 * servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)

----


<<Anchor(2012-02-16-zaan-maintenance)>>

= Maintenance downtime of Server behind ipp2vpp.ee.ethz.ch (printing/licenses/dhcp for selfmanaged hosts) =
'''STATUS:''' {{attachment:green.gif}}

On '''2012-02-16''' around '''8:00AM''' we are scheduling a maintenance downtime of zaan (serving printing, license Server and DHCP Server for D-ITET). The downtime will be as short as possible. We plan to have it down for around 45 minutes at most.

 '''2012-02-16 08:45:''' Server is up and running again.

----

<<Anchor(2012-02-13-oenone-crash)>>

= oenone home server crash =
'''STATUS:''' {{attachment:green.gif}}

 2012-02-13:: During this night at around 22:40 '''oenone''' one of our home-servers crashed. Users with homes on oenone where affected, these are '''BIWI''', '''VAW''', '''Collegium Helveticum''', '''Control''', '''ISI''', '''IKT'''. The system was up again at 00:40.

We are sorry for the caused inconvenience and we are investigating the problem.


----
<<Anchor(2012-01-31-oenone-emergency-reboot)>>

= Emergency reboot of ITET's server OENONE =
'''STATUS:''' {{attachment:green.gif}}

 2012-01-31: 21:00:: Emergency reboot of server OENONE due to a not responding storage area. User concerned: '''BIWI, VAW, Collegium Helveticum, Control, ISI, IKT, EEH'''

 2012-01-31: 21:45:: Server oenone is up again and all services are running

----
<<Anchor(2012-01-23-linux-reboots)>>

= Emergency reboots of all Linux Clients and Servers =
'''STATUS:''' {{attachment:green.gif}}

 2012-01-23: 15:00 PM:: Due to a critical issue we were forced to reboot all affecting hosts We are sorry for the short notice and the inconvenience caused to you.

 2012-01-23: 20:45 PM:: All Clients rebooted.

----
<<Anchor(2011-11-29-oenone-crash)>>

= Maintenance work on D-ITET's home server TARDIS and OENONE =
'''STATUS:''' {{attachment:green.gif}}

 2011-12-17: 11:00 AM:: successful reboot of TARDIS and OENONE
 
 2011-12-17 10:00AM - 11:00AM:: During the last installation of Oracle patches a bug within the automount daemon was introduced causing high CPU load on systems with a high number of auto-mounted file systems. We have investigated this problem together with Oracle. Now, a bug fix is available, but requests a server reboot. Due to this requirement we are going to reboot TARDIS and OENONE at

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside


More details:
 * servers concerned: TARDIS, OENONE
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)

----
<<Anchor(2011-11-29-oenone-crash)>>

= oenone home server crash =
'''STATUS:''' {{attachment:green.gif}}

 2011-11-29:: During this night at around 21:30 '''oenone''' one of our home-servers crashed. Users with homes on oenone where affected, these are '''BIWI''', '''VAW''', '''Collegium Helveticum''', '''Control''', '''ISI''', '''IKT'''. The system was up again at 23:00.

We are sorry for the caused inconvenience and we are investigating the problem.


----

<<Anchor(2011-11-14-power-outage)>>

= Poweroutage affecting compute clusters =
'''STATUS:''' {{attachment:green.gif}}
 2011-11-14:: Due to a power outage some racks containing our compute clusters went down.

 2011-11-14 08:00:: All compute clusters should be up and running again.

----

<<Anchor(2011-11-12-solaris-server-patching)>>

= Maintenance work on D-ITET's central IT infrastructure =
'''STATUS:''' {{attachment:green.gif}}

 2011-11-12: 05:50 PM:: Final reboot of TARDIS successfully terminated.

 2011-11-12: 04:00 PM:: The final reboot of tardis is still outstanding due to a broken disk within TARDIS internal RAID (boot device). The broken disk has been successfully replaced but the RAID is still syncing. The reboot is postponed until the sync process is finished and the reboot can safely be carried out. So be prepared for a short interrupt today or tomorrow.

 2011-11-12: 04:00 PM:: All systems are back online.

 2011-11-12: 10:00 AM - 04:00 PM:: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to upgrade specific storage software and install latest Oracle patches on our main servers. The servers will be rebooted multiple times during this maintenance

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside

More details:
 * servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)

----

<<Anchor(2011-08-19-switch-routing-loop)>>

= Routing problems on switch.ch network =
'''STATUS:''' {{attachment:green.gif}}
 2011-08-19: 05:00 PM:: Routing Problems solved.
 2011-08-19: 02:22 PM:: Because of a routing problem on the switch network, all traffic to http://www.virginia.edu and their mailserver is disturbed.

----

<<Anchor(2011-10-14-cronbox-migration)>>
= Migration of cronbox.ee.ethz.ch to Debian Squeeze =
'''STATUS:''' {{attachment:green.gif}}
 2011-10-14: 07:30:: We plan to migrate the server behind cronbox.ee.ethz.ch to the new version of the Debian operating system. Expected downtime is from 07:30 up to around 10:00. The affected services are '''cron''' and '''ssh''' logins to the machine.
 2011-10-14: 11:00:: Migration completed.

----

<<Anchor(2011-09-30-matlab-license-server-down)>>
= Matlab License Server down =
'''STATUS:''' {{attachment:green.gif}}
 2011-09-30: 07:00:: Currently the Server from Informatik Dienste providing the license server service is down. You can track the curren status at their [[https://www.komcenter.ethz.ch/home/idServices/show|ID-Service Status]] page under Lizenzen -> 1965@vnava.

 2011-09-30: 08:45:: License server from Informatikdienste is now up again.

 2011-09-30: 15:30:: The license server lic-matlab.ethz.ch is unavailable again. Due it is outside our control we cannot estimate when it will work again.

 2011-09-30: 16:00:: lib-matlab.ethz.ch is available again.

----

<<Anchor(2011-08-30-nfs-problems-on-oenone)>>
= NFS outage on oenone =
'''STATUS:''' {{attachment:green.gif}}
 2011-08-30: 02:00 AM - 08:30 AM:: During this night at around 02:00 the NFS Services on '''oenone''' crashed. Users with homes on oenone where affected, these are '''BIWI''', '''VAW''', '''Collegium Helveticum''', '''Control''', '''IBT''', '''IKT'''. This crash also affected all webservices which depend on oenone and the mailserver (at least for those having the home directory on oenone). No mails are lost as we put them into a hold queue!

 Update: 08:30:: oenone is now up and running again. All Mails on the hold queue are now gradualy delivered. The webservices are also all available now.

We are sorry for the caused inconvenience and we are investigating the problem.

----

<<Anchor(2011-08-23-admin-ch-connection-problems)>>
= Connection problems to all admin.ch servers =
'''STATUS:''' {{attachment:green.gif}}
 2011-08-23: 03:00 PM:: All admin.ch websites an mailservers are reachable.
 2011-08-23: 01:25 PM:: Currently all traffic to the admin.ch servers is disturbed. This includes the websites and also email connections. Your sent mails are not lost as our server keeps them until it can connect to the destination.

----

<<Anchor(2011-08-23-solaris-server-patching)>>

= Maintenance work on D-ITET's central IT infrastructure =
'''STATUS:''' {{attachment:green.gif}}

 2011-08-23: CANCELLATION:: Due to unresolved error within Solaris operating system introduced by latest patch set

 2011-08-23: 7:00 PM - 10:00 PM:: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to patch and reboot some of our Solaris servers.

To prevent data corruption/loss please do the following:
 * save all open files
 * close all running applications
 * logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
 * shutdown your personal PC/Desktop
 * do not establish any connection from outside

More details:
 * servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
 * webpages hosted on these systems are NOT available
 * NO mail access, NO outgoing mails (incoming mails WON'T get lost)
----

<<Anchor(2011-08-19-switch-routing-loop)>>

= Routing problems on switch.ch network =
'''STATUS:''' {{attachment:green.gif}}
 2011-08-19: 05:00 PM:: Routing Problems solved.
 2011-08-19: 02:22 PM:: Because of a routing problem on the switch network, all traffic to http://www.virginia.edu and their mailserver is disturbed.

----

<<Anchor(2011-08-08-drwho-problems)>>

= Outage drwho.ee.ethz.ch =
'''STATUS:''' {{attachment:green.gif}}

 2011-08-13 09:00:: Since around 1:00 we experience server problems on one of our main servers affecting most of the services.

 2011-08-13 11:00:: All services back to normal.

----

<<Anchor(2011-08-08-dns_outage)>>

= ETH wide DNS outage =
'''STATUS:''' {{attachment:green.gif}}

 2011-08-08 10:00:: ETH wide DNS outage. All Services using Name Resolution do not work. Our Mailserver denys all incoming messages with an {{{450 4.3.2 Service currently unavailable}}}. Properly configured Mailservers should retry the message delivery later, so no mail is lost.

 2011-08-08 10:20:: DNS works again. All Services up and running.


----

<<Anchor(2011-06-22-colombo04)>>

= colombo04 not available =
'''STATUS:''' {{attachment:green.gif}}

 2011-06-23 11:00:: Powersupply of colombo04 replaced. colombo04 up and ready.

 2011-06-22 07:00:: Powersupply of colombo04 broke. Now waiting for replacement from Oracle.

----

<<Anchor(2011-06-06-biwinas01)>>

= biwinas01 not available =
'''STATUS:''' {{attachment:green.gif}}

 2011-06-08 08:00:: biwinas01 is now back online.

 2011-06-07 15:00:: The hardware supplier returned the server today. They had to replace the following hardware:
 * 1 CPU
 * 1 power supply
 * 1 fan

We will test the server now and bring it back online as soon as possible.

 2011-06-06 15:00:: biwinas01 is currently out of order due to a hardware failure. It might take several days until biwinas01 is back online.

----
<<Anchor(2011-06-06-ifhlux11)>>

= ifhlux11 not available =
'''STATUS:''' {{attachment:green.gif}}

 2011-06-08 11:20:: iflux11 is now back online.

 2011-06-07 15:00:: We are in contact with the supplier. Unfortunately, the reason for the crash of ifhlux11 is not known yet.

 2011-06-06 15:00:: ifhlux11 is currently out of order due to a hardware failure. It might take several days until ifhlux11 is back online.

----
<<Anchor(2011-06-03-Cooling-System-Replacement)>>

= Coming soon: Outage of several IT services due to cooling system replacement =

'''STATUS:''' {{attachment:green.gif}}

 2011-06-03 15:00:: All servers are again up and running with the exceptions of '''biwinas01''' and '''ifhlux11''' (see above).

'''UPCOMING: 2011-06-03 09:00 - 2011-06-06 09:00'''

The refrigeration supply in the ETZ building will be replaced. One server room
has to be shutdown completely (ETZ/D/96.2; all compute servers of the institutes).
The other server room will get an emergency cooling system (ETZ/F/66). This will,
however, not allow the cooling of all currently running servers within that room.
Consequently we will shut down as many servers as possible.

'''Basic services like email, user homes and other network shares, printing and login to
Windows or Linux workstations are not expected to be affected by this construction work.'''
In general, if you are not familiar with the server names or service terms below, you
should not be affected at all.

'''These services will not be available during the construction work times:'''
 * Shut down 2011-06-03 at '''09:00'''. Back online 2011-06-06 at 09:00:
  * Remote-Desktop-Access to the Windows Terminal Servers
   * quinn
   * sivi
   * vega7
  * All institute NAS servers (no access via Samba, link in home, ssh, etc.)
   * biwinas01-03
   * hamam01
   * ifenas01
   * ibtnas01
  * Most IFE compute servers
   * bernstein
   * coltrane
   * dylan
   * haydn
   * marley
   * mozart
  * Publication databases on sato (IFA)

 * Shut down 2011-06-03 at '''16:00'''. Back online 2011-06-06 at 09:00:
  * Computer rooms for students
   * ETZ D 61.1
   * ETZ D 61.2
   * ETZ D 96
  * All institute compute servers
   * autserv*
   * bender*
   * biwilux*
   * casseri*
   * colombo*
   * IFE compute servers cash and elvis
   * ifhlux*
   * nariwork*
   * tik*x
   * vierzack*

----
<<Anchor(2011-05-26-Firewall-Problem)>>

= Firewall Problem: some Services not reachable =
'''STATUS:''' {{attachment:green.gif}}

'''2011-05-26 07:15 - 2011-05-26 16:00'''

Due to problems with the firewall hardware, some services of the D-ITET-servers were not reachable.

----
<<Anchor(2011-05-19-polaris-and-zaan-reboot)>>

= Maintenance reboot of Servers behind login.ee.ethz.ch and ipp2vpp.ee.ethz.ch (printing/licenses) =
'''STATUS:''' {{attachment:green.gif}}

On '''2011-05-19''' around '''7:00AM''' we will perform a maintenance reboot of polaris (serving login.ee.ethz.ch) and zaan (serving printing at D-ITET). During this downtime it will not be possible to print via samba or cups.

----
<<Anchor(2011-05-09-galen-reboot)>>

= Maintenance reboot of Server behind people.ee.ethz.ch =
'''STATUS:''' {{attachment:green.gif}}

'''2011-05-10 07:45'''

On '''2011-05-10''' around '''7:00AM''' we will perform a maintenance reboot of galen. galen is the Server serving your personal homepage on people.ee.ethz.ch.

----

<<Anchor(2011-05-02-horde-outage)>>

= Horde Webmail outage =
'''STATUS:''' {{attachment:green.gif}}

'''2011-05-02 23:49 - 2011-05-03 02:27'''

The [[https://email.ee.ethz.ch|Horde Webmail Client]] had to be taken down for security reasons. It was not clear if someone used a zero day exploit or a phished account to send spam over our server. It turned out, the attackers used a phished account.

/!\ '''Please Remember: We at ISG.EE will NEVER ask you for your password.'''

----

<<Anchor(2011-03-01-short-smtp-outage)>>

= Short SMTP outage =
'''STATUS:''' {{attachment:green.gif}}

 2011-02-28 14:49 - 15:04:: As a result of an LDAP failure we had to stop the mail server for 15 minutes, to prevent the rejection of incoming emails. While the Mainserver was down, our two backup MX collected incoming emails.

----

<<Anchor(2011-02-28-zhadum-crash)>>

= Windows Terminal Server zhadum out-of-operation =
'''STATUS:''' {{attachment:green.gif}}

 2011-02-28 15:00:: Please use the server '''vega7''' from now on. If you had access to zhadum before, you should also be able to access vega7. '''Please use your NETHZ username and password to log in.'''

 2011-02-28 14:00:: The hardware of the departemental Windows terminal server '''zhadum''' is broken. A replacement server should be available soon...

----
<<Anchor(2011-02-22-jabba-upgrade)>>

= Upgrade of Backup Server JABBA =
'''STATUS:''' {{attachment:green.gif}}

 2011-02-22 18:15 PM:: The migration is finished and the new JABBA server is online.

 2011-02-22 7:00 AM - 7:00 PM (approx.):: We are going to upgrade the departments backup server JABBA. This upgrade includes changes in software as well as in hardware. During the upgrade no restore request of lost data can be fulfilled.

'''The complete backup infrastructure and all belonging services are NOT available during the upgrade'''

----
<<Anchor(2011-02-07-agilent-license-server-change)>>

= Agilent ADS/ICCAP License Server Change =
'''STATUS:''' {{attachment:green.gif}}

 2011-02-07 10:00:: As announced a week ago, the license server for Agilent ADS and ICCAP software has changed. If your client software still uses the old license server information, please make sure you change the license server to '''lic-agilent.ee.ethz.ch'''. The port stays the same.

----
<<Anchor(2011-01-31-svn-firewall)>>

= Subversion server not reachable from outside ETH Zurich =
'''STATUS:''' {{attachment:green.gif}}

 2011-01-31 16:30:: Our subversion server is now available again for users outside of ETH Zurich.

 2011-01-31: 16:15:: We have been told that the central firewall rules cannot be changed right now due to other (completely unrelated) problems. At the moment it is unknown when this will be fixed.

 2011-01-31: 15:30:: Our subversion server svn.ee.ethz.ch is at the moment not accessible through the svn:// protocol from outside ETH Zurich due to a firewall configuration error. We are working on it...

----
<<Anchor(2011-01-26-mailser-problems-after-patching)>>

= Mailserver problems while patching =
'''STATUS:''' {{attachment:green.gif}}

 2011-01-26: 11:00 AM:: During the server upgrade last night a patch has temporarily misconfigured the mail server. The server accepted incoming mails but could not place them into the users mailbox. EVERY such mail created a bounce. Because of this, our statement that no mails get lost while updating the servers, is not fully true anymore. Incoming Mails which bounced back have to be resent by the sender.

----
<<Anchor(2010-12-14-solaris-server-patching)>>

= Solaris Server Patching =
'''STATUS:''' {{attachment:green.gif}}

 2011-01-25: 10:30 PM:: All servers and services are back online

 2011-01-25: 7:00 PM - 10:00 PM:: To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular base. For this reason we are going to patch and reboot some of our Solaris servers.

Servers concerned: '''drwho''', '''tardis''', '''oenone''', '''spitfire''', '''yosemite''', '''malina'''.

To prevent data corruption/loss please do the following:

 * All diskless clients (Linux): please logout and shutdown all DL clients
 * All Windows systems: please logout and shutdown all Windows clients
 * All Mac systems: please logout and shutdown all Mac clients
 * All user homes: please logout from these servers
 * No NFS or SAMBA access to user homes
 * No mail access, no outgoing mails (incoming mails WON'T get lost)
 * Webpages hosted on these systems are unavailable

----
<<Anchor(2010-12-19-delayed-email-delivery)>>

= Delayed Email delivery =
'''STATUS:''' {{attachment:green.gif}}

 2011-01-19: 11:00 AM - 4:30 PM:: As a result of a faulty [[http://lurker.clamav.net/thread/20110119.125839.2b4ce0e1.en.html|ClamAV signature File]] every Email that contained a PDF-file was marked as infected. Before we could resend the quarantined emails we had to fix the issue. No mail was lost and everything was resent.

 Update: 2011-01-20: 10:33 AM:: ClamAV Signatues have been updated and tested. Everything is working as it should.

----
<<Anchor(2010-12-07-reboot-yosemite)>>

= Maintenance Reboot of Solaris Server Yosemite =
'''STATUS:''' {{attachment:green.gif}}

 2010-12-07: 7:30 AM:: Server yosemite has been rebooted successfully. All services are available.

 2010-12-07: 7:00 AM:: Due to a shortage of available memory we are forced to reboot the solaris server yosemite. Downtime approx. 30 minutes.

----
<<Anchor(2010-11-26-servers-down)>>

= cooling water system outage on clusters =
'''STATUS:''' {{attachment:green.gif}}

 2010-11-26: 5:00 PM:: Host autserv02 is running as well. All hosts can be used.

 2010-11-26: 4:40 PM:: Server racks are cooled again, all hosts except of autserv02 are running and can be used.

 2010-11-26: 4:00 PM:: Server racks are still down. --> Update follows at 5 PM or earlier

 2010-11-26: 3:10 PM:: One of the cooling water pumps installed in ETZ/D/96.2 does not work correctly. This forces some of the racks in this server room to shutdown in order to protect the servers from thermal damage. '''clusters from IFH, IBT, BIWI, TIK, IKT and VAW are affected.''' the facility management is working on solving the problem. --> Update follows at 4 PM

----
<<Anchor(2010-11-23-email-phishing-attack)>>

= email phishing attack =
'''STATUS:''' {{attachment:green.gif}}

 2010-11-23:: Yesterday between 18:20 and 19:50 about 320 Phishing Mails have been sent to different Users at D-ITET. The Mails pretend to come from ''IT Support Group'' and contain the subject ''ISG.EE Webmail Alert''. The mail tells something about ''spammers'' that have compromised ''the'' ISG.EE Webmail Account and that you should provide your '''Username, Password''' and some '''Alternate Email'''. Please remember, that the ISG.EE Team will '''NEVER ask you for your Password!''' If you still have replied to this phishing mail please contact us '''immediately''' under support@ee.ethz.ch so that we can plan with you the next steps to keep your account safe.

----
<<Anchor(2010-11-17-oenone-crash)>>

= oenone home server crash =
'''STATUS:''' {{attachment:green.gif}}

 2010-11-17:: During this night at around 00:15 '''oenone''' one of our home-servers crashed. Users with homes on oenone where affected, these are '''BIWI''', '''VAW''', '''Collegium Helveticum''', '''Control''', '''IBT''', '''IKT'''. The server is now checking the filesystems and comming up again.

We are sorry for the caused inconvenience and we are investigating the problem.

 Update: 08:00:: oenone is now up and running again.
 Update: 2010-11-18 07:30:: We opened a support case at Sun/Oracle for this server.

----
<<Anchor(2010-11-16-servers-down)>>

= cooling water system outage for some clusters =
'''STATUS:''' {{attachment:green.gif}}

 2010-11-16:: On last friday evening one of the cooling water pumps installed in ETZ/D/96.2 stopped working correctly. This forced some of the racks in this server room to shutdown in order to protect the servers from thermal damage. '''All clusters from IFH, IBT, BIWI, TIK, IKT and VAW were affected.'''

The facility management is working on solving the problem.

The servers are currently (08:35) down again. Please, even if they come up again, do not use them for long-timed computations as we still do not know when exactly the technician has solved the issue.

 Update: 2010-11-17 08:25:: The rack systems are running now with only one cooling water pump. A new pump is ordered by the rack company.
 Update: 2010-11-18 16:00:: Planed substitution of broken pump will be on 25.11 or 26.11.

----
[[CategoryEDUC]]

General Informations

Status-Key

green.gif

Resolved

orange.gif

Still working but with some errors

red.gif

Pending

Problems on core server drwho

STATUS: orange.gif

2015-02-27: 17:00
Planned maintenance downtime.
  • ITET's core server DRWHO experiences major problems with its disk controller and its associated disks. Affected hardware needs to be replaced by Oracle support. Due to this maintenance work we are going to shutdown DRWHO on

    Friday, February 27th, between 17:00 and 22:00.

2015-02-13: 09:15
All services should now be available again. The maintenance downtime was much longer than planned. Please apologize for any inconveniences that might have caused.
2015-02-13: 08:30
The problems seem not to be resolved. The server is online but some services are not running yet.
2015-02-13: 08:00
The firmware update took longer than planned. The server should be back online in the next minutes.
2015-02-12: 11:00
The firmware needs to be updated. This will be done on Friday, 13th of February, between 06:30 and 07:30 hours.
2015-02-09: 14:30
The server is back online, we are still investigating the issue.
2015-02-09: 13:15
Affected are all diskless clients and services which depend on drwho (e.g. the email.ee.ethz.ch website).
2015-02-09: 12:55
The D-ITET server drwho.ee.ethz.ch is currently not reachable.


www.ee.ethz.ch and www.werkstatt.ee.ethz.ch - Websites down

STATUS: green.gif

2015-01-26: 18:53

Informatikdienste resolved the problems and http://www.ee.ethz.ch and http://www.werkstatt.ee.ethz.ch are now again online.

2015-01-26: 14:12
ID annouces, that the downtime could last until 18:30.
2015-01-26: 09:30
Informed ID Servicedesk about the issue.


lists.ee.ethz.ch - Composing has been temporarily disabled

STATUS: green.gif

2015-01-20: 22:05
We applied the security updates and released again the composing area.
2015-01-20: 11:15
Composing of messages has been temporarily disabled due to a security flaw. People using the composing tool are able to view and send any file on the filesystem. Composing will be re-enabled after the package maintainers have updated sympa.


SMTP Email Sending: Sophos Problems

STATUS: green.gif

2015-01-15: 17:45
After the last virus-signature update of the Sophos antivirus software, we had a virus data checksum error. This was the reason for the delays. After fixing this issue, everything went to normal state.
2015-01-15: 16:00
The sending and receiving of emails takes too much time and the mailserver is on high load. We are working on a solution.


Server oenone NFS problems

STATUS: green.gif

2015-01-12: 21:37

We see NFS related problems on server oenone again. Affected were homes of all users from AUT, BIWI, IKT and VAW. This issue also affects related webpages and services.

2015-01-13: 07:20
The system is again recovered and back online.


Problems sending emails because of LDAP outage

STATUS: green.gif

2014-12-02: 16:15
Set status to green as we had no other problems.
2014-11-30: 23:15
We have reconfigured dovecot to use another ldap server. Currently it looks that sending emails with dovecot or roundcube is working again. But since we do not know the cause of the outtage it could be possible, that this newly configured LDAP server could also crash.
2014-11-30: 16:50
One of the main LDAP Servers (ldaps-rz-2) from the informatik dienste, used to authenticate the sending user, is not available anymore.


Maintenance work on D-ITET's central IT infrastructure

STATUS: green.gif

2014-11-22: 14:00
All services are back online.
2014-11-22: 10:00-16:00
To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance.

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Windows)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


Server room ETZ D 96 down

STATUS: green.gif

2014-11-14: 07:00
bender30 online
2014-11-13: 17:10
All servers listed below should be back online. With the exception of bender30 which seems to have a more serious problem.
2014-11-13: 15:45
The cooling system in ETZ D 96 should be back to normal operation. We will progressively power on all servers listed below.
2014-11-13: 15:24
The cooling system in ETZ D 96 has temporarily been out of order. To prevent a overheating of the servers, all racks in ETZ D 96 have automatically been shut down. Affected servers include
  • arton*
  • autserv*
  • bender*
  • biwirender*
  • colombo*
  • ifhlux*
  • nariwork*
  • tik*x
  • vierzack*


Server oenone NFS problems

STATUS: green.gif

2014-11-07: 23:50

We see NFS related problems on server oenone again. Affected were homes of all users from AUT, BIWI, IKT and VAW. This issue also affects related webpages and services.

2014-11-08: 05:00
The system is again recovered and back online.


YOSEMITE shutdown for maintenance

STATUS: green.gif

2014-08-21: 07:00
Server YOSEMITE is back online
2014-08-21: 07:00
Server YOSEMITE is down for emergency maintenance


Webmail outage

STATUS: green.gif

2014-06-30: 18:25
Webmail is running normal.
2014-06-30: 17:30
Our webmail server "roundcube" is currently down due to maintenance.


Swisstime outage

STATUS: green.gif

2014-06-05
Due to a new Linux kernel vulnerability all of our workstations, compute cluster servers and service servers will be rebooted.


Swisstime outage

STATUS: green.gif

2014-05-26: 17:00
Our time server "swisstime" is currently offline due to a switch hardware failure.
2014-05-27: 15:30
The switch has been replaced and reconfigured.


OpenSSL Heartbleed vulnerability

STATUS: green.gif

2014-04-07: 21:00
we got aware of a OpenSSL vulnerability in OpenSSL's support for the TLS/DTLS Heartbeat extension. Memory from either client or server can be recovered by an attacker. This vulnerability might allow an attacker to compromise the private key and other sensitive data in memory.
2014-04-08: 06:30
Relying on the updates of our used distribution we rolled out the update and had at this time already patched versions on all vulnerable servers.
2014-04-10: 08:00
Updates were rolled out also on Clients side.
2014-04-14: 12:00
OpenSSL certificates are replaced for previously affected services on the vulnerable servers.
2014-04-16: 16:00

Informatikdienste announce that they also patched all relevant services and advice customers to change the password. We confirm that this step is strongly recommended. Note in particular that the vulnerability was already exploited since November 2013 to collect passwords, session data, private keys, etc.


No Emails accepted from webservers

STATUS: green.gif

2014-04-14: 14:00
The plugin which caused this malicious behavior has been deactivated. Email connections from the webcluster work now as intended.
2014-04-13: 18:00
A webform has been missused to send spam over our infrastructure. In order to make sure this does not happen again until we investigate the case, all emails sent from our webservers will be rejected. Unfortunately some scripts do not realize this rejection and create a success message.

We appologize for this.


Maintenance work on D-ITET's central IT infrastructure

STATUS: green.gif

2014-04-12: 15:50
All services are back online
2014-04-12: 10:00-16:00
To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Windows)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


Networkmaintenance affecting some services

STATUS: green.gif

Netzwerkdienste is doing an router update between 6:30 and 7:00 on 2013-01-07. This can cause a downtime of the network reaching the following services for about 15 minutes:

  • Subversion service on svn.ee.ethz.ch

  • Sympa mailinglists on lists.ee.ethz.ch (both mailing and webinterface). Note: No mails will be lost


MySQL Server downtime

STATUS: green.gif

2013-11-18: 12:00
We are experiencing Problems with the MySQL database cluster. Affected are also all websites using the MySQL instance on remi.ee.ethz.ch
2013-11-18: 12:45
The system is working again.


Server oenone NFS problems

STATUS: green.gif

2013-11-12: 04:00

We see NFS related problems on server oenone again. Affected were homes of all users from AUT, BIWI, IKT and VAW.

2013-11-12: 07:40
The system is again back online.


Server oenone NFS problems

STATUS: green.gif

2013-10-16: 02:00

Starting from around 02:00 in the morning we had NFS related problems on server oenone. Affected were homes of all users from AUT, BIWI, IKT and VAW. Webpages of these institutes may also have been affected.

2013-10-16: 03:00
Server oenone recovered itself and is running as normal again.


Maintenance work on D-ITET's central IT infrastructure

STATUS: green.gif

2013-09-28: 14:30
All services are back online
2013-09-28: 10:00-16:00
To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Windows)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


Server oenone crash

STATUS: green.gif

2013-09-04: 21:30

oenone nfs server crashed. Affecting users of AUT, BIWI, IKT and VAW. Homes and Webpages of these institutes.

2013-09-05: 04:00
we needed to reset the server to bring it again up.


Server oenone crash

STATUS: green.gif

2013-06-15: 08:30

oenone nfs server crashed. Affecting users of AUT, BIWI, IKT and VAW. Homes and Webpages of these institutes.

2013-06-15: 11:45
we needed to reset the machine as the service for providing the home directories was unresponsive.


ETZSPEZ - HP 6100 encountered print quality issues

STATUS: green.gif

2013-05-30 12:00
The Printer has been fixed.
2013-05-30 09:00
Technician is mending the printer right now. Printer is offline for the next 4 hours.
2013-05-16 12:00
An external technician has been informed and will fix the printer in the room ETZ J66 (ETZSPEZ) as soon as possible.


The Informatikdienste have some problems with their NAS

STATUS: green.gif

2013-04-23 14:42
As a result of the disorders of the ID-NAS, the VPP printers don't work properly at the moment.
2013-04-24 07:00
All ID Services should be back to normal.


Outage Network Infrastructure ETH

STATUS: green.gif

2013-03-06 11:55
Again Informatikdienste has a global problem with the network infrastructure.
2013-03-06 13:00

Services should be all back. We got informed about the cause by the Informatikdienste. A Hardware-Loadbalancer in RZ crashed and was in an undefined state. Therefore the failover to HCI did not work.


Maintenance work on D-ITET's central IT infrastructure

STATUS: green.gif

2013-02-23: 16:00
All systems are back online.
2013-02-23: 10:00-14:00
To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Solaris, Windows)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


Outage Network Infrastructure ETH

STATUS: green.gif

2013-02-15 08:05
Again Informatikdienste has a global problem with the network infrastructure.
2013-02-15 09:10
Network is coming back, still problems present.
2013-02-15 09:20
Network ist still not 100% recovered.
2013-02-15 09:45
Network completly down again. We still don't have an update from Informatikdienste what is going on.
2013-02-15 10:45
Network ist still not stable.
2013-02-15 12:10
Network is coming back to normal, we are working on restoring the services.
2013-02-15 12:30
Most of our services are back online.
2013-02-15 14:00

We got informed about the cause by the Informatikdienste. Basically a virtuall firewall which was not needed anymore was deleted by the Networkteam at ID. In consequence all interfaces of all virtual firrewalls got down bringing down a big part of the ETH network. Neither a reload of the firewall hardware helped on their side, so they needed to reinstall all appliances. See the status website for a german written explanation.


DNS Outage Network ETH

STATUS: green.gif

2013-02-14 14:35
Currently DNS at ETH is down. ID will keep us informed. This affects all services.
2013-02-14 14:45
Services are stabilizing.


DNS Outage Network ETH

STATUS: green.gif

2013-02-12 16:30
Currently DNS at ETH is down. ID will keep us informed. This affects all services.
2013-02-12 18:50
Services are stabilizing.
2013-02-12 19:30

Not all DNS names are yet restored. However most services affecting D-ITET customers should work again. More informations will be publisched at Informatikdienste Statusseite. In case you experience some specific problems please contact us at support@ee.ethz.ch


Server oenone crash

STATUS: green.gif

2013-02-06: 23:00

oenone nfs server crashed. Affecting users of AUT, BIWI, IKT and VAW. Homes and Webpages of these institutes.

2013-02-07: 07:10
we needed to reset the machine as the service for providing the home directories was unresonsive.


Hanging people.ee.ethz.ch Webserver

STATUS: green.gif

2013-02-05: 06:00
The webserver for people.ee.ethz.ch was hanging this morning and was not pingable anymore.
2013-02-05: 07:00
We reseted the webserver and now investigating the issue. It's probably kernel- and hardware related.


Server oenone crash

STATUS: green.gif

2013-01-14: 21:28

oenone crashed. Affecting users of AUT, BIWI, IKT and VAW.

2013-01-14 21:52
oenone rebooted and is running again.


Webserver Outage

STATUS: green.gif

2012-11-15: 14:30

Erroneous deletion of some apache configuration files led to outages of the webservers oenone and yosemite today between 13:30 and 14:15.

Email Service Outage

STATUS: green.gif

2012-11-13: 07:30
VPP announces print service outage for about 15 min. . This should resolve the latest VPP service issues.


Email Service Outage

STATUS: green.gif

2012-11-09: 10:05
Emails can not be sent/received. ISG.EE is working to resolve this problem as soon as possible.
2012-11-09: 12:20
We are currently still investigating what caused the outage of our mail server.
2012-11-09: 15:30

The amavis Daemon (a high-performance interface between mailer (MTA) and AVI content checker) stopped working because his temporary directories were removed. It is not clear what removed these directories. We are still investigating this but in the meantime the mailserver is up and running. NO EMAILS WERE LOST but the mails were sent/received with a 1-2 hours delay.


VPP Outage

STATUS: green.gif

2012-11-02: 16:00
Jobs sent to VPP Printers don't get printed. We are investigating this problem together with VPP.
2012-11-05: 07:00
Services up and running again.


Complete Network Outage

STATUS: green.gif

2012-10-25: 07:05 - 09:15
Complete network outage. Cause is still unclear but might be a side effect of the router hardware replacement due to a defect hardware announced yesterday.
2012-10-25: 09:15

The Informatik Dienste have posted a statement on their status page


Complete Network Outage

STATUS: green.gif

2012-10-16: 20:20 - 20:40 / 22:10 - 23:20
Complete network outtage could be a side effect of a central router upgrade from informatik dienste. We are investigating it.
2012-10-17: 08:52

The informatik dienste have posted a statement on their status page


Outage Poster printer at ETL J 66 / ETZSPEZ - 16. October 2012

STATUS: green.gif

2012-10-16: 08:30 - 12:30
plotter HP6100 at ETZ J66 will be under maintenance and is not available during that time


Mailinglist Downtime (lists.ee.ethz.ch)

STATUS: green.gif

2012-10-10: 09:00 - 17:00

Due to the migration of our mailing list software to sympa, we will take down the lists.ee.ethz.ch website. Mails sent to lists.ee.ethz.ch will put into a HOLD queue and delivered to the mailinglist after migration. So no mails get lost!

2012-10-10: 17:04
Mailinglists converted. Services up on running.


LDAP Migration

STATUS: green.gif

2012-09-08: 10:00 - 17:00
All systems and services are not available during migration
2012-09-08: 17:30
All systems are back online


Network outage ETH

STATUS: green.gif

2012-08-17: 16:30
Currently on ETH network there seem to be problems related on networking level. No information is available yet.
2012-08-17: 17:15
Systems are back online.


Power outage at ETH

STATUS: green.gif

2012-06-28: 19:17
ETH had power outage affecting many services. D-ITET infrastructure was partially affected too.
2012-06-29: 08:00
We are currently working on resolving the outstanding issues and bringing back online services which are still down.
2012-06-29: 09:30

On ID Status website https://www1.ethz.ch/id/servicedesk/sysstat/index you can find now further information.

2012-06-29: 10:05
The cause was a transformator on fire in the main building, causing a power downtime in the computer centres from 19:30 to 22:00.


Server oenone crash

STATUS: green.gif

2012-06-20: 00:00

oenone crashed. Affecting users of AUT, BIWI, COLLEGIUM, IKT and VAW.

2012-06-05: 00:30
oenone rebooted and is running again.

(!) We are still investigating what caused the crash and will report further information here.


oenone: hanging lockd affecting some User homes and webpages

STATUS: green.gif

2012-06-05: 21:37

oenone is unresponsive. Affecting users of AUT, BIWI, COLLEGIUM, IKT and VAW.

2012-06-05: 00:00
oenone was rebooted and is running again.


Outage on 2/3 of alumni.ethz.ch mailservers

STATUS: green.gif

2012-06-04: 16:00
Our mailserver can't deliver emails to alumni.ethz.ch addresses. Reason: The receiving Servers have temporary errors: "451 unable to verify user". Looks like something is misconfigured there.
2012-06-05: 10:00
Looks like all alumni servers hosted on tophost.ch have a problem. We have created a temporary transport map which delivers the mails to the genotec.ch (the only one which works) alumni mailserver. As long as you use our mailserver to send emails to alumni addresses they will be delivered immediately.


oenone: hanging lockd affecting some User homes and webpages

STATUS: green.gif

2012-05-13: 22:50

oenone is unresponsive. Affecting users of AUT, BIWI, COLLEGIUM, IKT and VAW.

2012-05-14: 00:15
oenone was rebooted and is running again.


drwho: Main server outage affecting 64bit diskless Linux clients

STATUS: green.gif

2012-05-08: 11:50
We currently experience some problems with one of our main server. All 64bit diskless clients are affected. We are working on the Problem. Furthermore some svn repositories might be affected.
2012-05-08: 13:15
The system is going back to normal but needs some time to fully recover.
2012-05-08: 13:45
The system should be back to normal. We still are working on some single hosts to recover them.


Maintenance work on D-ITET's central IT infrastructure

STATUS: green.gif

2012-03-13: 21:30
All systems are back online.
2012-03-13: 19:00 - 22:00
To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to install latest Oracle patches on our main servers. The servers will be rebooted during this maintenance

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


Maintenance downtime of Server behind ipp2vpp.ee.ethz.ch (printing/licenses/dhcp for selfmanaged hosts)

STATUS: green.gif

On 2012-02-16 around 8:00AM we are scheduling a maintenance downtime of zaan (serving printing, license Server and DHCP Server for D-ITET). The downtime will be as short as possible. We plan to have it down for around 45 minutes at most.

  • 2012-02-16 08:45: Server is up and running again.


oenone home server crash

STATUS: green.gif

2012-02-13

During this night at around 22:40 oenone one of our home-servers crashed. Users with homes on oenone where affected, these are BIWI, VAW, Collegium Helveticum, Control, ISI, IKT. The system was up again at 00:40.

We are sorry for the caused inconvenience and we are investigating the problem.


Emergency reboot of ITET's server OENONE

STATUS: green.gif

2012-01-31: 21:00

Emergency reboot of server OENONE due to a not responding storage area. User concerned: BIWI, VAW, Collegium Helveticum, Control, ISI, IKT, EEH

2012-01-31: 21:45
Server oenone is up again and all services are running


Emergency reboots of all Linux Clients and Servers

STATUS: green.gif

2012-01-23: 15:00 PM
Due to a critical issue we were forced to reboot all affecting hosts We are sorry for the short notice and the inconvenience caused to you.
2012-01-23: 20:45 PM
All Clients rebooted.


Maintenance work on D-ITET's home server TARDIS and OENONE

STATUS: green.gif

2011-12-17: 11:00 AM
successful reboot of TARDIS and OENONE
2011-12-17 10:00AM - 11:00AM
During the last installation of Oracle patches a bug within the automount daemon was introduced causing high CPU load on systems with a high number of auto-mounted file systems. We have investigated this problem together with Oracle. Now, a bug fix is available, but requests a server reboot. Due to this requirement we are going to reboot TARDIS and OENONE at

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: TARDIS, OENONE
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


oenone home server crash

STATUS: green.gif

2011-11-29

During this night at around 21:30 oenone one of our home-servers crashed. Users with homes on oenone where affected, these are BIWI, VAW, Collegium Helveticum, Control, ISI, IKT. The system was up again at 23:00.

We are sorry for the caused inconvenience and we are investigating the problem.


Poweroutage affecting compute clusters

STATUS: green.gif

2011-11-14
Due to a power outage some racks containing our compute clusters went down.
2011-11-14 08:00
All compute clusters should be up and running again.


Maintenance work on D-ITET's central IT infrastructure

STATUS: green.gif

2011-11-12: 05:50 PM
Final reboot of TARDIS successfully terminated.
2011-11-12: 04:00 PM
The final reboot of tardis is still outstanding due to a broken disk within TARDIS internal RAID (boot device). The broken disk has been successfully replaced but the RAID is still syncing. The reboot is postponed until the sync process is finished and the reboot can safely be carried out. So be prepared for a short interrupt today or tomorrow.
2011-11-12: 04:00 PM
All systems are back online.
2011-11-12: 10:00 AM - 04:00 PM
To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to upgrade specific storage software and install latest Oracle patches on our main servers. The servers will be rebooted multiple times during this maintenance

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


Routing problems on switch.ch network

STATUS: green.gif

2011-08-19: 05:00 PM
Routing Problems solved.
2011-08-19: 02:22 PM

Because of a routing problem on the switch network, all traffic to http://www.virginia.edu and their mailserver is disturbed.


Migration of cronbox.ee.ethz.ch to Debian Squeeze

STATUS: green.gif

2011-10-14: 07:30

We plan to migrate the server behind cronbox.ee.ethz.ch to the new version of the Debian operating system. Expected downtime is from 07:30 up to around 10:00. The affected services are cron and ssh logins to the machine.

2011-10-14: 11:00
Migration completed.


Matlab License Server down

STATUS: green.gif

2011-09-30: 07:00

Currently the Server from Informatik Dienste providing the license server service is down. You can track the curren status at their ID-Service Status page under Lizenzen -> 1965@vnava.

2011-09-30: 08:45
License server from Informatikdienste is now up again.
2011-09-30: 15:30
The license server lic-matlab.ethz.ch is unavailable again. Due it is outside our control we cannot estimate when it will work again.
2011-09-30: 16:00
lib-matlab.ethz.ch is available again.


NFS outage on oenone

STATUS: green.gif

2011-08-30: 02:00 AM - 08:30 AM

During this night at around 02:00 the NFS Services on oenone crashed. Users with homes on oenone where affected, these are BIWI, VAW, Collegium Helveticum, Control, IBT, IKT. This crash also affected all webservices which depend on oenone and the mailserver (at least for those having the home directory on oenone). No mails are lost as we put them into a hold queue!

Update: 08:30
oenone is now up and running again. All Mails on the hold queue are now gradualy delivered. The webservices are also all available now.

We are sorry for the caused inconvenience and we are investigating the problem.


Connection problems to all admin.ch servers

STATUS: green.gif

2011-08-23: 03:00 PM
All admin.ch websites an mailservers are reachable.
2011-08-23: 01:25 PM
Currently all traffic to the admin.ch servers is disturbed. This includes the websites and also email connections. Your sent mails are not lost as our server keeps them until it can connect to the destination.


Maintenance work on D-ITET's central IT infrastructure

STATUS: green.gif

2011-08-23: CANCELLATION
Due to unresolved error within Solaris operating system introduced by latest patch set
2011-08-23: 7:00 PM - 10:00 PM
To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular basis. For this reason we are going to patch and reboot some of our Solaris servers.

To prevent data corruption/loss please do the following:

  • save all open files
  • close all running applications
  • logout from all ITET systems (Linux, Solaris, Windows, Mac OS X)
  • shutdown your personal PC/Desktop
  • do not establish any connection from outside

More details:

  • servers concerned: DRWHO, TARDIS, OENONE, SPITFIRE, YOSEMITE, MALINA
  • webpages hosted on these systems are NOT available
  • NO mail access, NO outgoing mails (incoming mails WON'T get lost)


Routing problems on switch.ch network

STATUS: green.gif

2011-08-19: 05:00 PM
Routing Problems solved.
2011-08-19: 02:22 PM

Because of a routing problem on the switch network, all traffic to http://www.virginia.edu and their mailserver is disturbed.


Outage drwho.ee.ethz.ch

STATUS: green.gif

2011-08-13 09:00
Since around 1:00 we experience server problems on one of our main servers affecting most of the services.
2011-08-13 11:00
All services back to normal.


ETH wide DNS outage

STATUS: green.gif

2011-08-08 10:00

ETH wide DNS outage. All Services using Name Resolution do not work. Our Mailserver denys all incoming messages with an 450 4.3.2 Service currently unavailable. Properly configured Mailservers should retry the message delivery later, so no mail is lost.

2011-08-08 10:20
DNS works again. All Services up and running.


colombo04 not available

STATUS: green.gif

2011-06-23 11:00
Powersupply of colombo04 replaced. colombo04 up and ready.
2011-06-22 07:00
Powersupply of colombo04 broke. Now waiting for replacement from Oracle.


biwinas01 not available

STATUS: green.gif

2011-06-08 08:00
biwinas01 is now back online.
2011-06-07 15:00
The hardware supplier returned the server today. They had to replace the following hardware:
  • 1 CPU
  • 1 power supply
  • 1 fan
  • We will test the server now and bring it back online as soon as possible.

    2011-06-06 15:00
    biwinas01 is currently out of order due to a hardware failure. It might take several days until biwinas01 is back online.


    ifhlux11 not available

    STATUS: green.gif

    2011-06-08 11:20
    iflux11 is now back online.
    2011-06-07 15:00
    We are in contact with the supplier. Unfortunately, the reason for the crash of ifhlux11 is not known yet.
    2011-06-06 15:00
    ifhlux11 is currently out of order due to a hardware failure. It might take several days until ifhlux11 is back online.


    Coming soon: Outage of several IT services due to cooling system replacement

    STATUS: green.gif

    2011-06-03 15:00

    All servers are again up and running with the exceptions of biwinas01 and ifhlux11 (see above).

    UPCOMING: 2011-06-03 09:00 - 2011-06-06 09:00

    The refrigeration supply in the ETZ building will be replaced. One server room has to be shutdown completely (ETZ/D/96.2; all compute servers of the institutes). The other server room will get an emergency cooling system (ETZ/F/66). This will, however, not allow the cooling of all currently running servers within that room. Consequently we will shut down as many servers as possible.

    Basic services like email, user homes and other network shares, printing and login to Windows or Linux workstations are not expected to be affected by this construction work. In general, if you are not familiar with the server names or service terms below, you should not be affected at all.

    These services will not be available during the construction work times:

    • Shut down 2011-06-03 at 09:00. Back online 2011-06-06 at 09:00:

      • Remote-Desktop-Access to the Windows Terminal Servers
        • quinn
        • sivi
        • vega7
      • All institute NAS servers (no access via Samba, link in home, ssh, etc.)
        • biwinas01-03
        • hamam01
        • ifenas01
        • ibtnas01
      • Most IFE compute servers
        • bernstein
        • coltrane
        • dylan
        • haydn
        • marley
        • mozart
      • Publication databases on sato (IFA)
    • Shut down 2011-06-03 at 16:00. Back online 2011-06-06 at 09:00:

      • Computer rooms for students
        • ETZ D 61.1
        • ETZ D 61.2
        • ETZ D 96
      • All institute compute servers
        • autserv*
        • bender*
        • biwilux*
        • casseri*
        • colombo*
        • IFE compute servers cash and elvis
        • ifhlux*
        • nariwork*
        • tik*x
        • vierzack*


    Firewall Problem: some Services not reachable

    STATUS: green.gif

    2011-05-26 07:15 - 2011-05-26 16:00

    Due to problems with the firewall hardware, some services of the D-ITET-servers were not reachable.


    Maintenance reboot of Servers behind login.ee.ethz.ch and ipp2vpp.ee.ethz.ch (printing/licenses)

    STATUS: green.gif

    On 2011-05-19 around 7:00AM we will perform a maintenance reboot of polaris (serving login.ee.ethz.ch) and zaan (serving printing at D-ITET). During this downtime it will not be possible to print via samba or cups.


    Maintenance reboot of Server behind people.ee.ethz.ch

    STATUS: green.gif

    2011-05-10 07:45

    On 2011-05-10 around 7:00AM we will perform a maintenance reboot of galen. galen is the Server serving your personal homepage on people.ee.ethz.ch.


    Horde Webmail outage

    STATUS: green.gif

    2011-05-02 23:49 - 2011-05-03 02:27

    The Horde Webmail Client had to be taken down for security reasons. It was not clear if someone used a zero day exploit or a phished account to send spam over our server. It turned out, the attackers used a phished account.

    /!\ Please Remember: We at ISG.EE will NEVER ask you for your password.


    Short SMTP outage

    STATUS: green.gif

    2011-02-28 14:49 - 15:04
    As a result of an LDAP failure we had to stop the mail server for 15 minutes, to prevent the rejection of incoming emails. While the Mainserver was down, our two backup MX collected incoming emails.


    Windows Terminal Server zhadum out-of-operation

    STATUS: green.gif

    2011-02-28 15:00

    Please use the server vega7 from now on. If you had access to zhadum before, you should also be able to access vega7. Please use your NETHZ username and password to log in.

    2011-02-28 14:00

    The hardware of the departemental Windows terminal server zhadum is broken. A replacement server should be available soon...


    Upgrade of Backup Server JABBA

    STATUS: green.gif

    2011-02-22 18:15 PM
    The migration is finished and the new JABBA server is online.
    2011-02-22 7:00 AM - 7:00 PM (approx.)
    We are going to upgrade the departments backup server JABBA. This upgrade includes changes in software as well as in hardware. During the upgrade no restore request of lost data can be fulfilled.

    The complete backup infrastructure and all belonging services are NOT available during the upgrade


    Agilent ADS/ICCAP License Server Change

    STATUS: green.gif

    2011-02-07 10:00

    As announced a week ago, the license server for Agilent ADS and ICCAP software has changed. If your client software still uses the old license server information, please make sure you change the license server to lic-agilent.ee.ethz.ch. The port stays the same.


    Subversion server not reachable from outside ETH Zurich

    STATUS: green.gif

    2011-01-31 16:30
    Our subversion server is now available again for users outside of ETH Zurich.
    2011-01-31: 16:15
    We have been told that the central firewall rules cannot be changed right now due to other (completely unrelated) problems. At the moment it is unknown when this will be fixed.
    2011-01-31: 15:30
    Our subversion server svn.ee.ethz.ch is at the moment not accessible through the svn:// protocol from outside ETH Zurich due to a firewall configuration error. We are working on it...


    Mailserver problems while patching

    STATUS: green.gif

    2011-01-26: 11:00 AM
    During the server upgrade last night a patch has temporarily misconfigured the mail server. The server accepted incoming mails but could not place them into the users mailbox. EVERY such mail created a bounce. Because of this, our statement that no mails get lost while updating the servers, is not fully true anymore. Incoming Mails which bounced back have to be resent by the sender.


    Solaris Server Patching

    STATUS: green.gif

    2011-01-25: 10:30 PM
    All servers and services are back online
    2011-01-25: 7:00 PM - 10:00 PM
    To keep our systems up to date with the newest software and security releases, we need to update our servers on a regular base. For this reason we are going to patch and reboot some of our Solaris servers.

    Servers concerned: drwho, tardis, oenone, spitfire, yosemite, malina.

    To prevent data corruption/loss please do the following:

    • All diskless clients (Linux): please logout and shutdown all DL clients
    • All Windows systems: please logout and shutdown all Windows clients
    • All Mac systems: please logout and shutdown all Mac clients
    • All user homes: please logout from these servers
    • No NFS or SAMBA access to user homes
    • No mail access, no outgoing mails (incoming mails WON'T get lost)
    • Webpages hosted on these systems are unavailable


    Delayed Email delivery

    STATUS: green.gif

    2011-01-19: 11:00 AM - 4:30 PM

    As a result of a faulty ClamAV signature File every Email that contained a PDF-file was marked as infected. Before we could resend the quarantined emails we had to fix the issue. No mail was lost and everything was resent.

    Update: 2011-01-20: 10:33 AM
    ClamAV Signatues have been updated and tested. Everything is working as it should.


    Maintenance Reboot of Solaris Server Yosemite

    STATUS: green.gif

    2010-12-07: 7:30 AM
    Server yosemite has been rebooted successfully. All services are available.
    2010-12-07: 7:00 AM
    Due to a shortage of available memory we are forced to reboot the solaris server yosemite. Downtime approx. 30 minutes.


    cooling water system outage on clusters

    STATUS: green.gif

    2010-11-26: 5:00 PM
    Host autserv02 is running as well. All hosts can be used.
    2010-11-26: 4:40 PM
    Server racks are cooled again, all hosts except of autserv02 are running and can be used.
    2010-11-26: 4:00 PM

    Server racks are still down. --> Update follows at 5 PM or earlier

    2010-11-26: 3:10 PM

    One of the cooling water pumps installed in ETZ/D/96.2 does not work correctly. This forces some of the racks in this server room to shutdown in order to protect the servers from thermal damage. clusters from IFH, IBT, BIWI, TIK, IKT and VAW are affected. the facility management is working on solving the problem. --> Update follows at 4 PM


    email phishing attack

    STATUS: green.gif

    2010-11-23

    Yesterday between 18:20 and 19:50 about 320 Phishing Mails have been sent to different Users at D-ITET. The Mails pretend to come from IT Support Group and contain the subject ISG.EE Webmail Alert. The mail tells something about spammers that have compromised the ISG.EE Webmail Account and that you should provide your Username, Password and some Alternate Email. Please remember, that the ISG.EE Team will NEVER ask you for your Password! If you still have replied to this phishing mail please contact us immediately under support@ee.ethz.ch so that we can plan with you the next steps to keep your account safe.


    oenone home server crash

    STATUS: green.gif

    2010-11-17

    During this night at around 00:15 oenone one of our home-servers crashed. Users with homes on oenone where affected, these are BIWI, VAW, Collegium Helveticum, Control, IBT, IKT. The server is now checking the filesystems and comming up again.

    We are sorry for the caused inconvenience and we are investigating the problem.

    Update: 08:00
    oenone is now up and running again.
    Update: 2010-11-18 07:30
    We opened a support case at Sun/Oracle for this server.


    cooling water system outage for some clusters

    STATUS: green.gif

    2010-11-16

    On last friday evening one of the cooling water pumps installed in ETZ/D/96.2 stopped working correctly. This forced some of the racks in this server room to shutdown in order to protect the servers from thermal damage. All clusters from IFH, IBT, BIWI, TIK, IKT and VAW were affected.

    The facility management is working on solving the problem.

    The servers are currently (08:35) down again. Please, even if they come up again, do not use them for long-timed computations as we still do not know when exactly the technician has solved the issue.

    Update: 2010-11-17 08:25
    The rack systems are running now with only one cooling water pump. A new pump is ordered by the rack company.
    Update: 2010-11-18 16:00
    Planed substitution of broken pump will be on 25.11 or 26.11.


    CategoryEDUC

    Status (last edited 2023-10-16 11:24:17 by alders)