Tech:Incidents

This pages lists all incidents on Orain. Newest incidents are listed at the top. Tracking started in April 2014. If there is anything missing then please add it!

March 2015

 * Incidents/2015-03-16 - 1 hour of downtime due to Linux OOM killer on prod8 (this report includes information about some long-term issues too)

February 2015

 * Incidents/2015-02-23 - 1 hour, 35 minutes downtime caused by runaway overload on MediaWiki application servers
 * Incidents/2015-02-15 - 2 hours of frequent 502 Bad Gateway errors due to prod9 HHVM
 * Incidents/2015-02-07 - 20 minutes of downtime due to corrupt extension testing and bad DB list

January 2015

 * Incidents/2015-01-27 - 15-20 minutes of downtime due to extension test gone wrong
 * Incidents/2015-01-23 - 4 hours of downtime due to DNS solving issues

December 2014

 * Incidents/2014-12-prod3 - 2 hour downtime from a prod3 failure. Removed from cluster
 * Incidents/2014-12-hhvm - Several days of slow loading times and 504s

July 2014

 * Incidents/2014-07-prod3Reinstall - 3 days of downtime after a forced reinstall on prod3
 * Incidents/2014-07-07 - 4 hours issues with the databases (cause not known)

June 2014

 * Incidents/2014-06-ExtensionIssues - 12 days of issues with some extensions due to an issue with the Echo extension
 * Incidents/2014-06-14 - 20 minutes downtime due to bad DB list
 * Incidents/2014-06-12 - 14 hours downtime due to prod4 suspension by RamNode

May 2014

 * Incidents/2014-05-12 - 18 hours downtime due to prod4 suspension by RamNode

April 2014

 * Incidents/2014-04-Downtimes - 16 hours + 80 hours downtimes due to SSL cert and i18n / fpm issues