Tech:Server admin log

July 23

 * 22:53 JohnLewis: reboot prod3 - MySQL health and load is fluctuating massively

July 22

 * 19:30 JohnLewis: gave SELECT to archive, revision, user and recentchanges tables on all PUBLIC WKIS for user 'useranalysis' on prod3. Account used by Cyberpower678 for his useranalysis tool. +1 for Orain-Community relations!

July 21

 * 15:42 JohnLewis: restart memcached (causing MediaWiki exceptions)

July 16

 * 20:00 JohnLewis: install prod5's basic requirements.
 * 18:55 JohnLewis: reboot prod4; irregular issues occuring
 * 16:05 JohnLewis: revert DNS back after fixing necessary issues relating to DNS
 * 15:20 JohnLewis: changed DNS for Orain directly to prod4 - broke all non-prod4 services

July 14

 * 21:09 JohnLewis: new SSL cert is installed and confirmed to be functioning correctly

July 13

 * 21:38 Addshore: Everything back up

July 8

 * 14:43 JohnLewis: mysql -p -e "drop database Techwritewiki"

July 7

 * 14:09 JohnLewis: speed seems to have improved. Need to further monitor the SQL downtimes however.
 * 14:08 JohnLewis: reboot prod3
 * 14:00 JohnLewis: prod3 is not responding to shell commands; matches downtimes with the farm

July 5

 * 17:00 JohnLewis: php createLocalAccount.php --wiki detectiveconanwiki --username KidProdigy
 * 16:58 JohnLewis: php migrateAccount.php --wiki metawiki --auto --homewiki metawiki --username KidProdigy
 * 13:45 JohnLewis: php maintenance/runJobs.php --wiki allthetropeswiki
 * 11:51 JohnLewis: mysql -p -e "drop database Revitestwiki"

June 28

 * 09:56 JohnLewis: confirmed security fix
 * 09:55 JohnLewis: reboot prod4 to force restart
 * 09:54 JohnLewis: manually patch OpenSSL to the latest release to fix a security issue (again)

June 20

 * 22:17 JohnLewis updated php5-fpm
 * 22:10 JohnLewis: updated OpenSSL to a security fix release
 * 00:10 Addshore: Started a run of rebuildFileCache.php for ATT wiki in a SCREEN on prod4 to fix pages once CSS extension has been re enabled

June 19

 * 23:45 Addshore: Reenabled ansible on prod4
 * 23:30 Addshore: Ran update.php on all wikis

June 18

 * 22:24 Addshore: Ran update.php on all wikis

June 15

 * 19:41 JohnLewis: mkdir OrainHacks; add a basic extension file and a .magic. file with LQT magicwords in. php rebuildLocalisationCache.php --force --wiki extloadwiki. Happy days! Now need to do it for the other 10 extensions disabled.

June 14

 * 19:45 Addshore: metawiki up, running the get db list script
 * 19:44 Addshore: DBlist is corrupt, replacing with "metawiki|meta|"
 * 19:35 Addshore: Removed Popups extension from mediawiki and reenabled ansible cron
 * 19:24 Addshore: all sites getting DB errors

June 11

 * 16:48 JohnLewis: disabled ansible to prevent ansible running while I do stuff (staggered committing)

June 10

 * 11:10 JohnLewis: added new .log files and rearranged the logging structure

June 9

 * 16:21 JohnLewis: upgrade spamassassin on prod1
 * 16:03 JohnLewis: php update.php --wiki jossewiki --quick

June 8

 * 0:00 JohnLewis: php deleteArchivedRevisions.php --wiki allthetropeswiki --delete

June 7

 * 21:36 JohnLewis: re-enable ansible
 * 21:30 JohnLewis: ran update.php on all wikis for MW 1.23 update
 * 19:35 JohnLewis: disabled ansible (for safety)
 * 16:08 JohnLewis: restarted memcached to clean up stuff
 * 16:00 JohnLewis: renamed 'spacetimewiki' database to 'timespacewiki'

June 3

 * 17:57 JohnLewis: purge torblock's node index

June 1

 * 15:34 JohnLewis: force password reset for "Stef99"
 * 15:33 JohnLewis: restart memcached

May 31

 * 12:46 addshore: ran update.php on ALL wikis
 * 12:43 addshore: updating to MW 1.22.7

May 30

 * 18:50 JohnLewis: remove 'notice' for CreateWiki on GitHub
 * 12:30 JohnLewis: ran ansible on prod4 to catch new nginx rules
 * 12:29 JohnLewis: change ufw rules on prod1 for mail
 * 01:02 JohnLewis: ufw allow 9300 and ufw allow 9200
 * 01:02 JohnLewis: playing tennis for elasticsearch on prod1. restarting it a bit.
 * 00:50 JohnLewis: remove elasticsearch from prod1
 * 00:25 JohnLewis: massive reduce in disk space :D
 * 00:24 addshore: on prod4 rm /root/old

May 29

 * 22:53 JohnLewis: ran ansible on prod1; needed to get the port rule in
 * 22:17 JohnLewis: del
 * 22:17 JohnLewis: restarted nginx
 * 22:06 addshore: rebooting prod1
 * 18:44 JohnLewis: restarted nagios3 on prod1
 * 18:30 addshore: ansible successfully runs on prod1 now, adding to cron
 * 18:25 JohnLewis: prod1: nagios3 -v *
 * 18:02 addshore: update ansible to 1.6.2 on prod1
 * 18:01 addshore: update ansible to 1.6.2 on prod3
 * 18:01 JohnLewis: Removed 'notice' from OrainMessages calls from GitHub
 * 17:58 addshore: update ansible to 1.6.2 on prod4
 * 17:57 addshore: orainLog back up...
 * 13:00 - 17:00 - Addshore - Poking prod4 and ansible. Prod4 now again has ansible on a cronjob. There were multiple shot downtimes during this time due to the poking of ufw (the firewall), but this was for the greater good!!!

May 24

 * 12:51 - JohnLewis - service php5-fpm restart
 * 12:46 - pingdom reports site down
 * 12:06 - JohnLewis - rename verkeerswiki to verkeerwiki. A bunch of SQL stuff.

May 23

 * 20:06 - JohnLewis - php createLocalAccount.php --wiki=espiralarchivowiki John

May 14

 * 16:07 - JohnLewis - php createLocalAccount.php --wiki=onepiecewiki Bocaniko
 * 16:01 - JohnLewis - clear apc cache

May 12

 * Recent downtime was caused by prod4 being suspended by the host, this is resolved.

May 11

 * 19:00 - JohnLewis: php reassignEdits.php --wiki allthetropeswiki 300154507a A300154507

May 09

 * 13:15 - addshore: added values for duplicity and AWS to prod3 vars
 * 13:15 - addshore: added AWS_BACKUPS_ACCESS_KEY_ID to prod3 vars.yml

April 15

 * prod2 died, migrated to a new user and the set up was pretty much so hacky nothing worked really. Kudu knows more about that than me.
 * A key server file became corrupted and the server crashed. That account s for around 24 hours, then we moved to a new server and had to deal with a hacky set up which accounts for the other 40 ish hours downtime.
 * This is kinda bad to say this downtime happened while we were still looking at the old downtime.. so :/
 * At least we know *why* this one occured.

April 14

 * 16:37 Addshore: prod4 - Killed db loop scripts running i18n cache updates
 * 16:42 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (this is all that is ever needed as extload has everything loaded and i18n cache is shared)
 * 16:45 JohnLewis: Reboot prod4
 * 16:50 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (in a SCREEN)
 * 16:51 JohnLewis: root@prod4:/# /etc/init.d/apache2 stop
 * 16:51 JohnLewis: root@prod4:/# /etc/init.d/nginx start
 * 17:17 Addshore: Remove JohnLewis IP from deny hosts file for sshd again on prod3

April 9

 * 22:30 JohnLewis: re enabled ansible cron
 * 16:23 JohnLewis: disabled ansible cron (doing live work on prod2 for ATTwiki). I'll post a note when I'm done.

April 6

 * 13:16 JohnLewis: run update.php on dangsunsnwiki and cheer
 * 13:08 JohnLewis: eval.php some more stuff into my dangsunsnwiki account...
 * 13:06 JohnLewis: eval.php an email into my dangsunsnwiki account
 * 13:03 JohnLewis: get annoyed about things
 * 12:57 JohnLewis: rename buswiki -> dangsunsnwiki

April 5

 * 9:10 JohnLewis: prod2 nginx killed and restarted, i18n cache reloaded

April 4

 * 13:00 pingdom reports orain down

April 3

 * 17:10 JohnLewis: dropped centralnoticetestwiki database as all worked - not needed now
 * 17:06 JohnLewis: manually ran ansible
 * 16:41 JohnLewis: ran update.php on all wikis

April 1

 * 19:14 JohnLewis: restarted nginx (not an April fools)

March 30

 * 16:41 JohnLewis: manually ran ansible because Joe is right about my stupidity sometimes
 * 16:19 JohnLewis: drop temp database (used to fix some issues with importing)
 * 16:18 JohnLewis: run update.php on archivoespiral and metawiki
 * 16:15 JohnLewis: do a bunch of SQL stuff on prod3 to get archivoespiral working

March 29

 * 19:46 Addshore: Remove JL IP from from prod2 deny hosts file

March 28

 * 21:30 Addshore: Remove JL IP from from prod2 deny hosts file

March 18

 * 20:27 JohnLewis: re enabled ansible cron on prod2
 * 19:47 JohnLewis: disabled ansible cron on prod2

March 15

 * 22:47 kudu: Run fixDoubleRedirects.php on ATT

March 12

 * 17:35 JohnLewis: populated interwiki table on some databases

March 8

 * 14:32 JohnLewis: changed some centralauth database entries to suit wiki move

March 7

 * 22:40 JohnLewis: dumped allthetropeswiki for Arcane
 * 16:53 JohnLewis: manually updated ansible
 * 16:51 JohnLewis: renamed database trainwiki to reviwiki

March 4

 * 20:46 JohnLewis: ran CentralAuth's createLocalAccount.php for myself on a few wikis to fix things

March 2

 * 01:25 JohnLewis: ran update.php on all wikis
 * 00:54 JohnLewis: manually ran ansible again
 * 00:47 JohnLewis: manually updated ansible (debugging - yay)

March 1

 * 23:58 JohnLewis: manually ran ansible and update.php on jdwiki

February 26

 * 21:53 JohnLewis: ditto on metawiki
 * 21:52 JohnLewis: ran update on jh67wiki

February 25

 * 03:32 kudu: Ran fixDoubleRedirects.php on ATT

February 21

 * 23:54 addshore: ran update.php for pmr2014wiki
 * 23:48 addshore: prod2 uninstalled dvipng texlive-latex-base etc. cjk-latex
 * 23:30 addshore: .... all of which we have and work.... GAH!
 * 23:30 addshore: for the record it is stuck on.. Failed to parse(PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert))
 * 23:29 addshore: apt-get installed dvipng texlive-latex-base texlive-latex-extra tex-live-recommended cjk-latex while trying to fix Math, no success
 * 22:36 addshore: chmod and chown Math extension .. we should have this all pulled as www-data
 * 22:22 addshore: prod2 ran make in /usr/share/nginx/.orain.org/w/extensions/Math/math
 * 22:07 addshore: ran update.php on extload
 * 21:56 addshore: reenable prod2 ansible cron
 * 21:14 addshore: disabling ansible on prod2
 * 20:14 addshore: ran i18n cache rebuild
 * 20:14 JohnFLewis: rebooted prod2
 * 15:20 JohnLewis: manually update ansible
 * 10:34 addshore: ran update.php on all wikis

February 20

 * 21:57 JohnLewis: ran update.php on jdwiki
 * 21:11 addshore: reenabling ansible pull cron on prod2 after resolving issue 173
 * 20:06 addshore: comment out ansible pull from prod2 cron while I manually poke collection extension
 * 16:25 JohnLewis: Info: Mail is fully working with a final dovecot restart!
 * 16:20 JohnLewis: changed dovecot config and the restarted x3 (issues first two times)
 * 15:32 JohnLewis: restarted dovecot on prod1

February 19

 * 22:40 addshore: prod2 on mediawiki submodules ran git submodle foreach --recursive git config core.fileMode false - this also solves the dirsty Elastica folder
 * 22:33 addshore: prod2 on mediawiki submodules ran git submodle foreach git config core.fileMode false
 * 22:26 addshore: prod2 chown www-data:www-data /w/extensions/*
 * 22:19 addshore: simplified the two ansible cronjobs on prod2
 * 22:13 addshore: rm /root/ans on prod2, this files is wrong!
 * 00:30 kudu: Ran rebuildtextindex.php on all wikis

February 13

 * 16:23 addshore: i18n cache broke, tried rebuilding off extload but the script wouldn't run, ran off metawiki first then off extloadwiki and everything returned to normal. The question remains why did the cache break in the first place and why could we not rebuild from extload wiki in the first place?
 * 16:20 addshore: MWEXCEPTIONS EVERYWHERE!

February 11

 * 03:11 kudu: Compress revisions on ATT using concat mode

February 8

 * 17:52 kudu: Disabled ufw on prod2, wasn't working well

February 7

 * 21:13 addshore: reenable ansible cron on prod2
 * 21:07 addshore: i18n rebuilt
 * 21:06 addshore: No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php. - rebuild fails
 * 21:05 addshore: exceptions everywhere, disabling ansible cron on prod2
 * 21:03 JohnLewis: rebooted prod2
 * 20:19 addshore: re enable ansible cron on prod2
 * 21:00 Kudu: indexing for elastic search on prod3
 * 19:06 Kudu: Modified ufw settings on prod3 following this article:, http://blog.kylemanna.com/linux/2013/04/26/ufw-vps/
 * 13:49 addshore: 504 Gateway Time-out on extload
 * 13:02 addshore: added another case to cc.php, apc clearing is working again
 * 12:38 addshore: running manual ansible pull
 * 12:33 addshore: disabling ansible cron on prod2

January 29

 * 23:14 - addshore - rename Promethiawiki db to promethiawiki. Also created Renaming_a_database to help people fix this in the future

January 27

 * 18:48 - John - re enabled ansible pull
 * 18:36 - John - disabled ansible pull temporarily for now -- needs to reenabled shortly

January 26

 * 19:08 - John - prod3 is now set up. Waiting before enabling however.
 * 18:25 - John - finished installing basic things onto prod3.
 * 15:17 - addshore - attempted everything I could think of but think Kudu will have to fix this, no idea what he has done! For tracking see github issue
 * 14:40 - addshore - after trying to revert several changes ansible still doesn't seem to pull, now it appears to just be hanging once updating from the repo. i.e. no tasks are run
 * 14:08 - addshore - ansible pull broken caused by the 'Create User' task, investigating...

January 25

 * 18:18 - John - Ran update.php on techwiki per Addshore's request
 * 18:07 - John - Manually pull ansible to fix my stupidity

January 20

 * 3:14 Kudu (talk) Indexed mediawikitesterswiki pages in CirrusSearch and launched an indexing loop for ATT.

January 19

 * 19:51 - addshore - remove pipeing to log files for all ansible-pulls. since http://git.io/JMVAXQ we make ansible do it itself

January 18

 * 17:15 - addshore - remove unused /etc/nginx/sites-enabled/test file that was being included but not in playbook
 * 15:27 - addshore - ansible-pull now runs as a success again!
 * 15:25 - addshore - update ansible to 1.4.4
 * 15:22 - addshore - ran updatedb
 * 15:05 - addshore - failed_when was added in 1.4 per this need to update ansible in order for the below to work
 * 15:03 - addshore - ansible-pull >> ERROR: failed_when is not a legal parameter in an Ansible task or handler >> caused by my commit (in the process of fixing now..)

January 13

 * 19:17 - addshore - ansible-pull ran. Success!
 * 19:16 - addshore - For extensions/InterwikiMagic "git remote set-url origin https://gerrit.wikimedia.org/r/p/mediawiki/extensions/InterwikiMagic.git" was previously "https://git.wikimedia.org/git/mediawiki/extensions/InterwikiMagic.git"
 * 19:13 - addshore - ansible-pull ran, no longer falling over DPLforum, Instead falling over InterwikiMagic
 * 19:12 - addshore - For extensions/DPLforum "git remote set-url origin https://gerrit.wikimedia.org/r/p/mediawiki/extensions/DPLforum.git" was previously "https://git.wikimedia.org/git/mediawiki/extensions/DPLforum.git"

January 11

 * 16:34 - addshore - Reports of pull not pulling, upon looking at the logs 1 task is failing. "unable to access 'https://git.wikimedia.org/git/mediawiki/extensions/DPLforum.git/': server certificate verification failed" thus "Unable to fetch in submodule path 'extensions/DPLforum'"

January 06

 * 18:50 - addshore - Manual pull to pickup this commit cleaning cronjobs before the next jobqueue jobqueue run
 * 18:26 - addshore - Manually ran update.php across everything AGAIN as it was needed for the last commit.
 * 18:21 - addshore - Manual pull to pickup another commit fixing more of local settings, John again needs to be slapped!
 * 18:04 - addshore - Run maintenance.php for ALL wikis, I gather some runs have been missed during the whole cron not pulling anisble thing!
 * 17:49 - addshore - Manually stab the job queue, As far as I can tell from this commit the cron should work but I dont want to wait 10 mins to find out.!
 * 17:39 - addshore - Manual anisble pull of this commit to fix broken local settings. John needs to be slaped for this commit
 * 17:25 - addshore - Manual anisble pull
 * 17:22 - addshore - uncomment anisble-pull line from crontab, Not sure who has done this..., Also change the cron to every 10 mins. As this may have been like this for a while I am bracing for some errors...

2014

 * Happy New Year  ·addshore·  talk to me! 17:23, 6 January 2014 (UTC)

December 14

 * 18:30 Kudu (talk) Installed git from wheezy-backports, updated MediaWiki to 1.22 and ran update.php on all wikis.

December 1

 * 00:24 Kudu (talk) Chmodded /var/log/mediawiki to 770.

November 26

 * 03:12 Kudu (talk) Ran deleteBatch.php on a list of ATT redirects and ran deleteArchivedRevisions.php on All The Tropes.

November 23

 * 23:54 Kudu (talk) Re-chmodded the web directory and the MediaWiki log directory to 770 and set the git core.fileMode configuration option to false in the MediaWiki directory to stop it from messing with permissions.
 * 18:26 Kudu (talk) Ran deleteBatch.php on a list of Troper Tales pages and ran deleteArchivedRevisions.php on All The Tropes.

November 21

 * 00:25 addshore - manually run i18n cache update

November 17

 * 18:54 Addshore - update.php on all wikis
 * 18:48 Addshore - 'git stash' changes on extension/CentralNotice prod2. Not sure why the changes were there but they were stopping ansible from updating the extension

November 12

 * 02:47 Kudu (talk) Imported the new file description pages on ATT and ran deleteOldRevisions.php on the file pages thanks to some SQL/xargs magic.

November 11

 * 23:23 Kudu (talk) Imported the file description pages on ATT and ran deleteArchivedFiles.php and deleteOldRevisions.php on the file pages thanks to some SQL/xargs magic.
 * 22:11 Kudu (talk) Running importImages.php on ATT's missing and numeric images.

November 10

 * 15:42 Kudu (talk) Ran update.php and rebuildTitleKeys.php on all wikis.

November 07

 * 20:44 Addshore - Manually run 18n cache update as the cron isnt working
 * 20:09 Addshore - Reenable cron per the 12 commits just being this....
 * 19:59 Addshore - commenting out anisble-pull from prod2 crontab after somehow pushing 12 unexpected and unknown changes to github...
 * 19:52 Addshore Correct file owners and permissions for allthetropes images directory -R

November 02

 * 12:24 Addshore Rebuilding all caches to reflect file location moves due to upload hostname change

November 01

 * 09:43 Addshore Ran update.php on wikinambaswiki resolving db errors see here

October 20

 * 19:09 Kudu (talk) Set up CopperEgg server monitoring.

September 29

 * 10:11 Kudu (talk) Renamed the database `alleniawikiwiki` to `alleniawiki`.

July 31

 * 22:48 Kudu (talk) Changed MySQL parameters: table_open_cache=2500, thread_cache_size=48. Kudu (talk) 22:48, 31 July 2013 (UTC)
 * 22:37 Kudu (talk) Change MySQL parameters: long_query_time=1, query_cache_size=32M, slow_query_log=1, table_open_cache=400, thread_cache_size=4. Those are preliminary settings, they should be adjusted more carefully eventually.