Upgrades
Guidance for low downtime updates.
- The non-active tools & bots servers should be updated first if possible. These can be rebooted freely.
- Decide whether to do a DC failover (only needed for larger upgrades with likely long downtime (Major Version Changes).
- If not doing a DC failover, decide whether to failover just traffic, this will make services a bit faster as there is a lot of latency between the DB.
- Databases should only be restarted carefully and will come back in read only
Standard Procedure[edit | edit source]
- Upgrade & Reboot the passive bots & tools server
- Upgrade & reboot the primary bots server
- depool the passive DB from Phab
- stop mysql on the pasive DB, upgrade & reboot
- confirm the passive DB is replicating fine
- stop phd on the primary toolserver
- place phab & active DB in read only, repool the passive DB
- confirm replication is up to date and mark the active DB as down in Phab
- reboot the primary DB & Toolserver
- confirm replication is connected
- remove DB read only
- restore phab to RW with a master-replica setup.
- start phd on the primary toolserver
Traffic Failover[edit | edit source]
- lower the load balancing session affinity
- Upgrade & Reboot the passive bots & tools server
- disable the primary bots server in the LB pool
- Upgrade & reboot the primary bots server
- enable the primary bots server in the LB pool
- depool the passive DB from Phab
- stop mysql on the pasive DB, upgrade & reboot
- confirm the passive DB is replicating fine
- stop phd on the primary toolserver
- place phab & active DB in read only, repool the passive DB
- disable the primary toolserver in the LB pool
- confirm replication is up to date and mark the active DB as down in Phab
- reboot the primary DB & Toolserver
- confirm replication is connected
- remove DB read only
- enable the primary toolserver in the LB pool
- restore phab to RW with a master-replica setup.
- start phd on both toolservers
- raise the load balancing session affinity
DC Failover[edit | edit source]
- Upgrade & Reboot the passive bots & tools server
- depool the passive DB from Phab
- stop mysql on the pasive DB, upgrade & reboot
- confirm the passive DB is replicating fine
- repool the passive DB
- switch DNS to the passive DC
- disable services in the active DC
- run the backup scripts in the active DC
- stop phd on the primary toolserver
- restore backups for non replicated systems
- confirm replication is up to date and mark the active DB as down in Phab
- reverse the direction of replication
- promote the replica DB to master
- confirm replication status
- reboot the primary DB & Toolserver
- confirm replication is connected
- remove new master DB & Phab read only
- repeat with roles swapped to failback