Wikimedia Cloud Services team/EnhancementProposals/Production Readiness Checklist
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
A list of items that we should think about before deploying new services/components/changes in production.
This is a working in progress, feel free to contribute.
Support Lifecycle
Service Level
Document support lifecycle of the solution (roadmap, timeline, expected SLA/SLO, etc)
Infrastructure components
Document all components that make up the solution and their supported lifecycle for improvements, bug fixes, security fixes, etc.
If we're using LTS versions or not, how long they're supported, by whom, etc.
Resource Limits
All resources consumed by users must have an upper limit to avoid unbounded resource usage.
- CPU/memory/disk usage should have clear limits per user/application
- API endpoints should be rate limited
Performance Tests
It's necessary to understand what levels of performance the solution can provide.
User Facing
Benchmark and document critical user-facing points in the solution to understand system behavior globally.
Building Blocks
Benchmark and document low-level building blocks (servers, memory throughput network-attached storage, local disks, network, etc).
Documentation
User Documentation
- Documentation for most common use cases
- Frequently Asked Question page
- Contact page
Admin Documentation
- Infrastructure diagrams
- High-level overview of data flow, request timeline, etc
- Runbooks
- How to deploy from scratch
- How to deploy new changes
- How to restart
- Creating/deleting/changing resources (users, apps, projects, etc... any manageable object)
- Troubleshooting steps
Monitoring
Servers
Critical Service Components
Blackbox Monitoring
Whitebox Monitoring
Backups
See also
- Portal:Cloud VPS/Admin/Deployment confidence checklist
- How to deploy code
- mw:Manual:Pre-commit checklist
- mw:Review queue#Checklist/Process
- mw:Best practices for extensions
- mw:API:Client code/Gold standard
External Resources
- [[[List here other companies' production readiness checklists for comparison. ]]]