MSP SLA review.
Uptime, exclusions, and credits.
99.9% uptime - before you read the exclusions. Unlimited maintenance windows. Service credits capped at one month. Kontractually reviews managed service SLAs against your standard playbook so you know what you're actually committing to.
No credit card required. First 3 reviews free.
What changes when you review every SLA clause.
You commit to 99.9% uptime in a client SLA. The client's Azure region has a 4-hour outage. Your SLA excludes 'force majeure' but not third-party cloud outages. The client claims $15,000 in service credits for an outage you didn't cause and couldn't prevent.
Kontractually flags SLAs where third-party cloud provider outages are not explicitly excluded from uptime calculations. You add a specific carve-out for upstream provider outages beyond your control, with documentation requirements.
Your SLA says 99.5% uptime measured monthly. But the maintenance window clause allows unlimited scheduled maintenance with 24 hours notice. The client argues that 6 maintenance windows in one month (totaling 18 hours) still count against uptime because the SLA doesn't explicitly exclude them.
Kontractually checks whether maintenance windows are explicitly excluded from uptime calculations and whether the frequency is capped. You set a maximum of 4 windows per month, minimum 48 hours notice, and explicit exclusion from availability metrics.
A client experiences repeated P2 incidents over 3 months. Service credits accumulate to $8,000. But you've also spent $25,000 in engineer time responding to incidents caused by the client's own infrastructure decisions. The SLA has no mechanism to address client-caused issues.
Kontractually flags SLAs without client responsibility clauses. You include provisions that exclude incidents caused by client infrastructure, unauthorized changes, or failure to follow agreed maintenance procedures from SLA calculations.
6 provisions to review in every MSP SLA.
Market standard for managed services varies by service tier: 99.5% (business hours monitoring) to 99.9% (24/7 managed services). For context, 99.5% allows approximately 3.65 hours downtime per month; 99.9% allows 43 minutes. The critical question is what the uptime excludes - scheduled maintenance windows can make a 99.9% SLA closer to 98% in practice. Kontractually checks MSP SLAs for exclusions that effectively hollow out the stated uptime commitment.
Most MSPs depend on AWS, Azure, or GCP. A blanket exclusion for 'third-party outages' means the client bears the risk of cloud provider downtime even though the MSP selected the cloud provider. Better approach: the MSP commits to uptime based on their architecture design (redundancy, failover), not just the underlying cloud availability. The SLA should specify what redundancy measures the MSP maintains and whether single-region deployments are covered. Kontractually flags SLAs where cloud provider exclusions are broader than the MSP's redundancy design justifies.
Priority definitions must be objective and measurable - not subjective. P1 (Critical): complete service outage affecting all users or a security breach in progress. Response time: 15-30 minutes, resolution target: 4 hours. P2 (High): significant degradation affecting multiple users or a key business function. Response time: 1 hour, resolution target: 8 hours. P3 (Medium): single user or non-critical function affected with a workaround available. Response time: 4 hours, resolution target: 24 hours. The SLA should specify who determines priority classification (the client, the MSP, or jointly) and whether the classification can be escalated or downgraded during the incident. Kontractually flags SLAs with vague priority definitions or missing response time commitments for any priority level.
Response time is the period from when an incident is reported to when the MSP acknowledges it and begins investigation. Resolution time is the period from report to when the issue is fully resolved and the service is restored. These are separate SLA metrics with different targets. The distinction matters because MSPs can meet response time SLAs by acknowledging tickets quickly without actually resolving them promptly. Best practice: track both metrics independently, with separate service credit triggers for each. Also define what 'resolution' means - is a workaround sufficient, or must the root cause be fixed? Kontractually checks whether SLAs clearly distinguish response from resolution and whether both have measurable targets.
Most SLAs require the client to claim service credits within a defined period (typically 30 days from the incident). If the client doesn't claim within that window, the credit is forfeited. From the MSP's perspective, this claim window is important - it limits retroactive exposure and ensures disputes are addressed while evidence is fresh. From the client's perspective, the SLA should include automated reporting of SLA performance so credits can be identified without manual tracking. Kontractually flags SLAs that lack a credit claim process, have unreasonably short claim windows, or don't require the MSP to report SLA performance metrics to the client.
Related