Rini Biswas

Subscribe to Rini Biswas : eMailAlertsEmail Alerts
Get Rini Biswas : homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Article

SLA Translation

Metric to Metric Translation in Cloud Computing

 

Service Level Agreement is a pact where both the parties agree on specific parameters to attain a certain guaranteed level of service. It forms an important part in measuring the performance of applications and the infrastructure and provides a guarantee to a customer that the necessary service level parameters would be met by the vendor.

For an application deployed in the cloud environment SLA parameters are dependent on various factors like infrastructure performance, network performance, application performance and workload. To interlink all these factors we propose Business Level Metrics, Application Level Metrics and Resource Level Metrics. Business level Metrics include the cost, the usage, availability, overall response, utilization, etc. The Application Level Metrics captures the parameters necessary at the application level like the application response time, concurrent users, throughput, think time, network availability etc. The Resource Level Metrics include memory usage, disk space, etc.

How do we define business level metrics - In case of cloud availability of resources, cost of usage and overall performance are important parameters which need to be tracked. Overall performance would include response time throughput and utilization of the whole application. Any violation of these parameters hampers business. A cloud provider needs to keep a track of these SLA parameters to avoid any SLA violation and keep his customers satisfied. We put these parameters as our entry point to the SLA Translation and call it the Business Level Metric. Examples of Business Level Metrics are – Response Time, Utilization, Availability, Data Recovery, Application Security, Throughput, Workload, Identity Management, etc.

Then we have the application level metric which is a step below the business level metrics. Every parameter in the business level metrics can be broken down to a number of other parameters. These parameters we put in the application level metrics. Consider Response time as a Business level parameter. Response time is the response of the whole application which would include the response times of all the resources in that application and the network response times. Sometimes one application is again linked to another application. In such a case it would be the summation of all the application response times. Examples of Application Level Metrics are - Application Response Time, Application Throughput, Application Utilization, Application availability, Application Security Architecture, Authentication, etc.

Every application has n number of resources like web server, application server and database server. For each resource like the web server, application server and the database server at the resource level metrics, we monitor the basic parameters for performance. These are the finest parameters which cannot be broken down further like the Free Memory, Pages/sec, CPU Wait Time, Total Swap, etc. They can be directly measured using various tools.

Consider the parameters for the Web Server Utilization at the Resource Level Metrics. It can be an inexhaustible list but we have tried to capture all the important ones. Moreover at the resource level we can further classify it to memory, cpu and disk of every resource.

Here we have captured all the finest parameters we think are needed at the memory level of a resource –

·     Free Memory,

·     Virtual Memory Size,

·     Resident Memory Size,

·     Available Memory,

·     Total Memory,

·     Used Memory,

·     Number of CPUs,

·     Cache Hit Rate, Pages/sec,

·     Page Out per Second,

·     Page In per Second,

·     Page Free per Second,

·     Free Swap,

·     Used Swap,

·     Total Swap,

·     Heap Memory Committed,

·     Heap Memory Free,

·     Heap Memory Max,

·     Heap Memory Used,

·     Garbage Collection frequency,

·     Page Faults/sec,

·     JVM Number Of Daemon Threads,

·     JVM Total Garbage Collection Count,

·     JVM Total Nursery Size,

·     Process : Working Set

 

We broke down every business level metric to the finest and the most atomic parameter at the resource level metrics which can be easily measured using open source tools like hyperic, ganglia, etc. This way we propose an SLA translation starting with the business level at the highest node, followed by the application level and finally the resource level.

EXPLANATION:

Think of it like a tree structure where the Business Level (B) sits at the highest point. Then we have the application level metrics (A1 and A2) right below it. Every application would have number of resources. Each resource would have its own set of parameters to be monitored. So we put the number of resource level metrics (R1, R2, R3 and R4) below each of the application metrics.

 

Thereby what we have derived is that if we have the knowledge of all the parameters at resource level we can track the business level metrics.

 

 

IMPLEMENTATION:

Here’s a graphical explanation of how we have captured one parameter from the SLA metrics.

SLA parameter – Performance of an application deployed in cloud

Performance includes the Response Time, Throughput and Utilization which are parameters of the cloud business level metric. Response Time is again derived from the server response time, database entry and the network latency. Server Response time includes various parameters like Processor Queue Length , Queue Time, % Privileged time total, etc which are parameters of the resource level metric.

Thus on a broader spectrum we can say that the business metrics are derived from the application metrics which in turn are derived from the resource level metrics.

 

SLA

Business Level

Application Level

Resource Level

Performance

Response Time

Application Response Time

Server Response Time

Processor Queue Length

% Processor Time

Queue Time

Instructions per sec

% Privileged time total

Network Latency

Maximum Jitter

Packet Rate In and Out

Network Latency

Output Queue Length

Database entry

Reads to the database

Writes to the database

Throughput

Application Throughput

Web Server Throughput

Invalid Login Attempts

JTA Active Transactions

JTA Committed Transactions

JTA Total Transactions

Login Attempts While Locked

JTA Transactions Abandoned

JTA Transaction Rollback Timeouts

DB Server Throughput

Client Roundtrips

Logical Reads

Physical Reads

Physical Writes

User Calls

User Commits

User Rollbacks

Network Throughput

Tcp Active Opens

Tcp Attempt Fails

Tcp Curr Estab

Tcp Estab Resets

Tcp In Errs

Tcp In Segs

Tcp Out Rsts

Tcp Out Segs

Tcp Outbound Connections

Utilization

Application Utilization

Web Server Utilization

….

DB Utilization

…..

 

 

USE CASE:

Consider a simple Use Case which would explain how the different levels of metrics are mapped.

A customer has a requirement which states that the Response time for the given application should be less than 6 sec for a given request.

The application is deployed in the cloud and has one Web Server, one Application Servers and one DB Server. 

 

The overall response time of the whole application would depend on the response times of each of the individual resources present inside the application. In this case it would include the response time of the Web Server, the response time of the Application Server and the response time of the DB Server.

At the resource level this response time would be dependent on the various factors like the network latency, queue length each server level. At the application level we have the summation of the response times of each of the servers. The business level is the requirement as given by the customer.

Thus the application would meet the Business Level Metrics only if –

Response time of Web Server + Response time of App Server + Response time of DB Server < 6 seconds.

CONCLUSION:

Thus our proposed SLA translation narrows down every parameter of the business level metric like Response Time to be dependent on the most granular parameters like queue length, network latency, etc of the resource level metric. So to have an effective way to derive the Business Level Metrics we need to know the Resource Level Metrics thoroughly.

 

More Stories By Rini Biswas

Rini is a Software Engineer at SETLabs, R&D division, at Infosys Technologies Ltd. She has more than 3 years of experience in development of Cloud computing, Java and Java EE applications, etc. Joel works as a Technology Analyst at SETLabs, R&D division, at Infosys Technologies Ltd. He has close to 3 years of experience in development of Cloud computing, Java and Java EE applications, Web 2.0,etc.