Digital Transformation with AIOps on Multicloud

Jonathan Chin
7 min readJul 6, 2020

AIOps stands for Artificial Intelligence for IT Operations, and to explain this buzzword in layman terms, it roughly means enabling our IT infrastructure with the new intelligence to automatically discover what its users (humans, machines, or applications) need, then analyze the tradeoffs of those needs (of who and what is more or less important to the business), and finally making the necessary changes (by provisioning, allocating more or less resources) to apply and balance those tradeoffs. This cycle then continuously repeats itself.

When AIOps is enabled for your applications deployed on Multicloud, it allows your applications to run smoothly and efficiently by continuously balancing resources in your Multicloud environment, and this can be automated via policies without users interventions.

The Building Blocks of AIOps

Basic building blocks of an AIOps Engine

Today, an effective AIOps technology stack begins with the discovery of your IT infrastructure, from containers, virtual machines and physical server clusters in your private cloud, to the virtual instances or Kubernetes clusters you deploy on the public clouds. Apart from IT infrastructure, visibility into your applications’ health allows events correlations between these applications and the infrastructure they are running on.

Insight refers to the application of machine learning (ML) and data science to the data collected. Due to the sheer volume of these data, the analysis is often impossible to be done manually or accurately by human.

Finally, the tradeoffs computed by the machine learning algorithms can be converted into actions to be applied to your IT infrastructure on the fly. For example, an outcome is scaling out compute resources (can be anywhere on the Multicloud) to improve the performance of your application depending on those resources.

A practical look at AIOps

In the rest of this article, I will demonstrate a practical example of AIOps for Application Performance Management (APM) with a microservices application deployed in a container based environment. In my example, I will use AppDynamics and Cisco Workload Optimization Manager (also as Turbonomics) to provide me with visibility into the state of my application and IT infrastructure.

AppDynamics is an Application Performance Management software which can manage the performance and availability of applications across multicloud. CWOM uses AI driven analytics to optimize performance, compliance, and cost in real time. AppDynamics and CWOM can work together to match application demand to infrastructure supply to continuously ensure application performance.

Both software can be deployed within your private cloud, or as a software as a service (SaaS) subscription on public clouds. I will be using the SaaS option for AppDynamics and the private cloud option for CWOM. I will also be deploying my demo application on a local Kubernetes cluster created in my private cloud. As such, my application and IT infrastructure are on private cloud, but as I will explain later on, these can be on any public clouds.

Visibility into Application and IT infrastructure state

First, I will need an application. For this, I am leveraging on the Online Boutique microservice demo application. AppDynamics supports different deployment options and use cases, ranging from code level application performance monitoring, cloud monitoring, end user monitoring, or infrastructure visibility.

For this article, I am using code level application performance monitoring and I am instrumenting its Golang SDK into the Golang source code of the frontend microservice in the Online Boutique demo application. This involves adding information and access key credentials of my AppDynamics SaaS controller into the main.go file as well as initializing calls to its Golang SDK at various Golang functions in the application where performance monitoring are warranted.

Instrumenting Golang SDK into application
Initializing calls to Golang SDK in application functions

After these instrumentation steps are completed, I compiled the Golang application source code into a docker container and deployed it together with the rest of the Online Boutique application’s containers on my Kubernetes cluster.

Online Boutique application deployed on local Kubernetes Cluster

And, the logs from the frontend container show the AppDynamics Golang SDK agent is started successfully. At this point, the Golang SDK agent establishes connection to the AppDynamics controller with the AccessKey credentials. The information and AccessKey credentials of my AppDynamics controller are instrumented into the main.go file of the frontend container in an earlier section of this article.

AppDynamics SDK agent started

Logging in to my AppDynamics dashboard, I can now see the Golang SDK agent has connected successfully and the controller is receiving application performance information from the container.

AppDynamics Dashboard showing Application Performance Data from Golang SDK instrumented container

Next, I will need to setup my CWOM software to communicate with AppDynamics controller so that it can receive insights from AppDynamics controller regarding my application. I will now login to my CWOM software.

CWOM landing page dashboard

CWOM is agent-less and all that is required for communication between CWOM and AppDynamics controller is to setup my AppDynamics controller’s public address, username and password in CWOM.

Adding AppDynamics controller into CWOM as a Target Configuration
Populating AppDynamics controller information into CWOM

After a short while, CWOM now has the Application Performance Insights regarding my Online Boutique application from my AppDynamics controller.

CWOM receiving Application Performance Insights from AppDynamics

The next thing after this is to setup CWOM to gain visibility into my IT infrastructure, which in my case, is the local Kubernetes cluster where my Online Boutique application is running. To achieve this, I will deploy CWOM’s container pods in a non-default namespace on my Kubernetes cluster and these container pods will monitor the performance of running micro-services in Kubernetes Pods, as well as the efficiency of underlying infrastructure. My CWOM software’s credentials are setup in a ConfigMap object. Examples of the full Kubernetes YAML files are available at Turbonomics’ Github.

With this, the Visibility and Insight engines for my AIOps setup are complete.

AIOps with Visibility and Insights → Action

Finally, I am now ready to validate Visibility of my Application Performance and IT infrastructure can lead to Insights and these Insights can then be translated into Actions.

To trigger this, I need to generate increased load to my Online Boutique application so that its Application Performance starts to degrade. When this happens, Insights regarding my Application Performance degeneration should trigger the Insight engine (CWOM) to execute the necessary Actions to adjust and balance IT infrastructure to restore my Application Performance.

I increased the traffic generated by the load generator container pods to the frontend container pod in the Online Boutique application. After the traffic is increased, the application’s average response time has significantly increased from sub 100 milliseconds to above more than 300 milliseconds. In addition, from my AppDynamics’ controller Application Transaction Scorecard, I see Very Slow and Slow transactions are reported by the Golang SDK agent in the application’s container.

Average Response Time after increased traffic load
Application Transaction Scorecard after increased load

Logging back into my CWOM software, I see new Critical (red) and Warning (yellow) alerts related to the containers running the affected Online Boutique application.

CWOM’s new pending Critical and Warning alerts related to Container affecting Application Performance

I will now create an Automation Policy in CWOM for Containers to dynamically balance resources to improve and restore Application Performance. After the policy is applied, CWOM executes vertical scaling to allocate more compute resources to the affected Container Pods.

After the scalings Actions are completed, the average response time of the Online Boutique application is gradually restored, and the Transaction Scorecard is showing Normal transactions again.

Average Response Time is gradually restored after CWOM has completed resource scaling and balancing
Normalized Transaction Scorecard

Lastly, I wanted to show how the entire AIOps use case can be viewed from a “single pane of glass”. I am leveraging on the REST APIs supported by AppDynamic and CWOM, and some basic NodeJS web programming.

Customized “AIOps Web Dashboard” with NodeJS and REST APIs

AIOps — Visibility, Insights and Actions — via AppDynamics and CWOM is validated to be working as I expected. What I have shown is based on a container environment setup in a private cloud, but the same use case explained in this article works on any of the public cloud provider managed service such as EKS, GKE or AKS.

--

--

Jonathan Chin

Jonathan is an App Modernization Customer Engineer at Google Cloud, helping customers in their journey towards Cloud Native. He lives in Singapore.