New machine learning powered operations service
provides tailored recommendations to improve application
availability
Atlassian, PagerDuty, Fidelity Investments, and
Thomson Reuters among organizations using Amazon DevOps Guru
Today, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc.
company (NASDAQ: AMZN), announced the general availability of
Amazon DevOps Guru, a fully managed operations service that uses
machine learning to make it easier for developers to improve
application availability by automatically detecting operational
issues and recommending specific actions for remediation. Informed
by years of Amazon.com and AWS operational excellence, Amazon
DevOps Guru applies machine learning to automatically analyze data
like application metrics, logs, events, and traces for behaviors
that deviate from normal operating patterns. When Amazon DevOps
Guru identifies anomalous application behavior that could cause
potential outages or service disruptions, it alerts developers with
issue details to help them quickly understand the potential impact
and likely causes of the issue, with specific recommendations for
remediation. Developers can use remediation suggestions from Amazon
DevOps Guru to reduce time to resolution when issues arise and
improve application availability—all with no manual setup or
machine learning expertise required. There are no upfront costs or
commitments with Amazon DevOps Guru, and customers pay only for the
data Amazon DevOps Guru analyzes. To get started with Amazon DevOps
Guru, visit: aws.amazon.com/devops-guru
As more organizations move to cloud-based application deployment
and microservice architectures to scale their businesses,
applications have become increasingly distributed, and developers
need more automated practices to maintain application availability
and reduce the time and effort spent detecting, debugging, and
resolving operational issues. Application downtime events caused by
faulty code or config changes, unbalanced container clusters, or
resource exhaustion (e.g. CPU, memory, disk, etc.) inevitably lead
to bad customer experiences and lost revenue. Companies invest a
considerable amount of developer resources, time, and money to
deploy multiple monitoring tools, often managed separately, and
then have to develop and maintain custom alerts for common issues
like spikes in load balancer errors or drops in application request
rates. Setting thresholds to identify and alert when application
resources are behaving abnormally is difficult to get right,
involves manual setup, and requires thresholds that must be
continually updated as application usage changes (e.g. an unusually
large number of requests during a sales promotion). If a threshold
is set too high, developers don’t see alarms until operational
performance is severely impacted. When a threshold is set too low,
developers get too many false positives, which they are prone to
ignore. Even when developers get alerted to a potential operational
issue, the process of identifying the root cause can still prove
difficult. Using existing tools, developers often have difficulty
triangulating the root cause of an operational issue from graphs
and alarms, and even when they are able to find the root cause,
they are often left without the right information to fix it. Each
troubleshooting attempt is a cold start where teams must spend
hours or days identifying problems, and this leads to time
consuming, tedious work that slows down the time to resolve an
operational failure and can prolong application disruptions.
Amazon DevOps Guru’s machine learning models leverage over 20
years of operational expertise in building, scaling, and
maintaining highly available applications for Amazon.com. This
gives Amazon DevOps Guru the ability to automatically detect
operational issues (e.g. missing or misconfigured alarms, early
warning of resource exhaustion, config changes that could lead to
outages, etc.), provide context on resources involved and related
events, and recommend remediation actions. With just a few clicks
in the Amazon DevOps Guru console, historical application and
infrastructure metrics like latency, error rates, and request rates
for resources are automatically ingested from a user’s AWS
applications and analyzed to establish normal operating bounds.
Amazon DevOps Guru then uses a pre-trained machine learning model
to identify deviations from this established baseline (e.g.
under-provisioned compute capacity, database I/O utilization,
memory leaks, etc.). When Amazon DevOps Guru analyzes system and
application data to automatically detect anomalies, it also groups
this data into operational insights that include anomalous metrics,
visualizations of application behavior over time, and
recommendations on actions for remediation—all easily viewable in
the Amazon DevOps Guru console. Amazon DevOps Guru also correlates
and groups related application and infrastructure metrics (e.g. web
application latency spikes, running out of disk space, bad code
deployments, etc.) to reduce redundant alarms and help focus users
on high-severity issues. Customers can see configuration change
histories and deployment events, along with system and user
activity, to generate a prioritized list of likely causes for an
operational issue via a dashboard in the Amazon DevOps Guru
console. To help customers resolve issues quickly, Amazon DevOps
Guru provides intelligent recommendations with remediation steps
and integrates with AWS Systems Manager for runbook and
collaboration tooling, giving customers the ability to more
effectively maintain applications and manage infrastructure for
their deployments. For example, when an analytics application using
Amazon Relational Database Service (RDS) begins to exhibit degraded
latencies, Amazon DevOps Guru will detect the change by
automatically analyzing the relevant metrics across the application
stack, identify the underlying root cause (e.g. increased number of
concurrent compute instances writing to RDS), and provide a
recommendation to resolve the issue (e.g. increase the provisioned
RDS capacity and IOPS storage to handle the higher load).
“Customers continue to ask AWS for more services that enable
them to take advantage of our decades of operational excellence in
improving application availability running Amazon.com,” said Swami
Sivasubramanian, Vice President, Amazon Machine Learning, AWS.
“With Amazon DevOps Guru, we have taken that expertise and built
specialized machine learning models to detect, troubleshoot, and
prevent operational issues long before they impact customers and
without dealing with cold starts each time an issue arises. Amazon
DevOps Guru immediately provides customers the benefits of
operational best practices we have learned running Amazon.com, and
we designed Amazon DevOps Guru to be so simple that turning it on
would be an easy choice for every AWS customer.”
With a few clicks in the AWS Management Console, customers can
enable Amazon DevOps Guru to begin analyzing account and
application activity within minutes to provide operational
insights. Amazon DevOps Guru gives customers a single-console
experience to visualize their operational data by summarizing
relevant data across multiple sources (e.g. AWS CloudTrail, Amazon
CloudWatch, AWS Config, AWS CloudFormation, AWS X-Ray) and reduces
the need to switch between multiple tools. Customers can also view
correlated operational events and contextual data for operational
insights within the Amazon DevOps Guru console and receive alerts
via Amazon SNS. Additionally, Amazon DevOps Guru supports API
endpoints through the AWS SDK, making it easy for Amazon Partner
Network Partners and customers to integrate Amazon DevOps Guru into
their existing solutions for ticketing, paging, and automatic
notification of engineers for high-severity issues. PagerDuty and
Atlassian are among the AWS Partners that have integrated Amazon
DevOps Guru into their operations monitoring and incident
management platforms, and customers who use their solutions can now
benefit from operational insights provided by Amazon DevOps Guru.
Amazon DevOps Guru is available today in US East (N. Virginia), US
East (Ohio), and US West (Oregon), Asia Pacific (Singapore), Asia
Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe
(Ireland), and Europe (Stockholm), with availability in additional
regions in the coming months.
Together with Amazon CodeGuru—a developer tool powered by
machine learning that provides intelligent recommendations for
improving code quality and identifying an application’s most
expensive lines of code—Amazon DevOps Guru provides customers the
automated benefits of machine learning for their operational data
so that developers can more easily improve application availability
and reliability.
Teams at more than 194,000 companies rely on Atlassian products
to make teamwork easier, and help them organize, discuss, and
complete their work. “Atlassian is excited that our customers are
implementing an AIOps strategy using Amazon DevOps Guru to manage
the operational performance of their cloud applications,” said Emel
Dogrusoz, Head of Product at Opsgenie. “With our new Opsgenie and
Jira Service Management integration, the right teams are notified
the instant Amazon DevOps Guru discovers a potential issue and
prioritizes it by the severity of the incident using machine
learning (ML). This integration ensures that every team can quickly
respond to, resolve using ML-powered recommendations, and learn
from every incident.”
Fidelity Investments helps over 35 million people feel more
confident in their most important financial goals, manages employee
benefit programs for over 22,000 businesses, and supports more than
13,500 financial institutions with innovative investment and
technology solutions to grow their businesses. “At Fidelity, we’re
leveraging cloud technologies to enhance our global customer
experience and improve the resiliency of our applications,” said
Keith Blizard, SVP of Public Cloud Services at Fidelity
Investments. “AIOps tools such as Amazon DevOps Guru are helping us
deliver more efficient experiences and more resilient platforms to
our customers.”
PagerDuty, Inc. (NYSE:PD) is a leader in digital operations
management. “PagerDuty is excited to further deepen our
collaboration with AWS in a new integration with Amazon DevOps
Guru. PagerDuty's digital operations management platform was built
to drive a shift to DevOps culture, and we are delighted to
continue this commitment with this integration,” said Jonathan
Rende, SVP of Product at PagerDuty. “Harnessing Amazon DevOps
Guru's machine learning capabilities, PagerDuty provides even more
real-time signal-to-action capabilities to our joint customers.
Through PagerDuty’s ingestion of Amazon SNS via Amazon DevOps Guru,
AWS customers can take real-time action on operational issues
before they become customer-impacting outages.”
Thomson Reuters is one of the world’s most trusted providers of
answers, helping professionals make confident decisions and run
better businesses. “Customer experience and satisfaction are our
top priorities. When multiple sources of alerts and monitoring
events are received, it can be challenging and time-consuming to
filter through the noise to identify customer-impacting incidents,”
said Steve Thoennes, Director of Site Reliability Engineering and
Cloud at Thomson Reuters. “With Amazon DevOps Guru, we are able to
leverage its ML-powered insights to provide clear paths for action
to reduce—and in many cases eliminate—the impact issues have on our
customers. The Amazon DevOps Guru integration with PagerDuty also
provides a direct path to quickly and efficiently deliver
recommendations to the right people at the right time, and we
anticipate significantly reduced operational downtime as a
result.”
HCL Technologies is a next-generation global technology company
that helps enterprises reimagine their businesses for the digital
age. Its technology products and services are built on four decades
of innovation, with a world-renowned management philosophy, a
strong culture of invention and risk-taking, and a relentless focus
on customer relationships. “We are always looking for ways to
reduce the amount of time our teams spend on resolving operational
issues, and we are now using Amazon DevOps Guru and leveraging its
ML-powered insights to help us identify, correlate, and remediate
operational issues quickly,” said Anchal Gupta, Senior Technical
Lead, DevOps at HCL Technologies. “With the insights Amazon DevOps
Guru provides, our teams can now quickly find issues without having
to start from scratch trying to root cause problems. Our IT team
has significantly reduced our mean time to recovery (MTTR), and
they are saving hours upon hours of time resolving issues—all the
while ensuring our customers have the best end-user experience
possible.”
605 is an independent TV measurement firm that offers
advertising and content measurement, full-funnel attribution, media
planning, optimization, and analytical solutions on top of its
multi-source viewership data set covering more than 21 million U.S.
households. “We have over a dozen AWS accounts and tens of
thousands of resources to monitor. Even with Infrastructure as Code
and creating dynamic alerts for these services, it is difficult to
manage and correlate metrics to quickly resolve issues.” said Jared
Williams, Director of DevOps at 605.tv. “With Amazon DevOps Guru,
we are confident that the alerts and notifications we receive are
accurate from the machine learning powered metrics correlated
across multiple services. Integrating Amazon DevOps Guru only took
minutes to implement, and it was a breeze to integrate with our
thousands of AWS CloudFormation stacks. Amazon DevOps Guru has
provided insights that help us focus our infrastructure
roadmap.”
About Amazon Web Services
For over 15 years, Amazon Web Services has been the world’s most
comprehensive and broadly adopted cloud platform. AWS has been
continually expanding its services to support virtually any cloud
workload, and it now has more than 200 fully featured services for
compute, storage, databases, networking, analytics, machine
learning and artificial intelligence (AI), Internet of Things
(IoT), mobile, security, hybrid, virtual and augmented reality (VR
and AR), media, and application development, deployment, and
management from 80 Availability Zones (AZs) within 25 geographic
regions, with announced plans for 15 more Availability Zones and
five more AWS Regions in Australia, India, Indonesia, Spain, and
Switzerland. Millions of customers—including the fastest-growing
startups, largest enterprises, and leading government
agencies—trust AWS to power their infrastructure, become more
agile, and lower costs. To learn more about AWS, visit
aws.amazon.com.
About Amazon
Amazon is guided by four principles: customer obsession rather
than competitor focus, passion for invention, commitment to
operational excellence, and long-term thinking. Amazon strives to
be Earth’s Most Customer-Centric Company, Earth’s Best Employer,
and Earth’s Safest Place to Work. Customer reviews, 1-Click
shopping, personalized recommendations, Prime, Fulfillment by
Amazon, AWS, Kindle Direct Publishing, Kindle, Career Choice, Fire
tablets, Fire TV, Amazon Echo, Alexa, Just Walk Out technology,
Amazon Studios, and The Climate Pledge are some of the things
pioneered by Amazon. For more information, visit
www.amazon.com/about and follow @AmazonNews.
View source
version on businesswire.com: https://www.businesswire.com/news/home/20210504006186/en/
Amazon.com, Inc. Media Hotline Amazon-pr@amazon.com
www.amazon.com/pr
Amazon.com (NASDAQ:AMZN)
Historical Stock Chart
From Mar 2024 to Apr 2024
Amazon.com (NASDAQ:AMZN)
Historical Stock Chart
From Apr 2023 to Apr 2024