Amazon Textract uses machine learning to
automatically extract text and data, including from tables and
forms, in virtually any document – with no machine learning
experience required
The Globe and Mail, MET Office, PwC,
Healthfirst, UiPath, Teradact, Ripcord, Kablamo, Vidado, BluePrism,
and Alfresco among customers and partners using Amazon Textract
Today, Amazon Web Services, Inc. (AWS), an Amazon.com company
(NASDAQ: AMZN), announced the general availability of Amazon
Textract, a fully managed service that uses machine learning to
automatically extract text and data, including from tables and
forms, in virtually any document without the need for manual
review, custom code, or machine learning experience. Amazon
Textract goes beyond simple optical character recognition (OCR) to
identify the contents of fields in forms, information stored in
tables, and the context in which the information is presented, such
as a name or social security number from a tax form or the product
SKU or quantity in a warehouse from an inventory report. The
extracted text and data can be easily used to build smart searches
on large archives of documents, or can be loaded into a database
for use by applications, such as accounting, auditing, and
compliance software. Amazon Textract’s API supports multiple image
formats like scans, PDFs, and photos, and customers can use it with
database and analytics services like Amazon Elasticsearch Service,
Amazon DynamoDB, and Amazon Athena and other machine learning
services like Amazon Comprehend, Amazon Comprehend Medical, Amazon
Translate, and Amazon SageMaker to derive deeper meaning from the
extracted text and data. To get started with Amazon Textract, visit
https://aws.amazon.com/textract.
Many companies extract text and data from files such as
contracts, expense reports, mortgage guarantees, fund prospectuses,
tax documents, hospital claims, and patient forms through manual
data entry or simple OCR software. This is a time-consuming and
often inaccurate process that produces an output requiring
extensive post-processing before it can be put in a format that is
usable by other applications. That’s because existing OCR
technologies are unable to recognize common layouts like forms and
tables, and only generate a lengthy and often inaccurate text dump.
What organizations want instead is the ability to accurately
identify and extract text and data from forms and tables in
documents of any format and from a variety of file types and
templates. Amazon Textract analyzes virtually any type of document,
automatically generating highly accurate text, form, and table
data. Amazon Textract identifies text and data from tables and
forms in documents – such as line items and totals from a
photographed receipt, tax information from a W2, or values from a
table in a scanned inventory report – and recognizes a range of
document formats, including those specific to financial services,
insurance, and healthcare, without requiring any customization or
human intervention. Amazon Textract makes it easy for customers to
accurately process millions of document pages in just a few hours,
significantly lowering document processing costs, and allowing
customers to focus on deriving business value from their text and
data instead of wasting time and effort on post-processing. Results
are delivered via an API that can be easily accessed and used
without requiring any machine learning experience.
“The power of Amazon Textract is that it accurately extracts
text and structured data from virtually any document with no
machine learning experience required. Subsequently, developers can
analyze and query the extracted text and data using our database
and analytics services like Amazon Elasticsearch Service, Amazon
DynamoDB, and Amazon Athena and integrate with other machine
learning services like Amazon Comprehend, Amazon Comprehend
Medical, Amazon Translate, and Amazon SageMaker to help customers
derive deeper meaning from the extracted text and data,” said Swami
Sivasubramanian, Vice President, Amazon Machine Learning. “In
addition to the integration with other AWS services, the rich
partner community developing around Amazon Textract makes it
possible for customers to gain real meaning from their file
collections, operate more efficiently, improve security compliance,
automate data entry, and facilitate faster business decisions.”
Amazon Textract takes scanned files stored in an Amazon S3
bucket, reads them, and returns data in the form of JSON text
annotated with the page number, section, form labels, and data
types. This data can then be used for a range of applications (e.g.
generating smart search indexes, redacting text in a massive
collection of forms, creating automated loan approval workflows,
using the data for regulatory compliance, and flagging fraud risk
for insurance claims). Customers can load the data into business
software, such as spreadsheets, databases, and payroll systems, or
they can analyze and query the data using Amazon ElasticSearch,
Amazon DynamoDB, Amazon Redshift, or Amazon Athena. Amazon Textract
is available today in US East (Ohio), US East (N. Virginia), US
West (Oregon), EU (Ireland), and will expand to additional regions
in the coming year.
The Globe and Mail is a national icon and Canada’s most
recognized media brand. "As a news media company, we rely on many
PDF or scanned-source documents such as FOIs (freedom of
information requests) that have important information contained in
tables that we previously couldn't access,” said Michael O’Neill,
Managing Director of Digital and Data Science at The Globe and
Mail. “These documents have been under-utilized because journalists
were not able to access them easily or didn't know they existed.
Using Amazon Textract, we are able to extract information from
tables in PDFs and easily output that data to CSV and offer easy
access to these documents by making them available for search
queries by our journalists. This increases efficient access to
information for our journalist by tenfold."
Met Office is the UK’s national weather service, and is a world
leader in providing weather and climate services. "We hope to use
AmazonTextract to digitize millions of historical weather
observations from document archives,” said Philip Brohan, Climate
Scientist at Met Office. “Making these observations available to
science will improve our understanding of climate variability and
change."
PwC helps organizations and individuals create value by
delivering quality in assurance, tax, and advisory services. “At
PwC, we work to provide our customers with intelligent automation
tools that help transform previously manual processes. We've
integrated Amazon Textract into our solution for the pharmaceutical
industry to automate document processing for various FDA forms like
MedWatch and CIOMS,” said Siddhartha Bhattacharya of PwC.
“Previously, people would manually review, edit, and process these
forms, each one taking hours. Amazon Textract has proven to be the
most efficient and accurate OCR solution available for these forms,
extracting all of the relevant information for review and
processing, and reducing time spent from hours to down to
minutes.”
Healthfirst is a not-for-profit managed care organization and
one of the fastest growing health plans in New York with over 1.4M
diverse members and a network of more than 35,000 providers and
4,500 employees. “At Healthfirst, we are building data pipelines to
turn scanned medical charts into useful clinical information to
improve care coordination, drive quality outcomes, and ensure
appropriate reimbursement for members under our coverage,” said
Steve Prewitt, Chief Analytics Officer at Healthfirst. “We use
Amazon Textract and Amazon Comprehend Medical to glean real value
from unstructured data sources in an efficient way, resulting in
revenue savings 10-20 times more than our usual downstream
operation. By scaling up to analyze over 50,000 charts, we can find
undocumented diagnoses and refer around 5,000 members for the care
management they need.”
Informed, Inc. automates how financial institutions originate
loans and open bank accounts. "We have already used Amazon Textract
to analyze tens of thousands of loan documents on behalf of
financial institutions, and our own software-as-a-service offering
has been enhanced by the service, enabling us to identify 95% of
the defects in loan application packages and help banks reduce
their manual data entry,” said Justin Wickett, Founder and CEO,
Informed Inc. “Using Amazon Textract, our software gives financial
institutions real-time visibility into an applicant’s income based
off of their pay stubs, bank statements, tax returns, and other
financial documents. We plan to expand the types of documents we
analyze using Amazon Textract in order to enable financial
institutions to take advantage of our machine learning models and
bring real-time decision-making efficiency to today's slow and
manual process."
Candor’s mission is to transform the archaic, time consuming
process that burdens the mortgage industry. “We use OCR to extract
data from a wide variety of lender-required documents to verify
income, assets, property value, and more. Until now, the best OCR
solution read one page at the rate of 38.4 seconds, but Amazon
Textract achieves this in a fraction of that time,” said Tom
Showalter, Founder & CEO of Candor. “We’ve been able to use
Textract to accurately read complex, diverse documents such as bank
statements, pay stubs, and tax documents without additional
training or machine learning expertise, allowing our clients to
underwrite and close a loan in days, as opposed to weeks.”
UiPath is a leading Robotic Process Automation vendor providing
a complete software platform to help organizations efficiently
automate business processes. "Amazon Textract will further
differentiate UiPath's robotic process automation platform by
enhancing UiPath’s document understanding capabilities, enabling
our customers to unlock critical business data from documents,
transform that data into actionable business insights, and deliver
those insights into line-of-business and operational systems," said
Param Kahlon, Chief Product Officer of UiPath.
TeraDact allows customers to transform stored images and paper
documents into privacy-compliant, usable digital formats at scale.
“Amazon Textract’s smart docs platform feeds TeraDact’s patented
redaction services to automatically remove and secure sensitive
data. TeraDact customers can permanently remove this data so that
it can never be recovered or opt to replace sensitive data with
patented tokens which can be recovered by individuals with the
appropriate permissions. This is particularly useful in complying
with government mandates surrounding individual data privacy such
as GDPR,” said Tom Trobridge, COO, TeraDact.
Ripcord’s mission is to digitize and extract knowledge from
paper documents using vision-guided robotics, machine learning, and
advanced AI. This knowledge automates business processes and
workflows. “We’ve had tremendous success utilizing Amazon Textract
to augment our advanced entity extraction to benefit many
industries and uncover $4 billion in new pay. We look forward to
expanding our use of Amazon Textract across financial and
government services, healthcare and legal,” said Alex Fielding, CEO
of Ripcord.
Blue Prism develops Robotic Process Automation software to
provide businesses and organizations with a more agile virtual
workforce. “Blue Prism's connected-RPA can automate and perform
mission-critical processes, allowing customers the freedom to focus
on more creative, meaningful work. By using Amazon Textract, we’ve
given our digital workforce another powerful tool for automation.
Amazon Textract accurately analyzes data from various document
types using machine learning, which enhances the digital
transformation journey for our customers. Using additional AWS AI
services like Amazon Comprehend and Amazon Rekognition, we can
tackle challenges from added secure customer authentication
processes to fraud detection capabilities. The intelligence and
flexibility of Amazon Textract’s form data extraction can elevate
OCR to new levels in industries like financial services, retail,
manufacturing and transportation to name a few,” said Dave Moss,
CTO and Co-Founder of Blue Prism.
About Amazon Web Services
For 13 years, Amazon Web Services has been the world’s most
comprehensive and broadly adopted cloud platform. AWS offers over
165 fully featured services for compute, storage, databases,
networking, analytics, robotics, machine learning and artificial
intelligence (AI), Internet of Things (IoT), mobile, security,
hybrid, virtual and augmented reality (VR and AR), media, and
application development, deployment, and management from 66
Availability Zones (AZs) within 21 geographic regions, spanning the
U.S., Australia, Brazil, Canada, China, France, Germany, Hong Kong
Special Administrative Region, India, Ireland, Japan, Korea,
Singapore, Sweden, and the UK. Millions of customers including the
fastest-growing startups, largest enterprises, and leading
government agencies—trust AWS to power their infrastructure, become
more agile, and lower costs. To learn more about AWS, visit
aws.amazon.com.
About Amazon
Amazon is guided by four principles: customer obsession rather
than competitor focus, passion for invention, commitment to
operational excellence, and long-term thinking. Customer reviews,
1-Click shopping, personalized recommendations, Prime, Fulfillment
by Amazon, AWS, Kindle Direct Publishing, Kindle, Fire tablets,
Fire TV, Amazon Echo, and Alexa are some of the products and
services pioneered by Amazon. For more information, visit
amazon.com/about and follow @AmazonNews.
View source
version on businesswire.com: https://www.businesswire.com/news/home/20190529005985/en/
Amazon.com, Inc.Media
HotlineAmazon-pr@amazon.comwww.amazon.com/pr
Amazon.com (NASDAQ:AMZN)
Historical Stock Chart
From Apr 2024 to May 2024
Amazon.com (NASDAQ:AMZN)
Historical Stock Chart
From May 2023 to May 2024