AWS Announces Three Amazon EC2 Instances Powered by New AWS-Designed Chips

Inf2 instances, powered by new Inferentia2 chips, support large deep learning models (e.g., LLMs, image generation, and automated speech detection) with up to 175 billion parameters, while delivering the lowest cost per inference on Amazon EC2. Inf2 is the first inference-optimized Amazon EC2 instance that supports distributed inference, a technique that spreads large models across several chips to deliver the best performance for deep learning models with more than 100 billion parameters. Inf2 instances support stochastic rounding, a way of rounding probabilistically that enables high performance and higher accuracy as compared to legacy rounding modes. Inf2 instances support a wide range of data types including CFP8, which improves throughput and reduces power per inference, and FP32, which boosts performance of modules that have not yet taken advantage of lower precision data types. Customers can get started with Inf2 instances using AWS Neuron, the unified software development kit (SDK) for ML inference. AWS Neuron is integrated with popular ML frameworks like PyTorch and TensorFlow to help customers deploy their existing models to Inf2 instances with minimal code changes. Since splitting large models across several chips requires fast inter-chip communication, Inf2 instances support AWS’s high-speed, intra-instance interconnect, NeuronLink, offering 192 GB/s of ring connectivity. Inf2 instances offer up to 4x the throughput and up to 10x lower latency compared to current-generation Inf1 instances, and they also offer up to 45% better performance per watt compared to GPU-based instances. Inf2 instances are available today in preview. To learn more about Inf2 instances, visit aws.amazon.com/ec2/instance-types/inf2.

The Water Institute is an independent, non-profit applied research organization that works across disciplines to advance science and develop integrated methods used to solve complex environmental and societal challenges. “The ability to make accurate, near-real-time numerical weather predictions to aid decision making is important to our clients. We’re excited to see Amazon EC2’s high performance computing offerings continue to evolve with the launch of Amazon EC2 Hpc7g instances,” said Zach Cobell, research engineer at The Water Institute. “With increased floating-point performance, higher efficiency using AWS Graviton3E processors, based on Arm architecture, and decreased inter-node latency using Elastic Fabric Adapter, we expect to continue to be able to deliver innovative and sustainable solutions across our computational portfolio.”

Arup is a global collective of designers, engineering and sustainability consultants, advisors and experts dedicated to sustainable development and to using imagination, technology and rigour to shape a better world. “We use AWS to run highly complex simulations to help our customers to build the next generation of high-rise buildings, stadiums, data-centres, and crucial infrastructure, along with assessing and providing insight into urban microclimates, global warming, and climate change that impacts the lives of so many people around the world,” said Dr. Sina Hassanli, senior engineer at Arup. “Our customers are constantly demanding faster, more accurate simulations at a lower cost to inform their designs at the early stages of development, and we are already anticipating how the introduction of Amazon EC2 Hpc7g instances with higher performance will help our customers innovate faster and more efficiently.”

HAProxy Technologies is the company behind HAProxy, the world’s fastest and most widely-used software load balancer. "HAProxy powers modern application delivery at any scale and in any environment, providing the utmost performance, observability, and security for some of the most popular websites in the world,” said Willy Tarreau, lead developer at HAProxy. “When HAProxy tested Amazon EC2 C6gn instances, we found unprecedented performance for a software load balancer. We are excited about the new C7gn instances with Graviton3E and fifth generation AWS Nitro Cards and the networking performance improvements they will bring to our customers.”

Aerospike Inc.'s real-time data platform is designed for organizations to build applications that fight fraud, enable global digital payments, deliver hyper-personalized user experiences to tens of millions of customers, and more. “The Aerospike Real-time Data Platform is a shared-nothing, multithreaded, multimodal data platform designed to operate efficiently on a cluster of server nodes, exploiting modern hardware and network technologies to drive reliably fast performance at sub-millisecond speeds across petabytes of data,” said Lenley Hensarling, chief product officer at Aerospike. “In our recent real-time database read tests, we were pleased to see a significant improvement in transactions per second on Amazon EC2 C7gn instances featuring new AWS Nitro Cards compared to C6gn instances. We look forward to taking advantage of C7gn instances and future AWS infrastructure improvements as they become available.”

Qualtrics designs and develops experience management software. “At Qualtrics, our focus is building technology that closes experience gaps for customers, employees, brands, and products. To achieve that, we are developing complex multi-task, multi-modal deep learning models to launch new features, such as text classification, sequence tagging, discourse analysis, key-phrase extraction, topic extraction, clustering, and end-to-end conversation understanding,” said Aaron Colak, head of Core Machine Learning at Qualtrics. “As we utilize these more complex models in more applications, the volume of unstructured data grows, and we need more performant inference-optimized solutions that can meet these demands, such as Inf2 instances, to deliver the best experiences to our customers. We are excited about the new Inf2 instances, because it will not only allow us to achieve higher throughputs, while dramatically cutting latency, but also introduces features like distributed inference and enhanced dynamic input shape support, which will help us scale to meet the deployment needs as we push towards larger, more complex large models.”

Finch Computing is a natural language technology company providing artificial intelligence applications for government, financial services, and data integrator clients. “To meet our customers’ needs for real-time natural language processing, we develop state-of-the-art deep learning models that scale to large production workloads. We have to provide low-latency transactions and achieve high throughputs to process global data feeds. We already migrated many production workloads to Inf1 instances and achieved an 80% reduction in cost over GPUs,” said Franz Weckesser, chief architect at Finch Computing. “Now, we are developing larger, more complex models that enable deeper, more insightful meaning from written text. A lot of our customers need access to these insights in real-time and the performance on Inf2 instances will help us deliver lower latency and higher throughput over Inf1. With the Inf2 performance improvements and new Inf2 features, such as support for dynamic input sizes, we are improving our cost-efficiency, elevating the real-time customer experience, and helping our customers glean new insights from their data.”

About Amazon Web Services

For over 15 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud offering. AWS has been continually expanding its services to support virtually any cloud workload, and it now has more than 200 fully featured services for compute, storage, databases, networking, analytics, machine learning and artificial intelligence (AI), Internet of Things (IoT), mobile, security, hybrid, virtual and augmented reality (VR and AR), media, and application development, deployment, and management from 96 Availability Zones within 30 geographic regions, with announced plans for 15 more Availability Zones and five more AWS Regions in Australia, Canada, Israel, New Zealand, and Thailand. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—trust AWS to power their infrastructure, become more agile, and lower costs. To learn more about AWS, visit aws.amazon.com.

About Amazon

Amazon is guided by four principles: customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking. Amazon strives to be Earth’s Most Customer-Centric Company, Earth’s Best Employer, and Earth’s Safest Place to Work. Customer reviews, 1-Click shopping, personalized recommendations, Prime, Fulfillment by Amazon, AWS, Kindle Direct Publishing, Kindle, Career Choice, Fire tablets, Fire TV, Amazon Echo, Alexa, Just Walk Out technology, Amazon Studios, and The Climate Pledge are some of the things pioneered by Amazon. For more information, visit amazon.com/about and follow @AmazonNews.



Contact:

Amazon.com, Inc.
Media Hotline
Amazon-pr@amazon.com
www.amazon.com/pr



« Previous Page 1 | 2             
Featured Video
Jobs
GPU Design Verification Engineer for AMD at Santa Clara, California
CAD Engineer for Nvidia at Santa Clara, California
Senior Firmware Architect - Server Manageability for Nvidia at Santa Clara, California
Design Verification Engineer for Blockwork IT at Milpitas, California
Sr. Silicon Design Engineer for AMD at Santa Clara, California
Senior Platform Software Engineer, AI Server - GPU for Nvidia at Santa Clara, California
Upcoming Events
SEMICON Japan 2024 at Tokyo Big Sight Tokyo Japan - Dec 11 - 13, 2024
PDF Solutions AI Executive Conference at St. Regis Hotel San Francisco - Dec 12, 2024
DVCon U.S. 2025 at United States - Feb 24 - 27, 2025



© 2024 Internet Business Systems, Inc.
670 Aberdeen Way, Milpitas, CA 95035
+1 (408) 882-6554 — Contact Us, or visit our other sites:
AECCafe - Architectural Design and Engineering TechJobsCafe - Technical Jobs and Resumes GISCafe - Geographical Information Services  MCADCafe - Mechanical Design and Engineering ShareCG - Share Computer Graphic (CG) Animation, 3D Art and 3D Models
  Privacy PolicyAdvertise