Enterprises are increasingly developing applications to extract real-time insights from large data sets. The necessity for real-time analytics across Intel architecture is a vital piece of the Big Data puzzle to enable the extraction of prompt, actionable insights from large data sets. As an open source framework that enables stream processing as well as fast queries on large data sets stored on a Hadoop cluster, Apache Spark supports new modes of analytics on big data platforms based on the Apache Hadoop ecosystem.
"Open source is undoubtedly the future of technological innovation and Big Data tools and processing are at the forefront of that wave," said Ion Stoica, CEO at Databricks. "Our collaboration with Intel will bring the unified Spark ecosystem to businesses of all sizes with new levels of analytic capabilities, real-time benefits, and simplicity."
"As more and more connected devices, including sensors, are introduced to the market, Big Data sets are growing exponentially every year, making processing and analyzing this data a more complex task," said Michael Greene, Intel Vice President, Intel Software and Services Group and General Manager of System Technologies and Optimization. "To find new trends and strong patterns from large complex data sets, a strong analytics foundation is needed. Our work with Databricks to advance these analytics capabilities on Intel® architecture by utilizing the rich capabilities of Spark will help our customers dive deeper into their data and derive real-time insights and benefits in the cloud."
Apache Spark is a tool for iterative processing of large datasets compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode and is designed to perform both batch processing and new workloads like streaming, interactive queries, and machine learning. Having recently won the 2014 Gray Sort competition, a 3rd-party benchmark measuring how fast a system can sort 100 TB of data (1 trillion records), Spark is built for scalability, stability and performance with the ability to process datasets from Gigabytes to Terabytes to Petabytes.
Read Michael Greene's blog post to learn more about this announcement: http://blogs.intel.com/evangelists/unlocking-the-promise-of-a-data-driven-world
About Databricks:
Databricks was founded by the team that created and continues to drive Apache Spark, the most active open source project in the Big Data ecosystem. Databricks' vision is to dramatically simplify big data processing and free users to focus on turning data into value. Databricks Cloud, a cloud platform built around Apache Spark, delivers on this vision by combining the power of Spark with a zero-management hosted platform and an initial set of applications built around common workflows. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, visit
http://www.databricks.com.
For media inquiries: Suzanne Block 617-824-0981 databricksmg@merrittgrp.com