Accelerate analytics development and data science with IBM Analytics Engine
Share this post:
The launch of IBM® Analytics Engine marks the start of a new stage in the evolution of big data analytics—which makes it the perfect time for you to reconsider your analytics architecture.
If you are struggling to transform big data into business insight, or your company’s adoption of Hadoop and Spark seem to be stalling, please read on to learn more about what IBM Analytics Engine can do for you.
A landmark in your data science journey
IBM Analytics Engine represents another major milestone in the journey of IBM Watson® Data Platform to simplify analytics infrastructure, streamline workflows, and make it easier to unlock the business value of data science.
As the next generation of IBM’s Apache Spark and Apache Hadoop cloud service, it represents a paradigm shift from the previous Hadoop offering, IBM BigInsights® on Cloud—significantly improving the user experience for developers, data engineers, and data scientists.
Spin up Spark and Hadoop clusters in minutes
The primary motivation of IBM Analytics Engine is to give your business a quick and simple way to deploy analytics applications.
As a data scientist or data engineer, you can log into the intuitive web interface from anywhere, and can spin up a Spark cluster or a Hadoop cluster with Hive, HBase, and other Hadoop ecosystem components within minutes.
Meanwhile, for application developers, IBM Analytics Engine also makes it easy to embed Hadoop or Spark capabilities into your code. The solution’s REST APIs and command line interface make it possible for your applications to provision and manage clusters programmatically. As a result, it’s easy to operationalize your apps with predictive modeling capabilities and integrate new levels of insight into your everyday operations.
Increasing flexibility by separating compute from storage
Unlike a traditional Hadoop architecture, IBM Analytics Engine keeps compute and storage infrastructure separate. The data is stored in IBM’s Cloud Object Storage Service , and the Hadoop and Spark clusters connect to the object storage repository when they need to access it.
This separation of concerns gives you much greater flexibility and reliability. Because the clusters themselves are not involved in data storage, you can spin up a cluster environment for the duration of a single job, and delete it on completion—with no risk of data loss. Moreover, you can write a configuration script once and pass it to IBM Analytics Engine as you create new clusters, to apply the exact same configuration as your previous clusters.
Only pay for what you use
Separating compute from storage also makes the solution more cost-effective than a permanent Hadoop cluster. With IBM Analytics Engine, you no longer have to keep a compute cluster running if you are not using it. Instead, you can deprovision it as soon as your job is finished, and spin up a new cluster the next time you need one.
As a result, IBM’s flexible cloud delivery model only requires you to pay for the compute resources that you are actually using—there is no longer any need for your organization to bear the costs of a fixed Hadoop infrastructure that may spend much of its time idle.
Greater resilience and simpler disaster recovery
Unlike a Hadoop cluster, in which each node needs to balance the requirements of both storage and computation, the IBM Cloud Object Storage focuses entirely on providing efficient, resilient big data storage for IBM Analytics Engine.
IBM Cloud Object Storage (COS) provides organizations with an innovative, secure and cost-effective way to store and manage data without creating multiple copies. IBM’s groundbreaking SecureSlice technology distributes data slices across storage systems for security and protection. This unique approach combines encryption, erasure coding and information dispersal of data to multiple locations for protection without complex or expensive copies. The service is continuously available; it can tolerate even catastrophic regional outages without down time or the need for customer intervention.
Resiliency options for your data workload
Regional: Data is dispersed and stored automatically in three IBM Cloud data centers within a single geographic region providing high data durability and availability. There are no additional charges for leveraging all three data centers.
Cross-Region: Data is dispersed and stored automatically in IBM Cloud data centers across three geographic regions, helping ensure business continuity and data accessibility across multiple regions. There are no additional charges for leveraging data centers across all three regions. The Cross-Region service delivers built in business continuance in the event of regional outage without the need for expensive and complex replication methods that other cloud vendors require.
Integrating your data science workflow
As part of IBM Watson Data Platform, IBM Analytics Engine is also integrated with data science services such as IBM Data Science Experience. This provides a powerful and flexible foundation that makes it easy to execute data science and machine learning workloads, and plays a key role in improving productivity across the broader data science workflow.
For example, via the Data Science Experience interface, you can connect a Jupyter notebook to a Spark runtime in IBM Analytics Engine, empowering you to run interactive queries seamlessly and explore data sets quickly. And for larger projects, where multiple members of your team need to work together, you can set up shared clusters that make it easy to collaborate and share insight.
Take the next step
IBM Analytics Engine offers you the power to revolutionize your approach to big data analytics, and deliver a fast return on investment to your business. To find out how, please click here for more information.
To learn more about IBM Watson Data Platform:
via IBM Cloud Blog https://ibm.co/2pQcNaA
November 2, 2017 at 11:09AM