Apache Spark MLlib 2.0 Preview: Data Science and Production

From the recent Spark Summit 2016 in San Francisco, the video presentation below by Joseph K. Bradley of Databricks give focus to “Apache Spark MLlib 2.0 Preview: Data Science and Production.” This talk highlights major improvements in Machine Learning (ML) targeted for Apache Spark 2.0. The MLlib 2.0 release focuses on ease of use for data science—both for casual and power users. Joseph discusses 3 key improvements: persisting models for production, customizing Pipelines, and improvements to models and APIs critical to data science.

MLlib simplifies moving ML models to production by adding full support for model and Pipeline persistence. Individual models—and entire Pipelines including feature transformations—can be built on one Spark deployment, saved, and loaded onto other Spark deployments for production and serving.
Users will find it much easier to implement custom feature transformers and models. Abstractions automatically handle input schema validation, as well as persistence for saving and loading models.
For statisticians and data scientists, MLlib has doubled down on Generalized Linear Models (GLMs), which are key algorithms for many use cases. MLlib now supports more GLM families and link functions, handles corner cases more gracefully, and provides more model statistics. Also, expanded language APIs allow data scientists using Python and R to call many more algorithms.

Finally, the presentation demonstrates these improvements live and show how they facilitate getting started with ML on Spark, customizing implementations, and moving to production.

For our reader’s convenience, here are the slides for the presentation:

Sign up for the free insideAI News newsletter.

Apache Spark MLlib 2.0 Preview: Data Science and Production

Sponsored Guest Articles

Webinar: Getting Started with Llama 3 on AMD Radeon and Instinct GPUs

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Apache Spark MLlib 2.0 Preview: Data Science and Production

Sponsored Guest Articles

Webinar: Getting Started with Llama 3 on AMD Radeon and Instinct GPUs

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC