Spark 101 Archives - insideAI News

Databricks Announces Major Contributions to Flagship Open Source Projects

July 2, 2022 by Editorial Team Leave a Comment

Databricks announced that the company will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation and open source all Delta Lake APIs as part of the Delta Lake 2.0 release. In addition, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model deployments. Finally, the company introduced Spark Connect, to enable the use of Spark on virtually any device, and Project Lightspeed, a next generation Spark Structured Streaming engine for data streaming on the lakehouse.

Filed Under: Big Data, Big Data Services, Big Data Software, Cloud, Databricks, Google News Feed, inside SPARK, Machine Learning, Main Feature, News / Analysis, Spark 101, Uncategorized Tagged With: Apache Spark, databricks, lakehouse, MLflow, Weekly Newsletter Articles

Databricks Launches SQL Analytics to Enable Cloud Data Warehousing on Data Lakes

November 14, 2020 by Editorial Team Leave a Comment

Databricks, the data and AI company, announced the launch of SQL Analytics, which for the first time enables data analysts to perform workloads previously meant only for a data warehouse on a data lake. This expands the traditional scope of the data lake from data science and machine learning to include all data workloads including Business Intelligence (BI) and SQL.

Filed Under: Big Data, Big Data Software, Databricks, Featured, Google News Feed, inside SPARK, News / Analysis, Spark 101, Uncategorized Tagged With: analytics, cloud data warehous, data lake, data warehouse, databricks, SQL, Weekly Newsletter Articles

Top 5 Mistakes When Writing Spark Applications

January 7, 2018 by Editorial Team Leave a Comment

In the presentation below from Spark Summit 2016, Mark Grover goes over the top 5 things that he’s seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.

Filed Under: Big Data, Featured, Google News Feed, inside SPARK, News / Analysis, Spark 101, Uncategorized, Video Tagged With: Apache Spark, Weekly Newsletter Articles

The Data Scientist’s Guide to Apache Spark

December 27, 2017 by Editorial Team Leave a Comment

Looking to dive deeper into the more cutting edge machine learning use cases in Apache Spark? To successfully use Spark’s advanced analytics capabilities including large scale machine learning and graph analysis, check out The Data Scientist’s Guide to Apache Spark, from our friends over at Databricks.

Filed Under: Big Data, Databricks, Featured, Google News Feed, inside SPARK, News / Analysis, Spark 101, Uncategorized Tagged With: Apache Spark, data scientist, Weekly Newsletter Articles

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

December 2, 2016 by Editorial Team Leave a Comment

In the talk below, Michael Armbrust, gives an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL’s query optimizer, to all users of Spark.

Filed Under: Big Data, Big Data Software, Databricks, Google News Feed, inside SPARK, Main Feature, News / Analysis, Spark 101, Uncategorized Tagged With: Apache Spark, Weekly Newsletter Articles

Apache Spark MLlib 2.0 Preview: Data Science and Production

August 6, 2016 by Editorial Team Leave a Comment

From the recent Spark Summit 2016 in San Francisco, the video presentation below by Joseph K. Bradley of Databricks give focus to “Apache Spark MLlib 2.0 Preview: Data Science and Production.”

Filed Under: Big Data, Databricks, inside SPARK, Machine Learning, Main Feature, Spark 101, Uncategorized, Video Tagged With: Apache Spark, Weekly Newsletter Articles

Large-Scale Deep Learning with TensorFlow

June 18, 2016 by Daniel Gutierrez Leave a Comment

We bring you the keynote presentation below from the recent Spark Summit 2016 held in San Francisco on June 6-8. Speaker Jeff Dean joined Google in 1999 and is currently a Google Senior Fellow.

Filed Under: Big Data Software, Google News Feed, inside SPARK, Main Feature, News / Analysis, Spark 101, Uncategorized, Video Tagged With: Apache Spark, tensorflow, Weekly Newsletter Articles

Spark MLlib: Making Practical Machine Learning Easy and Scalable

November 23, 2015 by Daniel Gutierrez Leave a Comment

In this talk, Xiangrui Meng of Databricks shares his experience in developing MLlib. The talk covers both higher-level APIs, ML pipelines, that make MLlib easy to use, as well as lower-level optimizations that make MLlib scale to massive data sets.

Filed Under: Big Data, Databricks, Google News Feed, inside SPARK, Machine Learning, Main Feature, News / Analysis, Spark 101, Uncategorized Tagged With: Weekly Newsletter Articles

Advanced Apache Spark

November 13, 2015 by Daniel Gutierrez Leave a Comment

Big data is going Spark crazy! Here’s a whopping 6 hour intensive, fast-paced and vendor agnostic look at Spark Core presented by Sameer Farooqui, a client services engineer at Databricks.

Filed Under: Big Data, Big Data Software, Databricks, Education / Training, Google News Feed, inside SPARK, Main Feature, News / Analysis, Spark 101, Uncategorized, Video Tagged With: Weekly Newsletter Articles

Apache Spark is the Smartphone of Big Data

November 9, 2015 by Daniel Gutierrez Leave a Comment

In this special guest feature, Denny Lee of Databricks, talks about the versatility of Spark – essentially comparing it to the Swiss Army Knife of on your camping trip, called Big Data/Analytics.

Filed Under: Big Data, Data Science, Databricks, Google News Feed, Industry Perspectives, inside SPARK, Machine Learning, News / Analysis, Opinion, Spark 101, Uncategorized

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

In today’s fast-paced world, driven by demands for speed and efficiency, the field of clinical development has undergone a remarkable transformation. The way trials are being conducted has changed significantly with decentralized clinical trials (DCT) becoming mainstream and the collection of clinical data from wearables and other remote-monitoring devices becoming common practice. While these advances […]

Download

Databricks Announces Major Contributions to Flagship Open Source Projects

Top 5 Mistakes When Writing Spark Applications

The Data Scientist’s Guide to Apache Spark

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

Apache Spark MLlib 2.0 Preview: Data Science and Production

Large-Scale Deep Learning with TensorFlow

Spark MLlib: Making Practical Machine Learning Easy and Scalable

Advanced Apache Spark

Apache Spark is the Smartphone of Big Data

Sponsored Guest Articles

Webinar: Getting Started with Llama 3 on AMD Radeon and Instinct GPUs

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Featured RSS Feed

More News from insideHPC