Spark 101 Archives - insideAI News https://insideainews.com/category/spark-101/ Illuminating AI's Frontiers: Your Go-To News Destination. Tue, 28 Jun 2022 21:39:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://insideainews.com/wp-content/uploads/2024/06/iain-favicon.png Spark 101 Archives - insideAI News https://insideainews.com/category/spark-101/ 32 32 136462205 Databricks Announces Major Contributions to Flagship Open Source Projects https://insideainews.com/2022/07/02/databricks-announces-major-contributions-to-flagship-open-source-projects/ https://insideainews.com/2022/07/02/databricks-announces-major-contributions-to-flagship-open-source-projects/#respond Sat, 02 Jul 2022 13:00:00 +0000 https://insidebigdata.com/?p=29715 Databricks announced that the company will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation and open source all Delta Lake APIs as part of the Delta Lake 2.0 release. In addition, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model deployments. Finally, the company introduced Spark Connect, to enable the use of Spark on virtually any device, and Project Lightspeed, a next generation Spark Structured Streaming engine for data streaming on the lakehouse. ]]> https://insideainews.com/2022/07/02/databricks-announces-major-contributions-to-flagship-open-source-projects/feed/ 0 29715 Databricks Launches SQL Analytics to Enable Cloud Data Warehousing on Data Lakes https://insideainews.com/2020/11/14/databricks-launches-sql-analytics-to-enable-cloud-data-warehousing-on-data-lakes/ https://insideainews.com/2020/11/14/databricks-launches-sql-analytics-to-enable-cloud-data-warehousing-on-data-lakes/#respond Sat, 14 Nov 2020 14:00:00 +0000 https://insidebigdata.com/?p=25231 Databricks, the data and AI company, announced the launch of SQL Analytics, which for the first time enables data analysts to perform workloads previously meant only for a data warehouse on a data lake. This expands the traditional scope of the data lake from data science and machine learning to include all data workloads including Business Intelligence (BI) and SQL.]]> https://insideainews.com/2020/11/14/databricks-launches-sql-analytics-to-enable-cloud-data-warehousing-on-data-lakes/feed/ 0 25231 Top 5 Mistakes When Writing Spark Applications https://insideainews.com/2018/01/07/top-5-mistakes-writing-spark-applications/ https://insideainews.com/2018/01/07/top-5-mistakes-writing-spark-applications/#respond Sun, 07 Jan 2018 16:30:46 +0000 https://insidebigdata.com/?p=19733 In the presentation below from Spark Summit 2016, Mark Grover goes over the top 5 things that he's seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.]]> https://insideainews.com/2018/01/07/top-5-mistakes-writing-spark-applications/feed/ 0 19733 The Data Scientist’s Guide to Apache Spark https://insideainews.com/2017/12/27/data-scientists-guide-apache-spark/ https://insideainews.com/2017/12/27/data-scientists-guide-apache-spark/#respond Wed, 27 Dec 2017 16:30:17 +0000 https://insidebigdata.com/?p=19673 Looking to dive deeper into the more cutting edge machine learning use cases in Apache Spark? To successfully use Spark’s advanced analytics capabilities including large scale machine learning and graph analysis, check out The Data Scientist’s Guide to Apache Spark, from our friends over at Databricks.]]> https://insideainews.com/2017/12/27/data-scientists-guide-apache-spark/feed/ 0 19673 Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming https://insideainews.com/2016/12/02/structuring-apache-spark-2-0-sql-dataframes-datasets-and-streaming/ https://insideainews.com/2016/12/02/structuring-apache-spark-2-0-sql-dataframes-datasets-and-streaming/#respond Fri, 02 Dec 2016 13:00:05 +0000 http://insidebigdata.com/?p=16588 In the talk below, Michael Armbrust, gives an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL's query optimizer, to all users of Spark.]]> https://insideainews.com/2016/12/02/structuring-apache-spark-2-0-sql-dataframes-datasets-and-streaming/feed/ 0 16588 Apache Spark MLlib 2.0 Preview: Data Science and Production https://insideainews.com/2016/08/06/apache-spark-mllib-2-0-preview-data-science-and-production/ https://insideainews.com/2016/08/06/apache-spark-mllib-2-0-preview-data-science-and-production/#respond Sat, 06 Aug 2016 13:00:10 +0000 http://insidebigdata.com/?p=15612 From the recent Spark Summit 2016 in San Francisco, the video presentation below by Joseph K. Bradley of Databricks give focus to "Apache Spark MLlib 2.0 Preview: Data Science and Production."]]> https://insideainews.com/2016/08/06/apache-spark-mllib-2-0-preview-data-science-and-production/feed/ 0 15612 Large-Scale Deep Learning with TensorFlow https://insideainews.com/2016/06/18/large-scale-deep-learning-with-tensorflow/ https://insideainews.com/2016/06/18/large-scale-deep-learning-with-tensorflow/#respond Sat, 18 Jun 2016 12:00:10 +0000 http://insidebigdata.com/?p=15195 We bring you the keynote presentation below from the recent Spark Summit 2016 held in San Francisco on June 6-8. Speaker Jeff Dean joined Google in 1999 and is currently a Google Senior Fellow. ]]> https://insideainews.com/2016/06/18/large-scale-deep-learning-with-tensorflow/feed/ 0 15195 Spark MLlib: Making Practical Machine Learning Easy and Scalable https://insideainews.com/2015/11/23/spark-mllib-making-practical-machine-learning-easy-and-scalable/ https://insideainews.com/2015/11/23/spark-mllib-making-practical-machine-learning-easy-and-scalable/#respond Mon, 23 Nov 2015 21:00:34 +0000 http://insidebigdata.com/?p=14090 In this talk, Xiangrui Meng of Databricks shares his experience in developing MLlib. The talk covers both higher-level APIs, ML pipelines, that make MLlib easy to use, as well as lower-level optimizations that make MLlib scale to massive data sets.]]> https://insideainews.com/2015/11/23/spark-mllib-making-practical-machine-learning-easy-and-scalable/feed/ 0 14090 Advanced Apache Spark https://insideainews.com/2015/11/13/advanced-apache-spark/ https://insideainews.com/2015/11/13/advanced-apache-spark/#respond Fri, 13 Nov 2015 14:00:29 +0000 http://insidebigdata.com/?p=14027 Big data is going Spark crazy! Here's a whopping 6 hour intensive, fast-paced and vendor agnostic look at Spark Core presented by Sameer Farooqui, a client services engineer at Databricks.]]> https://insideainews.com/2015/11/13/advanced-apache-spark/feed/ 0 14027 Apache Spark is the Smartphone of Big Data https://insideainews.com/2015/11/09/apache-spark-is-the-smartphone-of-big-data/ https://insideainews.com/2015/11/09/apache-spark-is-the-smartphone-of-big-data/#respond Mon, 09 Nov 2015 14:00:23 +0000 http://insidebigdata.com/?p=14000 https://insideainews.com/2015/11/09/apache-spark-is-the-smartphone-of-big-data/feed/ 0 14000