Databricks unveiled several Lakehouse AI innovations to accelerate generative AI application development. These included vector search, an LLM-optimized model serving service and MLflow with AI gateway and prompt tools for easy integration.
The company’s CEO believes that generative AI will empower employees to work more productively while opening up new job categories.
Unified Data Platform
While many organizations still rely on generalized AI models such as chatbots for AI capabilities, more organizations are realizing the necessity of using data specific to each organization as the source for their generative AI capabilities. To do this effectively will require an all-inclusive data platform that stores, manages, and provides access to all the required information for training and running these types of models.
Databricks Unified Data and AI Platform was designed to make this type of generative AI possible for everyone by eliminating data silos that have traditionally separated analytics, machine learning and AI. This is accomplished by combining structured data management from traditional enterprise warehouses with low cost flexible object stores provided by data lakes.
Databricks unified platform serves as the cornerstone of its mission to accelerate innovation and equip every person and team with top data and AI technologies. Powered by open-source industry leaders Apache Spark, Delta Lake and MLflow – it supports an array of analytics and AI use cases including data prep, machine learning, deep learning and generative AI use cases.
Databricks offers world-record performance for both structured and unstructured data, enabling global enterprises to use storage, analytics and AI at scale as well as the most advanced generative AI techniques such as large language models (LLMs). Furthermore, its automated infrastructure management and performance optimization deliver low TCO combined with seamless data integration that reduce replication or ETL needs.
Additionally, it provides a curated marketplace of machine learning libraries suitable for many generative AI use cases. Users can fine-tune these models using enterprise data while maintaining ownership over them and deployment, governance and monitoring via integration with MLflow, Unity Catalog or Model Serving is simple and efficient.
Databricks has strengthened its position with new generative AI features as well as acquisition of MosaicML and upgrades to key products like unified search and discovery, data cataloging, metadata management and security. These updates will enable organizations to quickly implement generative AI for all of their most crucial use cases without technical barriers limiting deployment timeframes.
Open Source Foundation Models
Generative AI/large language model (LLM) technologies could transform analytics and artificial intelligence for enterprise wide use, far beyond just serving tech user/developer users familiar with Python and SQL as is currently the case for Databricks. Estimates by McKinsey suggest these models could automate tasks currently taking up to 70% of employees time – according to Joel Minnick, VP of product at Databricks, “Generative AI/LLM technology provides an exceptional opportunity to bring AI closer to business users while giving everyone access to insights they require.”
To this end, the company has made significant investments in tools that make deploying and using LLMs simpler for customers in production environments, including new Vector Search capabilities, fine-tuning in AutoML, and providing open source models with optimized model hosting to ensure high performance.
The Vector Search capability offers developers an effective means of increasing the accuracy of generative AI responses through embedding search. It will automatically create vector indexes for MLflow and Unity Catalog models, keep them up-to-date, and integrate seamlessly with Databricks’ existing governance and operations for AI/ML in Lakehouse platform. In addition, open source models such as Falcon-7B/MPT-7B instruction-following models or Stable Diffusion image generation models may help users find and refine foundation models tailored specifically to their use cases – examples include Falcon-7B/MPT-7B instruction-following models, as well as Stable Diffusion for image generation models among many others.
Users will be able to leverage foundation models to develop their own generative AI in Lakehouse’s environment, with logs being automatically uploaded into MLflow and Unity Catalog for easy sharing, deployment, monitoring with others in their organization. Furthermore, this unified platform will support every step in AI lifecycle from data collection and preparation through model creation and deployment as well as LLMOps while giving users complete control over their production data.
These announcements follow a report published by MIT Technology Review Insights that found CIOs desire to deploy emerging generative AI technologies without risking existing data, processes, governance or culture.
Low-Code AutoML
Databricks customers can now take advantage of MosaicML’s automation and optimization features to quickly build machine learning models and move them from experimentation into production, thus expanding access to AI while simultaneously increasing collaboration between non-technical business users as well as data scientists and engineers on developing AI applications more meaningfully.
This platform simplifies moving models from experimentation to production by consolidating operations, governance and monitoring into one location. Furthermore, large language models can now be utilized across an array of analytics and AI applications for easier use.
AutoML’s low-code approach enables both technical and non-technical data analysts to easily create LLMs on enterprise data with confidence, maintaining ownership over their model while speeding up development time, increasing domain performance, and decreasing risk.
Additionally, the unified platform offers foundation models to address various generative AI use cases. These include MPT-7B and Falcon-7B instruction-following models as well as Stable Diffusion for image generation. In addition, there is a selection of open source models available through marketplace and optimized with Databricks Model Serving to complete its functionality.
Databricks continues to invest heavily in its existing platform. This includes new capabilities to support large language models within Lakehouse AI, such as vector search, an open-source collection of models for Lakehouse monitoring and MLflow 2.5 as well as an easier UI that streamlines creating training datasets for neural networks. These improvements will speed up development time of generative AI applications while simultaneously increasing trustworthiness of production models while decreasing development costs – all while potentially leading to an IPO down the road depending on market conditions and customer demands.
Lakehouse IQ
Databricks announced at the Data + AI Summit a set of capabilities designed to make developing and deploying generative AI easier for organizations. Lakehouse IQ, an intelligent knowledge engine which understands an organization’s data and culture and offers search and query capabilities using natural language search, allows any employee with appropriate permissions to search data using natural language search queries using natural language searches – with this platform fully integrated with Databricks Unity Catalog to ensure this democratization adheres to internal security and governance policies.
These tools will enable a range of business applications, from natural language interactions with data to increasing productivity and driving innovation. However, it should be noted that generative AI capabilities rely heavily on data for accurate results; therefore a centralized data platform must be available.
Databricks unified platform will assist in maintaining clean, high-quality data – an essential requirement of AI solutions. This will greatly reduce training times and resources necessary for generative models; additionally it enables developers to work with data curated by an expert and verified for accuracy.
Databricks recently unveiled an enhanced search capability that will enable users to locate data and metadata quickly. This capability will also integrate seamlessly into other products within Databricks such as the Unity Catalog, notebooks, and ML Server, providing users with a more seamless user experience across Databricks tools.
MosaicML recent acquisition will bolster Databricks’ generative AI initiatives further. MosaicML has created foundation models, such as MPT-7B instruction-following and summarization models that will become available through Databricks Marketplace; furthermore, MosaicML optimized Model Serving so these models perform to their fullest capacity when deployed into production environments.
Databricks is making significant investments in its generative AI offerings, which will enable it to expand beyond serving technical user/developer communities fluent in Python and SQL. When combined with Lakehouse IQ’s knowledge engine and integration of models through Unity Catalog, data analytics and AI will become accessible to more enterprise organizations than ever.
Are you interested to build AI solution with IoT Worlds? Contact us today.