Elastic launches scalable Search AI Lake for Gen AI and vector search

Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.

Elastic is dramatically rethinking how it scales data for generative AI, observability and security, with a new Search AI Lake technology that is being announced today.

Elastic got its start and is perhaps best known for its ElasticSearch search technology, which is built on top of the open source Apache Lucene project. Elastic’s use cases in 2024 expand beyond just search and also include observability as well as security. The company and its technology have also increasingly been drawn into generative AI workflows. In May 2023, the ElasticSearch Relevance Engine (ESRE) was launched combining the power of vector search with traditional search to improve search outcomes.

To date, storage for an Elastic deployment has often been coupled with compute which can potentially represent a barrier for scalability. With the new Search AI Lake technology being announced today that barrier is being removed. The Search AI Lake decouples storage and compute, allowing the technology to scale to massive data volumes while maintaining fast query performance for both regular data types as well as vectors for gen AI. In addition to the Search AI Lake, Elastic is also launching new serverless offerings for enterprise search, observability, and security. These offerings are built on top of the Search AI Lake and provide specialized user interfaces for each use case. The Search AI Lake and serverless offering are currently in tech preview.

“What we now have accomplished is complete decoupling of storage, ingestion and querying,” Ash Kulkarni, CEO of Elastic told VentureBeat in an exclusive interview. “What that lets us do is create a completely flexible serverless architecture, where Amazon S3 becomes the primary storage for Elastic in this new service offering and allows us to create a native style architecture that’s effectively infinitely scalable.”       

VB Event

The AI Impact Tour: The AI Audit

Join us as we return to NYC on June 5th to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Request an invite

Not all data lakes are the same

The concept of using a data lake as a repository for a database is not a new one. 

Multiple vendors including Databricks and Snowflake have built large businesses enabling organizations to benefit from data lake and data lakehouse architectures. Kulkarni argued that the approach that Elastic is taking with Search AI Lake is very different from other vendors.

Kulkarni explained that Search AI Lake brings search capabilities to the data lake, allowing for real-time exploration and querying of data, without needing predefined schemas. He also noted that his company’s technology provides rapid query performance while maintaining massive scalability thanks to the decoupled architecture with storage and caching.

Additionally, Kulkarni said that Search AI Lake has native support for dense vectors and capabilities like hybrid search, faceted search, and relevance ranking that are important for applications like gen AI and Retrieval Augmented Generation (RAG).

No Iceberg or data lake table schema is part of Search AI Lake

In recent years, nearly all of the major data lake and data lakehouse vendors have embraced one, or more, data lake table formats. Popular formats include Apache Iceberg, Apache Hudi and Databricks Delta Lake. ElasticSearch AI Lake doesn’t use any of those formats.

“Why a lot of vendors talk about data lake table formats, is because one of their biggest problems is data exploration, as being able to find what’s in the data lake has always been an issue,” Kulkarni said.

He explained that when data is put into a data lake table unless there is some kind of metadata and the ability to search that metadata, it’s difficult to find the right data. Elastic takes a different approach and just makes everything in the Search AI Lake searchable. Since ElasticSearch allows for ad-hoc exploration of all data through its search capabilities, it doesn’t have the same need as traditional databases to rely on specific table formats or metadata to understand what data is present.

Querying of the Search AI Lake is done via the existing ElasticSearch Query Language, which allows users to search in a federated manner across Elastic clusters. While there is no need for a data lake table format, the Search AI Lake actually does have a format, with the Elastic Common Schema (ECS).

“The Elastic Common Schema is the way in which we store data in our open format,” Kulkarni said.

ECS was recently contributed by Elastic to the open source Linux Foundation’s Cloud Native Computing Foundation (CNCF) to become an open standard schema for observability and security in the cloud.

Enabling gen AI and vector search use cases

A key focus for the Search AI Lake is enabling generative AI and vector search use cases. The offering provides native support for dense vector types.

Elastic has seen strong customer adoption of its platform for generative AI applications. Kulkarni noted that over the last several quarters Elastic has been adding hundreds of new customers that are using the company’s platform for RAG  use cases.

“What you’re going to be able to see now is the most scalable vector implementation out there,” Kulkarni said.

Source link

About The Author

Scroll to Top