Chain of Thought Archives | Kinetica - The Real-Time Database

Kinetica’s Memory-First Approach to Tiered Storage: Maximizing Speed and Efficiency

punya — Thu, 05 Dec 2024 04:42:41 +0000

A key challenge for any database, whether distributed or not, is the constant movement of data between a hard disk and system memory (RAM). This data transfer is often the source of significant performance overhead, as the speed difference between these two types of storage can be dramatic.

In an ideal scenario, all operational data would reside in memory, eliminating the need to read from or write to slower hard disks. Unfortunately, system memory (RAM) is substantially more expensive than disk storage, making it impractical to store all data in-memory, especially for large datasets.

So, how do we get the best performance out of this limited resource? The answer is simple: use a database that optimizes its memory use intelligently.

Prioritizing Hot and Warm Data

Kinetica takes a memory-first approach by utilizing a tiered storage strategy that prioritizes high-speed VRAM (the memory co-located with GPUs) and RAM for the data that is most frequently accessed, often referred to as “hot” and “warm” data. This approach significantly reduces the chances of needing to pull data from slower disk storage for operations, enhancing the overall system performance.

The core of Kinetica’s tiered storage strategy lies in its flexible resource management. Users can define tier strategies that determine how data is prioritized across different storage layers. The database assigns eviction priorities for each data object, and when space utilization in a particular tier crosses a designated threshold (the high water mark), Kinetica initiates an eviction process. Less critical data is pushed to lower, disk-based tiers until space utilization falls to an acceptable level (the low water mark).

This movement of data—where less frequently used information is shifted to disk while keeping the most critical data in high-speed memory—ensures that Kinetica can always allocate sufficient RAM and VRAM for current high-priority workloads. By minimizing data retrieval from slower hard disks, Kinetica can sidestep the performance bottlenecks that typically plague database systems.

The Speed Advantage: Memory vs. Disk

The performance gains achieved by this approach are clear when you consider the speed differential between memory and disk storage. To put things into perspective, reading 64 bits from a hard disk can take up to 10 million nanoseconds (0.01 seconds). In contrast, fetching the same data from RAM takes about 100 nanoseconds—making RAM access 100,000 times faster.

Reads from RAM take about 100 nanoseconds

The reason for this stark difference lies in the nature of how data is accessed on these storage devices. Hard disks use mechanical parts to read data, relying on a moving head that accesses data in blocks of 4 kilobytes each. This block-based, mechanical retrieval method is inherently slower and incurs a delay every time data needs to be accessed. On the other hand, system memory allows direct access to each byte, without relying on mechanical parts, enabling consistent and rapid read/write operations.

Reads from a hard disk can take up to 0.01 seconds

Kinetica’s memory-first strategy leverages this inherent speed advantage by prioritizing RAM and VRAM for all “hot” data. This not only reduces the reliance on slower storage but also ensures that analytical operations can be performed without the bottleneck of data loading from disk.

Managing Costs Without Sacrificing Performance

While memory offers significant speed advantages, the flip side is its cost. Storing all data in RAM is prohibitively expensive for most organizations, especially when dealing with terabytes or petabytes of data. Kinetica balances this cost with two key approaches:

1. Intelligent Tiering: As mentioned earlier, Kinetica’s tiered storage automatically manages what data should stay in high-speed memory and what should be moved to lower-cost disk storage. This ensures that only the most crucial data occupies valuable memory resources at any given time.

2. Columnar Data Storage: Kinetica also employs a columnar data storage approach, which enhances compression and enables more efficient memory usage. By storing data in columns rather than rows, Kinetica can better optimize its memory footprint, allowing more data to be held in RAM without exceeding cost limits.

Conclusion

By adopting a memory-first, tiered storage approach, Kinetica effectively addresses the inherent challenges of traditional database systems. Its ability to prioritize RAM and VRAM for the most frequently accessed data—while intelligently managing less critical data on lower-cost disk storage—allows for faster analytics without the performance penalty of constant disk access.

This approach ensures that Kinetica remains an efficient, high-performance database solution, capable of handling complex analytics at speed, without requiring prohibitively large memory investments. In essence, Kinetica provides the best of both worlds: the blazing speed of memory with the cost efficiency of tiered storage management.

The post Kinetica’s Memory-First Approach to Tiered Storage: Maximizing Speed and Efficiency appeared first on Kinetica - The Real-Time Database.

The GPU strikes back: Why the GPU will be the premiere compute device for Analytics not just AI

nnegahban@kinetica.com — Tue, 28 Nov 2023 14:58:04 +0000

Since their inception, databases had been designed with the notion of compute being a scarce resource. So they organized data in advanced data structures, on disk and in memory, so that when the query comes in, you’re using as little compute resources as possible.

The onset of the GPU as a general compute device marked the beginning of the end for application architectures centered around compute scarcity. So in 2010, Kinetica started thinking about optimizing database design for the idea that compute was an abundant resource. When you have the luxury of being able to do a thousand operations at once, we can be truly focused on leveraging that compute capacity, feeding it as fast as possible, rethinking algorithms to use all the compute power, and simplifying data structures that can continuously grow as data flows in.

This simple idea proved to be incredibly powerful, and the GPU analytics era was born. The promise has always been dramatic: dramatic reductions in data engineering, and hardware footprint, paired with infinite analytical flexibility and performance. However, this vision has not been without some key challenges.

Bandwidth and Memory

Over the past ten years, developers working to pair data analytics with GPU processing had to deal with this central question: How do we get data to the GPU fast enough and how can we keep it there? Remember, the GPU is still an auxiliary processor. That means getting data into the GPU’s native memory requires a transfer over a bus. A bus that has traditionally been considered slow and narrow for the world of big data analytics.

The data-to-compute ratio for AI is way more skewed towards a high level computation for each data element. Analytics workloads have a much lower compute intensity but a much higher I/O intensity. So once you’ve transferred the data into GPU memory, it’s going to blast through whatever analytic operation you’ve set up for it. The tricky part is can you get the right data there or already cached there fast enough. When you do you get paradigm shifting performance, when you don’t you might get a ‘shoulder shrugs’ worth of improvement, or worse.

Now back to that bus: For many years, most GPUs were hamstrung on account of the glacial pace of evolution for the PCI Express (PCIe) system bus. The PCIe 3.0 era seemed to last forever, but in the last few years the dam has finally broken. Enterprise Hardware vendors are adopting the newer PCI standards at a much faster pace and that’s bringing dramatically higher throughput and transfer rates for the GPU.

Chart courtesy PCI-SIG.

This hardware trend on its own would be a tremendous harbinger of good times to come for GPU analytics. But when you pair it with another under the radar hardware trend – which is the explosion in VRAM capacity it signals to me an upcoming golden era. Take for instance Nvidia H100 NVL dual-GPU cards — which are geared for staging large language models (LLM) — have 188 GB total VRAM. So these two main analytic bottlenecks for the GPU — getting data to it, and storing data near it — are going away.

A clear path ahead

For analytics, the GPU is going to really start asserting its dominance, the way it already has for AI training. AI training is not so data-heavy, but very compute-heavy. Now with analytics, the GPU has all the necessary components to be able to marry the data with the compute.

The Nvidia A100 and the H100 are the beginnings of this trend. Over the next ten years, it’s going to get more pronounced. The buses are getting faster, and the amount of VRAM is growing to become on-par with the amount of system DRAM. The GPU today is akin to the CPU, the Pentium processor of the ‘90s. It is the premier compute device for the next 15 years. So you’re either able to leverage that, or not.

The post The GPU strikes back: Why the GPU will be the premiere compute device for Analytics not just AI appeared first on Kinetica - The Real-Time Database.

Getting answers from your data now: Natural language-to-SQL

nnegahban@kinetica.com — Tue, 14 Nov 2023 21:11:13 +0000

Generative AI has raised people’s expectations about how soon they can get answers to the novel questions they pose. What amazes them first is how fast GenAI responds to them. When people see this, they wonder why they can’t have a similar experience when asking questions about their enterprise data. They don’t want to have to document their requirements, and then fight for resources to eventually be prioritized, only to find themselves waiting for their database teams to engineer the environment to answer yesterday’s questions. They want to ask questions and get answers.

Ad hoc answers

Most databases aren’t good at answering questions to questions that no one anticipated beforehand. Take a look at the traditional methods folks have used for decades, for optimizing database processes and accelerating performance, which includes indexing, denormalization, pre-aggregation, materialized views, partitioning, summarizing. All of this is about overcoming performance limitations of traditional analytic databases.

Pre-engineering data is essentially trading performance for agility. Whenever you pre-engineer data to improve performance for the queries you know and expect, you create technical debt that makes it harder to ask new and untried questions.

To deliver on the promise of supporting ad hoc queries, the database can’t rely on pre-engineering or pre-emptive indexing ahead of time to ensure high performance. You do need a kind of backstop. Kinetica provides this backstop, in the form of a compute architecture that enables the ability to scan all the data quickly. Kinetica was designed from the ground up to quickly answer unknown, complex ad-hoc questions.

We brought this engine to the table before the generative AI discussion even began.

War room

Kinetica’s origins date back to 2009 at the US Dept. of Defense, in support of its mission to track the behavior and tactical movement patterns of suspected terrorists. The DoD had tried all the leading distributed analytic databases, including Teradata, Hadoop, and others. But their environments were too rigid to accomplish the dynamic nature of the questions that analysts were asking.

Our initial objective was to make it feasible to do a very complex set of filters and aggregates across a wide variety of different data sets. These massive data sets included cell phone records, satellite imagery, email communications, and more. We needed to apply variable criteria for specified time ranges, while merging continuously growing data sets and joining them by given geospatial zones. And we were working against real-time data.

The queries themselves had very high cardinality, meaning they worked with high numbers of unique values at all times. They did not lend themselves well to traditional performance improvements like indexing. DoD analysts needed a way to quickly find the needle in the haystack. They needed to slice-and-dice, fuse and filter their data sets, in an unlimited number of dimensions, without knowing in advance what the access patterns would be. Beyond traditional OLAP type questions, analysts had questions involving movement over time and space. They didn’t know what they were looking for; they were working off of clues, working against the clock to better understand their enemy. If their answers didn’t come quickly, the situation would change and the opportunity would be gone.

We looked at this problem from an entirely different perspective. Traditional data engineering techniques are designed to avoid scanning all the data, because looking at every record and performing complex joins have historically been expensive and slow tasks. What if scanning all the data wasn’t expensive and slow? Thinking along these lines, we built an engine that could scan data more quickly and efficiently than other databases.

Our solution for the DoD was vectorized processing. We leveraged the power of GPUs, reducing the time spent scanning all the data in memory to much less than a traditional database needs to run an index or a pipeline. It’s vectorized processing that gives us this backstop — this guarantee that every query will return results quickly, even without engineering the underlying data for performance.

Just add language

Speed and agility are at the core of Kinetica, making it highly complementary to LLMs tasked with converting natural language to SQL (NL2SQL). Kinetica was the first database to introduce NL2SQL to an analytic database last May with the launch of SQL-GPT. Each time you give SQL-GPT a question, the LLM translates that question into a SQL query that can then be run within Kinetica.

Today, most of what we’re seeing around using LLMs to convert language into SQL is centered around the application layer. These solutions take a decoupled approach with the database that will have to execute the query. We believe that approach is flawed.

Kinetica’s approach includes our own native LLM that’s already aware of Kinetica syntax and analytic functions. So when you use Kinetica’s LLM, you’re leveraging semantics that already reflect a deep understanding of the nuances of both SQL and Kinetica’s syntax, which includes vector search, time-series, graph, and geospatial analytics.

To facilitate developers building applications, we added constructs within Kinetica’s database engine that allow users to interact more natively with an LLM. Through our SQL API users can quickly and easily define context objects, provide few-shot training samples and execute LLM output directly.

Typically, the LLMs you see used in AI today are tuned for creativity. One way they go about this is through a process called speculative sampling, which is like educated guesses. Actually, what appears to be spontaneously produced text — not always saying the same thing every time — is a side-effect of making different educated guesses with each iteration.

That’s nice enough when you’re writing an essay. When you’re generating SQL, you don’t want spontaneity — you require unwavering consistency. You don’t want variations in your SQL; you want the code we generate to be reliable and functional every time.

When Kinetica assembles a sequence of SQL commands for the output, rather than always choosing the highest probability single word as the next word in the sequence, Kinetica explores multiple, potential paths before picking the best one. Language models that generate natural-language text tend to avoid these methods, because their text ends up repeating itself. But when you’re creating SQL, you want that consistency.

Peanut butter and jelly

If NL2SQL is the peanut butter, the underlying database is the jelly. When an organization opens the gates to allow users to ask any question of their data, the SQL it would produce will get complicated. It will result in queries that have complex joins, correlated subqueries, and invoke sophisticated analytic functions like a graph algorithm. It won’t be easily controlled or managed. Traditional methods of optimization and planning will no longer work. You need a data engine with the horsepower to take on any query, and return results that contain up-to-date data, with a quick response time.

Now you can finally have a real conversation with your data.

The post Getting answers from your data now: Natural language-to-SQL appeared first on Kinetica - The Real-Time Database.