Intel’s Q1 2025 earnings press release talked up their new AI-enabled chips. But these are not selling. [Intel] In the earnings call, CFO Dave Zinsner mentioned they had “capacity constraints in In…
Gotta be real, LLMs for queries makes me uneasy. We’re already in a place where data modeling isn’t as common and people don’t put indexes or relationships between tables (and some tools didn’t really support those either), they might be alright at describing tables (Databricks has it baked in for better or worse for example, it’s usually pretty good at a quick summary of what a table is for), throwing an LLM on that doesn’t really inspire confidence.
If your data model is highly normalised, with fks everywhere, good naming and well documented, yeah totally I could see that helping, but if that’s the case you already have good governance practices (which all ML tools benefit from AFAIK). Without that, I’m totally dreading the queries, people already are totally capable of generating stuff that gives DBAs a headache, simple cases yeah maybe, but complex queries idk I’m not sold.
Data understanding is part of the job anyhow, that’s largely conceptual which maybe LLMs could work as an extension for, but I really wouldn’t trust it to generate full on queries in most of the environments I’ve seen, data is overwhelmingly super messy and orgs don’t love putting effort towards governance.
Personally, I’ve used llm tools more for interactive documentation, occasionally some base snippets for something I’m not familiar with, but found it doesn’t do great for more complex stuff.
And yeah, you’re right, people who know what they’re doing they’ll verify it, it’s a concern I think when people just blindly run it without verifying, which 100% happens.
I’ve done some work on natural language to SQL, both with older (like Bert) and current LLMs. It can do alright if there is a good schema and reasonable column names, but otherwise it can break down pretty quickly.
Thats before you get into the fact that SQL dialects are a really big issue for LLMs to begin with. They all looks so similar I’ve found it common for them to switch between them without warning.
Yeah I can totally understand that, Genie is databricks’ one and apparently it’s surprisingly decent at that, but it has access to a governance platform that traces column lineage on top of whatever descriptions and other metadata you give it, was pretty surprised with the accuracy in some of its auto generated descriptions though.
Yeah, the more data you have around the database the better, but that’s always been the issue with data governance - you need to stay on top of that or things start to degrade quickly.
When the governance is good, the LLM may be able to keep up, but will you know when things start to slip?
Gotta be real, LLMs for queries makes me uneasy. We’re already in a place where data modeling isn’t as common and people don’t put indexes or relationships between tables (and some tools didn’t really support those either), they might be alright at describing tables (Databricks has it baked in for better or worse for example, it’s usually pretty good at a quick summary of what a table is for), throwing an LLM on that doesn’t really inspire confidence.
If your data model is highly normalised, with fks everywhere, good naming and well documented, yeah totally I could see that helping, but if that’s the case you already have good governance practices (which all ML tools benefit from AFAIK). Without that, I’m totally dreading the queries, people already are totally capable of generating stuff that gives DBAs a headache, simple cases yeah maybe, but complex queries idk I’m not sold.
Data understanding is part of the job anyhow, that’s largely conceptual which maybe LLMs could work as an extension for, but I really wouldn’t trust it to generate full on queries in most of the environments I’ve seen, data is overwhelmingly super messy and orgs don’t love putting effort towards governance.
But that’s why I’m there.
I ask it for a query, then I go over it to confirm what it does and test it.
If you have no idea what you’re doing, you’re fucked with or without gpt. But if you do know what you’re doing, GPT will save you time.
I don’t understand how this is so controversial…
Personally, I’ve used llm tools more for interactive documentation, occasionally some base snippets for something I’m not familiar with, but found it doesn’t do great for more complex stuff.
And yeah, you’re right, people who know what they’re doing they’ll verify it, it’s a concern I think when people just blindly run it without verifying, which 100% happens.
I’ve done some work on natural language to SQL, both with older (like Bert) and current LLMs. It can do alright if there is a good schema and reasonable column names, but otherwise it can break down pretty quickly.
Thats before you get into the fact that SQL dialects are a really big issue for LLMs to begin with. They all looks so similar I’ve found it common for them to switch between them without warning.
Yeah I can totally understand that, Genie is databricks’ one and apparently it’s surprisingly decent at that, but it has access to a governance platform that traces column lineage on top of whatever descriptions and other metadata you give it, was pretty surprised with the accuracy in some of its auto generated descriptions though.
Yeah, the more data you have around the database the better, but that’s always been the issue with data governance - you need to stay on top of that or things start to degrade quickly.
When the governance is good, the LLM may be able to keep up, but will you know when things start to slip?
what in the utter fuck is this post