I want to create, sort, filter, query, update, etc. hierarchical data like JSON or XML or YAML with the same ease as a spreadsheet. Does such a thing exist?
jq might work.
One of the best organising tools I ever found. And still under active development.
I just checked this out. It’s not quite what I’m looking for right now but it does answer my question as asked. I can see it coming in handy later.
That’s generally what relational databases are for. You might try LibreOffice Base or sqlite.
As for JSON, XML, and YAML files in particular, there are tools for doing one-off queries/transformations against them, like jq and yq.
That’s generally what relational databases are for.
But they need rectangular structure. How do they work on tree structures, like OP has asked?
one-off queries/transformations
Again, that wasn’t the question.
The question reads like an XY problem, they describe DB functions for data structures so unless there’s some specific reason they can’t use a DB that’s the right answer. A “spreadsheet for data structures” describes a relational database.
But they need rectangular structure. How do they work on tree structures, like OP has asked?
Relationships. You don’t dump all your data in a single table. Take for instance the following sample JSON:
JSON
"users": [ { "id": 1, "name": "Alice", "email": "[email protected]", "favorites": { "games": [ { "title": "The Witcher 3", "platforms": [ { "name": "PC", "release_year": 2015, "rating": 9.8 }, { "name": "PS4", "release_year": 2015, "rating": 9.5 } ], "genres": ["RPG", "Action"] }, { "title": "Minecraft", "platforms": [ { "name": "PC", "release_year": 2011, "rating": 9.2 }, { "name": "Xbox One", "release_year": 2014, "rating": 9.0 } ], "genres": ["Sandbox", "Survival"] } ] } }, { "id": 2, "name": "Bob", "email": "[email protected]", "favorites": { "games": [ { "title": "Fortnite", "platforms": [ { "name": "PC", "release_year": 2017, "rating": 8.6 }, { "name": "PS5", "release_year": 2020, "rating": 8.5 } ], "genres": ["Battle Royale", "Action"] }, { "title": "Rocket League", "platforms": [ { "name": "PC", "release_year": 2015, "rating": 8.8 }, { "name": "Switch", "release_year": 2017, "rating": 8.9 } ], "genres": ["Sports", "Action"] } ] } } ] }
You’d structure that in SQL tables something like this:
Tables
dbo.users
user_id name email 1 Alice [email protected] 2 Bob [email protected] dbo.games
game_id title genre 1 The Witcher 3 RPG 2 Minecraft Sandbox 3 Fortnite Battle Royale 4 Rocket League Sports dbo.favorites
user_id game_id 1 1 1 2 2 3 2 4 dbo.platforms
platform_id game_id name release_year rating 1 1 PC 2015 9.8 2 1 PS4 2015 9.5 3 2 PC 2011 9.2 4 2 Xbox One 2014 9.0 5 3 PC 2017 8.6 6 3 PS5 2020 8.5 7 4 PC 2015 8.8 8 4 Switch 2017 8.9 The dbo.favorites table handles the many-to-many relationship between users and games; users can have as many favourite games as they want, and multiple users can have the same favourite game. The dbo.platforms handles one-to-many relationships; each record in this table represents a single release, but each game can have multiple releases on different platforms.
So the real question was, which tool to use in order to transform the JSON’s tree into these tables & relations?
(hopefully you didn’t just write this all up manually! :-))
I’ll second the SQL database here. Especially since most people who use a spreadsheet actually treat it as a database in the first place, and not as a way to lay out data in a 2D table.
But if a hierarchical table is really what’s desired, any visual database interface should do the trick.
TreeSheets?
Closest thing I know is Grist (https://www.getgrist.com/), because columns can contain lists of values or lists of references.
Probably not exactly what you’re looking for, but it served both my wife and I very well.
There’s a whole range of cli tools to extract and query structured data like that, but you might consider loading it into something like sqlite3 and treating it as a database because those formats are really not intended for queries, they’re designed for sharing data.
Visidata, maybe.
Visual Studio Code with a jq plugin maybe
Ms Access?