Independent Attribute Writes
"Currently, if your array consists of more than one attributes, TileDB requires you to provide values for all the attributes in each write operation." I'd love to be able to write attributes independently of each other.
Potentially integrate with Kerchunk to support ingesting mixed-shape arrays into a single xarray-dataset, as discussed here: https://forum.tiledb.com/t/dataset-with-mixed-shapes/485/4 Preferably with support for lazy-loading.
It would be cool to have the possibility to combine multiple tiledb arrays in a single logical view. A simple and fast way would be to require the same schema and just combine individual fragments s.t. the latest write wins. A more useful option would be to also allow concatenation of arrays along some dimension. In Dask, this could be exported as a single array with chunks aligned to the dataset borders. Even better, one could include filtering options or joining by keys. See also the discussion in https://github.com/TileDB-Inc/TileDB/issues/1475 .
Add slicing with strides
TileDB should support slicing with strides, which is quite common in numpy and similar tools.
Support axes labels
TileDB should support attaching axes labels (dataframes in their full generality), so that the user can slice the array based on arbitrary axes label predicates.
TileDB currently performs only slicing. It should allow other computations, such as filters, group-by queries and joins. This will help high-level application push compute closer to storage.
TileDB does not allow arrays to change size. Appending new data and removing old data along a specific dimension (ex. time) is useful for realtime applications.
Automatic Schema Recommendations
Hi, I've been using TileDB for over a year now, and one issue I've always been facing is figuring out the best schema for a given array. Would it be possible to add a feature for recommending schemas? I imagine a function to which you pass a sample of your data, specify the dimension- and attribute names & dtypes, and it would then automatically run various combinations of dimension-tiling, filters, capacities, cell-orders, and return how much storage each one occupied, and how long it took to load all & partial data. It would also be useful if it pointed out flaws, such as irregular data density (when dealing with higher-dimensional sparse arrays), and sub-optimal order of the dimensions. I suppose you already have something similar for internal testing. Best, Fred