Automatic Schema Recommendations
Hi, I've been using TileDB for over a year now, and one issue I've always been facing is figuring out the best schema for a given array. Would it be possible to add a feature for recommending schemas? I imagine a function to which you pass a sample of your data, specify the dimension- and attribute names & dtypes, and it would then automatically run various combinations of dimension-tiling, filters, capacities, cell-orders, and return how much storage each one occupied, and how long it took to load all & partial data. It would also be useful if it pointed out flaws, such as irregular data density (when dealing with higher-dimensional sparse arrays), and sub-optimal order of the dimensions. I suppose you already have something similar for internal testing. Best, Fred
Independent Attribute Writes
"Currently, if your array consists of more than one attributes, TileDB requires you to provide values for all the attributes in each write operation." I'd love to be able to write attributes independently of each other.
The user should be able to modify the array schema (e.g., add/remove attributes, change filters, tile capacity, etc) and time travel over the different versions.
TileDB does not allow arrays to change size. Appending new data and removing old data along a specific dimension (ex. time) is useful for realtime applications.
Support for deletions
The user should be able to delete any number of cells. Currently this is possible via directly inserting "tombstones", but the deletion logic fall on the higher-level application. TileDB should be able to natively support deletions.
Better Streaming Support
TileDB is already very good for storing most information, and I like that the goal is for it to become a 'Universal Storage Engine', but it currently neglects streaming-data applications, such as saving data every second or minute.
Support axes labels
TileDB should support attaching axes labels (dataframes in their full generality), so that the user can slice the array based on arbitrary axes label predicates.
TileDB currently performs only slicing. It should allow other computations, such as filters, group-by queries and joins. This will help high-level application push compute closer to storage.
LERC compression filter
It would be great if TileDB implemented LERC (Limited Error Raster Compression) [ https://github.com/Esri/lerc ] as possible compressor for dense arrays.