Skip to content

Store high-order tensors as 1d vectors #47

@thunterdb

Description

@thunterdb

Because catalyst is optimized for 0d and 1d tensors, all the tensors should be stored this way. Of course, users can still input some arrays of arrays at the inputs, but the outputs should be optimized for 1d arrays. It should be the recommended output for anything above 3d tensors.

This can be done only with a more flexible interpretation of the metadata.

One concern is that the data storage as seen by sql may be different from the interpretation seen by tensorframes. On the positive side, it will simplify the low-level operations.

Expected modifications:

  • default storage layout is row major (but with consideration to a potential option to column major)
  • all operations should accept at ingest imbricated arrays or flattened tensors
  • all operations should output flattened tensors for tensors >= 2 dimensions -> this is a user-facing change
  • analyze will be the conversion point between flattened and nested representations, with an extra option compact_storage. This option will either accept a single boolean (all numerical types), the letter 'R" (all columns compacted in Row order) or a list of names of columns (only these columns are compacted in Row order). A dictionary could be supported later.
  • printschema will differentiate between tensors stored in 1d and n

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions