Skip to content

Serialisation: text and binary#59

Merged
alphaville merged 6 commits intomainfrom
feature/58-serialise-binary
Dec 2, 2024
Merged

Serialisation: text and binary#59
alphaville merged 6 commits intomainfrom
feature/58-serialise-binary

Conversation

@alphaville
Copy link
Copy Markdown
Member

@alphaville alphaville commented Dec 2, 2024

Main Changes

Support for binary encoding, which is more reliable, faster and leads to smaller files

  • Support for binary encoding (.bt files)
  • Renamed parseFromTextFile to parseFromFile (seamlessly supports text and binary encodings)
  • Updated README.md
  • Version 1.6.0 to be released

Example

/* Write to binary file */
auto r = DTensor<double>::createRandomTensor(3, 6, 4, -1, 1);
std::string fName = "tensor.bt"; // binary tensor file extension: .bt
r.saveToFile(fName);

/* Parse binary file */
auto recov = DTensor<double>::parseFromFile(fName);
auto err = r - recov;
std::cout << "max error : " << err.maxAbs();

Performance

Encoding Data type Dimensions Size (MB) Write time (ms) Read time (ms)
Binary double 1000 x 1000 x 2 16.00 84 64
Text double 1000 x 1000 x 2 38.78 4751 611
Binary float 1000 x 1000 x 2 16.00 74 58
Text float 1000 x 1000 x 2 38.78 4320 474

From GPutils to Python

To read a .bt file in Python, use the function

def read_array_from_file(path, dt='d'):
    with open(path, 'rb') as f:
        nr = int.from_bytes(f.read(8), byteorder='little', signed=False)
        nc = int.from_bytes(f.read(8), byteorder='little', signed=False)
        nm = int.from_bytes(f.read(8), byteorder='little', signed=False)
        dat = np.fromfile(f, dtype=np.dtype(dt))
    return dat

From Python to GPUtils

import numpy as np

nr, nc, mm = 5, 6, 2
ne = nr * nc * nm
x = np.linspace(-100.0, 100.0, ne, dtype=np.dtype('d'))
with open('p.bt', 'wb') as f:
    f.write(nr.to_bytes(8, 'little'))
    f.write(nc.to_bytes(8, 'little'))
    f.write(nm.to_bytes(8, 'little'))
    x.tofile(f)

Associated Issues

@alphaville alphaville marked this pull request as ready for review December 2, 2024 15:19
Copy link
Copy Markdown
Collaborator

@ruairimoran ruairimoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@alphaville alphaville merged commit bdb2bc4 into main Dec 2, 2024
@alphaville alphaville deleted the feature/58-serialise-binary branch December 3, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support serialisation to binary files

2 participants