-
Notifications
You must be signed in to change notification settings - Fork 24
Remove transform module and update FCSData class and mef and compensate modules #344
Description
After some reflection in #340, @castillohair and I agreed the transform module should be removed and code previously interfacing it should be simplified.
Tasks:
- Remove
transformmodule. - Modify
FCSDatato automatically transform data to RFI units. - Add
FCSData.transform()function to support generic transformations. - Consolidate
compensate.get_transform_fxn()andtransform.to_compensated()and return all unmixed fluorescence signals. - Consolidate
mef.get_transform_fxn()andtransform.to_mef(). Updateexcel_uiaccordingly.
Unresolved issues:
- What form should the consolidated
compensatefunction take?- What data structure(s) should be returned? (E.g., a numpy
array,pd.DataFrameorFCSDataobject for each single-fluorophore control?) - Would a
full_outputflag (like with thegatemodule functions) provide better control over the amount of information returned? - Should the user be able to specify their own spillover matrix?
- What data structure(s) should be returned? (E.g., a numpy
- What form should the consolidated
meffunction take?
Relevant discussion from #340:
I've become skeptical about the need to have a dedicated module for "transformations". I think this came out of our old view that it was worth distinguishing between "channel" units and "a.u.", and therefore having a module that transformed between these two. But having worked with a lot of flow cytometry data, including data from more modern instruments which are stored directly in a.u., I started seeing channel units as an intermediate step that should not be used for anything. If present-day me had to remake FlowCal from scratch, I'd probably have FCSData objects be directly converted to a.u. upon loading, and eliminate the transform module, since we never used it for anything other than the
to_rfi()function. That way FCSData objects from old and new instruments will be automatically in a.u., improving consistency.
I've been reflecting on the
transformmodule. Some thoughts:
- It's always hard for me to remember how the MEF transformation traces its way through the
mefandtransformmodules. I would be in favor of simplifying its derivation and exposure to the user.- I think I originally thought there were going to be a lot more transformations we would want to support (e.g., log, logicle, etc.). In practice, those have largely manifested themselves in the
plotmodule.- The
transformmodule is still useful forFCSDatabookkeeping (e.g., making a copy of theFCSDataobject and updatingFCSData.range()). I could envision this functionality being absorbed intoFCSData, though (e.g., via aFCSData.transform()function).- The
transformmodule is also still useful for applying transforms to non-FCSDatadata (e.g., anumpyarray) in a standardized way. I don't know how many users use non-FCSDatadata, though. Moreover, iftransform.to_mef()andtransform.to_compensated()were moved back to their respective modules, those functions could still be written to support non-FCSDataarrays.- I agree
transform.to_rfi()might make more sense as an internal processing step ofFCSDataand doesn't really need to be exposed to the user as overtly as it currently is.- The
mefmodule currently kind of bends over backwards to interface with thetransformmodule (e.g., by producing a transformation function viamef.get_transform_fxn()). If that driving rationale is removed, the primarymefmodule interface point could possibly be simplified. I'm not sure what form it (andcompensate) should take to simplify them, though. (Do we still return transformation functions? Do the modules provide functions that operate directly on data, likegatemodule functions?)Upon reflection, I currently favor removing
transform, updatingFCSData, and simplifying the commonmefandcompensatemodule interfaces.