Generator rewrite#22
Conversation
Used to test both axes against the x-axis, not comparing the y-axis
This proved to be tricky. The code for compound generator has gotten quite complicated.
This functionality went AWOL briefly during the rewrite.
Tests that point preparation for ~100 million points (before region filtering) happens within a few seconds.
Nested generators should always alternate back and forth (from their perspective). Adds the constraint that any set of flattened axes must all share a common alternate_direction setting to allows reversal of a full dimension. Adds altenerate_direction to LissajousGenerator.
Done to remove confusion when we start adding dataset indexes to points.
93f356d to
88d745d
Compare
|
@thomascobb - this is mostly for your review, rather than merging at this stage (docs and merge conflicts need sorting out) |
Allows the same numpy-like arrays to be used in Jython. Not everything is perfectly implemented and it's not as performant, but at least it'll work.
It makes the code less clear, but saves on memory as new return arrays do not have to be created. This is particularly significant when it comes to Jython as the JVM may be memory constrained.
f13e7fb to
f2bf5b9
Compare
187a531 to
6b24fa9
Compare
i.e "x >= 0 && x < size" becomes "x >= 0 && x <= size"
This changes the alternating case slightly (may start in a different direction)
6b24fa9 to
aa400d9
Compare
GDYendell
left a comment
There was a problem hiding this comment.
The generators and ROI masks all look good. The prepare function in compoundgenerator is pretty huge and hard to follow; it could do with being broken up a bit. prepare and get_point would both be easier to follow with some more descriptive variable names. It might also be worthwhile increasing the Landscape strictness, as I am getting some lint on PyCharm.
Otherwise looks good! Are there any problems getting this to work with GDA?
| excluders = list(self.excluders) | ||
| generators = list(self.generators) | ||
|
|
||
| # special case if we have rectangular regions on line generators |
There was a problem hiding this comment.
Why would we be given a grid with a RectangularROI? Shouldn't they just provide the smaller grid directly?
There was a problem hiding this comment.
Apparently the users specify in the GUI a bounding box (grid, circle, polygon), and a fill pattern (raster, spiral, lissajous), and there are multiple bits of GUI code that can do this, so it's better to detect the grid in a rectangle bit here...
| for generator in generators: | ||
| generator.produce_points() | ||
| self.axes_points.update(generator.points) | ||
| self.axes_points_lower.update(generator.points_lower) |
There was a problem hiding this comment.
We only need the bounds for the lowest generator, so there is a lot of extra generation and storing going on here. Is speed still an issue with this code?
| - generators.index(gen_2) | ||
| if gen_diff < -1 or gen_diff > 1: | ||
| raise ValueError( | ||
| "Excluders must be defined on axes that are adjacent in " \ |
There was a problem hiding this comment.
Is this going to be a problem for some users? I can't remember the use case for trying to ensure we could run Excluders on any pair of axes.
There was a problem hiding this comment.
It's not a problem at present, we can revisit it if it's a problem in the future...
| if gen_diff == 1: | ||
| gen_1, gen_2 = gen_2, gen_1 | ||
| axis_1, axis_2 = axis_2, axis_1 | ||
| gen_diff = -1 |
There was a problem hiding this comment.
This doesn't seem to be used.
| repeat *= len(dim["indicies"]) | ||
| self.num = repeat | ||
| for dim in self.dimensions: | ||
| l = len(dim["indicies"]) |
There was a problem hiding this comment.
This l looks like a 1; slightly confusing...
|
|
||
|
|
||
| ##### | ||
| # first check if region spans two dimensions - merge if so |
| "Generators tied by regions must have the same " \ | ||
| "alternate_direction setting") | ||
| # merge "inner" into "outer" | ||
| if dim_diff == -1: |
There was a problem hiding this comment.
... Or is dim_diff == 0 the case for spiral and lissajous where one generator already contains the axes for a region? Does the list of dims have to match the list of excluders?
| for dim in self.dimensions: | ||
| indicies = dim["indicies"] | ||
| i = n // dim["repeat"] | ||
| r = i // len(indicies) |
| r = i // len(indicies) | ||
| i %= len(indicies) | ||
| k = indicies[i] | ||
| dim_reverse = False |
|
"Dimension" refers to a "collapsed" set of generators that are connected by regions. So two scannables that form a grid that are then filtered by a (non-rectangular) region will be merged into one "dimension". This merging and subsequent mask generation is why there's the restriction on the axes a region can span - I don't know how to expand the mask arrays appropriately otherwise. Perhaps with more time, thought, and whiteboards, the restriction could be lifted. The "special case" for grids with rectangular regions was asked for recently. As for
There's more cleanup to be done in addition to the linter stuff (removal of now unused iterators from non-compound generators, contains_point from regions, etc). The main problem with GDA is the numpy requirement - there is a jython numpy emulation (in scisoftpy) but there may be complications getting that included. It doesn't perform as well either. But with that (and some minor changes in GDA) it appears to work. |
|
Will it now be possible now to apply mutators before the excluders? Initially we decided that users would have to put up with points being randomly offset outside of the ROI because otherwise we would have to generate the points first. Since we have them all now at the start this could possibly be accounted for. |
|
We actually want the mutators to be run after the excluders, otherwise we might have a different number of points in each iteration of a ROI'd scan, not useful if we want to run the same scan at different temperatures with different random offsets...
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. |
I think this change makes that harder, not easier. The fundamental idea in this pull request is to vectorise the "contains_point" step and delay creation of the |
CompoundGenerator is now very different to other Generators. The name "Generator" may now be inappropriate for "regular" generators.
The documentation structure needs to be reviewed, as it doesn't obviously follow. Some changes may be desirable (particularly wrt produce_points on Generators as it is side-effecting and doesn't return anything). The nature of the index attribute of points from CompoundGenerator is also not spelled out.
|
@thomascobb At 33 commits this is getting to be a rather silly pull request. |
|
Well the pull request is called "rewrite"... |
thomascobb
left a comment
There was a problem hiding this comment.
Another partial review, I'll continue another time...
| @@ -34,6 +34,9 @@ def contains_point(self, d): | |||
|
|
|||
| stop = self.stop[axis] | ||
| d = stop - start | ||
| if self.num == 1: | ||
| self.points[axis_name] = np.array([start]) |
There was a problem hiding this comment.
Point taken, lets leave it as it is
| @@ -21,7 +21,7 @@ def __init__(self, name, units, start, stop, num, alternate_direction=False): | |||
| units (str): The scannable units. E.g. "mm" | |||
| start (float/list(float)): The first position to be generated. | |||
There was a problem hiding this comment.
Actually, scrap that, let's leave it as it is for now
| self.names = names | ||
| self.units = units | ||
| self.points = None | ||
| self.points_lower = None |
There was a problem hiding this comment.
Where are points_lower and points_upper used?
There was a problem hiding this comment.
And could we call them positions rather than points please? Keeps it consistent with their use in Point
| yield p | ||
| def produce_points(self): | ||
| self.points = {} | ||
| self.bounds = {} |
There was a problem hiding this comment.
Can you declare self.bounds in __init__ please
| d = self.phase_diff | ||
| fx = lambda t: x0 + A * np.sin(a * 2*m.pi * t/self.num + d) | ||
| fy = lambda t: y0 + B * np.sin(b * 2*m.pi * t/self.num) | ||
| x = fx(np.arange(self.num)) |
There was a problem hiding this comment.
Now you've put it like this, can we push some of this to the base class? We could have produce_points() take an array of indexes (either np.arange(self.num) or np.arange(self.num + 1) - 0.5) and return the points or the bounds array. This would remove duplication in the Generators, which is good because we may want to write lots of them...
There was a problem hiding this comment.
Maybe the base class could look like this:
positions = None
bounds = None
def prepare_array(self, index_array):
raise NotImplementedError()
def prepare_positions(self):
self.positions = self.prepare_array(np.arange(self.num))
def prepare_bounds(self):
self.bounds = self.prepare_array(np.arange(self.num + 1) - 0.5)Then the majority of generators only have to implement prepare_array()
There was a problem hiding this comment.
Seems like a reasonable abstraction - line generator is the only one that doesn't currently work that way but it should be an easy change.
|
|
||
| for excluder in excluders: | ||
| axis_1, axis_2 = excluder.scannables | ||
| gen_1 = [g for g in generators if axis_1 in g.axes][0] |
There was a problem hiding this comment.
This code appears a lot, should we have a find_generator(axis_name) function?
| - generators.index(gen_2) | ||
| if gen_diff < -1 or gen_diff > 1: | ||
| raise ValueError( | ||
| "Excluders must be defined on axes that are adjacent in " \ |
There was a problem hiding this comment.
It's not a problem at present, we can revisit it if it's a problem in the future...
| for point in iterator: | ||
| yield point | ||
| it = (self.get_point(n) for n in range_(self.num)) | ||
| for m in self.mutators: |
There was a problem hiding this comment.
Mutators need to be applied in get_point(), not in iterator()
There was a problem hiding this comment.
I guess that means Mutator.mutate(iterator) should turn into Mutator.mutate(point, point_number)
| generators = [] | ||
| for generator in d['generators']: | ||
| generators.append(Generator.from_dict(generator)) | ||
| class Dimension(object): |
There was a problem hiding this comment.
Can this go in dimension.py please?
Replaces generator.produce_points with generator.prepare_bounds and prepare_positions that call into the "virtual" method (implemented by classes deriving generator) prepare_array that accepts an index array used to produce points.
Prevents weird inconsistencies when prepare is called multiple times (e.g. when passed repeatedly to plot_generator)
In Python3 3.0 // 2 returns 1.0, not 1
Requires a rewrite of RandomOffsetMutator, which currently does not alter the bounds of a point. Doing so would require rethinking mutators a little bit so the bound information could be updated. The random offset generation is now required to be deterministic based on the points passed (since the points can now be generated in a random order) and must be fully consistent. Fixes CompoundGenerator to call mutators in get_point.
5df47c3 to
3d490e7
Compare
RandomOffsetMutator can now adjust a points boundaries consistently (once again), but this requires passing the linear index for a point to the mutate method (due to the potential for random access). The Random class is not being used at the moment.
36175e7 to
c4031a8
Compare
It is now expected for this class to become public API (after some changes that are not part of this commit)
The first line of the comment sometimes replaces the test name during large test runs, which is unhelpful in this case.
Changes the way generators work and hence the interface most of the project. Points from generators are now calculated at the start as a numpy array, allowing vectorised operations to be performed when applying excluders to filter points. This allows us to answer questions regarding the size and dimensions of scans without having to generate all the point objects (which can be very slow and expensive) in exchange for having to hold large-ish mask arrays in memory.
Rewrites generators to calculate, for each axis, an array of all positions.
Significant changes required to compound generator to handle this different
process, including some restrictions (regions can only be defined over axes
in consecutive generators, generators connected by regions must have the same
alternate_direction setting).
Documentation still to be updated and the compound generator code wants some
tidying. CompoundGenerator.get_point(..) does not apply mutators as the
interface to mutators has not been changed yet.