Generator rewrite by c-mita · Pull Request #22 · DiamondLightSource/scanpointgenerator

c-mita · 2016-11-14T11:47:47Z

Rewrites generators to calculate, for each axis, an array of all positions.

Significant changes required to compound generator to handle this different
process, including some restrictions (regions can only be defined over axes
in consecutive generators, generators connected by regions must have the same
alternate_direction setting).

Documentation still to be updated and the compound generator code wants some
tidying. CompoundGenerator.get_point(..) does not apply mutators as the
interface to mutators has not been changed yet.

Used to test both axes against the x-axis, not comparing the y-axis

This proved to be tricky. The code for compound generator has gotten quite complicated.

This functionality went AWOL briefly during the rewrite.

Tests that point preparation for ~100 million points (before region filtering) happens within a few seconds.

Nested generators should always alternate back and forth (from their perspective). Adds the constraint that any set of flattened axes must all share a common alternate_direction setting to allows reversal of a full dimension. Adds altenerate_direction to LissajousGenerator.

Done to remove confusion when we start adding dataset indexes to points.

c-mita · 2016-11-14T11:55:59Z

@thomascobb - this is mostly for your review, rather than merging at this stage (docs and merge conflicts need sorting out)

Allows the same numpy-like arrays to be used in Jython. Not everything is perfectly implemented and it's not as performant, but at least it'll work.

It makes the code less clear, but saves on memory as new return arrays do not have to be created. This is particularly significant when it comes to Jython as the JVM may be memory constrained.

i.e "x >= 0 && x < size" becomes "x >= 0 && x <= size"

This changes the alternating case slightly (may start in a different direction)

GDYendell

The generators and ROI masks all look good. The prepare function in compoundgenerator is pretty huge and hard to follow; it could do with being broken up a bit. prepare and get_point would both be easier to follow with some more descriptive variable names. It might also be worthwhile increasing the Landscape strictness, as I am getting some lint on PyCharm.

Otherwise looks good! Are there any problems getting this to work with GDA?

GDYendell · 2016-11-21T16:18:27Z

+        excluders = list(self.excluders)
+        generators = list(self.generators)
+
+        # special case if we have rectangular regions on line generators


Why would we be given a grid with a RectangularROI? Shouldn't they just provide the smaller grid directly?

Apparently the users specify in the GUI a bounding box (grid, circle, polygon), and a fill pattern (raster, spiral, lissajous), and there are multiple bits of GUI code that can do this, so it's better to detect the grid in a rectangle bit here...

GDYendell · 2016-11-21T16:34:29Z

+        for generator in generators:
+            generator.produce_points()
+            self.axes_points.update(generator.points)
+            self.axes_points_lower.update(generator.points_lower)


We only need the bounds for the lowest generator, so there is a lot of extra generation and storing going on here. Is speed still an issue with this code?

GDYendell · 2016-11-21T16:38:17Z

+                - generators.index(gen_2)
+            if gen_diff < -1 or gen_diff > 1:
+                raise ValueError(
+                    "Excluders must be defined on axes that are adjacent in " \


Is this going to be a problem for some users? I can't remember the use case for trying to ensure we could run Excluders on any pair of axes.

It's not a problem at present, we can revisit it if it's a problem in the future...

GDYendell · 2016-11-21T16:38:19Z

+            if gen_diff == 1:
+                gen_1, gen_2 = gen_2, gen_1
+                axis_1, axis_2 = axis_2, axis_1
+                gen_diff = -1


This doesn't seem to be used.

GDYendell · 2016-11-22T10:10:00Z

+            repeat *= len(dim["indicies"])
+        self.num = repeat
+        for dim in self.dimensions:
+            l = len(dim["indicies"])


This l looks like a 1; slightly confusing...

GDYendell · 2016-11-22T10:24:57Z

+
+
+            #####
+            # first check if region spans two dimensions - merge if so


Aren't they always 2D?

GDYendell · 2016-11-22T10:29:01Z

+                    "Generators tied by regions must have the same " \
+                            "alternate_direction setting")
+            # merge "inner" into "outer"
+            if dim_diff == -1:


... Or is dim_diff == 0 the case for spiral and lissajous where one generator already contains the axes for a region? Does the list of dims have to match the list of excluders?

GDYendell · 2016-11-22T10:43:03Z

+        for dim in self.dimensions:
+            indicies = dim["indicies"]
+            i = n // dim["repeat"]
+            r = i // len(indicies)


This isn't used.

GDYendell · 2016-11-22T10:43:10Z

+            r = i // len(indicies)
+            i %= len(indicies)
+            k = indicies[i]
+            dim_reverse = False


c-mita · 2016-11-22T11:38:40Z

"Dimension" refers to a "collapsed" set of generators that are connected by regions. So two scannables that form a grid that are then filtered by a (non-rectangular) region will be merged into one "dimension".
Initially one is created for each generator, regions cause them to be merged.

This merging and subsequent mask generation is why there's the restriction on the axes a region can span - I don't know how to expand the mask arrays appropriately otherwise. Perhaps with more time, thought, and whiteboards, the restriction could be lifted.

The "special case" for grids with rectangular regions was asked for recently.

As for prepare being long and confusing - yeah, it is. It doubled in length over the course of this pull request.
The main sections that could be broken up are:

Handle special case for line generators with rectangular regions
Create dimensions
Merge dimensions
Create dimension masks.
Each of these steps (excluding 1.) modify the "dimension" structure - merely breaking those steps into separate functions without addressing that is even more distasteful. But that is a minor obstacle that can be overcome by merely rearranging the data structures.

There's more cleanup to be done in addition to the linter stuff (removal of now unused iterators from non-compound generators, contains_point from regions, etc). get_point doesn't apply the effect of mutators either - fine if you only ever generate points via iterator, but it seems silly to prevent random access solutions when the functionality is there.
And obviously, the documentation needs a lot of updating.

The main problem with GDA is the numpy requirement - there is a jython numpy emulation (in scisoftpy) but there may be complications getting that included. It doesn't perform as well either. But with that (and some minor changes in GDA) it appears to work.

GDYendell · 2016-11-22T11:46:14Z

Will it now be possible now to apply mutators before the excluders? Initially we decided that users would have to put up with points being randomly offset outside of the ROI because otherwise we would have to generate the points first. Since we have them all now at the start this could possibly be accounted for.

thomascobb · 2016-11-22T11:56:23Z

We actually want the mutators to be run after the excluders, otherwise we might have a different number of points in each iteration of a ROI'd scan, not useful if we want to run the same scan at different temperatures with different random offsets...

-----Original Message-----
From: Gary Yendell [mailto:notifications@github.com]
Sent: 22 November 2016 11:46
To: dls-controls/scanpointgenerator
Cc: Cobb, Tom (DLSLtd,RAL,TEC); Mention
Subject: Re: [dls-controls/scanpointgenerator] Generator rewrite (#22)

Will it now be possible now to apply mutators before the excluders?
Initially we decided that users would have to put up with points being
randomly offset outside of the ROI because otherwise we would have to
generate the points first. Since we have them all now at the start this could
possibly be accounted for.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <https://github.com/dls-
controls/scanpointgenerator/pull/22#issuecomment-262220638> , or mute
the thread <https://github.com/notifications/unsubscribe-
auth/AHWR5Jq7PZm5BrwxhetfwndrHKSHNGXIks5rAtYHgaJpZM4KxO6K> .
<https://github.com/notifications/beacon/AHWR5BrCdmbt0lBftCFfuLYOs0U
Hj-tcks5rAtYHgaJpZM4KxO6K.gif>

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

c-mita · 2016-11-22T11:57:43Z

Will it now be possible now to apply mutators before the excluders?

I think this change makes that harder, not easier.

The fundamental idea in this pull request is to vectorise the "contains_point" step and delay creation of the Point object until it's actually required.
What happens here is that you don't have "all the points" already - you have a means of only generating the points that are contained in regions based on some large mask arrays and then some weird indexing. You also know exactly how many will be generated ahead of time along each dimension.

CompoundGenerator is now very different to other Generators. The name "Generator" may now be inappropriate for "regular" generators.

The documentation structure needs to be reviewed, as it doesn't obviously follow. Some changes may be desirable (particularly wrt produce_points on Generators as it is side-effecting and doesn't return anything). The nature of the index attribute of points from CompoundGenerator is also not spelled out.

c-mita · 2016-12-06T17:13:15Z

@thomascobb At 33 commits this is getting to be a rather silly pull request.

thomascobb · 2016-12-14T15:42:00Z

Well the pull request is called "rewrite"...

thomascobb

Another partial review, I'll continue another time...

thomascobb · 2016-12-14T15:46:28Z

@@ -34,6 +34,9 @@ def contains_point(self, d):



Can we delete contains_point now?

thomascobb · 2016-12-14T15:48:32Z

+            stop = self.stop[axis]
+            d = stop - start
+            if self.num == 1:
+                self.points[axis_name] = np.array([start])


Point taken, lets leave it as it is

thomascobb · 2016-12-14T15:50:36Z

@@ -21,7 +21,7 @@ def __init__(self, name, units, start, stop, num, alternate_direction=False):
            units (str): The scannable units. E.g. "mm"
            start (float/list(float)): The first position to be generated.


Actually, scrap that, let's leave it as it is for now

thomascobb · 2016-12-14T15:52:07Z

        self.names = names
        self.units = units
+        self.points = None
+        self.points_lower = None


Where are points_lower and points_upper used?

And could we call them positions rather than points please? Keeps it consistent with their use in Point

thomascobb · 2016-12-14T15:52:23Z

-            yield p
+    def produce_points(self):
+        self.points = {}
+        self.bounds = {}


Can you declare self.bounds in __init__ please

thomascobb · 2016-12-14T16:28:52Z

+        d = self.phase_diff
+        fx = lambda t: x0 + A * np.sin(a * 2*m.pi * t/self.num + d)
+        fy = lambda t: y0 + B * np.sin(b * 2*m.pi * t/self.num)
+        x = fx(np.arange(self.num))


Now you've put it like this, can we push some of this to the base class? We could have produce_points() take an array of indexes (either np.arange(self.num) or np.arange(self.num + 1) - 0.5) and return the points or the bounds array. This would remove duplication in the Generators, which is good because we may want to write lots of them...

Maybe the base class could look like this:

positions = None bounds = None def prepare_array(self, index_array): raise NotImplementedError() def prepare_positions(self): self.positions = self.prepare_array(np.arange(self.num)) def prepare_bounds(self): self.bounds = self.prepare_array(np.arange(self.num + 1) - 0.5)

Then the majority of generators only have to implement prepare_array()

Seems like a reasonable abstraction - line generator is the only one that doesn't currently work that way but it should be an easy change.

thomascobb · 2016-12-14T16:42:49Z

+
+        for excluder in excluders:
+            axis_1, axis_2 = excluder.scannables
+            gen_1 = [g for g in generators if axis_1 in g.axes][0]


This code appears a lot, should we have a find_generator(axis_name) function?

thomascobb · 2016-12-14T16:43:30Z

+                - generators.index(gen_2)
+            if gen_diff < -1 or gen_diff > 1:
+                raise ValueError(
+                    "Excluders must be defined on axes that are adjacent in " \


It's not a problem at present, we can revisit it if it's a problem in the future...

thomascobb · 2016-12-14T16:46:02Z

-            for point in iterator:
-                yield point
+        it = (self.get_point(n) for n in range_(self.num))
+        for m in self.mutators:


Mutators need to be applied in get_point(), not in iterator()

I guess that means Mutator.mutate(iterator) should turn into Mutator.mutate(point, point_number)

thomascobb · 2016-12-14T16:46:49Z

-        generators = []
-        for generator in d['generators']:
-            generators.append(Generator.from_dict(generator))
+class Dimension(object):


Can this go in dimension.py please?

Replaces generator.produce_points with generator.prepare_bounds and prepare_positions that call into the "virtual" method (implemented by classes deriving generator) prepare_array that accepts an index array used to produce points.

Prevents weird inconsistencies when prepare is called multiple times (e.g. when passed repeatedly to plot_generator)

In Python3 3.0 // 2 returns 1.0, not 1

Requires a rewrite of RandomOffsetMutator, which currently does not alter the bounds of a point. Doing so would require rethinking mutators a little bit so the bound information could be updated. The random offset generation is now required to be deterministic based on the points passed (since the points can now be generated in a random order) and must be fully consistent. Fixes CompoundGenerator to call mutators in get_point.

RandomOffsetMutator can now adjust a points boundaries consistently (once again), but this requires passing the linear index for a point to the mutate method (due to the potential for random access). The Random class is not being used at the moment.

It is now expected for this class to become public API (after some changes that are not part of this commit)

The first line of the comment sometimes replaces the test name during large test runs, which is unhelpful in this case.

Changes the way generators work and hence the interface most of the project. Points from generators are now calculated at the start as a numpy array, allowing vectorised operations to be performed when applying excluders to filter points. This allows us to answer questions regarding the size and dimensions of scans without having to generate all the point objects (which can be very slow and expensive) in exchange for having to hold large-ish mask arrays in memory.

c-mita added 13 commits November 9, 2016 12:03

Add means of pre-calculating value arrays in generators

0db41f7

Adjust SectorROI contains_point calculation

f861456

Fix PointROI contains_point bug

4d77d8e

Used to test both axes against the x-axis, not comparing the y-axis

Add mask generation for points in regions

2ba7de3

Rewrite compound generator to handle new point generation mechanism

9b026a4

Handle generators that alternate directions in compound generator

4310c6a

This proved to be tricky. The code for compound generator has gotten quite complicated.

Call mutators in CompoundGenerator.iterator

ed7f078

This functionality went AWOL briefly during the rewrite.

Rewrite tests for CompoundGenerator after its big change

c428a68

Fix up other tests after Generator changes

8739983

Add time-sensitive test for compound generator for a large scan.

ae15b8d

Tests that point preparation for ~100 million points (before region filtering) happens within a few seconds.

Rename internal index dict to dimension in compound generator

0082b8f

Done to remove confusion when we start adding dataset indexes to points.

Add dimension indexes to points in CompoundGenerator

88d745d

c-mita force-pushed the generation_rewrite branch from 93f356d to 88d745d Compare November 14, 2016 11:49

c-mita added 2 commits November 15, 2016 10:20

Use scisoftpy in Jython instead of numpy

deae585

Allows the same numpy-like arrays to be used in Jython. Not everything is perfectly implemented and it's not as performant, but at least it'll work.

Use in-place array operators in roi mask calculations

f2bf5b9

It makes the code less clear, but saves on memory as new return arrays do not have to be created. This is particularly significant when it comes to Jython as the JVM may be memory constrained.

c-mita force-pushed the generation_rewrite branch from f13e7fb to f2bf5b9 Compare November 15, 2016 12:58

Permit omission of the alternate setting on outermost generator

f6938bf

c-mita force-pushed the generation_rewrite branch 2 times, most recently from 187a531 to 6b24fa9 Compare November 21, 2016 14:48

c-mita added 2 commits November 21, 2016 15:09

Change RectangularROI to include points on both sides of boundary

7aabf9f

i.e "x >= 0 && x < size" becomes "x >= 0 && x <= size"

Do not merge dimensions with rectangular regions over line generators

aa400d9

This changes the alternating case slightly (may start in a different direction)

c-mita force-pushed the generation_rewrite branch from 6b24fa9 to aa400d9 Compare November 21, 2016 15:09

Handle single point case in LineGenerator

831dbd7

GDYendell reviewed Nov 22, 2016

View reviewed changes

c-mita added 6 commits December 5, 2016 14:34

Fix LineGenerator point calculation to ensure float division

5de5594

Add produce_points to Generator and stop CompoundGenerator deriving it.

8ed4063

CompoundGenerator is now very different to other Generators. The name "Generator" may now be inappropriate for "regular" generators.

Merge remote-tracking branch 'origin/master' into generation_rewrite

dc05577

Remove ArrayGenerator

903d2ad

Only apply bounds to innermost generator in CompoundGenerator

1460605

thomascobb reviewed Dec 14, 2016

View reviewed changes

c-mita added 5 commits December 15, 2016 15:07

Rename generator.points to generator.positions

a142209

Reset internal state in CompoundGenerator.prepare

6235893

Prevents weird inconsistencies when prepare is called multiple times (e.g. when passed repeatedly to plot_generator)

Force integer divisions to actually return integers

e6aadbb

In Python3 3.0 // 2 returns 1.0, not 1

c-mita force-pushed the generation_rewrite branch from 5df47c3 to 3d490e7 Compare December 21, 2016 16:43

c-mita added 3 commits January 9, 2017 16:09

Fix documentation following changes to generator point production.

2501c05

Remove redundant attributes in generator __init__'s

c4031a8

c-mita force-pushed the generation_rewrite branch from 36175e7 to c4031a8 Compare February 20, 2017 10:36

c-mita added 7 commits February 20, 2017 13:28

Move Dimension from compoundgenerator.py to new dimension.py

daf887f

It is now expected for this class to become public API (after some changes that are not part of this commit)

Rename generator.num to generator.size

16b1d28

Fix spelling of indices in compoundgenerator.py

d1d8465

Make CompoundGenerator.prepare a no-op on successive calls

58b441b

Add test class for Dimension

c13849c

Use Dimension size attribute instead of multiplying generator sizes

3f3cd50

Move comment on polygonal roi test

a1775d7

The first line of the comment sometimes replaces the test name during large test runs, which is unhelpful in this case.

c-mita merged commit 868f35e into master Feb 22, 2017

c-mita deleted the generation_rewrite branch March 14, 2017 13:50

c-mita mentioned this pull request Mar 17, 2017

CompoundGenerator performance test #39

Open



		#####
		# first check if region spans two dimensions - merge if so

		@@ -21,7 +21,7 @@ def __init__(self, name, units, start, stop, num, alternate_direction=False):
		units (str): The scannable units. E.g. "mm"
		start (float/list(float)): The first position to be generated.

Conversation

c-mita commented Nov 14, 2016

Uh oh!

c-mita commented Nov 14, 2016

Uh oh!

GDYendell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c-mita commented Nov 22, 2016

Uh oh!

GDYendell commented Nov 22, 2016

Uh oh!

thomascobb commented Nov 22, 2016

Uh oh!

c-mita commented Nov 22, 2016

Uh oh!

c-mita commented Dec 6, 2016

Uh oh!

thomascobb commented Dec 14, 2016

Uh oh!

thomascobb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!