Skip to content

Generator rewrite#22

Merged
c-mita merged 48 commits into
masterfrom
generation_rewrite
Feb 22, 2017
Merged

Generator rewrite#22
c-mita merged 48 commits into
masterfrom
generation_rewrite

Conversation

@c-mita
Copy link
Copy Markdown
Contributor

@c-mita c-mita commented Nov 14, 2016

Rewrites generators to calculate, for each axis, an array of all positions.

Significant changes required to compound generator to handle this different
process, including some restrictions (regions can only be defined over axes
in consecutive generators, generators connected by regions must have the same
alternate_direction setting).

Documentation still to be updated and the compound generator code wants some
tidying. CompoundGenerator.get_point(..) does not apply mutators as the
interface to mutators has not been changed yet.

c-mita added 13 commits November 9, 2016 12:03
Used to test both axes against the x-axis, not comparing the y-axis
This proved to be tricky. The code for compound generator has
gotten quite complicated.
This functionality went AWOL briefly during the rewrite.
Tests that point preparation for ~100 million points (before region
filtering) happens within a few seconds.
Nested generators should always alternate back and forth (from their
perspective).

Adds the constraint that any set of flattened axes must all share a
common alternate_direction setting to allows reversal of a full dimension.

Adds altenerate_direction to LissajousGenerator.
Done to remove confusion when we start adding dataset indexes to points.
@c-mita c-mita force-pushed the generation_rewrite branch from 93f356d to 88d745d Compare November 14, 2016 11:49
@c-mita
Copy link
Copy Markdown
Contributor Author

c-mita commented Nov 14, 2016

@thomascobb - this is mostly for your review, rather than merging at this stage (docs and merge conflicts need sorting out)

Allows the same numpy-like arrays to be used in Jython. Not everything
is perfectly implemented and it's not as performant, but at least it'll
work.
It makes the code less clear, but saves on memory as new return arrays
do not have to be created. This is particularly significant when it
comes to Jython as the JVM may be memory constrained.
@c-mita c-mita force-pushed the generation_rewrite branch from f13e7fb to f2bf5b9 Compare November 15, 2016 12:58
@c-mita c-mita force-pushed the generation_rewrite branch 2 times, most recently from 187a531 to 6b24fa9 Compare November 21, 2016 14:48
i.e "x >= 0 && x < size" becomes "x >= 0 && x <= size"
This changes the alternating case slightly (may start in a different
direction)
@c-mita c-mita force-pushed the generation_rewrite branch from 6b24fa9 to aa400d9 Compare November 21, 2016 15:09
Copy link
Copy Markdown
Contributor

@GDYendell GDYendell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generators and ROI masks all look good. The prepare function in compoundgenerator is pretty huge and hard to follow; it could do with being broken up a bit. prepare and get_point would both be easier to follow with some more descriptive variable names. It might also be worthwhile increasing the Landscape strictness, as I am getting some lint on PyCharm.

Otherwise looks good! Are there any problems getting this to work with GDA?

excluders = list(self.excluders)
generators = list(self.generators)

# special case if we have rectangular regions on line generators
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we be given a grid with a RectangularROI? Shouldn't they just provide the smaller grid directly?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently the users specify in the GUI a bounding box (grid, circle, polygon), and a fill pattern (raster, spiral, lissajous), and there are multiple bits of GUI code that can do this, so it's better to detect the grid in a rectangle bit here...

for generator in generators:
generator.produce_points()
self.axes_points.update(generator.points)
self.axes_points_lower.update(generator.points_lower)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the bounds for the lowest generator, so there is a lot of extra generation and storing going on here. Is speed still an issue with this code?

- generators.index(gen_2)
if gen_diff < -1 or gen_diff > 1:
raise ValueError(
"Excluders must be defined on axes that are adjacent in " \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be a problem for some users? I can't remember the use case for trying to ensure we could run Excluders on any pair of axes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a problem at present, we can revisit it if it's a problem in the future...

if gen_diff == 1:
gen_1, gen_2 = gen_2, gen_1
axis_1, axis_2 = axis_2, axis_1
gen_diff = -1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used.

repeat *= len(dim["indicies"])
self.num = repeat
for dim in self.dimensions:
l = len(dim["indicies"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This l looks like a 1; slightly confusing...



#####
# first check if region spans two dimensions - merge if so
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't they always 2D?

"Generators tied by regions must have the same " \
"alternate_direction setting")
# merge "inner" into "outer"
if dim_diff == -1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Or is dim_diff == 0 the case for spiral and lissajous where one generator already contains the axes for a region? Does the list of dims have to match the list of excluders?

for dim in self.dimensions:
indicies = dim["indicies"]
i = n // dim["repeat"]
r = i // len(indicies)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used.

r = i // len(indicies)
i %= len(indicies)
k = indicies[i]
dim_reverse = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.

@c-mita
Copy link
Copy Markdown
Contributor Author

c-mita commented Nov 22, 2016

"Dimension" refers to a "collapsed" set of generators that are connected by regions. So two scannables that form a grid that are then filtered by a (non-rectangular) region will be merged into one "dimension".
Initially one is created for each generator, regions cause them to be merged.

This merging and subsequent mask generation is why there's the restriction on the axes a region can span - I don't know how to expand the mask arrays appropriately otherwise. Perhaps with more time, thought, and whiteboards, the restriction could be lifted.

The "special case" for grids with rectangular regions was asked for recently.

As for prepare being long and confusing - yeah, it is. It doubled in length over the course of this pull request.
The main sections that could be broken up are:

  1. Handle special case for line generators with rectangular regions
  2. Create dimensions
  3. Merge dimensions
  4. Create dimension masks.
    Each of these steps (excluding 1.) modify the "dimension" structure - merely breaking those steps into separate functions without addressing that is even more distasteful. But that is a minor obstacle that can be overcome by merely rearranging the data structures.

There's more cleanup to be done in addition to the linter stuff (removal of now unused iterators from non-compound generators, contains_point from regions, etc). get_point doesn't apply the effect of mutators either - fine if you only ever generate points via iterator, but it seems silly to prevent random access solutions when the functionality is there.
And obviously, the documentation needs a lot of updating.

The main problem with GDA is the numpy requirement - there is a jython numpy emulation (in scisoftpy) but there may be complications getting that included. It doesn't perform as well either. But with that (and some minor changes in GDA) it appears to work.

@GDYendell
Copy link
Copy Markdown
Contributor

Will it now be possible now to apply mutators before the excluders? Initially we decided that users would have to put up with points being randomly offset outside of the ROI because otherwise we would have to generate the points first. Since we have them all now at the start this could possibly be accounted for.

@thomascobb
Copy link
Copy Markdown

We actually want the mutators to be run after the excluders, otherwise we might have a different number of points in each iteration of a ROI'd scan, not useful if we want to run the same scan at different temperatures with different random offsets...

-----Original Message-----
From: Gary Yendell [mailto:notifications@github.com]
Sent: 22 November 2016 11:46
To: dls-controls/scanpointgenerator
Cc: Cobb, Tom (DLSLtd,RAL,TEC); Mention
Subject: Re: [dls-controls/scanpointgenerator] Generator rewrite (#22)

Will it now be possible now to apply mutators before the excluders?
Initially we decided that users would have to put up with points being
randomly offset outside of the ROI because otherwise we would have to
generate the points first. Since we have them all now at the start this could
possibly be accounted for.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <https://github.com/dls-
controls/scanpointgenerator/pull/22#issuecomment-262220638> , or mute
the thread <https://github.com/notifications/unsubscribe-
auth/AHWR5Jq7PZm5BrwxhetfwndrHKSHNGXIks5rAtYHgaJpZM4KxO6K> .
<https://github.com/notifications/beacon/AHWR5BrCdmbt0lBftCFfuLYOs0U
Hj-tcks5rAtYHgaJpZM4KxO6K.gif>

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

@c-mita
Copy link
Copy Markdown
Contributor Author

c-mita commented Nov 22, 2016

Will it now be possible now to apply mutators before the excluders?

I think this change makes that harder, not easier.

The fundamental idea in this pull request is to vectorise the "contains_point" step and delay creation of the Point object until it's actually required.
What happens here is that you don't have "all the points" already - you have a means of only generating the points that are contained in regions based on some large mask arrays and then some weird indexing. You also know exactly how many will be generated ahead of time along each dimension.

CompoundGenerator is now very different to other Generators. The name
"Generator" may now be inappropriate for "regular" generators.
The documentation structure needs to be reviewed, as it doesn't
obviously follow. Some changes may be desirable (particularly wrt
produce_points on Generators as it is side-effecting and doesn't return
anything).

The nature of the index attribute of points from CompoundGenerator is
also not spelled out.
@c-mita
Copy link
Copy Markdown
Contributor Author

c-mita commented Dec 6, 2016

@thomascobb At 33 commits this is getting to be a rather silly pull request.

@thomascobb
Copy link
Copy Markdown

Well the pull request is called "rewrite"...

Copy link
Copy Markdown

@thomascobb thomascobb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another partial review, I'll continue another time...

@@ -34,6 +34,9 @@ def contains_point(self, d):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete contains_point now?

stop = self.stop[axis]
d = stop - start
if self.num == 1:
self.points[axis_name] = np.array([start])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken, lets leave it as it is

@@ -21,7 +21,7 @@ def __init__(self, name, units, start, stop, num, alternate_direction=False):
units (str): The scannable units. E.g. "mm"
start (float/list(float)): The first position to be generated.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, scrap that, let's leave it as it is for now

self.names = names
self.units = units
self.points = None
self.points_lower = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are points_lower and points_upper used?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could we call them positions rather than points please? Keeps it consistent with their use in Point

yield p
def produce_points(self):
self.points = {}
self.bounds = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you declare self.bounds in __init__ please

d = self.phase_diff
fx = lambda t: x0 + A * np.sin(a * 2*m.pi * t/self.num + d)
fy = lambda t: y0 + B * np.sin(b * 2*m.pi * t/self.num)
x = fx(np.arange(self.num))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you've put it like this, can we push some of this to the base class? We could have produce_points() take an array of indexes (either np.arange(self.num) or np.arange(self.num + 1) - 0.5) and return the points or the bounds array. This would remove duplication in the Generators, which is good because we may want to write lots of them...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the base class could look like this:

positions = None
bounds = None

def prepare_array(self, index_array):
    raise NotImplementedError()

def prepare_positions(self):
    self.positions = self.prepare_array(np.arange(self.num))

def prepare_bounds(self):
    self.bounds = self.prepare_array(np.arange(self.num + 1) - 0.5)

Then the majority of generators only have to implement prepare_array()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable abstraction - line generator is the only one that doesn't currently work that way but it should be an easy change.


for excluder in excluders:
axis_1, axis_2 = excluder.scannables
gen_1 = [g for g in generators if axis_1 in g.axes][0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code appears a lot, should we have a find_generator(axis_name) function?

- generators.index(gen_2)
if gen_diff < -1 or gen_diff > 1:
raise ValueError(
"Excluders must be defined on axes that are adjacent in " \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a problem at present, we can revisit it if it's a problem in the future...

for point in iterator:
yield point
it = (self.get_point(n) for n in range_(self.num))
for m in self.mutators:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutators need to be applied in get_point(), not in iterator()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that means Mutator.mutate(iterator) should turn into Mutator.mutate(point, point_number)

generators = []
for generator in d['generators']:
generators.append(Generator.from_dict(generator))
class Dimension(object):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go in dimension.py please?

Replaces generator.produce_points with generator.prepare_bounds and
prepare_positions that call into the "virtual" method (implemented by
classes deriving generator) prepare_array that accepts an index array
used to produce points.
Prevents weird inconsistencies when prepare is called multiple times
(e.g. when passed repeatedly to plot_generator)
In Python3 3.0 // 2 returns 1.0, not 1
Requires a rewrite of RandomOffsetMutator, which currently does not
alter the bounds of a point. Doing so would require rethinking mutators
a little bit so the bound information could be updated.
The random offset generation is now required to be deterministic based
on the points passed (since the points can now be generated in a random
order) and must be fully consistent.

Fixes CompoundGenerator to call mutators in get_point.
@c-mita c-mita force-pushed the generation_rewrite branch from 5df47c3 to 3d490e7 Compare December 21, 2016 16:43
c-mita added 3 commits January 9, 2017 16:09
RandomOffsetMutator can now adjust a points boundaries consistently
(once again), but this requires passing the linear index for a point to
the mutate method (due to the potential for random access).

The Random class is not being used at the moment.
@c-mita c-mita force-pushed the generation_rewrite branch from 36175e7 to c4031a8 Compare February 20, 2017 10:36
@c-mita c-mita merged commit 868f35e into master Feb 22, 2017
c-mita added a commit that referenced this pull request Feb 22, 2017
Changes the way generators work and hence the interface most of the project.
Points from generators are now calculated at the start as a numpy array,
allowing vectorised operations to be performed when applying excluders to
filter points.

This allows us to answer questions regarding the size and dimensions of scans
without having to generate all the point objects (which can be very slow and
expensive) in exchange for having to hold large-ish mask arrays in memory.
@c-mita c-mita deleted the generation_rewrite branch March 14, 2017 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants