Skip to content

Conversation

@ctSkennerton
Copy link
Contributor

Hi,

Would you be interested in having a few extra basic statistics measures added into Polymath?

Best,
Connor

sum := 0.

self do: [ :i | i < 0
ifTrue: [Error new signal: 'The harmonic mean cannot be calculated on negative numbers'] ifFalse: [ sum := (sum + (1 / i))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a guard clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this to use a guard clause.

@@ -0,0 +1,37 @@
Extension { #name : #Collection }

{ #category : #'*Math-Statistics' }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeStinckwich I suggested that geometric and harmonic mean computation should be in our Statistics package after some quick research on Wikipedia. Would you agree? Are there alternatives?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is an arithmetic-geometric mean already implemented in ArbitraryPrecisionFloat.
This is absolutely not specific to statistics package.
Eventually, for small utilities like that, we can tolerate duplications, or we may extract those common utilities in a specific package...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the agm: is implemented in ArbitraryPrecisionFloat, my understanding is that you want this generalized to a collection, right? In this case, there is no conflict, just two implementations.

Copy link
Contributor

@nicolas-cellier-aka-nice nicolas-cellier-aka-nice Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, case of miss-reading, you don't want an arithmetic-geometric mean, you just want indepedent arithmetic, geometric, harmonic means? Sorry for the noise.

Also add in a test to make sure that passing in a negative number causes an error.
"{ 4. 1. 1 / 32} geometricMean >>> (1/2)"
"#(3.14 1 4.56 0.333) geometricMean >>> 1.4776945822943937"

^(self reduce: [ :a :b | a * b ] ) raisedToFraction: 1 / self size.
Copy link
Contributor

@nicolas-cellier-aka-nice nicolas-cellier-aka-nice Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may result in Float overflow/underflow on large collections.
An alternative is to sum the logarithms. (a*b*c)^(1/3)=exp(ln(a*b*c))^(1/3)=exp((ln(a)+ln(b)+ln(c))/3)
That is something like ^((self collect: #ln) sum / self size) exp.
The price is that we may answer an inexact Float instead of an exact Integer in some edge cases.
For example{1/3. 2/3. 8/3. 16/3} geometricMean would answer 1.3333333333333333 instead of (4/3)...
Also note that the geometricMean may not be well defined for negative numbers, likewise the harmonicMean...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I have updated the implementation with your suggestion and added a guard clause similar to harmonicMean.

* Add in guard clause to prevent use with negative numbers as the behaviour is not well defined.
* Prevent a potential float under/overflow by summing the logarithms as suggested by Nicolas Cellier.
@hemalvarambhia
Copy link
Contributor

hemalvarambhia commented Jan 23, 2021

Wondering whether this introduction of geometric and harmonic means should be an extension to the Collection class. Here I plead ignorance. Could an expert enlighten me? For example, if we have a collection of strings, does a geometric mean make sense?

As an alternative, my thinking is we introduce the idea of a Statistical Sample that is backed by a collection as it makes the concept explicit Thoughts?

@ctSkennerton
Copy link
Contributor Author

I agree, it only makes sense if the collection contains a numeric type. When I was implementing these I took inspiration from the statistical methods already defined on collections that similarly only work if they contain numbers. I'm new to smalltalk so I don't know what's considered good style in this case.

@SergeStinckwich SergeStinckwich added this to the v1.0.3 milestone Feb 11, 2021
@hemalvarambhia
Copy link
Contributor

@ctSkennerton hi there. Could you try to extract a Statistical Sample class as a refactor. I'd love to see its impact on the code

"#(2.5, 3, 10) harmonicMean >>> 3.6"
| sum |

self detect: [ :i | i < 0 ] ifFound: [ Error new signal: 'The harmonic mean cannot be calculated on negative numbers' ] .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe test i <= 0 here too

"#(3.14 1 4.56 0.333) geometricMean >>> 1.4776945822943937"
"{1/3. 2/3. 8/3. 16/3} geometricMean >>> 1.3333333333333335"

self detect: [ :i | i < 0 ] ifFound: [ Error new signal: 'The geometric mean should not be calculated on negative numbers' ] .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe test i <= 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I've changed the logic to be <=0

@ctSkennerton
Copy link
Contributor Author

@hemalvarambhia Maybe I can sketch out what a StatisticalSample class would be to see if I understand correctly. I'm thinking there would have a number of class methods for the various operations that take a collection as an input parameter.

StatisticalSample class >>> harmonicMean: aCollection
StatisticalSample class >>> geometricMean: aCollection
StatisticalSample class >>> mode: aCollection

(This could then be further developed with other measures as needed)

What do you think?

@hemalvarambhia
Copy link
Contributor

@hemalvarambhia Maybe I can sketch out what a StatisticalSample class would be to see if I understand correctly. I'm thinking there would have a number of class methods for the various operations that take a collection as an input parameter.

StatisticalSample class >>> harmonicMean: aCollection
StatisticalSample class >>> geometricMean: aCollection
StatisticalSample class >>> mode: aCollection

(This could then be further developed with other measures as needed)

What do you think?

You've got the idea. My thinking is that the methods would be instance ones, with the collection being a state variable.

@ctSkennerton
Copy link
Contributor Author

Ok, I've converted the code to use a new class PMStatisticalSample

@hemalvarambhia
Copy link
Contributor

hemalvarambhia commented Mar 8, 2021

Merge at your leisure, @ctSkennerton. @nicolas-cellier-aka-nice are you happy that your comments have been addressed?

@ctSkennerton
Copy link
Contributor Author

@hemalvarambhia I don't have permission to merge.

@hemalvarambhia hemalvarambhia merged commit 28ed5fa into PolyMathOrg:master Mar 8, 2021
@hemalvarambhia
Copy link
Contributor

@ctSkennerton merged. Thank you for contributing. Looking forward to your next one. :)

@hemalvarambhia
Copy link
Contributor

@ctSkennerton love the documentation by the way. Very thorough!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants