-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Faster implementation of linreg #16545
Conversation
I have changed the implementation of linreg following the discussion with @andreasnoack on julia-users. I would appreciate any feedback on the implementation, including what is the general preference for the return type, currently I am returning an Array but @andreasnoack has indicated that a Tuple might be preferred for potential speed increase (which I currently can't find evidence of) as well as being a more natural output, as the user might prefer to only use one of the two returned coeffcients. I like being able to do vector arithmetic on the output, but the longer I think about it, the more agnostic I am becoming. I have also tried to ensure that this pull request is rebased and all the other good git etiquette, but I am not sure I have done it correctly. Tips, feedback appreciated. |
Thanks for the non- The other reason is that |
I can easily move back to a Wouldn't having to use two functions for this lead to having the extra traversal of Your point about the type promotion is very interesting. I hadn't thought of that. I am struck by the difficulty of doing this correctly, especially if, like both my proposals allow for using Finally do you have any final insight on why the |
@@ -24,3 +24,4 @@ | |||
*.ji | |||
|
|||
.DS_Store | |||
get_toolchain.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I can not for the life of me figure out how to remove that file from the pull request (or requests as it seems to have combined both). I really just want the changes to linalg/generics.jl
and tests/linalg/generics.jl
. But man every time I try to follow the guides on rebasing or amending ... it nukes everything. Hardest 10 line change to code I have had to do. I am might be to dumb to use git :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to adding this file to gitignore, it comes from some windows-specific setup scripts. It could maybe be a separate commit or PR but rebasing it out can be slightly complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So so complicated. I was about to give up programming and go back to a slide rule...
Okay. I have cleaned up all the nonsense from the previous pulls. No more merge garbage, or messing with the upstream .gitignore. I will wait a bit to see if there are anymore opinions on the two versions of the code. And if not I will make the changes suggested by @andreasnoack as he has a better feeling for the statistical functions in Base. |
|
||
Output: | ||
|
||
- `[b0, b1]` - the coefficients for the line of best fit such that yhat = b0 + b1*x | ||
- `(a, b)` - the coefficients for the line of best fit satisfying a + b*x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer the previous wording and use of b0
and b1
. If it were me, I would just change the square brackets to parentheses and leave the description unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to use either, see if there is any consensus. I changed this so that it was consistent with the example given in the Base.rst
file. Changing to the parentheses is meant to be consistent with the code soon, I just can't figure out how to use the @test_approx_equal_eps
with a tuple return instead of the array...
LGTM. Do you consider it ready? If so then please squash the commits (migth be little fight with git the first time you do it but you'll have to go through this process sooner or later). |
Why not squash with github ui on merge? |
Is this docstring intended to replace the old one (in |
@KristofferC I didn't know about that option. So if an owner merges the PR, he can do the squash? |
Yes, when you press merge you can chose to squash it (and edit the commit message if you chose). It will also avoid the merge commit. |
See https://github.com/blog/2141-squash-your-commits - when you click "Merge pull request," but before you click "Confirm merge," there's an option you can select. It's sticky though, so keep an eye on it since you might not want to squash commits from every single PR, for those that are broken into good separate (all passing) pieces that may be worth bisecting within later. |
@tkelman There was no docstring before, but there is an entry in @andreasnoack I am happy to try and figure out how to do the squash if that is preferred ... learned more about git then I ever wanted in the last couple of days ... |
The contents of So I'd recommend deleting the one from |
@tkelman okay I have merged the docstring and the information in @andreasnoack fixing up the docstrings I see the case for weighted |
ref #15136 for the most recent discussion on docstring guidelines |
I'd forgotten the weighted version. Actually, it shouldn't be here. We have moved all the weighted statistics functions to |
@tkelman I am not sure what is happening but the @andreasnoack I have removed the weighted version, once I get the documentation stuff fixed up I will open a pull request in |
Okay I believe everything is good to go! Let me know if all the recent changes look good and I can figure out how to squash the commits. |
There's also |
I always do |
So I have my local copy of the branch rebased and squashed. But I don't understand how to push it over this current pull request. I tried but I get strange error messages about no the name of the upstream branch. Am I missing something simple? |
You need to force push over the |
OMG I hate git. Disregard all the commit noise ... |
Okay ready to rock. |
""" | ||
linreg(x, y) | ||
|
||
Perform simple linear regression using Ordinary Least Squares. Return `a` and `b` such |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns actually nevermind, I see the guidelines I linked to earlier recommend "Return"...
Return a tuple instead of an array Updated and added tests to deal with the tuple return change and argument types. Ensure new linreg docstring plays nice with doc system Moved documentation out of `helpdb` into docstring. Remove weighted linreg from Base.
Thanks for the contribution. |
Move from using the general matrix division operator to an explicit loop to calculate the OLS without extra memory allocation.