Skip to content

copy shared columns before setkey#3768

Merged
mattdowle merged 10 commits intomasterfrom
setkey_copies
Aug 27, 2019
Merged

copy shared columns before setkey#3768
mattdowle merged 10 commits intomasterfrom
setkey_copies

Conversation

@MichaelChirico
Copy link
Copy Markdown
Member

@MichaelChirico MichaelChirico commented Aug 15, 2019

Closes #3496
Closes #3766

See also: follow-up @ #3791

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 15, 2019

Codecov Report

Merging #3768 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3768      +/-   ##
==========================================
+ Coverage   99.41%   99.41%   +<.01%     
==========================================
  Files          71       71              
  Lines       13197    13224      +27     
==========================================
+ Hits        13120    13147      +27     
  Misses         77       77
Impacted Files Coverage Δ
R/setkey.R 100% <100%> (ø) ⬆️
src/utils.c 100% <100%> (ø) ⬆️
R/data.table.R 100% <100%> (ø) ⬆️
src/reorder.c 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81af9b6...f7031ba. Read the comment docs.

@MichaelChirico
Copy link
Copy Markdown
Member Author

Wasn't sure about the changes to data.table.R, but incidentally, they fix #2245. So I'll leave those changes & add a test to close that.

@MichaelChirico MichaelChirico changed the title Closes #3496 and #3766 -- detect & block setkey from double-reordering identical columns Closes #3496, #3766, and #2245 -- detect & block setkey from double-reordering identical columns Aug 18, 2019
Copy link
Copy Markdown
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe we can try revert changes to data.table.R and see if it will still do what is needed?

@mattdowle mattdowle added this to the 1.12.4 milestone Aug 26, 2019
…ause. Faster too by avoiding address() at R level which creates character strings for addresses and avoiding hash table of duplicated(); only noticeable for very many columns.
@mattdowle mattdowle changed the title Closes #3496, #3766, and #2245 -- detect & block setkey from double-reordering identical columns copy shared columns before setkey Aug 27, 2019
@mattdowle
Copy link
Copy Markdown
Member

Great fix. I just followed up by creating copySharedColumns at C level. Should be faster when there are many columns by avoiding creating a character vector of each column's address to be passed to duplicate() which then creates a hash table on the character strings. We can use this copySharedColumns at other places at C level too; e.g. in :=. Since I guess there's also a problem with using := on a shared memory column (both columns get updated)?
As you wrote in the heading comment, it should be fine to return shared memory columns from j. So yes the data.table.R changes in this PR (other than the part which fixes .SD lock) could be reverted as you said.

return ans;
}

void copySharedColumns(SEXP x) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@MichaelChirico
Copy link
Copy Markdown
Member Author

OK. Would have to sit back down with this to figure which part exactly is needed to solve #2245...

@MichaelChirico
Copy link
Copy Markdown
Member Author

actually I think what makes sense is to remove the stuff in data.table.R to it's own PR. the issues are really separate now & deserve separate PRs

@renkun-ken
Copy link
Copy Markdown
Member

I notice that I raised #2162 long time ago which is a similar but different case. I guess it's impossible to also address that at the moment?

@MichaelChirico
Copy link
Copy Markdown
Member Author

Thanks Kun. Your issue in fact looks the same and should be solved by this PR.

@mattdowle mattdowle merged commit 0f39178 into master Aug 27, 2019
@mattdowle mattdowle deleted the setkey_copies branch August 27, 2019 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect setkey result error using function setkey

4 participants