Skip to content

use lapply in subsetBy #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 23, 2015
Merged

use lapply in subsetBy #63

merged 1 commit into from
Sep 23, 2015

Conversation

sgibb
Copy link
Collaborator

@sgibb sgibb commented Sep 23, 2015

Dear Laurent,

this PR replace the for loop in subsetBy (that contains a growing vector) with a lapply based solution.
This results in a slightly faster subsetting:

library("git2r")
library("devtools")

repo <- repository("MSnbase")

checkout(repo, branch="master")
load_all("MSnbase")
load_all("synapter")

load(file.path(system.file("data", package="synapterdata"), "ups25a.rda"))
ms25a <- as(ups25a, "MSnSet")

system.time({ma <- topN(ms25a, groupBy = fData(ms25a)$protein.Accession, n = 3) })
#   user  system elapsed
#  2.064   0.000   2.066

checkout(repo, branch="lapply")
load_all("MSnbase")
system.time({la <- topN(ms25a, groupBy = fData(ms25a)$protein.Accession, n = 3) })
#    user  system elapsed
#   1.200   0.008   1.207

checkout(repo, branch="split")
load_all("MSnbase")
system.time({sp <- topN(ms25a, groupBy = fData(ms25a)$protein.Accession, n = 3) })
#    user  system elapsed
#   0.692   0.004   0.695

This approach doesn't change anything (all topN tests pass) and has (hopefully) no negative side effects.

As you could see there is another branch split https://github.com/lgatto/MSnbase/blob/split/R/utils.R#L324-L338 that uses a split approach that is even faster.
The split approach returns its results in a different order (order of the levels of group) in contrast to the original one and to the lapply approach (order of appearance in group). That's why some unit tests fail and why I don't offer the split solution as PR. I think I will need your help to ensure everything is correct in the split branch.

Verified

This commit was signed with the committer’s verified signature.
spencerschrock Spencer Schrock
@lgatto
Copy link
Owner

lgatto commented Sep 23, 2015

Great, thanks!
I don't see a reason why split wouldn't be adequate - there is no specification that states the order of the output. Would it be possible to explicitly set the levels to reflect their order of appearance?

lgatto pushed a commit that referenced this pull request Sep 23, 2015

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
use lapply in subsetBy
@lgatto lgatto merged commit 18ab75e into master Sep 23, 2015
@sgibb sgibb deleted the lapply branch September 24, 2015 16:48
@sgibb sgibb mentioned this pull request Sep 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants