-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CLI scorer params #366
Conversation
using index_scorer<Wand>::index_scorer; | ||
|
||
static float doc_term_weight(uint64_t freq, float norm_len) | ||
bm25(const Wand& wdata, const float b, const float k1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering, maybe we can at least put a link reference to some description of BM25 for explanation of b
and k1
. I imagine for someone just coming to see the code, it's pretty confusing.
@@ -16,13 +16,13 @@ template <typename Wand> | |||
struct pl2: public index_scorer<Wand> { | |||
using index_scorer<Wand>::index_scorer; | |||
|
|||
static constexpr float c = 1; | |||
pl2(const Wand& wdata, const float c) : index_scorer<Wand>(wdata), m_c(c) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same idea as for BM25 only for c
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some verbose descriptions, maybe we can clean them up.
include/pisa/scorer/scorer.hpp
Outdated
@@ -11,26 +11,37 @@ | |||
#include "quantized.hpp" | |||
#include "spdlog/spdlog.h" | |||
|
|||
struct ScorerParams { | |||
ScorerParams(std::string s_name = "default") : name(s_name) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments:
- There shouldn't be a default,
"default"
doesn't mean anything does it? I imagine you did that so you can initialize it in CLI with something. I suggest just initialize it toScoreParams("")
there and leave this to require a name. - The constructor should be
explicit
. - You should move the contents of
s_name
:name(std::move(s_name))
. - Maybe let's expand the name of the argument to
scorer_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks. Why do we need the std::move
in this case? Is there some 'best practice' here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short answer is that you avoid unnecessary copy.
Your constructor gets s_name
as value, so it will be destructed after the constructor goes out of scope. If you don't move, then an additional copy will be made for name
constructor. std::move
effectively moves out any allocated memory to an rvalue reference, while leaving s_name
in undetermined but valid "empty" state.
It's important to note that small strings will be stored on stack because of small string optimization, so chances are it will be byte-copied anyway given our short names, but if that happens, std::move
won't have any disadvantages over a copy.
Another point is that alternatively, we could simply pass by reference to the constructor:
ScorerParams(std::string const& s_name) : name(s_name) {}
This will avoid making an additional copy as well. However, in this case, we'll always be making one copy, while pass-by-value can avoid making any if the argument is an rvalue itself (like a temporary):
ScorerParams params("bm25");
// or
ScorerParams params(std::move(another_variable));
Again, in our case this whole thing is not very important simply because any overhead is negligible here. But as you suggested, this is kind of a best practice pattern when holding member values of non-trivially-copyable types.
Also, I believe clang-tidy will complain :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have just that one cosmetic comment about the documentation.
include/pisa/scorer/bm25.hpp
Outdated
// Implements the Okapi BM25 model. k1 and b are both free parameters which | ||
// alter the weight given to different aspects of the calculation. | ||
// We adopt the defaults recommended by the following resource - A. Trotman, | ||
// X-F. Jia, and M. Crane: "Towards an Efficient and Effective Search Engine," | ||
// in Proceedings of the SIGIR 2012 Workshop on Open Source Information | ||
// Retrieval (OSIR), 2012. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use triple slash ///
to indicate it's a piece of documentation, and remove the empty line between the comment and the struct.
The reason is that if we ever decide to use a API doc generator, such as doxygen, it will be automatically usable.
include/pisa/scorer/dph.hpp
Outdated
// Implements the DPH model. This model is parameter free. | ||
// See the following resource for further information - G. Amati, E. Ambrosi, | ||
// M Bianchi, C Gaibisso, and G Gambosi: "FUB, IASI-CNR and University of Tor | ||
// Vergata at TREC 2007 Blog Track," in Proceedings of the 16th Text REtrieval | ||
// Conference (TREC), 2007. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as bm25.
include/pisa/scorer/pl2.hpp
Outdated
// Implements the PL2 model. c is a free parameter. | ||
// See the following resource for further information - G. Amati: "Probabalistic | ||
// models for information retrieval based on divergence from randomness." PhD | ||
// Thesis, University of Glasgow, 2003. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as bm25.
include/pisa/scorer/qld.hpp
Outdated
// Implements the Query Liklihood model with Dirichlet smoothing. | ||
// This model has a smoothing parameter, mu. | ||
// See the following resources for further information - J. M. Ponte, and | ||
// W. B. Croft: "A Language Modeling Approach to Information Retrieval," in | ||
// Proceedings of SIGIR, 1998. | ||
// Also see: C. Zhai and J. Lafferty: "A Study of Smoothing Methods for Language | ||
// Models Applied to Ad Hoc Information Retrieval," in Proceedings of SIGIR, | ||
// 2001. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as bm25.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic 🎉
Adds command line options for scorers.