-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revised Hash functions incorporating changes in the main Stdlib repository. #573
Conversation
Bringing over codes fro original hash_functions branch [ticket: X]
Brought over codes from the original hash_functions branch. [ticket: X]
Brought over code from original hash_functions [ticket: X]
Modified so it doesn't check overflow of integers. [ticket: X]
Added the gcc compiler flag no-range-checks [ticket: X]
Added files with hash function modules to the list of fypp files to be processed. [ticket: X]
Added hash_functions to the test directories [ticket: X]
Added documentation for the hash functions. [ticket: X]
Removed two filesusedd in an older version of hash validation. [ticket: X]
Added generate_key_array.f90 generate_hash_arrays.f90 and hash_validity_test..f90 to implement a new version of the validity test. [ticket: X]
Modified the Makefile so it would compile the new validation. [ticket: X]
Added README>md that is documentation for the code in the hash function validation directory. [ticket: X]
Implemented fixes to various files addressing problems brought up by Jeremie Vandenplas. [ticket: X]
Change a compile flag for gfortran from -fwrapv to -fno-range-check. [ticket: X]
Changed from parentheses to square brackets. [ticket: X]
Note the previous PR is here (with initial discussions): #554 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the previous pull request, this looks very impressive.
Most of my comments are quite minor.
I appreciate that you've added some regression tests. For now they are outside of the regular stdlib test-suite (due to dependency on C/C++), and I think we should discuss with others whether that is the right approach for now, or whether now is the time to start supporting C/C++ dependencies. I don't have a deep enough knowledge on the software engineering implications to make strong statements about this.
Changed three files "CMakeLists.txt", "stdlib_hash_functions.md", and "stdlib_64_bith_hash_functions.fypp"in response to @gareth-nx's comments [ticket: X]
Incorporated more changes in stdlib_hash_functions.md based on comments by @garreth-nx. [ticket: X]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wclodius2 . Here are some comments on the specs.
In general it is an impressive amount of work, with well detailed documentation.
I will now focus on the code
Renamed stdlib_hash_functions.md to stdlib_hash_procedures.md. [ticket: X]
Renamed stdlib_hash_functions.md to stdlib_hash_proceedures.md effectively deleting it. [ticket: X]
Co-authored-by: Jeremie Vandenplas <[email protected]>
Changed documentation in ways suggested by Jeremie's comments. [ticket: X]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some additional comments
src/stdlib_32_bit_nmhashes.fypp
Outdated
!! The algorithms come in multiple versions, depending on whether the | ||
!! vectorized instructions SSE2 or AVX2 are available. As neither instruction | ||
!! is available in portable Fortran 2008, the algorithms that do not use these | ||
!! instructions are used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for further improvements: would it be possible that the compiler select the right code when e.g., mentioning the option for supporting AVX2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure if a preprocessor can inquire of its existence from the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to preprocessing, could be to use a logical parameter and rely upon the compiler's ability to remove dead code (similar to the little-endian parameter). A second option is to pack the different implementations in submodules, which are then substituted at link time based upon a flag to the build system. But personally I'm happy with the portable version.
interface spooky_init | ||
|
||
module subroutine spookysubhash_init( self, seed ) | ||
type(spooky_subhash), intent(out) :: self | ||
integer(int_hash), intent(in) :: seed(2) | ||
end subroutine spookysubhash_init | ||
|
||
end interface spooky_init | ||
|
||
|
||
interface spooky_update | ||
|
||
module subroutine spookyhash_update( spooky, key ) | ||
type(spooky_subhash), intent(out) :: spooky | ||
integer(int8), intent(in) :: key(0:) | ||
end subroutine spookyhash_update | ||
|
||
end interface spooky_update | ||
|
||
|
||
interface spooky_final | ||
|
||
module subroutine spookyhash_final(spooky, hash_code) | ||
type(spooky_subhash), intent(inout) :: spooky | ||
integer(int_hash), intent(inout) :: hash_code(2) | ||
end subroutine spookyhash_final | ||
|
||
end interface spooky_final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am correct, there are private
interfaces. If so, could they be moved to the submodules? Or what is the reasons to keep them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the basis of an incremental hasher, and were originally public.I eventually decided that including the incremental hasher complicated the API in ways that were inappropriate for an initial release, though they could be easily be made public.
I am having some problems with git. I accepted a couple of of suggestions of @jvdp1 to modify the
I didn't find the 'Note about fast-forwards' useful so I tried pulling the remote, and ended up pulling https://github.com/fortran-lang/stdlib git pull origin master
hint: Pulling without specifying how to reconcile divergent branches is
hint: discouraged. You can squelch this message by running one of the following
hint: commands sometime before your next pull:
hint:
hint: git config pull.rebase false # merge (the default strategy)
hint: git config pull.rebase true # rebase
hint: git config pull.ff only # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
remote: Enumerating objects: 34, done.
remote: Counting objects: 100% (34/34), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 34 (delta 16), reused 23 (delta 14), pack-reused 0
Unpacking objects: 100% (34/34), 10.37 KiB | 442.00 KiB/s, done.
From https://github.com/fortran-lang/stdlib
* branch master -> FETCH_HEAD
57cfaf0..3af3259 master -> origin/master
Auto-merging src/Makefile.manual
Auto-merging src/CMakeLists.txt
Auto-merging doc/specs/index.md
Auto-merging Makefile.manual
Auto-merging CMakeLists.txt
Merge made by the 'recursive' strategy.
.github/PULL_REQUEST_TEMPLATE.md | 4 +++
.github/workflows/CI.yml | 4 ++-
API-doc-FORD-file.md | 3 ++
CMakeLists.txt | 12 ++++++--
Makefile.manual | 6 ++++
VERSION | 1 +
ci/fpm-deployment.sh | 6 ++++
ci/fpm.toml | 2 +-
doc/specs/index.md | 1 +
doc/specs/stdlib_version.md | 64 +++++++++++++++++++++++++++++++++++++++++++
src/CMakeLists.txt | 4 +++
src/Makefile.manual | 3 +-
src/common.fypp | 3 ++
src/stdlib_version.fypp | 65 ++++++++++++++++++++++++++++++++++++++++++++
14 files changed, 173 insertions(+), 5 deletions(-)
create mode 100644 .github/PULL_REQUEST_TEMPLATE.md
create mode 100644 VERSION
create mode 100644 doc/specs/stdlib_version.md
create mode 100644 src/stdlib_version.fypp So now my local copy is up to date WRT https://github.com/fortran-lang/stdlib, but not WRT https://github.com/wclodius2/stdlib/tree/hash_functions2. Another
Any suggestions? |
After commiting suggestions on github, you must pull the remote branch into your local one as follows: git pull william hash_functions2 If there are no conflicts, then all changes will be merged in your local branch, and you can then push your branch. |
More for consistency with existing specs, so far we were using always lowercase for procedure names. I don't really mind, but this is something that was noticeable on the first glance. |
I opened a PR against the @wclodius2 branch for modifying all names. hopefully I missed none of them... |
Co-authored-by: Jeremie Vandenplas <[email protected]>
Update the description of testing procedures
Changed capitalized words to lowercase + some typos
I have now merged in
The result passes all checks, and I believe is ready to merge into the stdlib master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, it looks great. I'm happy to get it merged.
A final suggestion, we should add the new modules to the CHANGELOG.md
.
Updated CHANGELOG.md to reflect the new modules stdlib_hash_32bit and stdlib_hash_64bit. [ticket: X]
I believe I have properly added the modules to |
I think this is ready to be merged? Any outstanding issues @milancurcic @ivan-pi ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added the names of the public procedures in the CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no major comments anymore. Thanks for all of your work, especially William.
I think I noticed some small formatting issues in the itemized sections of the markdown document, but would need to render it first. These can be dealt with in a separate PR.
@milancurcic I believe all your comments were answered. However, I am afraid I could miss some. Could you check please, and merge this PR if it it the case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you William and reviewers for this great addition!
Numerous changes were made in the main repository to deal with the lack of double precision in gfortran's implementation for the Mac M1 processor, and a few other issues, meant that the hash functions source code needed to be merged with the main repository. This proved overwhelming so I did a clone of the main repository, copied the changes for the hash function codes into that, made further changes in the code in response to comments on the original PR, and generated a new PR. As with the original
I have created a number of hash functions for incorporation in the Fortran standard library by translating from C or C++ a number of algorithms recommended by Reini Urban who maintains a well respected code, SMHasher, for testing the properties and performance of hash functions. The codes can hash default character strings and integer vectors. The hash functions are: