Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erratic failure of in-tree build for stdlib_bitset_64 #383

Closed
awvwgk opened this issue Apr 10, 2021 · 5 comments · Fixed by #388
Closed

Erratic failure of in-tree build for stdlib_bitset_64 #383

awvwgk opened this issue Apr 10, 2021 · 5 comments · Fixed by #388
Labels
bug Something isn't working

Comments

@awvwgk
Copy link
Member

awvwgk commented Apr 10, 2021

This test seem to fail erratically for the in-tree build in the GCC 10.2 build on Ubuntu 20.04 from time to time:

  5/31 Test  #4: stdlib_bitset_64 .................***Failed    0.01 sec

Test string operations: from_string, read_bitset, to_string, and write_bitset
 from_string transferred bitstring_0 properly into set0
 from_string transferred bitstring_all properly into set1
 read_bitset_string failed with bitstring_0 as expected.
 read_bitset_string transferred "s33b" // bitstring_0 properly into set3
 read_bitset_string transferred "s33b" // bitstring_all properly into set4.
 to_string properly converted the set0 value
 to_string properly converted the set1 value
 write_bitset_string properly converted the set0 value
 write_bitset_string properly converted the set1 value

 Test bitset I/O: input, read_bitset, output, and write_bitset
 Transfer to and from units using plain write_bitset_unit and read_bitset_unit succeeded.
 Transfer to and from units using write_bitset_unit and read_bitset_unit with advance=="no" succeeded.
 Transfer to and from units using output and input succeeded.
ERROR STOP TEST_IO transfer to and from units using  stream output and input failed.

Error termination. Backtrace:
#0  0x7fd60cd20d3a
#1  0x7fd60cd21849
#2  0x7fd60cd22f77
#3  0x560e72feed04
#4  0x560e72feb522
#5  0x560e72fef8b3
#6  0x7fd60cb350b2
#7  0x560e72fea1fd
#8  0xffffffffffffffff
@awvwgk awvwgk added the bug Something isn't working label Apr 10, 2021
@Romendakil
Copy link

If that is a test doing writing and reading of stuff this could be an occasional race condition where the test suite tries to access the same object/file at the same time and hiccups.

@awvwgk
Copy link
Member Author

awvwgk commented Apr 10, 2021

Quite possible, it seems to fail in an IO operation here.

@wclodius2
Copy link
Contributor

As the creator of the bitsets modules, it is my responsibility to fix this. FWIW I have gfortran 10.2 on my Mac and have not had this problem in my testing, but I do not test as regularly as the test harness. A few questions:

  1. Does it always fail with the message: "ERROR STOP TEST_IO transfer to and from units using stream output and input failed." or does it sometimes fail with another message?
  2. Do any of the tests use a recent version of ifort or its successor One API? If they do, do any of these also result in a failure?

I have been busy with other things, but have noticed comments on other aspects of the bitsets modules, and might as well address those while I address this. First there seemed to be a desire for user defined derived type I/O. What features did users want in UDDTIO? Second users seemed to want longer names for some of the derived types, appending either _t or _type. Was there a consensus on what types should have the longer names and what suffix should be used? Are there any other issues with the modules?

@awvwgk
Copy link
Member Author

awvwgk commented Apr 11, 2021

So far the failure seems to be limited to the in-tree build with CMake, which we are only doing for GCC 10.2 on Ubuntu 20.04 at the moment, the out-of-tree builds seem to be fine?

The failure does not always happen at the same place, here is another log:
https://github.com/fortran-lang/stdlib/runs/2315519471#step:13:7658

  Test bitset I/O: input, read_bitset, output, and write_bitset
 Transfer to and from units using plain write_bitset_unit and read_bitset_unit succeeded.
 Transfer to and from units using write_bitset_unit and read_bitset_unit with advance=="no" succeeded.
 Transfer to and from units using output and input succeeded.
STDLIB_BITSETS % INPUT: Failure on a READ statement for UNIT.
ERROR STOP A failure occurred in a READ statement.

It's somewhat hard to pin down why it is failing, due to its erratic nature.

@wclodius2
Copy link
Contributor

I think I know what is the problem. The two test codes, test_stdlib_bitset_64.f90 and test_stdlib_bitset_large.f90, were derived from one another using cut and paste, so they sometimes use the same filenames, are in the same directory, and may be run simultaneously. As a result they may attempt to access the same file at the same time. In particular the two reported errors occur while test_stdlib_bitset_64.f90 tries to access test.bin which has the same name in both codes. This should be fixed. However, if this is the problem, I am slightly surprised that the problem is found only in test_stdlib_bitset_64.f90 and not also in test_stdlib_bitset_large.f90. There is a slight chance that it could be due to reading from a file immediately after it has been written and closed, in which case some FLUSH statements may be necessary. However the FLUSH statements will clutter the code and make it harder to read, so I am reluctant to try this fix unless the other fix is unsuccessful.

@LKedward LKedward linked a pull request Apr 17, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants