Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctest reports Failed for tests that actually pass #461

Closed
milancurcic opened this issue Nov 5, 2017 · 5 comments
Closed

ctest reports Failed for tests that actually pass #461

milancurcic opened this issue Nov 5, 2017 · 5 comments

Comments

@milancurcic
Copy link

  • OpenCoarrays Version: 1.9.2
  • Fortran Compiler: gfortran-7.2.1
  • C compiler used for building lib: gcc-7.2.1
  • Installation method: From source
  • Output of uname -a: x86_64 GNU/Linux
  • MPI library being used: openmpi-2.0.2-2
  • Machine architecture and number of physical cores: Single VCPU cloud instance of E5-2650 v4
  • Version of CMake: 3.9.1

Observed Behavior

ctest reports:

      Start  1: initialize_mpi
 1/47 Test  #1: initialize_mpi ...................   Passed    0.58 sec
      Start  2: register
 2/47 Test  #2: register .........................   Passed    0.60 sec
      Start  3: register_vector
 3/47 Test  #3: register_vector ..................   Passed    0.61 sec
      Start  4: register_alloc_vector
 4/47 Test  #4: register_alloc_vector ............   Passed    0.64 sec
      Start  5: allocate_as_barrier
 5/47 Test  #5: allocate_as_barrier ..............   Passed    1.66 sec
      Start  6: allocate_as_barrier_proc
 6/47 Test  #6: allocate_as_barrier_proc .........   Passed    1.63 sec
      Start  7: register_alloc_comp_1
 7/47 Test  #7: register_alloc_comp_1 ............   Passed    0.64 sec
      Start  8: register_alloc_comp_2
 8/47 Test  #8: register_alloc_comp_2 ............   Passed    0.64 sec
      Start  9: register_alloc_comp_3
 9/47 Test  #9: register_alloc_comp_3 ............   Passed    0.66 sec
      Start 10: async_comp_alloc
10/47 Test #10: async_comp_alloc .................   Passed    0.68 sec
      Start 11: async_comp_alloc_2
11/47 Test #11: async_comp_alloc_2 ...............   Passed    0.67 sec
      Start 12: comp_allocated_1
12/47 Test #12: comp_allocated_1 .................   Passed    0.72 sec
      Start 13: comp_allocated_2
13/47 Test #13: comp_allocated_2 .................   Passed    0.73 sec
      Start 14: get_array
14/47 Test #14: get_array ........................   Passed  661.22 sec
      Start 15: get_self
15/47 Test #15: get_self .........................   Passed    0.76 sec
      Start 16: send_array
16/47 Test #16: send_array .......................   Passed  661.50 sec
      Start 17: get_with_offset_1d
17/47 Test #17: get_with_offset_1d ...............   Passed    0.65 sec
      Start 18: whole_get_array
18/47 Test #18: whole_get_array ..................   Passed    0.74 sec
      Start 19: strided_get
19/47 Test #19: strided_get ......................   Passed    0.63 sec
      Start 20: strided_sendget
20/47 Test #20: strided_sendget ..................   Passed    0.69 sec
      Start 21: co_sum
21/47 Test #21: co_sum ...........................   Passed    0.60 sec
      Start 22: co_broadcast
22/47 Test #22: co_broadcast .....................   Passed    0.73 sec
      Start 23: co_min
23/47 Test #23: co_min ...........................   Passed    0.60 sec
      Start 24: co_max
24/47 Test #24: co_max ...........................   Passed    0.60 sec
      Start 25: syncall
25/47 Test #25: syncall ..........................   Passed    1.62 sec
      Start 26: syncimages
26/47 Test #26: syncimages .......................   Passed    0.68 sec
      Start 27: syncimages2
27/47 Test #27: syncimages2 ......................   Passed    0.59 sec
      Start 28: duplicate_syncimages
28/47 Test #28: duplicate_syncimages .............   Passed    0.57 sec
      Start 29: co_reduce
29/47 Test #29: co_reduce ........................   Passed    0.59 sec
      Start 30: co_reduce_res_im
30/47 Test #30: co_reduce_res_im .................   Passed    0.64 sec
      Start 31: co_reduce_string
31/47 Test #31: co_reduce_string .................   Passed    0.64 sec
      Start 32: syncimages_status
32/47 Test #32: syncimages_status ................   Passed    0.58 sec
      Start 33: sync_ring_abort_np3
33/47 Test #33: sync_ring_abort_np3 ..............***Failed  Required regular expression not found.Regex=[Test passed.
]  0.57 sec
      Start 34: sync_ring_abort_np7
34/47 Test #34: sync_ring_abort_np7 ..............***Failed  Required regular expression not found.Regex=[Test passed.
]  0.56 sec
      Start 35: simpleatomics
35/47 Test #35: simpleatomics ....................   Passed    0.63 sec
      Start 36: hello_multiverse
36/47 Test #36: hello_multiverse .................   Passed    0.62 sec
      Start 37: coarray_burgers_pde
37/47 Test #37: coarray_burgers_pde ..............   Passed   62.80 sec
      Start 38: co_heat
38/47 Test #38: co_heat ..........................   Passed  239.46 sec
      Start 39: asynchronous_hello_world
39/47 Test #39: asynchronous_hello_world .........   Passed    0.77 sec
      Start 40: source-alloc-no-sync
40/47 Test #40: source-alloc-no-sync .............   Passed    0.62 sec
      Start 41: allocatable_p2p_event_post
41/47 Test #41: allocatable_p2p_event_post .......***Failed  Required regular expression not found.Regex=[Test passed.
]  0.54 sec
      Start 42: static_event_post_issue_293
42/47 Test #42: static_event_post_issue_293 ......***Failed  Required regular expression not found.Regex=[Test passed.
]  0.59 sec
      Start 43: co_reduce-factorial
43/47 Test #43: co_reduce-factorial ..............   Passed    0.63 sec
      Start 44: co_reduce-factorial-int8
44/47 Test #44: co_reduce-factorial-int8 .........   Passed    0.62 sec
      Start 45: co_reduce-factorial-int64
45/47 Test #45: co_reduce-factorial-int64 ........   Passed    0.63 sec
      Start 46: image_status_test_1
46/47 Test #46: image_status_test_1 ..............   Passed    0.58 sec
      Start 47: test-installation-scripts.sh
47/47 Test #47: test-installation-scripts.sh .....   Passed    0.35 sec

91% tests passed, 4 tests failed out of 47

Total Test time (real) = 1655.12 sec

The following tests FAILED:
         33 - sync_ring_abort_np3 (Failed)
         34 - sync_ring_abort_np7 (Failed)
         41 - allocatable_p2p_event_post (Failed)
         42 - static_event_post_issue_293 (Failed)
Errors while running CTest

Another potential issue is the run-time of tests 14 and 16 (11 minutes). I am not sure whether this is expected or not. Ideally unit tests should each be short (O(seconds)).

Expected Behavior

ctest should report that all tests succeeded, because tests 33, 34, 41, 42 actually pass when executed directly with cafrun.

Steps to Reproduce

See above.

@zbeekman
Copy link
Collaborator

zbeekman commented Nov 6, 2017

tests 14, 16 and 38 are all taking a MIGHTY long time. Is this on AWS? This is definitely indicative of something fishy.

Can you do me a favor and run the tests wither with make check or ctest --output-on-failure so I can see why CTest things the tests are failing?

Any chance you might be able to try with MPICH or open-mpi 2.1.x? (or 3.0.0, but I've done no testing of this yet).

@milancurcic
Copy link
Author

milancurcic commented Nov 6, 2017

Hi Zaak,

This is on DigitalOcean.

  1. With mpich-2.3.8, all tests pass. get_array and sent_array still take as long as with openmpi, so this is likely not related to the MPI implementation.

  2. I ran ctest --output-on-failure with the openmpi build (some output from ctest and individual program is mixed up):

      Start  1: initialize_mpi
 1/47 Test  #1: initialize_mpi ...................   Passed    0.72 sec
      Start  2: register
 2/47 Test  #2: register .........................   Passed    0.73 sec
      Start  3: register_vector
 3/47 Test  #3: register_vector ..................   Passed    0.73 sec
      Start  4: register_alloc_vector
 4/47 Test  #4: register_alloc_vector ............   Passed    0.78 sec
      Start  5: allocate_as_barrier
 5/47 Test  #5: allocate_as_barrier ..............   Passed    1.78 sec
      Start  6: allocate_as_barrier_proc
 6/47 Test  #6: allocate_as_barrier_proc .........   Passed    1.74 sec
      Start  7: register_alloc_comp_1
 7/47 Test  #7: register_alloc_comp_1 ............   Passed    0.74 sec
      Start  8: register_alloc_comp_2
 8/47 Test  #8: register_alloc_comp_2 ............   Passed    0.77 sec
      Start  9: register_alloc_comp_3
 9/47 Test  #9: register_alloc_comp_3 ............   Passed    0.78 sec
      Start 10: async_comp_alloc
10/47 Test #10: async_comp_alloc .................   Passed    0.79 sec
      Start 11: async_comp_alloc_2
11/47 Test #11: async_comp_alloc_2 ...............   Passed    0.77 sec
      Start 12: comp_allocated_1
12/47 Test #12: comp_allocated_1 .................   Passed    0.83 sec
      Start 13: comp_allocated_2
13/47 Test #13: comp_allocated_2 .................   Passed    0.85 sec
      Start 14: get_array
14/47 Test #14: get_array ........................   Passed  668.07 sec
      Start 15: get_self
15/47 Test #15: get_self .........................   Passed    0.92 sec
      Start 16: send_array
16/47 Test #16: send_array .......................   Passed  668.81 sec
      Start 17: get_with_offset_1d
17/47 Test #17: get_with_offset_1d ...............   Passed    0.76 sec
      Start 18: whole_get_array
18/47 Test #18: whole_get_array ..................   Passed    0.89 sec
      Start 19: strided_get
19/47 Test #19: strided_get ......................   Passed    0.74 sec
      Start 20: strided_sendget
20/47 Test #20: strided_sendget ..................   Passed    0.83 sec
      Start 21: co_sum
21/47 Test #21: co_sum ...........................   Passed    0.72 sec
      Start 22: co_broadcast
22/47 Test #22: co_broadcast .....................   Passed    0.86 sec
      Start 23: co_min
23/47 Test #23: co_min ...........................   Passed    0.72 sec
      Start 24: co_max
24/47 Test #24: co_max ...........................   Passed    0.70 sec
      Start 25: syncall
25/47 Test #25: syncall ..........................   Passed    1.71 sec
      Start 26: syncimages
26/47 Test #26: syncimages .......................   Passed    0.78 sec
      Start 27: syncimages2
27/47 Test #27: syncimages2 ......................   Passed    0.71 sec
      Start 28: duplicate_syncimages
28/47 Test #28: duplicate_syncimages .............   Passed    0.71 sec
      Start 29: co_reduce
29/47 Test #29: co_reduce ........................   Passed    0.70 sec
      Start 30: co_reduce_res_im
30/47 Test #30: co_reduce_res_im .................   Passed    0.72 sec
      Start 31: co_reduce_string
31/47 Test #31: co_reduce_string .................   Passed    0.72 sec
      Start 32: syncimages_status
32/47 Test #32: syncimages_status ................   Passed    0.67 sec
      Start 33: sync_ring_abort_np3
33/47 Test #33: sync_ring_abort_np3 ..............***Failed  Required regular expression not found.Regex=[Test passed.
]  0.65 sec
ERROR STOP Error: stat_var /= STAT_STOPPED_IMAGE:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

      Start 34: sync_ring_abort_np7
34/47 Test #34: sync_ring_abort_np7 ..............***Failed  Required regular expression not found.Regex=[Test passed.
]  0.66 sec
ERROR STOP Error: stat_var /= STAT_STOPPED_IMAGE:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

      Start 34: sync_ring_abort_np7
34/47 Test #34: sync_ring_abort_np7 ..............***Failed  Required regular expression not found.Regex=[Test passed.
]  0.66 sec
ERROR STOP Error: stat_var /= STAT_STOPPED_IMAGE:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

      Start 35: simpleatomics
35/47 Test #35: simpleatomics ....................   Passed    0.74 sec
      Start 36: hello_multiverse
36/47 Test #36: hello_multiverse .................   Passed    0.76 sec
      Start 37: coarray_burgers_pde
37/47 Test #37: coarray_burgers_pde ..............   Passed   63.67 sec
      Start 38: co_heat
38/47 Test #38: co_heat ..........................   Passed  242.52 sec
      Start 39: asynchronous_hello_world
39/47 Test #39: asynchronous_hello_world .........   Passed    0.85 sec
      Start 40: source-alloc-no-sync
40/47 Test #40: source-alloc-no-sync .............   Passed    0.72 sec
      Start 41: allocatable_p2p_event_post
41/47 Test #41: allocatable_p2p_event_post .......***Failed  Required regular expression not found.Regex=[Test passed.
]  0.65 sec
ERROR STOP num_images() >= 4 reERROR STOP quired for evnenu_mp_oismta_g1e st(e)s t>
= 4 required for even_post_1 test
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[fortran-in-action.localdomain:03153] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[fortran-in-action.localdomain:03153] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

      Start 42: static_event_post_issue_293
42/47 Test #42: static_event_post_issue_293 ......***Failed  Required regular expression not found.Regex=[Test passed.
]  0.67 sec
ERROR STOP exposing issue 293 requires num_images() >= 3
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
ERROR STOP exposing issue 293 requires num_images() >= 3
[fortran-in-action.localdomain:03165] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[fortran-in-action.localdomain:03165] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

      Start 43: co_reduce-factorial
43/47 Test #43: co_reduce-factorial ..............   Passed    0.71 sec
      Start 44: co_reduce-factorial-int8
44/47 Test #44: co_reduce-factorial-int8 .........   Passed    0.73 sec
      Start 45: co_reduce-factorial-int64
45/47 Test #45: co_reduce-factorial-int64 ........   Passed    0.72 sec
      Start 46: image_status_test_1
46/47 Test #46: image_status_test_1 ..............   Passed    0.67 sec
      Start 47: test-installation-scripts.sh
47/47 Test #47: test-installation-scripts.sh .....   Passed    1.46 sec

91% tests passed, 4 tests failed out of 47

Total Test time (real) = 1678.99 sec

The following tests FAILED:
         33 - sync_ring_abort_np3 (Failed)
         34 - sync_ring_abort_np7 (Failed)
         41 - allocatable_p2p_event_post (Failed)
         42 - static_event_post_issue_293 (Failed)
Errors while running CTest

It looks like for tests 33 and 34, the error message is: ERROR STOP Error: stat_var /= STAT_STOPPED_IMAGE.

For tests 41 and 42, it looks like they are not being run with sufficient num_images(). However I can go in and manually execute these tests with the correct number of images.

@zbeekman
Copy link
Collaborator

zbeekman commented Nov 10, 2017 via email

@zbeekman
Copy link
Collaborator

This is really #267 coming back to bite us. I have, more or less, a fix worked up.

@zbeekman
Copy link
Collaborator

@milancurcic Any chance you can test my fix from the master branch on DigitalOcean? Or do you need to wait for the next release (which should be soon)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants