-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctest reports Failed for tests that actually pass #461
Comments
tests 14, 16 and 38 are all taking a MIGHTY long time. Is this on AWS? This is definitely indicative of something fishy. Can you do me a favor and run the tests wither with Any chance you might be able to try with MPICH or open-mpi 2.1.x? (or 3.0.0, but I've done no testing of this yet). |
Hi Zaak, This is on DigitalOcean.
It looks like for tests 33 and 34, the error message is: For tests 41 and 42, it looks like they are not being run with sufficient |
This is because the tests seem to be run with too few images. I️ think this
is due to some logic in the top level cmake lists
…On Mon, Nov 6, 2017 at 1:43 PM Milan Curcic ***@***.***> wrote:
Hi Zaak,
This is on DigitalOcean.
1.
With mpich-2.3.8, all tests pass. get_array and sent_array still take
as long as with openmpi, so this is likely not related to the MPI
implementation.
2.
I ran ctest --output-on-failure with the openmpi build (some output
from ctest and individual program is mixed up):
Start 1: initialize_mpi
1/47 Test #1: initialize_mpi ................... Passed 0.72 sec
Start 2: register
2/47 Test #2: register ......................... Passed 0.73 sec
Start 3: register_vector
3/47 Test #3: register_vector .................. Passed 0.73 sec
Start 4: register_alloc_vector
4/47 Test #4: register_alloc_vector ............ Passed 0.78 sec
Start 5: allocate_as_barrier
5/47 Test #5: allocate_as_barrier .............. Passed 1.78 sec
Start 6: allocate_as_barrier_proc
6/47 Test #6: allocate_as_barrier_proc ......... Passed 1.74 sec
Start 7: register_alloc_comp_1
7/47 Test #7: register_alloc_comp_1 ............ Passed 0.74 sec
Start 8: register_alloc_comp_2
8/47 Test #8: register_alloc_comp_2 ............ Passed 0.77 sec
Start 9: register_alloc_comp_3
9/47 Test #9: register_alloc_comp_3 ............ Passed 0.78 sec
Start 10: async_comp_alloc
10/47 Test #10: async_comp_alloc ................. Passed 0.79 sec
Start 11: async_comp_alloc_2
11/47 Test #11: async_comp_alloc_2 ............... Passed 0.77 sec
Start 12: comp_allocated_1
12/47 Test #12: comp_allocated_1 ................. Passed 0.83 sec
Start 13: comp_allocated_2
13/47 Test #13: comp_allocated_2 ................. Passed 0.85 sec
Start 14: get_array
14/47 Test #14: get_array ........................ Passed 668.07 sec
Start 15: get_self
15/47 Test #15: get_self ......................... Passed 0.92 sec
Start 16: send_array
16/47 Test #16: send_array ....................... Passed 668.81 sec
Start 17: get_with_offset_1d
17/47 Test #17: get_with_offset_1d ............... Passed 0.76 sec
Start 18: whole_get_array
18/47 Test #18: whole_get_array .................. Passed 0.89 sec
Start 19: strided_get
19/47 Test #19: strided_get ...................... Passed 0.74 sec
Start 20: strided_sendget
20/47 Test #20: strided_sendget .................. Passed 0.83 sec
Start 21: co_sum
21/47 Test #21: co_sum ........................... Passed 0.72 sec
Start 22: co_broadcast
22/47 Test #22: co_broadcast ..................... Passed 0.86 sec
Start 23: co_min
23/47 Test #23: co_min ........................... Passed 0.72 sec
Start 24: co_max
24/47 Test #24: co_max ........................... Passed 0.70 sec
Start 25: syncall
25/47 Test #25: syncall .......................... Passed 1.71 sec
Start 26: syncimages
26/47 Test #26: syncimages ....................... Passed 0.78 sec
Start 27: syncimages2
27/47 Test #27: syncimages2 ...................... Passed 0.71 sec
Start 28: duplicate_syncimages
28/47 Test #28: duplicate_syncimages ............. Passed 0.71 sec
Start 29: co_reduce
29/47 Test #29: co_reduce ........................ Passed 0.70 sec
Start 30: co_reduce_res_im
30/47 Test #30: co_reduce_res_im ................. Passed 0.72 sec
Start 31: co_reduce_string
31/47 Test #31: co_reduce_string ................. Passed 0.72 sec
Start 32: syncimages_status
32/47 Test #32: syncimages_status ................ Passed 0.67 sec
Start 33: sync_ring_abort_np3
33/47 Test #33: sync_ring_abort_np3 ..............***Failed Required regular expression not found.Regex=[Test passed.
] 0.65 sec
ERROR STOP Error: stat_var /= STAT_STOPPED_IMAGE:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Start 34: sync_ring_abort_np7
34/47 Test #34: sync_ring_abort_np7 ..............***Failed Required regular expression not found.Regex=[Test passed.
] 0.66 sec
ERROR STOP Error: stat_var /= STAT_STOPPED_IMAGE:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Start 34: sync_ring_abort_np7
34/47 Test #34: sync_ring_abort_np7 ..............***Failed Required regular expression not found.Regex=[Test passed.
] 0.66 sec
ERROR STOP Error: stat_var /= STAT_STOPPED_IMAGE:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Start 35: simpleatomics
35/47 Test #35: simpleatomics .................... Passed 0.74 sec
Start 36: hello_multiverse
36/47 Test #36: hello_multiverse ................. Passed 0.76 sec
Start 37: coarray_burgers_pde
37/47 Test #37: coarray_burgers_pde .............. Passed 63.67 sec
Start 38: co_heat
38/47 Test #38: co_heat .......................... Passed 242.52 sec
Start 39: asynchronous_hello_world
39/47 Test #39: asynchronous_hello_world ......... Passed 0.85 sec
Start 40: source-alloc-no-sync
40/47 Test #40: source-alloc-no-sync ............. Passed 0.72 sec
Start 41: allocatable_p2p_event_post
41/47 Test #41: allocatable_p2p_event_post .......***Failed Required regular expression not found.Regex=[Test passed.
] 0.65 sec
ERROR STOP num_images() >= 4 reERROR STOP quired for evnenu_mp_oismta_g1e st(e)s t>
= 4 required for even_post_1 test
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[fortran-in-action.localdomain:03153] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[fortran-in-action.localdomain:03153] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Start 42: static_event_post_issue_293
42/47 Test #42: static_event_post_issue_293 ......***Failed Required regular expression not found.Regex=[Test passed.
] 0.67 sec
ERROR STOP exposing issue 293 requires num_images() >= 3
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
ERROR STOP exposing issue 293 requires num_images() >= 3
[fortran-in-action.localdomain:03165] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[fortran-in-action.localdomain:03165] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Start 43: co_reduce-factorial
43/47 Test #43: co_reduce-factorial .............. Passed 0.71 sec
Start 44: co_reduce-factorial-int8
44/47 Test #44: co_reduce-factorial-int8 ......... Passed 0.73 sec
Start 45: co_reduce-factorial-int64
45/47 Test #45: co_reduce-factorial-int64 ........ Passed 0.72 sec
Start 46: image_status_test_1
46/47 Test #46: image_status_test_1 .............. Passed 0.67 sec
Start 47: test-installation-scripts.sh
47/47 Test #47: test-installation-scripts.sh ..... Passed 1.46 sec
91% tests passed, 4 tests failed out of 47
Total Test time (real) = 1678.99 sec
It looks like for tests 33 and 34, the error message is: ERROR STOP
Error: stat_var /= STAT_STOPPED_IMAGE.
For tests 41 and 42, it looks like they are not being run with sufficient
num_images(). However I can go in and manually execute these tests with
the correct number of images.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#461 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAREPFzTqAXJEBGZQQe1QHXDLnP8ZIxdks5sz1NVgaJpZM4QSctx>
.
|
This is really #267 coming back to bite us. I have, more or less, a fix worked up. |
@milancurcic Any chance you can test my fix from the master branch on DigitalOcean? Or do you need to wait for the next release (which should be soon) |
uname -a
: x86_64 GNU/LinuxObserved Behavior
ctest reports:
Another potential issue is the run-time of tests 14 and 16 (11 minutes). I am not sure whether this is expected or not. Ideally unit tests should each be short (O(seconds)).
Expected Behavior
ctest should report that all tests succeeded, because tests 33, 34, 41, 42 actually pass when executed directly with cafrun.
Steps to Reproduce
See above.
The text was updated successfully, but these errors were encountered: