-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel linalg #67
Comments
@zbeekman you have a lot of experience with co-arrays, is there a way to do this? |
Would this API also include shared-memory parallelization? Especially if it is based on BLAS/LAPACK. |
In the above I was thinking of distributed memory parallelization (MPI, co-arrays, ...). What are the options for shared-memory parallelization in Fortran? I am aware of |
In general, one should be able to implement parallel LA algorithms using coarrays. The coarray implementation may be shared memory, distributed memory, hybrid, etc. the standard doesn't specify. Part of the point of coarrays is to have a simpler API and programming model that can be divorced from the underlying implementation. The trickier question is, perhaps, what should the interface look like? How much ownership and control should the client code have over the objects? Should the user create and pass coarrays? Or should there be a global array view that makes it appear as though you're working with normal arrays? Last I checked there were some non-trivial issues with the coarray specification in the standard that makes them challenging or impossible to use in some applications, especially computations on unstructured meshes and some other graph and graph-like algorithms. I don't recall the details and I believe Salvatore Filippone (PSBLAS author) submitted a proposal to J3 to resolve it, or at least to highlight the issue in the standard. Intel provides a shared-memory coarray implementation on some platforms with some licenses, if I remember correctly. I think without parallel studio cluster edition, the Intel Fortran compiler has a shared memory coarray implementation. If you have the license for cluster edition I think that unlocks the MPI back end (or at least the SDK/compile time stuff). Using coarrays is nice because it abstracts away the backend. OpenCoarrays main backend is MPI, but we have an experimental/partial one based on OpenSHMEM, and at one point in the past we were using GASNet. So I think coarrays are a natural and good choice for parallelism, but a few issues remain:
OpenMP is nice because of its built in conditional compilation and support for GPUs and accelerators. Thread affinity and avoiding other threading issues is certainly tricky, however. |
The book by Numrich - Parallel Programming with Co-arrays discusses an API for both sparse and dense linear algebra using co-arrays. I know that for PSBLAS they recently developed a co-array backend. A recent article discusses the topic (a draft is available somewhere on GitHub). |
If we can use or adopt parts of PSBLAS that would be nice, rather than reinventing the wheel. |
@zbeekman I was lead to believe at the latest J3 meeting that co-arrays can be used today with GFortran, Intel and Cray for anything that MPI can be used, including unstructured meshes (that was my first question to them). But I haven't used co-arrays myself yet. My understanding is also that you can mix and match co-arrays with MPI, is that correct? I would go ahead and try to figure out what the API should look like using co-arrays, and if we like it, we can work towards putting it into stdlib. If we can't agree on a good way due to fundamental limitations of co-arrays, then let's submit proposals to the J3 committee to fix it. I would think exposing co-arrays directly to the user would be the natural way lowest level API, similarly to the serial linalg API that just operates on arrays. Then, we can always see if there is some optional good higher level API, whether object oriented, or some global object (state?), similarly to how there can be an optional OO API on top of the serial linalg. Let's brainstorm this more on some example. @ivan-pi thanks for the pointers --- both links contain very useful info. They have done a lot of thinking about this, so we should see if we can use their API. |
@certik I usually rely on Sparse BLAS for such operations (http://www.netlib.org/utk/people/JackDongarra/etemplates/node381.html), mainly with the MKL version. |
I think it would be a good start. |
I have not found the library anywhere and the book also doesn't offer any link. The book mostly contains only the subroutine prototypes and a description of the variables and some discussion of the API design. |
Yes, this is more or less true. However, I don't remember the particular issue, however I recall that @sfilippone found a subtlety with the standard that caused a large headache/impediment in realizing more complex data structures/machinery needed for unstructured meshes. I cannot immediately recall the details. Maybe the OpenCoarrays repo has issues discussing this or maybe Salvatore can remind me here.
Yes, in theory this should be true. One complication is that if coarrays are implemented via MPI, then the compiler provided Fortran runtime is responsible for initializing MPI. This may not be ideal in certain situations. I think we implemented a configure time option in OpenCoarrays to return the global communicator to the user or delay |
Hi there
Zaak is correct, there is a problem with the standard.
The problem arises as soon as you want to have a coarray component of a
derived type: if you have a component in a derived type, which itself may
be in a derived type, etc. you have a hierarchy of "container" objects
which ultimately includes a coarray.
With the current standard, it is forbidden for any of the containers to be
ALLOCATABLE (whereas the coarray itself is pretty much forced to be
allocatable). This implies that the set of entities that may either be a
coarray or contain a coarray componet has to be fixed at compile time.
I have proposed a change in the standard to lift this restriction; I did
not attend the latest meetings of the committe, but my colleague Damian
Rouson who coauthored the proposal did attend, and as far as I understand
the proposed change was approved. How long until it is supported in
compilers, I have no idea.
Hope this helps
Salvatore
…On Mon, Jan 6, 2020 at 9:40 PM zbeekman ***@***.***> wrote:
@zbeekman <https://github.com/zbeekman> I was lead to believe at the
latest J3 meeting that co-arrays can be used today with GFortran, Intel and
Cray for anything that MPI can be used, including unstructured meshes (that
was my first question to them). But I haven't used co-arrays myself yet.
Yes, this is more or less true. However, I don't remember the particular
issue, however I recall that @sfilippone <https://github.com/sfilippone>
found a subtlety with the standard that caused a large headache/impediment
in realizing more complex data structures/machinery needed for unstructured
meshes. I cannot immediately recall the details. Maybe the OpenCoarrays
repo has issues discussing this or maybe Salvatore can remind me here.
My understanding is also that you can mix and match co-arrays with MPI, is
that correct?
Yes, in theory this should be true. One complication is that if coarrays
are implemented via MPI, then the compiler provided Fortran runtime is
responsible for initializing MPI. This may not be ideal in certain
situations. I think we implemented a configure time option in OpenCoarrays
to return the global communicator to the user or delay MPI_init() and let
the user call it. I'd have to double check.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#67?email_source=notifications&email_token=AD274T6G5BMYJDDJ2XAOAN3Q4OQN5A5CNFSM4KCE7XVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIG4CQY#issuecomment-571326787>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD274T3GV3PU6RFR63ANHD3Q4OQN5ANCNFSM4KCE7XVA>
.
|
The modern Fortran API for a serial linear algebra (#10) seems natural.
How would that be extended to work in parallel using co-arrays? If there is a similar "natural" parallel API for linear algebra using modern Fortran, then that would be a good candidate for inclusion into stdlib, and we can have different backends that do the work (Scalapack, ..., perhaps even our own simpler reference implementation using co-arrays directly), that way if somebody writes a faster 3rd party library, then it could be plugged in as a backend, and user codes do not need to change, because they would already be using the stdlib API for parallel linear algebra.
The text was updated successfully, but these errors were encountered: