Skip to content

Commit 2ee46a3

Browse files
committed
Merge commit 'cfa3db3f' into amd-main
* commit 'cfa3db3f': Fixed bug in mixed-dt gemm introduced in e9da642. Removed support for 3m, 4m induced methods. Updated do_sde.sh to get SDE from GitHub. Disable SDE testing of old AMD microarchitectures. Fixed substitution bug in configure. Allow use of 1m with mixing of row/col-pref ukrs. AMD-Internal: [CPUPL-2698] Change-Id: I961f0066243cf26aeb2e174e388b470133cc4a5f
2 parents db2e353 + cfa3db3 commit 2ee46a3

File tree

180 files changed

+2315
-17805
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

180 files changed

+2315
-17805
lines changed

CREDITS

+1
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ but many others have contributed code and feedback, including
9292
Nathaniel Smith @njsmith
9393
Shaden Smith @ShadenSmith
9494
Tyler Smith @tlrmchlsmth (The University of Texas at Austin)
95+
Snehith @ArcadioN09
9596
Paul Springer @springer13 (RWTH Aachen University)
9697
Adam J. Stewart @adamjstewart (University of Illinois at Urbana-Champaign)
9798
Vladimir Sukarev

configure

+17-2
Original file line numberDiff line numberDiff line change
@@ -729,13 +729,21 @@ read_registry_file()
729729
if [ "${mem}" != "${mems_mem}" ]; then
730730

731731
#clist="${config_registry[$config]}"
732-
clist=$(query_array "config_registry" ${config})
732+
clisttmp=$(query_array "config_registry" ${config})
733733

734734
# Replace the current config with its constituent config set,
735735
# canonicalize whitespace, and then remove duplicate config
736736
# set names, if they exist. Finally, update the config registry
737737
# with the new config list.
738-
newclist=$(echo -e "${clist}" | sed -e "s/${mem}/${mems_mem}/g")
738+
# NOTE: WE must use substitute_words() rather than a simple sed
739+
# expression because we need to avoid matching partial strings.
740+
# For example, if clist above contains "foo bar barsk" and we use
741+
# sed to substitute "bee boo" as the members of "bar", the
742+
# result would (incorrectly) be "foo bee boo bee boosk",
743+
# which would then get reduced, via rm_duplicate_words(), to
744+
# "foo bee boo boosk".
745+
#newclist=$(echo -e "${clist}" | sed -e "s/${mem}/${mems_mem}/g")
746+
newclist=$(substitute_words "${mem}" "${mems_mem}" "${clisttmp}")
739747
newclist=$(canonicalize_ws "${newclist}")
740748
newclist=$(rm_duplicate_words "${newclist}")
741749

@@ -818,6 +826,13 @@ read_registry_file()
818826
# canonicalize whitespace, and then remove duplicate kernel
819827
# set names, if they exist. Finally, update the kernel registry
820828
# with the new kernel list.
829+
# NOTE: WE must use substitute_words() rather than a simple sed
830+
# expression because we need to avoid matching partial strings.
831+
# For example, if klist above contains "foo bar barsk" and we use
832+
# sed to substitute "bee boo" as the members of "bar", the
833+
# result would (incorrectly) be "foo bee boo bee boosk",
834+
# which would then get reduced, via rm_duplicate_words(), to
835+
# "foo bee boo boosk".
821836
#newklist=$(echo -e "${klisttmp}" | sed -e "s/${ker}/${kers_ker}/g")
822837
newklist=$(substitute_words "${ker}" "${kers_ker}" "${klisttmp}")
823838
newklist=$(canonicalize_ws "${newklist}")

docs/BLISObjectAPI.md

-7
Original file line numberDiff line numberDiff line change
@@ -2336,16 +2336,9 @@ char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt )
23362336
```
23372337

23382338
Possible implementation (ie: the `ind_t method` argument) types are:
2339-
* `BLIS_3MH`: Implementation based on the 3m method applied at the highest level, outside the 5th loop around the microkernel.
2340-
* `BLIS_3M1`: Implementation based on the 3m method applied within the 1st loop around the microkernel.
2341-
* `BLIS_4MH`: Implementation based on the 4m method applied at the highest level, outside the 5th loop around the microkernel.
2342-
* `BLIS_4M1B`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that the 1st loop is fissured into two loops, the first of which multiplies the real part of the current micropanel of packed matrix B (against all real and imaginary parts of packed matrix A), and the second of which multiplies the imaginary part of the current micropanel of packed matrix B.
2343-
* `BLIS_4M1A`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that real and imaginary components of the current micropanels are completely used before proceeding to the next virtual microkernel invocation.
23442339
* `BLIS_1M`: Implementation based on the 1m method. (This is the default induced method when real domain kernels are present but complex kernels are missing.)
23452340
* `BLIS_NAT`: Implementation based on "native" execution (ie: NOT an induced method).
23462341

2347-
**NOTE**: `BLIS_3M3` and `BLIS_3M2` have been deprecated from the `typedef enum` of `ind_t`, and `BLIS_4M1B` is also effectively no longer available, though the `typedef enum` value still exists.
2348-
23492342
Possible microkernel types (ie: the return values for `bli_info_get_*_ukr_impl_string()`) are:
23502343
* `BLIS_REFERENCE_UKERNEL` (`"refrnce"`): This value is returned when the queried microkernel is provided by the reference implementation.
23512344
* `BLIS_VIRTUAL_UKERNEL` (`"virtual"`): This value is returned when the queried microkernel is driven by a the "virtual" microkernel provided by an induced method. This happens for any `method` value that is not `BLIS_NAT` (ie: native), but only applies to the complex domain.

docs/BLISTypedAPI.md

-7
Original file line numberDiff line numberDiff line change
@@ -2015,16 +2015,9 @@ char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt )
20152015
```
20162016

20172017
Possible implementation (ie: the `ind_t method` argument) types are:
2018-
* `BLIS_3MH`: Implementation based on the 3m method applied at the highest level, outside the 5th loop around the microkernel.
2019-
* `BLIS_3M1`: Implementation based on the 3m method applied within the 1st loop around the microkernel.
2020-
* `BLIS_4MH`: Implementation based on the 4m method applied at the highest level, outside the 5th loop around the microkernel.
2021-
* `BLIS_4M1B`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that the 1st loop is fissured into two loops, the first of which multiplies the real part of the current micropanel of packed matrix B (against all real and imaginary parts of packed matrix A), and the second of which multiplies the imaginary part of the current micropanel of packed matrix B.
2022-
* `BLIS_4M1A`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that real and imaginary components of the current micropanels are completely used before proceeding to the next virtual microkernel invocation.
20232018
* `BLIS_1M`: Implementation based on the 1m method. (This is the default induced method when real domain kernels are present but complex kernels are missing.)
20242019
* `BLIS_NAT`: Implementation based on "native" execution (ie: NOT an induced method).
20252020

2026-
**NOTE**: `BLIS_3M3` and `BLIS_3M2` have been deprecated from the `typedef enum` of `ind_t`, and `BLIS_4M1B` is also effectively no longer available, though the `typedef enum` value still exists.
2027-
20282021
Possible microkernel types (ie: the return values for `bli_info_get_*_ukr_impl_string()`) are:
20292022
* `BLIS_REFERENCE_UKERNEL` (`"refrnce"`): This value is returned when the queried microkernel is provided by the reference implementation.
20302023
* `BLIS_VIRTUAL_UKERNEL` (`"virtual"`): This value is returned when the queried microkernel is driven by a the "virtual" microkernel provided by an induced method. This happens for any `method` value that is not `BLIS_NAT` (ie: native), but only applies to the complex domain.

docs/Sandboxes.md

+20-32
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,9 @@ Simply put, a sandbox in BLIS provides an alternative implementation to the
1717
`gemm` operation.
1818

1919
To get a little more specific, a sandbox provides an alternative implementation
20-
to the function `bli_gemmnat()`, which is the object-based API call for
21-
computing the `gemm` operation via native execution.
22-
23-
**Note**: Native execution simply means that an induced method will not be used.
24-
It's what you probably already think of when you think of implementing the
25-
`gemm` operation: a series of loops around an optimized (usually assembly-based)
26-
microkernel with some packing functions thrown in at various levels.
20+
to the function `bli_gemm_ex()`, which is the
21+
[expert interface](BLISObjectAPI.md##basic-vs-expert-interfaces) for calling the
22+
[object-based API](BLISObjectAPI.md#gemm) for the `gemm` operation.
2723

2824
Why sandboxes? Sometimes you want to experiment with tweaks or changes to
2925
the `gemm` operation, but you want to do so in a simple environment rather than
@@ -45,18 +41,11 @@ corresponds to a sub-directory of `sandbox` named `gemmlike`. (Reminder: the
4541
`auto` argument is the configuration target and thus unrelated to
4642
sandboxes.)
4743

48-
NOTE: If you want your sandbox implementation to handle *all* problem
49-
sizes and shapes, you'll need to disable the skinny/unpacked "sup"
50-
sub-framework within BLIS, which is enabled by default. This can be
51-
done by passing the `--disable-sup-handling` option to configure:
52-
```
53-
$ ./configure --enable-sandbox=gemmlike --disable-sup-handling auto
54-
```
55-
If you leave sup enabled, the sup implementation will, at runtime, detect
56-
and handle certain smaller problem sizes upstream of where BLIS calls
57-
`bli_gemmnat()` while all other problems will fall to your sandbox
58-
implementation. Thus, you should only leave sup enabled if you are fine
59-
with those smaller problems being handled by sup.
44+
NOTE: Using your own sandbox implementation means that BLIS will call your
45+
sandbox for *all* problem sizes and shapes, for *all* datatypes supported
46+
by BLIS. If you intend to only implement a subset of this functionality
47+
within your sandbox, you should be sure to redirect execution back into
48+
the core framework for the parts that you don't wish to reimplement yourself.
6049

6150
As `configure` runs, you should get output that includes lines
6251
similar to:
@@ -67,13 +56,12 @@ configure: sandbox/gemmlike
6756
And when you build BLIS, the last files to be compiled will be the source
6857
code in the specified sandbox:
6958
```
70-
Compiling obj/haswell/sandbox/gemmlike/bli_gemmnat.o ('haswell' CFLAGS for sandboxes)
7159
Compiling obj/haswell/sandbox/gemmlike/bls_gemm.o ('haswell' CFLAGS for sandboxes)
7260
Compiling obj/haswell/sandbox/gemmlike/bls_gemm_bp_var1.o ('haswell' CFLAGS for sandboxes)
7361
...
7462
```
7563
That's it! After the BLIS library is built, it will contain your chosen
76-
sandbox's implementation of `bli_gemmnat()` instead of the default
64+
sandbox's implementation of `bli_gemm_ex()` instead of the default BLIS
7765
implementation.
7866

7967
## Sandbox rules
@@ -97,15 +85,15 @@ Note that `blis.h` already contains all of its definitions inside of an
9785
`extern "C"` block, so you should be able to `#include "blis.h"` from your
9886
C++11 source code without any issues.
9987

100-
3. All of your code to replace BLIS's default implementation of `bli_gemmnat()`
88+
3. All of your code to replace BLIS's default implementation of `bli_gemm_ex()`
10189
should reside in the named sandbox directory, or some directory therein.
10290
(Obviously.) For example, the "gemmlike" sandbox is located in
10391
`sandbox/gemmlike`. All of the code associated with this sandbox will be
10492
contained within `sandbox/gemmlike`. Note that you absolutely *may* include
10593
additional code and interfaces within the sandbox, if you wish -- code and
10694
interfaces that are not directly or indirectly needed for satisfying the
10795
the "contract" set forth by the sandbox (i.e., including a local definition
108-
of`bli_gemmnat()`).
96+
of`bli_gemm_ex()`).
10997

11098
4. The *only* header file that is required of your sandbox is `bli_sandbox.h`.
11199
It must be named `bli_sandbox.h` because `blis.h` will `#include` this file
@@ -119,12 +107,12 @@ you should only place things (e.g. prototypes or type definitions) in
119107
(b) an *application* that calls your sandbox-enabled BLIS library.
120108
Usually, neither of these situations will require any of your local definitions
121109
since those local definitions are only needed to define your sandbox
122-
implementation of `bli_gemmnat()`, and this function is already prototyped by
110+
implementation of `bli_gemm_ex()`, and this function is already prototyped by
123111
BLIS. *But if you are adding additional APIs and/or operations to the sandbox
124-
that are unrelated to `bli_gemmnat()`, then you'll want to `#include` those
112+
that are unrelated to `bli_gemm_ex()`, then you'll want to `#include` those
125113
function prototypes from within `bli_sandbox.h`*
126114

127-
5. Your definition of `bli_gemmnat()` should be the **only function you define**
115+
5. Your definition of `bli_gemm_ex()` should be the **only function you define**
128116
in your sandbox that begins with `bli_`. If you define other functions that
129117
begin with `bli_`, you risk a namespace collision with existing framework
130118
functions. To guarantee safety, please prefix your locally-defined sandbox
@@ -147,9 +135,9 @@ For example, with a BLIS sandbox you **can** do the following kinds of things:
147135
kernels, which can already be customized within each sub-configuration);
148136
- try inlining your functions manually;
149137
- pivot away from using `obj_t` objects at higher algorithmic level (such as
150-
immediately after calling `bli_gemmnat()`) to try to avoid some overhead;
138+
immediately after calling `bli_gemm_ex()`) to try to avoid some overhead;
151139
- create experimental implementations of new BLAS-like operations (provided
152-
that you also provide an implementation of `bli_gemmnat()`).
140+
that you also provide an implementation of `bli_gemm_ex()`).
153141

154142
You **cannot**, however, use a sandbox to do the following kinds of things:
155143
- define new datatypes (half-precision, quad-precision, short integer, etc.)
@@ -167,17 +155,17 @@ Another important limitation is the fact that the build system currently uses
167155
# Example framework CFLAGS used by 'haswell' sub-configuration
168156
-O3 -Wall -Wno-unused-function -Wfatal-errors -fPIC -std=c99
169157
-D_POSIX_C_SOURCE=200112L -I./include/haswell -I./frame/3/
170-
-I./frame/ind/ukernels/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/
171-
-I./frame/include -DBLIS_VERSION_STRING=\"0.3.2-51\"
158+
-I./frame/1m/ -I./frame/1f/ -I./frame/1/ -I./frame/include
159+
-DBLIS_VERSION_STRING=\"0.3.2-51\"
172160
```
173161
which are likely more general-purpose than the `CFLAGS` used for, say,
174162
optimized kernels or even reference kernels.
175163
```
176164
# Example optimized kernel CFLAGS used by 'haswell' sub-configuration
177165
-O3 -mavx2 -mfma -mfpmath=sse -march=core-avx2 -Wall -Wno-unused-function
178166
-Wfatal-errors -fPIC -std=c99 -D_POSIX_C_SOURCE=200112L -I./include/haswell
179-
-I./frame/3/ -I./frame/ind/ukernels/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/
180-
-I./frame/include -DBLIS_VERSION_STRING=\"0.3.2-51\"
167+
-I./frame/3/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/ -I./frame/include
168+
-DBLIS_VERSION_STRING=\"0.3.2-51\"
181169
```
182170
(To see precisely which flags are being employed for any given file, enable
183171
verbosity at compile-time via `make V=1`.) Compiling sandboxes with these more

docs/Testsuite.md

+1-6
Original file line numberDiff line numberDiff line change
@@ -128,11 +128,6 @@ sdcz # Datatype(s) to test:
128128
300 # Problem size: maximum to test
129129
100 # Problem size: increment between experiments
130130
# Complex level-3 implementations to test
131-
1 # 3mh ('1' = enable; '0' = disable)
132-
1 # 3m1 ('1' = enable; '0' = disable)
133-
1 # 4mh ('1' = enable; '0' = disable)
134-
1 # 4m1b ('1' = enable; '0' = disable)
135-
1 # 4m1a ('1' = enable; '0' = disable)
136131
1 # 1m ('1' = enable; '0' = disable)
137132
1 # native ('1' = enable; '0' = disable)
138133
1 # Simulate application-level threading:
@@ -169,7 +164,7 @@ _**Test gemm with mixed-precision operands?**_ This boolean determines whether `
169164

170165
_**Problem size.**_ These values determine the first problem size to test, the maximum problem size to test, and the increment between problem sizes. Note that the maximum problem size only bounds the range of problem sizes; it is not guaranteed to be tested. Example: If the initial problem size is 128, the maximum is 1000, and the increment is 64, then the last problem size to be tested will be 960.
171166

172-
_**Complex level-3 implementations to test.**_ With the exception of the switch marked `native`, these switches control whether experimental complex domain implementations are tested (when applicable). These implementations employ induced methods complex matrix multiplication and apply to some (though not all) of the level-3 operations. If you don't know what these are, you can ignore them. The `native` switch corresponds to native execution of complex domain level-3 operations, which we test by default. We also test the `1m` method, since it is the induced method of choice when complex microkernels are not available. Note that all of these induced method tests (including `native`) are automatically disabled if the `c` and `z` datatypes are disabled.
167+
_**Complex level-3 implementations to test.**_ This section lists which complex domain implementations of level-3 operations are tested. If you don't know what these are, you can ignore them. The `native` switch corresponds to native execution of complex domain level-3 operations, which we test by default. We also test the `1m` method, since it is the induced method of choice when optimized complex microkernels are not available. Note that all of these induced method tests (including `native`) are automatically disabled if the `c` and `z` datatypes are disabled.
173168

174169
_**Simulate application-level threading.**_ This setting specifies the number of threads the testsuite will spawn, and is meant to allow the user to exercise BLIS as a multithreaded application might if it were to make multiple concurrent calls to BLIS operations. (Note that the threading controlled by this option is orthogonal to, and has no effect on, whatever multithreading may be employed _within_ BLIS, as specified by the environment variables described in the [Multithreading](Multithreading.md) documentation.) When this option is set to 1, the testsuite is run with only one thread. When set to n > 1 threads, the spawned threads will parallelize (in round-robin fashion) the total set of tests specified by the testsuite input files, executing them in roughly the same order as that of a sequential execution.
175170

frame/1m/bli_l1m_ft_ker.h

-26
Original file line numberDiff line numberDiff line change
@@ -110,28 +110,6 @@ typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
110110

111111
INSERT_GENTDEF( unpackm_cxk )
112112

113-
// packm_3mis_ker
114-
// packm_4mi_ker
115-
116-
#undef GENTDEF
117-
#define GENTDEF( ctype, ch, opname, tsuf ) \
118-
\
119-
typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
120-
( \
121-
conj_t conja, \
122-
dim_t cdim, \
123-
dim_t n, \
124-
dim_t n_max, \
125-
ctype* restrict kappa, \
126-
ctype* restrict a, inc_t inca, inc_t lda, \
127-
ctype* restrict p, inc_t is_p, inc_t ldp, \
128-
cntx_t* restrict cntx \
129-
);
130-
131-
INSERT_GENTDEF( packm_cxk_3mis )
132-
INSERT_GENTDEF( packm_cxk_4mi )
133-
134-
// packm_rih_ker
135113
// packm_1er_ker
136114

137115
#undef GENTDEF
@@ -150,12 +128,8 @@ typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
150128
cntx_t* restrict cntx \
151129
);
152130

153-
INSERT_GENTDEF( packm_cxk_rih )
154131
INSERT_GENTDEF( packm_cxk_1er )
155132

156133

157-
158-
159-
160134
#endif
161135

frame/1m/bli_l1m_ker.h

-45
Original file line numberDiff line numberDiff line change
@@ -74,51 +74,6 @@ INSERT_GENTPROT_BASIC0( unpackm_14xk_ker_name )
7474
INSERT_GENTPROT_BASIC0( unpackm_16xk_ker_name )
7575

7676

77-
// 3mis packm kernels
78-
79-
#undef GENTPROT
80-
#define GENTPROT PACKM_3MIS_KER_PROT
81-
82-
INSERT_GENTPROT_BASIC0( packm_2xk_3mis_ker_name )
83-
INSERT_GENTPROT_BASIC0( packm_4xk_3mis_ker_name )
84-
INSERT_GENTPROT_BASIC0( packm_6xk_3mis_ker_name )
85-
INSERT_GENTPROT_BASIC0( packm_8xk_3mis_ker_name )
86-
INSERT_GENTPROT_BASIC0( packm_10xk_3mis_ker_name )
87-
INSERT_GENTPROT_BASIC0( packm_12xk_3mis_ker_name )
88-
INSERT_GENTPROT_BASIC0( packm_14xk_3mis_ker_name )
89-
INSERT_GENTPROT_BASIC0( packm_16xk_3mis_ker_name )
90-
91-
92-
// 4mi packm kernels
93-
94-
#undef GENTPROT
95-
#define GENTPROT PACKM_4MI_KER_PROT
96-
97-
INSERT_GENTPROT_BASIC0( packm_2xk_4mi_ker_name )
98-
INSERT_GENTPROT_BASIC0( packm_4xk_4mi_ker_name )
99-
INSERT_GENTPROT_BASIC0( packm_6xk_4mi_ker_name )
100-
INSERT_GENTPROT_BASIC0( packm_8xk_4mi_ker_name )
101-
INSERT_GENTPROT_BASIC0( packm_10xk_4mi_ker_name )
102-
INSERT_GENTPROT_BASIC0( packm_12xk_4mi_ker_name )
103-
INSERT_GENTPROT_BASIC0( packm_14xk_4mi_ker_name )
104-
INSERT_GENTPROT_BASIC0( packm_16xk_4mi_ker_name )
105-
106-
107-
// rih packm kernels
108-
109-
#undef GENTPROT
110-
#define GENTPROT PACKM_RIH_KER_PROT
111-
112-
INSERT_GENTPROT_BASIC0( packm_2xk_rih_ker_name )
113-
INSERT_GENTPROT_BASIC0( packm_4xk_rih_ker_name )
114-
INSERT_GENTPROT_BASIC0( packm_6xk_rih_ker_name )
115-
INSERT_GENTPROT_BASIC0( packm_8xk_rih_ker_name )
116-
INSERT_GENTPROT_BASIC0( packm_10xk_rih_ker_name )
117-
INSERT_GENTPROT_BASIC0( packm_12xk_rih_ker_name )
118-
INSERT_GENTPROT_BASIC0( packm_14xk_rih_ker_name )
119-
INSERT_GENTPROT_BASIC0( packm_16xk_rih_ker_name )
120-
121-
12277
// 1e/1r packm kernels
12378

12479
#undef GENTPROT

0 commit comments

Comments
 (0)