[TVM][RUST] Heap corruption when using FFI for `BackendParallelLaunch` #1226

nhynes · 2018-06-04T20:03:23Z

When using a FFI for TVMBackendParallelLaunch, even heap allocating a single byte corrupts the resulting computation.

One possible cause is that there's some unintentional malloc/free happening when constructing the flambda closure.
Another (probably more likely) possibility is that I've incorrectly set a struct field wrong somewhere. Parallel for basic TVM ops works, after all.

For reference, Rust uses jemalloc.

Steps to reproduce

curl https://sh.rustup.rs -sSf | sh
rustup default nightly-2018-04-11
git clone https://github.com/nhynes/tvm-rs

cd tvm-rs/tests/test_nnvm
export TVM_NUM_THREADS=0
cargo run  # works
cargo run --features par-launch-alloc  # does not work

The text was updated successfully, but these errors were encountered:

nhynes · 2018-09-26T05:44:11Z

This is why we can't have nice things

tqchen · 2018-09-27T01:46:32Z

can you elaborate a bit on this?

nhynes · 2018-09-27T03:35:43Z

When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version), 32-byte (VEX.256 encoded version) or 64-byte (EVEX.512 encoded version) boundary or a general-protection exception (#GP) will be generated.

For parallel lambdas which use a temporal workspace, LLVM will emit load/store instructions which specify an alignment e.g.,

  %70 = getelementptr inbounds float, float* %4, i64 %69
  %71 = bitcast float* %70 to <8 x float>*
  store <8 x float> %68, <8 x float>* %71, align 32, !tbaa !149

where %4 is the temporal workspace.
The llvm ops get translated into MOVAPS which #GP when when the workspace has the wrong alignment.

tqchen · 2018-09-27T03:50:43Z

This sounds like a bug to me in the parallel code generator(or the workspace allocator)which do not respect the alignment requirements

tqchen · 2018-09-27T03:51:34Z

Specially, we do require temporal workspace also align to a minimum alignment and the compiler takes advantage of that

nhynes · 2018-09-27T04:01:27Z

sounds like a bug to me in the parallel code generator(or the workspace allocator)

Yes, this was most definitely a bug in my implementation of the workspace allocator. It's fixed now, so threading works in the rust runtime. Figuring this out required bisecting the generated llvm code with debugging statements, which isn't the best developer experience, but I suppose this is unavoidable.

tqchen · 2018-09-27T04:08:11Z

Gotcha, maybe we should put a heavy comment on workspace and DeviceAPI interface to warn this potential issue to the developers. This is certainly something that I would have overlooked as well

nhynes · 2018-09-27T04:12:56Z

~~PR incoming!~~ Actually, it literally says right there in c_backend_api.h:

/*!
 * \brief Backend function to allocate temporal workspace.
 *
 * \note The result allocate spaced is ensured to be aligned to kTempAllocaAlignment.
 *
 * ...
 */

This is totally my fault.

tqchen changed the title ~~[TVM] Heap corruption when using FFI for BackendParallelLaunch~~ [TVM][RUST] Heap corruption when using FFI for BackendParallelLaunch Jun 4, 2018

ehsanmok mentioned this issue Aug 15, 2018

Add rust runtime #1597

Merged

nhynes closed this as completed Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TVM][RUST] Heap corruption when using FFI for `BackendParallelLaunch` #1226

[TVM][RUST] Heap corruption when using FFI for `BackendParallelLaunch` #1226

nhynes commented Jun 4, 2018 •

edited

Loading

nhynes commented Sep 26, 2018

tqchen commented Sep 27, 2018

nhynes commented Sep 27, 2018

tqchen commented Sep 27, 2018

tqchen commented Sep 27, 2018

nhynes commented Sep 27, 2018

tqchen commented Sep 27, 2018

nhynes commented Sep 27, 2018 •

edited

Loading

[TVM][RUST] Heap corruption when using FFI for BackendParallelLaunch #1226

[TVM][RUST] Heap corruption when using FFI for BackendParallelLaunch #1226

Comments

nhynes commented Jun 4, 2018 • edited Loading

Steps to reproduce

nhynes commented Sep 26, 2018

tqchen commented Sep 27, 2018

nhynes commented Sep 27, 2018

tqchen commented Sep 27, 2018

tqchen commented Sep 27, 2018

nhynes commented Sep 27, 2018

tqchen commented Sep 27, 2018

nhynes commented Sep 27, 2018 • edited Loading

[TVM][RUST] Heap corruption when using FFI for `BackendParallelLaunch` #1226

[TVM][RUST] Heap corruption when using FFI for `BackendParallelLaunch` #1226

nhynes commented Jun 4, 2018 •

edited

Loading

nhynes commented Sep 27, 2018 •

edited

Loading