Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch column count in ScanActor (expected) vs CS (received) and out of range vector access in TPC-DS Q9 #15845

Open
iddqdex opened this issue Mar 17, 2025 · 4 comments
Assignees
Labels

Comments

@iddqdex
Copy link
Collaborator

iddqdex commented Mar 17, 2025

Воспроизведение:

ulimit -c unlimited
cd ~/g/ydb/tests/functional/tpc/large
ya make -tttF test_tpcds.py::TestTpcdsS1::test_tpcds[?]
ls test-results/py3test/test_tpcds/testing_out_stuff/test_tpcds.py.TestTpcdsS1.test_tpcds.1/cluster/slot_1/

стек:

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `/extra_disk_1/.ya/build/build_root/k20t/000007/ydb/apps/ydbd/ydbd server --supp'.
Program terminated with signal SIGILL, Illegal instruction.
#0  std::__y1::vector<NYql::NUdf::TUnboxedValue, NKikimr::NMiniKQL::TMKQLAllocator<NYql::NUdf::TUnboxedValue, (NKikimr::NMiniKQL::EMemorySubPool)0>>::operator[][abi:fe190000](unsigned long) (this=<optimized out>, __n=<optimized out>) at /-S/contrib/libs/cxxsupp/libcxx/include/vector:1448

warning: 1448   /-S/contrib/libs/cxxsupp/libcxx/include/vector: No such file or directory
[Current thread is 1 (LWP 1566338)]
[arc] Arcadia GDB pretty-printers enabled
(gdb) bt
#0  std::__y1::vector<NYql::NUdf::TUnboxedValue, NKikimr::NMiniKQL::TMKQLAllocator<NYql::NUdf::TUnboxedValue, (NKikimr::NMiniKQL::EMemorySubPool)0>>::operator[][abi:fe190000](unsigned long) (this=<optimized out>, __n=<optimized out>) at /home/iddqd/git/ydb/contrib/libs/cxxsupp/libcxx/include/vector:1448
#1  0x000000001559f026 in NKikimr::NMiniKQL::TKqpScanComputeContext::TScanData::TBlockBatchReader::AddData (this=0x6c319b15700, dataAccessor=..., holderFactory=...)
#2  0x000000001559f5ff in NKikimr::NMiniKQL::TKqpScanComputeContext::TScanData::AddData (this=0x6c37940b488, batch=...,
    shardId=Printer initialization error: <class 'AttributeError'>, 'NoneType' object has no attribute 'dereference'
..., holderFactory=...) at /home/iddqd/git/ydb/ydb/core/kqp/runtime/kqp_scan_data.cpp:722
#3  0x000000001960eb64 in NKikimr::NKqp::NScanPrivate::TKqpScanComputeActor::Handle (this=this@entry=0x6c355fe8e80, ev=Python Exception <class 'gdb.error'>: There is no member named T_.
)
    at /home/iddqd/git/ydb/ydb/core/kqp/compute_actor/kqp_scan_compute_actor.cpp:182
#4  0x0000000019611e48 in NKikimr::NKqp::NScanPrivate::TKqpScanComputeActor::StateFunc (this=0x6c355fe8e80, ev=TAutoPtr<NActors::IEventHandle> = {...})
    at /home/iddqd/git/ydb/ydb/core/kqp/compute_actor/kqp_scan_compute_actor.h:79
#5  0x000000000aa6ce35 in NActors::IActor::Receive (this=0x6c355fe8e80, ev=TAutoPtr<NActors::IEventHandle> = {...})
    at /home/iddqd/git/ydb/ydb/library/actors/core/actor.cpp:280
#6  0x000000000aaa4711 in NActors::TExecutorThread::Execute (this=this@entry=0x6c33f40fa00, mailbox=mailbox@entry=0x6c31b1a6900, isTailExecution=<optimized out>)
    at /home/iddqd/git/ydb/ydb/library/actors/core/executor_thread.cpp:269
#7  0x000000000aaa8212 in NActors::TExecutorThread::ProcessExecutorPool()::$_0::operator()(NActors::TMailbox*, bool) const (this=this@entry=0x7f6fd9e22ea0,
    mailbox=mailbox@entry=0x6c31b1a6900, isTailExecution=false) at /home/iddqd/git/ydb/ydb/library/actors/core/executor_thread.cpp:460
#8  0x000000000aaa7e61 in NActors::TExecutorThread::ProcessExecutorPool (this=this@entry=0x6c33f40fa00) at /home/iddqd/git/ydb/ydb/library/actors/core/executor_thread.cpp:512
#9  0x000000000aaa89db in NActors::TExecutorThread::ThreadProc (this=0x6c33f40fa00) at /home/iddqd/git/ydb/ydb/library/actors/core/executor_thread.cpp:538
#10 0x0000000009d400ad in (anonymous namespace)::TPosixThread::ThreadProxy (arg=0x6c33d709480) at /home/iddqd/git/ydb/util/system/thread.cpp:244
#11 0x00007f6fecb90609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f6fecab5353 in clone () from /lib/x86_64-linux-gnu/libc.so.6

#13580

@Hor911
Copy link
Collaborator

Hor911 commented Mar 18, 2025

Single Q9 run also reproduces the problem

ya make -tttF test_tpcds.py::TestTpcdsS1::test_tpcds[9]

@maximyurchuk
Copy link
Collaborator

А почему на перф кластерах у нас такое не воспроизводится? Может кто-нибудь знает (@Hor911 ?)

@Hor911
Copy link
Collaborator

Hor911 commented Mar 18, 2025

  1. Там настоящий проезд в коде
  2. Он остается незамеченным в build=release и почему-то стреляет только в debug
  3. Я сейчас хочу вставить Y_ENSURE и посмотреть на кластере, сколько запросов попадает

@Hor911
Copy link
Collaborator

Hor911 commented Mar 18, 2025

Problem is here https://github.com/ydb-platform/ydb/blob/main/ydb/core/kqp/runtime/kqp_scan_data.cpp#L691

CS send TEvSendData with empty DataIndexes field and 3 internal columns

_yql_plan_step
_yql_tx_id
_yql_write_id

TKqpScanComputeActor expects no columns and results out-of-range access batchValues vector

@Hor911 Hor911 changed the title coredump in tpcds in localtests Mismatch column count in ScanActor (expected) vs CS (received) and out of range vector access in TPC-DS Q9 Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants