MDEV-38212 MDEV-37686 Breaks Parallel Replication #4462

andrelkin · 2025-11-28T18:37:03Z

Re-fixing MDEV-37686.

NULLifying THD::rgi_slave pointer in start_new_trans ctor harmed the parallel slave conflict detection.
Now instead of NULLifying of this member a finer approach is taken to optionally screen THD::rgi_slave when it is attempted to be accessed within start_new_trans context, that is when such out-of-band transaction is run by a slave thread.

The start_new_trans is allowed of course to access server local non-replicated temporary tables, should it ever need that.

The original MDEV-37686 aspect is tested with 12.3 branch's rpl.create_or_replace_mix2.
Any possible side effects to temporary table replication is controlled by existing rpl suite tests.
There is no need for a specific mtr test to prove the parallel slave side is back to normal, as THD::rgi_slave is again available to concurrent transactions.

The Jira issue number for this PR is: MDEV-______

Description

TODO: fill description here

Release Notes

TODO: What should the release notes say about this change?
Include any changed system variables, status variables or behaviour. Optionally list any https://mariadb.com/kb/ pages that need changing.

How can this PR be tested?

TODO: modify the automated test suite to verify that the PR causes MariaDB to behave as intended.
Consult the documentation on "Writing good test cases".

If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

Re-fixing MDEV-37686. NULLifying `THD::rgi_slave` pointer in `start_new_trans` ctor harmed the parallel slave conflict detection. Now instead of `NULL`ifying of this member a finer approach is taken to optionally screen `THD::rgi_slave` when it is attempted to be accessed within `start_new_trans` context, that is when such out-of-band transaction is run by a slave thread. The start_new_trans is allowed of course to access server local non-replicated temporary tables, should it ever need that. The original MDEV-37686 aspect is tested with 12.3 branch's rpl.create_or_replace_mix2. Any possible side effects to temporary table replication is controlled by existing rpl suite tests. There is no need for a specific mtr test to prove the parallel slave side is back to normal, as `THD::rgi_slave` is again available to concurrent transactions.

CLAassistant · 2025-11-28T18:37:10Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

bnestere

If I understand correctly, this patch basically tricks the execution context into sometimes thinking it is a master thread, and other times on a slave thread, i.e. when in the context of a sub-transaction started via start_new_trans. Then, for each usage of the rgi that may involve being tricked, we have to go through an intermediary to see if we should do this trickery or not.

There are a few issues I see with this approach:

It generally seems bug prone to have to go through each usage of rgi that is vulnerable to this trickery
On the master, both parent and sub-transactions both use THD::temporary_tables; then on the slave this would mean the parent transaction would use rgi based temporary tables and then the sub-transactions would use THD::temporary_tables. This inconsistency seems bug prone.
Can there be nested start_new_trans? This would reset is_new_trans for the deepest sub-transaction when it is commit, and parent trans would then see the wrong value.

I'd think it would be more consistent for sub-transactions to continue using rgi based temporary tables, and follow the convention that @vuvova mentions on MDEV-37686:

The code path is almost the same, but there's no crash here. Because "creating a separate transaction" includes saving and resetting the list of temporary tables in the THD.

IIUC, the original problem from MDEV-37686 was that when the sub-transaction (i.e. the one started via start_new_trans) committed, it over-stretched in what it closed because the sub-transaction had too much insight into the parent transaction context. So my thought is, wouldn't it be cleaner to just have start_new_trans correctly save/reset the temporary tables for a slave?

And for a more general question, is there another problem on the parallel slave, s.t. any sub-transactions committed via start_new_trans can't be rolled back via the retry mechanism?

andrelkin · 2025-12-02T19:25:15Z

Thanks for the questions, Brandon!

sometimes thinking it is a master thread, and other times on a slave thread

As the core part of the patch is
s/if (rgi_slave)/if (get_rgi_slave())/
which is affects the slave side only, so I'd leave the master aside.

You are up to the point when quote Serg's

Because "creating a separate transaction" includes saving and resetting the list of temporary tables in the THD.

So start_new_trans() always takes care of doing that. Effectively it's isolation from the main transaction.
And despite of this measure the slave applier manages to see or modify the main transaction's temporary tables list
'cos of it uses a different code path (that the "master") for accessing them.

One might start thinking I bet, about why won't start_new_trans() always present itself as a "master" transaction.
One of the reason of MDEV-37686 failing approach was that, reading into too much (which is over-relying on)

start_new_trans::org_thd

I made a mistake to think it was so indeed. But actually the new trans does not allocate any new THD (most probably out of performance consideration)
... and start_new_trans() inherits THD::rgi_info only for trouble.
Which is the slave tmp tables path unfortunately is also concerned with THD::rgi_info as is the parallel slave conflict detector.

So the patch deals with the slave path, and since a new THD was apparently (I did not ask the question though) discarded by design, I could not come up with
anything more viable than of blocking THD::rgi_info::rli::save_temporary_tables access to the start_new_trans() slave applier.

There are a few issues I see with this approach:

To p.1 vulnerability is that it's not future-proof wrt start_new_trans() potential to reach something else of the slave applier context when
it will, e.g due to human mistake, regress to direct if (rgi_slave). An obvious solution is to enforce use of the new get_rgi_slave() all around.
I share your grievances about p.2, my "apology" is said above, hopefully the Global Temporary Table is the salvation in middle term run..

To p.3, nesting seems to be fine though sharing THD instance is akin of rope-walking.

wouldn't it be cleaner to just have start_new_trans correctly save/reset the temporary tables for a slave?

I'd stay for more: start_new_trans should not have access to anything (but explicitly allowed) of parent transaction(s)' contexts.
Ideally it should base one own THD instance. Could those be pre-allocated in a cache to address the performance argument (I forecast)..?

bnestere · 2025-12-02T20:35:35Z

One might start thinking I bet, about why won't start_new_trans() always present itself as a "master" transaction.

My thought is the opposite, that start_new_trans shouldn't present as a "master" transaction, if it is run by a slave thread. But rather, the rgi state should be temporarily adjusted to account for the new transaction state, and the new transaction should then just be run with the context of the updated RGI.

Ideally it should base one own THD instance. Could those be pre-allocated in a cache to address the performance argument (I forecast)..?

And I'm not so sure, many of the uses of start_new_trans are rather small, single-purpose sub-transactions. For example, Query_log_event::do_apply_event() calls my_tz_find() which uses a start_new_trans to read timezone tables. If that has to be done "as a new transaction", then I think it makes more sense to do it on the same THD and to have the rgi state just updated to reflect a temporary new transaction state. And similarly for finding a stored procedure to execute, which also uses a start_new_trans. And it may be important for future start_new_trans uses to have access to the rgi context. It seems strange to me to try to hide it.

I'd stay for more: start_new_trans should not have access to anything (but explicitly allowed) of parent transaction(s)' contexts.

And to this, I agree it shouldn't have access to the parent transaction context, but I do think it should have access to the rgi context (which would just adapted to hide the parent transaction-specific info); if it exists.

andrelkin · 2025-12-03T09:32:22Z

One might start thinking I bet, about why won't start_new_trans() always present itself as a "master" transaction.

My thought is the opposite, that start_new_trans shouldn't present as a "master" transaction, if it is run by a slave thread. But rather, the rgi state should be temporarily adjusted to account for the new transaction state

start_new_trans() just has to be isolated from parent transactions, should they be slave applier transactions, and therefore is neutral to either master or slave role (I tried to convey that with "master") .
Hence it need not THD::rgi_info which we can see through your

example, Query_log_event::do_apply_event() calls my_tz_find() which uses a start_new_trans to read timezone tables. If that has to be done "as a new transaction", then I think it makes more sense to do it on the same THD and to have the rgi state just updated

It does not matter what is the main trx' applier, any of them would use the same code path never intending to access THD::rgi_info.

As to reusing the parent THD it is stated already that such current practice is an unfortunate compromise;
the inherited THD::rgi_info is clear vulnerability for the parent isolation, manifested of course
by MDEV-37686.

bnestere · 2025-12-03T15:18:24Z

start_new_trans() just has to be isolated from parent transactions, should they be slave applier transactions, and therefore is neutral to either master or slave role (I tried to convey that with "master") .

I don't follow the logic jump from "isolated from parent transactions" to "therefore is neutral to either master or slave role".

The new start_new_trans being isolated from parent transaction data/context is pretty straightforward and I agree. But that doesn't sound the same to me as isolated from slave-applier-context when being applied via a slave thread. I'd think it would make potential bugs/problems/complexities induced from these start_new_trans harder to diagnose/debug. One example that comes to mind is innodb deadlock checking, say if a start_new_trans gets hung on a lock. Granted, your patch doesn't touch that logic, so it would still access rgi, but that then is an inconsistency, where sometimes we would use the rgi, and other times not, while within the start_new_trans context

bnestere · 2025-12-03T20:46:42Z

After our online call, I understand better what you are saying, @andrelkin. What I previously called a sub-transaction is a bit a misnomer, where it would be more effectively thought of as an asynchronous transaction, which could in theory run in its own separate thread (where there would be no rgi context). I agree with this idea, but I'm not sure this patch is a good way to "emulate" that idea. I think that should be done cleanly, and I think this approach is error prone.

As the original issue of MDEV-37686 was with temporary table access, can this patch be limited in scope to only temporary tables? I wonder if instead, the THD::has_temporary_tables() function could be replaced with a function pointer.

I envision something like:

Update the function signature to be a function pointer, rather than function, say bool (THD::*has_temporary_tables)()
There is already a function for normal transactions, THD::has_thd_temporary_tables(). We would need to define a slave-specific function for checking if there are temporary tables

bool THD::has_rgi_temporary_tables()
{
  Some_raii_to_lock_rgi_slave_rli_data_lock lock_raii;
  return rgi_slave->rli->save_temporary_table;
}

By default, a THD would have its function pointer using the THD temp tables, i.e. has_temporary_tables= has_thd_temporary_tables. When a slave applier thread is created, it would reset the function pointer to slave version, i.e. has_temporary_tables= has_rgi_temporary_tables. When start_new_trans creates a new transaction, the function pointer would be reset back to the normal THD variant for temporary tables; and then start_new_trans::restore_old_transaction() would reset it back to the slave version.

The advantages I see of this approach are:

I think it is more extensible, as (a) new pieces of code wouldn't have to be aware of how to get the rgi (i.e. how to invoke the accessor and under what parameters), and (b) other possibly problematic if (rgi) return lorem; else return ipsam; could adopt the same pattern.
It never hides anything, it only resets the behavior for the new transaction to the default behavior. I think this has less risk of running into some invalid server state.
THD::has_temporary_tables() receives a general optimization by removing an if conditional, because the boolean-checker is setup initially based on the context of the thread.

However, it still leaves the problem that rgi exists for our asynchronous transaction, which we can tackle more cleanly in the coming year.

andrelkin · 2025-12-04T13:29:35Z

I wonder if instead, the THD::has_temporary_tables() function could be replaced with a function pointer.

You must've assumed THD::has_temporary_tables() plays a central role in the slave temp table access mechanics.
But it's not and the following stack

#0  THD::lock_temporary_tables (this=0x52c0002f0218) at temporary_tables.cc:1554
#1  0x0000567d64ea40b0 in THD::drop_temporary_table (this=0x52c0002f0218, table=0x519000116c98, is_trans=0x0, delete_table=false) at temporary_tables.cc:649
#2  0x0000567d64a39c2b in mysql_alter_table (thd=0x52c0002f0218, new_db=0x52c0002f4fc8, new_name=0x52c0002f5428, create_info=0x74df656f02e0, table_list=0x52d0005324d8, recreate_info=0x74df656f0070, alter_info=0x74df656f0130, order_num=0, order=0x0, ignore=false, if_exists=false) at sql_table.cc:11714
#3  0x0000567d64c111f3 in Sql_cmd_alter_table::execute (this=0x52d000532c10, thd=0x52c0002f0218) at sql_alter.cc:688
#4  0x0000567d647258a0 in mysql_execute_command (thd=0x52c0002f0218, is_called_from_prepared_stmt=false) at sql_parse.cc:6207
#5  0x0000567d64732dcc in mysql_parse (thd=0x52c0002f0218, rawbuf=0x50e000073163 "ALTER TABLE t1 ENGINE=InnoDB", length=28, parser_state=0x74df65510d60) at sql_parse.cc:8229
#6  0x0000567d6548043e in Query_log_event::do_apply_event (this=0x5130000ca5d8, rgi=0x51d000465080, query_arg=0x50e000073163 "ALTER TABLE t1 ENGINE=InnoDB", q_len_arg=28) at log_event_server.cc:2206

presents a scenario of accessing the slave temp table repository without visiting the function. Similar one exists for DROP.

However your point can be furthered. Much central role play lock/unlock temporary tables. So the current patch's temporary_table.cc hunks seem to be optimizable
into only three \footnote{actually mtr uncovered counters like

if (rgi_slave)
    slave_open_temp_tables++;

require more one or two} aforementioned functions. (Actually in principle its lock/unlock pair as has_temporary_tables is re-definable through them).
And I made a 12.3.0 branch that is testable with rpl.create_or_replace_mix2.

Given that the issue can't be covered by changes around a single function, and at the same time
the mere lock/unlock 's checking of a flag (currently implemented as rgi_info::is_start_new_trans) suffices, let me skip commenting on pp.2-3.

BUT! Indeed this "instead" of yours

which could in theory run in its own separate thread

certainly deserves further interrogation! :-) We might as well engage existing general server worker thread for that..

bnestere · 2025-12-04T15:16:00Z

You must've assumed THD::has_temporary_tables() plays a central role in the slave temp table access mechanics. But it's not and the following stack

Ah I suppose I was thinking of MDEV-37686 in-particular, where the condition that lead to the crash was has_temporary_tables(). But you are right, the general behavior needs to be considered.

And on that note, now that you mention..

\footnote{actually mtr uncovered counters like

it also looks like conditions revolving tmp_tables aren't just using THD::rgi_slave, but also THD::slave_thread, which likely adds another layer of complexity to this fix..

E.g, in the same function, THD::open_temporary_table(TMP_TABLE_SHARE*, const Lex_ident_table&), we check for slave usage in both different ways:

  /*
    In replication, temporary tables are not confined to a single
    thread/THD.
  */
  if (slave_thread)
    flags|= HA_OPEN_GLOBAL_TMP_TABLE;

  ...

  /* Increment Slave_open_temp_table_definitions status variable count. */
  if (rgi_slave)
    slave_open_temp_tables++;

and THD::slave_thread is used elsewhere too..

I'd think we need to make those uses consistent with this fix as well.

andrelkin · 2025-12-04T15:22:04Z

BUT! Indeed this "instead" of yours

which could in theory run in its own separate thread

certainly deserves further interrogation! :-) We might as well engage existing general server worker thread for that..

which makes me immediately scout on.
Judging by a limited number of use cases of start_new_trans it looks really feasible.
E.g (the line numbers from the 12.3 branch of mine)

Let it be CREATE-TABLE (it does matter that it's the slave thread is executing it)

#12 0x000064a9657f9fbd in Query_log_event::do_apply_event (this=0x5130000e0918, rgi=0x51d0002ff880, query_arg=0x50f000032b53 "CREATE TABLE t1 AS SELECT 1 AS f1", q_len_arg=33) at log_event_server.cc:2131

the new trans starts in delete_statistics_for_table()

#0  start_new_trans::start_new_trans (this=0x758031d7a0f0, thd=0x52c000240218) at sql_class.cc:6660
#1  0x000064a964c26c4b in delete_statistics_for_table (thd=0x52c000240218, db=0x52d00055a580, tab=0x52d00055a590) at sql_statistics.cc:3421
#2  0x000064a964c67d33 in create_table_impl (thd=0x52c000240218, ddl_log_state_create=0x52d00055be18, ddl_log_state_rm=0x52d00055be38, orig_db=..., orig_table_name=..., db=const LEX_CSTRING 0x52d00055a540 "test" length 4, table_name=const LEX_CSTRING 0x52d00055a500 "t1" length 2, path=const LEX_CSTRING 0x758031a10890 "./test/t1" length 9, options=..., create_info=0x758031a74360, alter_info=0x758031a741a0, create_table_mode=0, is_trans=0x0, key_info=0x758031a10830, key_count=0x758031a10820, frm=0x758031a10850) at sql_table.cc:4789

where the old stats is removed. I wonder whether the removal could be carried out in parallel with the rest of CREATE handling? Sure it admits race in that the new table and old stats coexist for "short" period of time.
But cut out of the out-of-band trx from the parent transaction is compelling to me due clear logics separation, simplicity etc.

@vuvova may cut this proposal short though...

andrelkin requested review from bnestere, montywi and vuvova November 28, 2025 18:37

svoj added the MariaDB Corporation label Nov 28, 2025

bnestere requested changes Dec 2, 2025

View reviewed changes

andrelkin mentioned this pull request Dec 5, 2025

MDEV-38254 refixing MDEV-37686 #4472

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MDEV-38212 MDEV-37686 Breaks Parallel Replication #4462

MDEV-38212 MDEV-37686 Breaks Parallel Replication #4462

andrelkin commented Nov 28, 2025

Uh oh!

CLAassistant commented Nov 28, 2025

Uh oh!

bnestere left a comment •

edited

Loading

Uh oh!

andrelkin commented Dec 2, 2025 •

edited

Loading

Uh oh!

bnestere commented Dec 2, 2025

Uh oh!

andrelkin commented Dec 3, 2025 •

edited

Loading

Uh oh!

bnestere commented Dec 3, 2025 •

edited

Loading

Uh oh!

bnestere commented Dec 3, 2025

Uh oh!

andrelkin commented Dec 4, 2025 •

edited

Loading

Uh oh!

bnestere commented Dec 4, 2025

Uh oh!

andrelkin commented Dec 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

Uh oh!

MDEV-38212 MDEV-37686 Breaks Parallel Replication #4462

Are you sure you want to change the base?

MDEV-38212 MDEV-37686 Breaks Parallel Replication #4462

Conversation

andrelkin commented Nov 28, 2025

Description

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Uh oh!

CLAassistant commented Nov 28, 2025

Uh oh!

bnestere left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrelkin commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnestere commented Dec 2, 2025

Uh oh!

andrelkin commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnestere commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnestere commented Dec 3, 2025

Uh oh!

andrelkin commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnestere commented Dec 4, 2025

Uh oh!

andrelkin commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

bnestere left a comment •

edited

Loading

andrelkin commented Dec 2, 2025 •

edited

Loading

andrelkin commented Dec 3, 2025 •

edited

Loading

bnestere commented Dec 3, 2025 •

edited

Loading

andrelkin commented Dec 4, 2025 •

edited

Loading

andrelkin commented Dec 4, 2025 •

edited

Loading