From: Thomas Gleixner <tglx@linutronix.de>
Date: Tue, 21 Sep 2021 23:12:50 +0200
Subject: [PATCH] rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT

Valentin reported warnings about suspicious RCU usage on RT kernels. Those
happen when offloading of RCU callbacks is enabled:

  WARNING: suspicious RCU usage
  5.13.0-rt1 #20 Not tainted
  -----------------------------
  kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!

  rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
  rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
  rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)

The reason is that rcu_rdp_is_offloaded() is invoked without one of the
required protections on RT enabled kernels because local_bh_disable() does
not disable preemption on RT.

Valentin proposed to add a local lock to the code in question, but that's
suboptimal in several aspects:

  1) local locks add extra code to !RT kernels for no value.

  2) All possible callsites have to audited and amended when affected
     possible at an outer function level due to lock nesting issues.

  3) As the local lock has to be taken at the outer functions it's required
     to release and reacquire them in the inner code sections which might
     voluntary schedule, e.g. rcu_do_batch().

Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
rcu_rdp_is_offloaded() in the variable declaration section right at the top
of the functions. But the actual usage of the result is either within a
section which provides the required protections or after such a section.

So the obvious solution is to move the invocation into the code sections
which provide the proper protections, which solves the problem for RT and
does not have any impact on !RT kernels.

Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/rcu/tree.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	unsigned long mask;
-	bool needwake = false;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded, needwake = false;
 	struct rcu_node *rnp;
 
 	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	offloaded = rcu_rdp_is_offloaded(rdp);
 	if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
 	    rdp->gpwrap) {
 
@@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data
 	int div;
 	bool __maybe_unused empty;
 	unsigned long flags;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded;
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count = 0;
@@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data
 	rcu_nocb_lock(rdp);
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
 	pending = rcu_segcblist_n_cbs(&rdp->cblist);
+	offloaded = rcu_rdp_is_offloaded(rdp);
 	div = READ_ONCE(rcu_divisor);
 	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
 	bl = max(rdp->blimit, pending >> div);