From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: sched: Delay task stack freeing on RT
Date: Tue, 28 Sep 2021 14:24:30 +0200

Anything which is done on behalf of a dead task at the end of
finish_task_switch() is preventing the incoming task from doing useful
work. While it is benefitial for fork heavy workloads to recycle the task
stack quickly, this is a latency source for real-time tasks.

Therefore delay the stack cleanup on RT enabled kernels.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210928122411.593486363@linutronix.de
---
 kernel/exit.c       |    5 +++++
 kernel/fork.c       |    5 ++++-
 kernel/sched/core.c |    8 ++++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

@ kernel/exit.c:175 @ static void delayed_put_task_struct(stru
 	kprobe_flush_task(tsk);
 	perf_event_delayed_put(tsk);
 	trace_sched_process_free(tsk);
+
+	/* RT enabled kernels delay freeing the VMAP'ed task stack */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		put_task_stack(tsk);
+
 	put_task_struct(tsk);
 }
 
--- a/kernel/fork.c
+++ b/kernel/fork.c
@ kernel/exit.c:292 @ static inline void free_thread_stack(str
 			return;
 		}
 
-		vfree_atomic(tsk->stack);
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			vfree_atomic(tsk->stack);
+		else
+			vfree(tsk->stack);
 		return;
 	}
 #endif
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@ kernel/exit.c:4848 @ static struct rq *finish_task_switch(str
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
 
-		/* Task is done with its stack. */
-		put_task_stack(prev);
+		/*
+		 * Release VMAP'ed task stack immediate for reuse. On RT
+		 * enabled kernels this is delayed for latency reasons.
+		 */
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			put_task_stack(prev);
 
 		put_task_struct_rcu_user(prev);
 	}