From: Carsten Emde <C.Emde@osadl.org> Date: Sat, 9 Jul 2018 15:47:14 +0100 Subject: [PATCH] tracing: Add latency histograms This patch provides a recording mechanism to store data of potential and effective sources of system latencies. The recordings separately determine the latency caused by a delayed timer expiration, by a delayed wakeup of the related user space program, by a delayed context switch as well as by the sum of the two first and by the sum of all three. The histograms can be configured, enabled and reset individually. The data are accessible via the debug filesystem. For details please consult Documentation/trace/latency-histograms.txt. Signed-off-by: Carsten Emde <C.Emde@osadl.org> --- Documentation/trace/latency-histograms.txt | 221 +++ fs/proc/array.c | 4 include/linux/hrtimer_types.h | 4 include/linux/kthread.h | 1 include/linux/sched.h | 16 include/linux/workqueue.h | 1 include/trace/events/hist.h | 49 include/trace/events/latency_hist.h | 30 kernel/kthread.c | 12 kernel/time/hrtimer.c | 23 kernel/trace/Kconfig | 154 ++ kernel/trace/Makefile | 5 kernel/trace/helper_latency_hist.h | 33 kernel/trace/latency_hist.c | 1639 +++++++++++++++++++++++++++++ kernel/trace/trace_irqsoff.c | 6 kernel/trace/trace_preemptirq.c | 31 kernel/workqueue.c | 32 17 files changed, 2226 insertions(+), 35 deletions(-) Index: linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt =================================================================== @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:4 @ + Using the Linux Kernel Latency Histograms + by Carsten Emde + + +This document gives a short explanation how to enable, configure and use +latency histograms. Latency histograms are primarily relevant in the +context of real-time enabled kernels (CONFIG_PREEMPT_RT_FULL) +and are used in the quality management of the Linux real-time +capabilities. + + +* Purpose of latency histograms + +A latency histogram continuously accumulates latency data and counts +the number of times a particular latency value is hit. The granularity +of the latency histograms described herein is one microsecond. +There are two types of histograms +- potential sources of latencies +- effective latencies + + +* Potential sources of latencies + +Potential sources of latencies are code segments where interrupts, +preemption or both are disabled (aka critical sections). To create +histograms of potential sources of latency, the kernel stores the time +stamp at the start of a critical section, determines the time elapsed +when the end of the section is reached, and increments the frequency +counter of that latency value - irrespective of whether any concurrently +running process is affected by this latency or not. +- Configuration items (in the Kernel hacking/Tracers submenu) + Preemption-off Latency Histogram -> CONFIG_PREEMPT_OFF_HIST + Interrupt-off Latency Histogram -> CONFIG_INTERRUPT_OFF_HIST +- Requires + Enable trace events for preempt and irq disable/enable -> + CONFIG_PREEMPTIRQ_EVENTS + + +* Effective latencies + +Effective latencies are actually occurring during wakeup of a process. +To determine effective latencies, the kernel stores the time stamp when +a process is scheduled to be woken up, and determines the duration of +the wakeup time shortly before control is passed over to this process. +However, simply measuring the interval between enqueuing and wakeup may +not be appropriate in cases when a process is scheduled as a result of a +timer expiration. The timer may have missed its deadline, e.g. due to +disabled interrupts, but this latency would not be registered. +Therefore, the offsets of missed timers are recorded in a separate +histogram. If both missed timer offset and wakeup latency are configured +and enabled, a third histogram may be enabled that records the overall +latency as a sum of the timer offset and the wakeup latency. This +histogram is called "timerandwakeup". Finally, after the scheduling +operation is completed, the process still needs to undergo context +switch until the complete control is returned to user space and the +effective preemption latency can be determined. For this purpose, a +third base histogram is provided that starts the measurement at the +beginning of the context switch and stops when the related systems call +returns. If all three histograms missed timer offset, wakeup and context +switch latency are enabled, a fifth histogram may be enabled that +records the overall latency as a sum of the missed timer offset, wakeup +and context switch latency. This histogram is called +"timerwakeupswitch". +- Configuration items (in the Kernel hacking/Tracers submenu) + Missed Timer Offsets Histogram -> CONFIG_MISSED_TIMER_OFFSETS_HIST + Scheduling Latency Histogram -> CONFIG_WAKEUP_LATENCY_HIST + Context Switch Time Histogram -> CONFIG_SWITCHTIME_HIST +- Requires + Scheduling Latency Tracer -> CONFIG_SCHED_TRACER + + +* Usage + +The interface to the administration of the latency histograms is located +in the debugfs file system. To mount it, either enter + +mount -t sysfs nodev /sys +mount -t debugfs nodev /sys/kernel/debug + +from shell command line level, or add + +nodev /sys sysfs defaults 0 0 +nodev /sys/kernel/debug debugfs defaults 0 0 + +to the file /etc/fstab. All latency histogram related files are then +available in the directory /sys/kernel/debug/latency_hist. A particular +histogram type is enabled by writing non-zero to the related variable in +the /sys/kernel/debug/latency_hist/enable directory. Select +"preemptirqsoff" for the histograms of potential sources of latencies +and "wakeup" for histograms of effective latencies etc. The histogram +data - one per CPU - are available in the files + +/sys/kernel/debug/latency_hist/preemptoff/CPUx +/sys/kernel/debug/latency_hist/irqsoff/CPUx +/sys/kernel/debug/latency_hist/preemptirqsoff/CPUx +/sys/kernel/debug/latency_hist/missed_timer_offsets/CPUx +/sys/kernel/debug/latency_hist/wakeup/CPUx +/sys/kernel/debug/latency_hist/wakeup/sharedprio/CPUx +/sys/kernel/debug/latency_hist/timerandwakeup/CPUx +/sys/kernel/debug/latency_hist/switchtime/CPUx +/sys/kernel/debug/latency_hist/timerwakeupswitch/CPUx + +The following script can be used to enable all histograms of potential latencies: + +#!/bin/sh +enabledir=/sys/kernel/debug/latency_hist/enable +for i in missed_timer_offsets wakeup timerandwakeup switchtime timerwakeupswitch +do + enable=$enabledir/$i + if test -f $enable + then + echo 1 >$enable + fi +done + + +The histograms are reset by writing non-zero to the file "reset" in a +particular latency directory. To reset all latency data, use + +#!/bin/sh + +HISTDIR=/sys/kernel/debug/latency_hist + +if test -d $HISTDIR +then + cd $HISTDIR + for i in `find . | grep /reset$` + do + echo 1 >$i + done +fi + + +* Data format + +Latency data are stored with a resolution of one microsecond. The +maximum latency is 10,240 microseconds. The data are only valid, if the +underflow and the overflow registers are empty. Every output line +contains the latency in microseconds in the first row and the number of +samples in the second row. To display only lines with at least one +latency recording use, for example, + +grep -v " 0$" /sys/kernel/debug/latency_hist/preemptoff/CPU0 + +#Minimum latency: 0 microseconds. +#Average latency: 0 microseconds. +#Maximum latency: 25 microseconds. +#Total samples: 3104770694 +There are 0 samples lower than 0 microseconds. +#There are 0 samples greater or equal than 10240 microseconds +#usecs samples + 0 2984486876 + 1 49843506 + 2 58219047 + 3 5348126 + 4 2187960 + 5 3388262 + 6 959289 + 7 208294 + 8 40420 + 9 4485 + 10 14918 + 11 18340 + 12 25052 + 13 19455 + 14 5602 + 15 969 + 16 47 + 17 18 + 18 14 + 19 1 + 20 3 + 21 2 + 22 5 + 23 2 + 25 1 + + +* Wakeup latency of a selected process + +To only collect wakeup latency data of a particular process, write the +PID of the requested process to + +/sys/kernel/debug/latency_hist/wakeup/pid + +PIDs are not considered, if this variable is set to 0. + + +* Details of the process with the highest wakeup latency so far + +Selected data of the process that suffered from the highest wakeup +latency that occurred in a particular CPU are available in the file + +/sys/kernel/debug/latency_hist/wakeup/max_latency-CPUx. + +In addition, other relevant system data at the time when the +latency occurred are given. + +The format of the data is (all in one line): +<PID> <Priority> <Latency> (<Timeroffset>,<Wakeupoffset>) \ +<Command> <- <PID> <Priority> <Command> <Timestamp> + +The value of <Timeroffset> is only relevant in the combined timer +and wakeup latency recording. In the wakeup recording, it is +always 0, in the missed_timer_offsets recording, it is the same +as <Latency>. Similarly, the value of <Wakeupoffset> is only +relevant in the combined timer, wakeup and context switch +recording. Otherwise, it is 0 as above. + +When retrospectively searching for the origin of a latency and +tracing was not enabled, it may be helpful to know the name and +some basic data of the task that (finally) was switching to the +late real-tlme task. In addition to the victim's data, also the +data of the possible culprit are therefore displayed after the +"<-" symbol. + +Finally, the timestamp of the time when the latency occurred +in <seconds>.<microseconds> after the most recent system boot +is provided. + +These data are also reset when the wakeup histogram is reset. Index: linux-6.17.1-rt5-ras6/fs/proc/array.c =================================================================== --- linux-6.17.1-rt5-ras6.orig/fs/proc/array.c +++ linux-6.17.1-rt5-ras6/fs/proc/array.c @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:101 @ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) { - char tcomm[64]; + char tcomm[MAXFULLTASKNAME]; /* * Test before PF_KTHREAD because all workqueue worker threads are @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:117 @ void proc_task_name(struct seq_file *m, if (escape) seq_escape_str(m, tcomm, ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\"); else - seq_printf(m, "%.64s", tcomm); + seq_printf(m, "%.*s", (int) sizeof(tcomm), tcomm); } /* Index: linux-6.17.1-rt5-ras6/include/linux/hrtimer_types.h =================================================================== --- linux-6.17.1-rt5-ras6.orig/include/linux/hrtimer_types.h +++ linux-6.17.1-rt5-ras6/include/linux/hrtimer_types.h @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:32 @ enum hrtimer_restart { * @function: timer expiry callback function * @base: pointer to the timer base (per cpu and per clock) * @state: state information (See bit values above) + * @praecox: timer expiry time if expired at the time of programming * @is_rel: Set if the timer was armed relative * @is_soft: Set if hrtimer will be expired in soft interrupt context. * @is_hard: Set if hrtimer will be expired in hard interrupt context @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:46 @ struct hrtimer { enum hrtimer_restart (*__private function)(struct hrtimer *); struct hrtimer_clock_base *base; u8 state; +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + ktime_t praecox; +#endif u8 is_rel; u8 is_soft; u8 is_hard; Index: linux-6.17.1-rt5-ras6/include/linux/kthread.h =================================================================== --- linux-6.17.1-rt5-ras6.orig/include/linux/kthread.h +++ linux-6.17.1-rt5-ras6/include/linux/kthread.h @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:37 @ struct task_struct *kthread_create_on_cp const char *namefmt); void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk); +void get_kthread_comm_nolock(char *buf, size_t buf_size, struct task_struct *tsk); bool set_kthread_struct(struct task_struct *p); void kthread_set_per_cpu(struct task_struct *k, int cpu); Index: linux-6.17.1-rt5-ras6/include/linux/sched.h =================================================================== --- linux-6.17.1-rt5-ras6.orig/include/linux/sched.h +++ linux-6.17.1-rt5-ras6/include/linux/sched.h @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:90 @ struct user_event_mm; #include <linux/sched/ext.h> +#define MAXFULLTASKNAME 64 + /* * Task state bitmask. NOTE! These bits are also * encoded in fs/proc/array.c: get_task_state(). @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:1513 @ struct task_struct { #ifdef CONFIG_TRACING /* Bitmask and counter of trace recursion: */ unsigned long trace_recursion; +#ifdef CONFIG_WAKEUP_LATENCY_HIST + u64 preempt_timestamp_hist; +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + long timer_offset; +#endif +#endif /* CONFIG_WAKEUP_LATENCY_HIST */ +#ifdef CONFIG_SWITCHTIME_HIST + u64 switchtime_timestamp_hist; + struct task_struct *switchtime_prev; +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + unsigned long wakeup_latency; +#endif +#endif /* CONFIG_SWITCHTIME_HIST */ #endif /* CONFIG_TRACING */ #ifdef CONFIG_KCOV Index: linux-6.17.1-rt5-ras6/include/linux/workqueue.h =================================================================== --- linux-6.17.1-rt5-ras6.orig/include/linux/workqueue.h +++ linux-6.17.1-rt5-ras6/include/linux/workqueue.h @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:642 @ extern void show_all_workqueues(void); extern void show_freezable_workqueues(void); extern void show_one_workqueue(struct workqueue_struct *wq); extern void wq_worker_comm(char *buf, size_t size, struct task_struct *task); +extern void wq_worker_comm_nolock(char *buf, size_t size, struct task_struct *task); /** * queue_work - queue work on a workqueue Index: linux-6.17.1-rt5-ras6/include/trace/events/hist.h =================================================================== --- /dev/null +++ linux-6.17.1-rt5-ras6/include/trace/events/hist.h @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:4 @ +/* SPDX-License-Identifier: GPL-2.0-only */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM hist + +#if !defined(_TRACE_HIST_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HIST_H + +#include "latency_hist.h" +#include <linux/tracepoint.h> + +#ifndef CONFIG_MISSED_TIMER_OFFSETS_HIST +#define trace_hrtimer_interrupt(a, b, c, d) +#else +TRACE_EVENT(hrtimer_interrupt, + + TP_PROTO(int cpu, long long offset, struct task_struct *curr, + struct task_struct *task), + + TP_ARGS(cpu, offset, curr, task), + + TP_STRUCT__entry( + __field(int, cpu) + __field(long long, offset) + __array(char, ccomm, TASK_COMM_LEN) + __field(int, cprio) + __array(char, tcomm, TASK_COMM_LEN) + __field(int, tprio) + ), + + TP_fast_assign( + __entry->cpu = cpu; + __entry->offset = offset; + memcpy(__entry->ccomm, curr->comm, TASK_COMM_LEN); + __entry->cprio = curr->prio; + memcpy(__entry->tcomm, task != NULL ? task->comm : "<none>", + task != NULL ? TASK_COMM_LEN : 7); + __entry->tprio = task != NULL ? task->prio : -1; + ), + + TP_printk("cpu=%d offset=%lld curr=%s[%d] thread=%s[%d]", + __entry->cpu, __entry->offset, __entry->ccomm, + __entry->cprio, __entry->tcomm, __entry->tprio) +); +#endif + +#endif /* _TRACE_HIST_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> Index: linux-6.17.1-rt5-ras6/include/trace/events/latency_hist.h =================================================================== --- /dev/null +++ linux-6.17.1-rt5-ras6/include/trace/events/latency_hist.h @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:4 @ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _LATENCY_HIST_H +#define _LATENCY_HIST_H + +enum hist_action { + IRQS_ON, + PREEMPT_ON, + TRACE_STOP, + IRQS_OFF, + PREEMPT_OFF, + TRACE_START, +}; + +static char *actions[] = { + "IRQS_ON", + "PREEMPT_ON", + "TRACE_STOP", + "IRQS_OFF", + "PREEMPT_OFF", + "TRACE_START", +}; + +static inline char *getaction(int action) +{ + if (action >= 0 && action <= ARRAY_SIZE(actions)) + return actions[action]; + return "unknown"; +} + +#endif /* _LATENCY_HIST_H */ Index: linux-6.17.1-rt5-ras6/kernel/kthread.c =================================================================== --- linux-6.17.1-rt5-ras6.orig/kernel/kthread.c +++ linux-6.17.1-rt5-ras6/kernel/kthread.c @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:118 @ void get_kthread_comm(char *buf, size_t strscpy_pad(buf, kthread->full_name, buf_size); } +void get_kthread_comm_nolock(char *buf, size_t buf_size, struct task_struct *tsk) +{ + struct kthread *kthread = to_kthread(tsk); + + if (!kthread || !kthread->full_name) { + strscpy(buf, tsk->comm, buf_size); + return; + } + + strscpy_pad(buf, kthread->full_name, buf_size); +} + bool set_kthread_struct(struct task_struct *p) { struct kthread *kthread; Index: linux-6.17.1-rt5-ras6/kernel/time/hrtimer.c =================================================================== --- linux-6.17.1-rt5-ras6.orig/kernel/time/hrtimer.c +++ linux-6.17.1-rt5-ras6/kernel/time/hrtimer.c @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:49 @ #include <linux/uaccess.h> #include <trace/events/timer.h> +#include <trace/events/hist.h> #include "tick-internal.h" @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:1271 @ static int __hrtimer_start_range_ns(stru new_base = base; } +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + { + ktime_t now = new_base->get_time(); + + if (ktime_to_ns(tim) < ktime_to_ns(now)) + timer->praecox = now; + else + timer->praecox = ktime_set(0, 0); + } +#endif + first = enqueue_hrtimer(timer, new_base, mode); if (!force_local) { /* @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:1805 @ static void __run_hrtimer(struct hrtimer base->running = NULL; } +static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer); + static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now, unsigned long flags, unsigned int active_mask) { @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:1824 @ static void __hrtimer_run_queues(struct timer = container_of(node, struct hrtimer, node); + trace_hrtimer_interrupt(raw_smp_processor_id(), + ktime_to_ns(ktime_sub(ktime_to_ns(timer->praecox) ? + timer->praecox : hrtimer_get_expires(timer), + basenow)), + current, + timer->function == hrtimer_wakeup ? + container_of(timer, struct hrtimer_sleeper, + timer)->task : NULL); + /* * The immediate goal for using the softexpires is * minimizing wakeups, not running timers at the Index: linux-6.17.1-rt5-ras6/kernel/trace/Kconfig =================================================================== --- linux-6.17.1-rt5-ras6.orig/kernel/trace/Kconfig +++ linux-6.17.1-rt5-ras6/kernel/trace/Kconfig @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:395 @ config IRQSOFF_TRACER bool "Interrupts-off Latency Tracer" default n depends on TRACE_IRQFLAGS_SUPPORT - select TRACE_IRQFLAGS - select GENERIC_TRACER + depends on TRACE_IRQFLAGS + depends on GENERIC_TRACER select TRACER_MAX_TRACE select RING_BUFFER_ALLOW_SWAP select TRACER_SNAPSHOT @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:436 @ config PREEMPT_TRACER echo 0 > /sys/kernel/tracing/tracing_max_latency (Note that kernel size and overhead increase with this option - enabled. This option and the irqs-off timing option can be + enabled. This option and the preempt-off timing option can be used together or separately.) +config PREEMPTIRQ_EVENTS + bool "Enable trace events for preempt and irq disable/enable" + depends on TRACE_IRQFLAGS + depends on DEBUG_PREEMPT || !PROVE_LOCKING + depends on TRACING + default n + help + Enable tracing of disable and enable events for preemption and irqs. + For tracing preempt disable/enable events, DEBUG_PREEMPT must be + enabled. For tracing irq disable/enable events, PROVE_LOCKING must + be disabled. + +config PREEMPT_OFF_HIST + bool "Preemption-off Latency Histogram" + depends on PREEMPTIRQ_EVENTS + depends on DEBUG_PREEMPT + help + This option generates continuously updated histograms (one per cpu) + of the duration of time periods with preemption disabled. The + histograms are disabled by default. To enable them, write a non-zero + number to + + /sys/kernel/debug/latency_hist/enable/preemptirqsoff + + If INTERRUPT_OFF_HIST is also selected, additional histograms (one + per cpu) are generated that accumulate the duration of time periods + when both interrupts and preemption are disabled. The histogram data + will be located in the debug file system at + + /sys/kernel/debug/latency_hist/preemptoff + +config INTERRUPT_OFF_HIST + bool "Interrupt-off Latency Histogram" + depends on PREEMPTIRQ_EVENTS + help + This option generates continuously updated histograms (one per cpu) + of the duration of time periods with interrupts disabled. The + histograms are disabled by default. To enable them, write a non-zero + number to + + /sys/kernel/debug/latency_hist/enable/preemptirqsoff + + If PREEMPT_OFF_HIST is also selected, additional histograms (one + per cpu) are generated that accumulate the duration of time periods + when both interrupts and preemption are disabled. The histogram data + will be located in the debug file system at + + /sys/kernel/debug/latency_hist/irqsoff + +config MISSED_TIMER_OFFSETS_HIST + depends on HIGH_RES_TIMERS + select GENERIC_TRACER + bool "Missed Timer Offsets Histogram" + help + Generate a histogram of missed timer offsets in microseconds. The + histograms are disabled by default. To enable them, write a non-zero + number to + + /sys/kernel/debug/latency_hist/enable/missed_timer_offsets + + The histogram data will be located in the debug file system at + + /sys/kernel/debug/latency_hist/missed_timer_offsets + + If both Scheduling Latency Histogram and Missed Timer Offsets + Histogram are selected, additional histogram data will be collected + that contain, in addition to the wakeup latency, the timer latency, in + case the wakeup was triggered by an expired timer. These histograms + are available in the + + /sys/kernel/debug/latency_hist/timerandwakeup + + directory. They reflect the apparent interrupt and scheduling latency + and are best suitable to determine the worst-case latency of a given + system. To enable these histograms, write a non-zero number to + + /sys/kernel/debug/latency_hist/enable/timerandwakeup + config SCHED_TRACER bool "Scheduling Latency Tracer" select GENERIC_TRACER @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:527 @ config SCHED_TRACER This tracer tracks the latency of the highest priority task to be scheduled in, starting from the point it has woken up. +config WAKEUP_LATENCY_HIST + bool "Scheduling Latency Histogram" + depends on SCHED_TRACER + help + This option generates continuously updated histograms (one per cpu) + of the scheduling latency of the highest priority task. + The histograms are disabled by default. To enable them, write a + non-zero number to + + /sys/kernel/debug/latency_hist/enable/wakeup + + Two different algorithms are used, one to determine the latency of + processes that exclusively use the highest priority of the system and + another one to determine the latency of processes that share the + highest system priority with other processes. The former is used to + improve hardware and system software, the latter to optimize the + priority design of a given system. The histogram data will be + located in the debug file system at + + /sys/kernel/debug/latency_hist/wakeup + + and + + /sys/kernel/debug/latency_hist/wakeup/sharedprio + + If both Scheduling Latency Histogram and Missed Timer Offsets + Histogram are selected, additional histogram data will be collected + that contain, in addition to the wakeup latency, the timer latency, in + case the wakeup was triggered by an expired timer. These histograms + are available in the + + /sys/kernel/debug/latency_hist/timerandwakeup + + directory. They reflect the apparent interrupt and scheduling latency + and are best suitable to determine the worst-case latency of a given + system. To enable these histograms, write a non-zero number to + + /sys/kernel/debug/latency_hist/enable/timerandwakeup + +config SWITCHTIME_HIST + depends on HIGH_RES_TIMERS + depends on HAVE_SYSCALL_TRACEPOINTS + depends on WAKEUP_LATENCY_HIST + select GENERIC_TRACER + select FTRACE_SYSCALLS + bool "Context Switch Time Histogram" + help + Generate a histogram of the context switch duration per CPU. The + histograms are disabled by default. To enable them, write a non-zero + number to + + /sys/kernel/debug/latency_hist/enable/switchtime + + The histogram data will be located in the debug file system at + + /sys/kernel/debug/latency_hist/switchtime + + If all three Missed Timer Offsets, Scheduling Latency and Context + Switch Time Histogram are selected, additional histogram data will + be collected from the sum of the timer, wakeup and switchtime + latency, again per CPU. These histograms are available in the + + /sys/kernel/debug/latency_hist/timerwakeupswitch + + directory. They reflect the apparent interrupt and scheduling latency + and are best suitable to determine the worst-case latency of a given + system. To enable these histograms, write a non-zero number to + + /sys/kernel/debug/tracing/latency_hist/enable/timerwakeupswitch + config HWLAT_TRACER bool "Tracer to detect hardware latencies (like SMIs)" select GENERIC_TRACER Index: linux-6.17.1-rt5-ras6/kernel/trace/Makefile =================================================================== --- linux-6.17.1-rt5-ras6.orig/kernel/trace/Makefile +++ linux-6.17.1-rt5-ras6/kernel/trace/Makefile @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:66 @ obj-$(CONFIG_IRQSOFF_TRACER) += trace_ir obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o obj-$(CONFIG_HWLAT_TRACER) += trace_hwlat.o +obj-$(CONFIG_INTERRUPT_OFF_HIST) += latency_hist.o +obj-$(CONFIG_PREEMPT_OFF_HIST) += latency_hist.o +obj-$(CONFIG_MISSED_TIMER_OFFSETS_HIST) += latency_hist.o +obj-$(CONFIG_WAKEUP_LATENCY_HIST) += latency_hist.o +obj-$(CONFIG_SWITCHTIME_HIST) += latency_hist.o obj-$(CONFIG_OSNOISE_TRACER) += trace_osnoise.o obj-$(CONFIG_NOP_TRACER) += trace_nop.o obj-$(CONFIG_STACK_TRACER) += trace_stack.o Index: linux-6.17.1-rt5-ras6/kernel/trace/helper_latency_hist.h =================================================================== --- /dev/null +++ linux-6.17.1-rt5-ras6/kernel/trace/helper_latency_hist.h @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:4 @ +#ifndef _LINUX_KERNEL_HELPER_LATENCY_HIST_H +#define _LINUX_KERNEL_HELPER_LATENCY_HIST_H + +/* + * Use regular trace points on architectures that implement noinstr + * tooling: these calls will only happen with RCU enabled, which can + * use a regular tracepoint. + * + * On older architectures, RCU may not be watching in idle. In that + * case, wake up RCU to watch while calling the tracepoint. These + * aren't NMI-safe - so exclude NMI contexts: + */ +#ifdef CONFIG_ARCH_WANTS_NO_INSTR +#define trace(point, args) trace_##point(args) +#else +#define trace(point, args) \ + do { \ + if (trace_##point##_enabled()) { \ + bool exit_rcu = false; \ + if (in_nmi()) \ + break; \ + if (!IS_ENABLED(CONFIG_TINY_RCU) && \ + is_idle_task(current)) { \ + ct_irq_enter(); \ + exit_rcu = true; \ + } \ + trace_##point(args); \ + if (exit_rcu) \ + ct_irq_exit(); \ + } \ + } while (0) +#endif +#endif /* _LINUX_KERNEL_HELPER_LATENCY_HIST_H */ Index: linux-6.17.1-rt5-ras6/kernel/trace/latency_hist.c =================================================================== --- /dev/null +++ linux-6.17.1-rt5-ras6/kernel/trace/latency_hist.c @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:4 @ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Add support for histograms of preemption-off latency and + * interrupt-off latency and wakeup latency, it depends on + * Real-Time Preemption Support. + * + * Copyright (C) 2005 MontaVista Software, Inc. + * Yi Yang <yyang@ch.mvista.com> + * + * Converted to work with the new latency tracer. + * Copyright (C) 2008 Red Hat, Inc. + * Steven Rostedt <srostedt@redhat.com> + * + */ +#include <linux/module.h> +#include <linux/debugfs.h> +#include <linux/seq_file.h> +#include <linux/percpu.h> +#include <linux/kallsyms.h> +#include <linux/uaccess.h> +#include <linux/sched.h> +#include <linux/sched/rt.h> +#include <linux/slab.h> +#include <linux/atomic.h> +#include <trace/syscall.h> +#include <asm/div64.h> +#include <asm/syscall.h> + +#include "trace.h" +#include <trace/events/sched.h> +#include <trace/events/preemptirq.h> + +extern struct tracepoint __tracepoint_sys_exit; + +#define NSECS_PER_USECS 1000L + +#define CREATE_TRACE_POINTS +#include <trace/events/hist.h> + +/* + * FIELD_SIZEOF - get the size of a struct's field + * @t: the target struct + * @f: the target struct's field + * Return: the size of @f in the struct definition without having a + * declared instance of @t. + */ +#define FIELD_SIZEOF(t, f) (sizeof(((t *)0)->f)) + +enum { + IRQSOFF_LATENCY = 0, + PREEMPTOFF_LATENCY, + PREEMPTIRQSOFF_LATENCY, + WAKEUP_LATENCY, + WAKEUP_LATENCY_SHAREDPRIO, + MISSED_TIMER_OFFSETS, + TIMERANDWAKEUP_LATENCY, + SWITCHTIME, + TIMERWAKEUPSWITCH_LATENCY, + MAX_LATENCY_TYPE, +}; + +#define MAX_ENTRY_NUM 10240 + +struct hist_data { + atomic_t hist_mode; /* 0 log, 1 don't log */ + long offset; /* set it to MAX_ENTRY_NUM/2 for a bipolar scale */ + long min_lat; + long max_lat; + unsigned long long below_hist_bound_samples; + unsigned long long above_hist_bound_samples; + long long accumulate_lat; + unsigned long long total_samples; + unsigned long long hist_array[MAX_ENTRY_NUM]; +}; + +struct enable_data { + int latency_type; + int enabled; +}; + +static char *latency_hist_dir_root = "latency_hist"; + +#ifdef CONFIG_INTERRUPT_OFF_HIST +static DEFINE_PER_CPU(struct hist_data, irqsoff_hist); +static char *irqsoff_hist_dir = "irqsoff"; +static DEFINE_PER_CPU(cycles_t, hist_irqsoff_start); +static DEFINE_PER_CPU(int, hist_irqsoff_counting); +#endif + +#ifdef CONFIG_PREEMPT_OFF_HIST +static DEFINE_PER_CPU(struct hist_data, preemptoff_hist); +static char *preemptoff_hist_dir = "preemptoff"; +static DEFINE_PER_CPU(cycles_t, hist_preemptoff_start); +static DEFINE_PER_CPU(int, hist_preemptoff_counting); +#endif + +#if defined(CONFIG_PREEMPT_OFF_HIST) && defined(CONFIG_INTERRUPT_OFF_HIST) +static DEFINE_PER_CPU(struct hist_data, preemptirqsoff_hist); +static char *preemptirqsoff_hist_dir = "preemptirqsoff"; +static DEFINE_PER_CPU(cycles_t, hist_preemptirqsoff_start); +static DEFINE_PER_CPU(int, hist_preemptirqsoff_counting); +#endif + +#if defined(CONFIG_PREEMPT_OFF_HIST) || defined(CONFIG_INTERRUPT_OFF_HIST) +static notrace void probe_preemptirqsoff_hist(void *v, unsigned long ip, + unsigned long parent_ip); +static struct enable_data preemptirqsoff_enabled_data = { + .latency_type = PREEMPTIRQSOFF_LATENCY, + .enabled = 0, +}; +#endif + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) +struct maxlatproc_data { + char comm[FIELD_SIZEOF(struct task_struct, comm)]; + char current_comm[MAXFULLTASKNAME]; + int pid; + int current_pid; + int prio; + int current_prio; + long latency; + unsigned long timer_offset; + long wakeup_latency; + u64 timestamp; + int nr; +}; +#endif + +#ifdef CONFIG_WAKEUP_LATENCY_HIST +static DEFINE_PER_CPU(struct hist_data, wakeup_latency_hist); +static DEFINE_PER_CPU(struct hist_data, wakeup_latency_hist_sharedprio); +static char *wakeup_latency_hist_dir = "wakeup"; +static char *wakeup_latency_hist_dir_sharedprio = "sharedprio"; +static notrace void probe_wakeup_latency_hist_start(void *v, + struct task_struct *p); +static notrace void probe_wakeup_latency_hist_stop(void *v, + bool preempt, struct task_struct *prev, struct task_struct *next, + unsigned int prev_state); +#ifdef CONFIG_SMP +static notrace void probe_sched_migrate_task(void *, + struct task_struct *task, int cpu); +#endif +static struct enable_data wakeup_latency_enabled_data = { + .latency_type = WAKEUP_LATENCY, + .enabled = 0, +}; +static DEFINE_PER_CPU(struct maxlatproc_data, wakeup_maxlatproc); +static DEFINE_PER_CPU(struct maxlatproc_data, wakeup_maxlatproc_sharedprio); +static DEFINE_PER_CPU(struct task_struct *, wakeup_task); +static DEFINE_PER_CPU(int, wakeup_sharedprio); +static DEFINE_PER_CPU(unsigned long, wakeup_pid); +#endif + +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST +static DEFINE_PER_CPU(struct hist_data, missed_timer_offsets); +static char *missed_timer_offsets_dir = "missed_timer_offsets"; +static notrace void probe_hrtimer_interrupt(void *v, int cpu, + long long offset, struct task_struct *curr, struct task_struct *task); +static struct enable_data missed_timer_offsets_enabled_data = { + .latency_type = MISSED_TIMER_OFFSETS, + .enabled = 0, +}; +static DEFINE_PER_CPU(struct maxlatproc_data, missed_timer_offsets_maxlatproc); +static DEFINE_PER_CPU(unsigned long, missed_timer_offsets_pid); +#endif + +#ifdef CONFIG_SWITCHTIME_HIST +static DEFINE_PER_CPU(struct hist_data, switchtime); +static char *switchtime_dir = "switchtime"; +static notrace void probe_syscall_exit(void *v, struct pt_regs *regs, long ret); +static struct enable_data switchtime_enabled_data = { + .latency_type = SWITCHTIME, + .enabled = 0, +}; +static DEFINE_PER_CPU(struct maxlatproc_data, switchtime_maxlatproc); +static DEFINE_PER_CPU(struct task_struct *, switchtime_task); +static DEFINE_PER_CPU(unsigned long, switchtime_pid); +#endif + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) +static DEFINE_PER_CPU(struct hist_data, timerandwakeup_latency_hist); +static char *timerandwakeup_latency_hist_dir = "timerandwakeup"; +static struct enable_data timerandwakeup_enabled_data = { + .latency_type = TIMERANDWAKEUP_LATENCY, + .enabled = 0, +}; +static DEFINE_PER_CPU(struct maxlatproc_data, timerandwakeup_maxlatproc); +#endif + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) && \ + defined(CONFIG_SWITCHTIME_HIST) +static DEFINE_PER_CPU(struct hist_data, timerwakeupswitch_latency_hist); +static char *timerwakeupswitch_latency_hist_dir = "timerwakeupswitch"; +static struct enable_data timerwakeupswitch_enabled_data = { + .latency_type = TIMERWAKEUPSWITCH_LATENCY, + .enabled = 0, +}; +static DEFINE_PER_CPU(struct maxlatproc_data, timerwakeupswitch_maxlatproc); +#endif + +static void fulltaskname(struct task_struct *p, char *tcomm, int size) +{ + if (p->flags & PF_WQ_WORKER) + wq_worker_comm_nolock(tcomm, size, p); + else if (p->flags & PF_KTHREAD) + get_kthread_comm_nolock(tcomm, size, p); + else + strscpy_pad(tcomm, p->comm, size); +} + +static void notrace latency_hist(int latency_type, int cpu, long latency, + unsigned long timer_offset, long wakeup_latency, + u64 stop, struct task_struct *p, int nr) +{ + struct hist_data *my_hist; +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) || \ + defined(CONFIG_SWITCHTIME_HIST) + struct maxlatproc_data *mp = NULL; +#endif + + if (!cpu_possible(cpu) || latency_type < 0 || + latency_type >= MAX_LATENCY_TYPE) + return; + + switch (latency_type) { +#ifdef CONFIG_INTERRUPT_OFF_HIST + case IRQSOFF_LATENCY: + my_hist = &per_cpu(irqsoff_hist, cpu); + break; +#endif +#ifdef CONFIG_PREEMPT_OFF_HIST + case PREEMPTOFF_LATENCY: + my_hist = &per_cpu(preemptoff_hist, cpu); + break; +#endif +#if defined(CONFIG_PREEMPT_OFF_HIST) && defined(CONFIG_INTERRUPT_OFF_HIST) + case PREEMPTIRQSOFF_LATENCY: + my_hist = &per_cpu(preemptirqsoff_hist, cpu); + break; +#endif +#ifdef CONFIG_WAKEUP_LATENCY_HIST + case WAKEUP_LATENCY: + my_hist = &per_cpu(wakeup_latency_hist, cpu); + mp = &per_cpu(wakeup_maxlatproc, cpu); + break; + case WAKEUP_LATENCY_SHAREDPRIO: + my_hist = &per_cpu(wakeup_latency_hist_sharedprio, cpu); + mp = &per_cpu(wakeup_maxlatproc_sharedprio, cpu); + break; +#endif +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + case MISSED_TIMER_OFFSETS: + my_hist = &per_cpu(missed_timer_offsets, cpu); + mp = &per_cpu(missed_timer_offsets_maxlatproc, cpu); + break; +#endif +#ifdef CONFIG_SWITCHTIME_HIST + case SWITCHTIME: + my_hist = &per_cpu(switchtime, cpu); + mp = &per_cpu(switchtime_maxlatproc, cpu); + break; +#endif +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + case TIMERANDWAKEUP_LATENCY: + my_hist = &per_cpu(timerandwakeup_latency_hist, cpu); + mp = &per_cpu(timerandwakeup_maxlatproc, cpu); + break; +#endif +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) && \ + defined(CONFIG_SWITCHTIME_HIST) + case TIMERWAKEUPSWITCH_LATENCY: + my_hist = &per_cpu(timerwakeupswitch_latency_hist, cpu); + mp = &per_cpu(timerwakeupswitch_maxlatproc, cpu); + break; +#endif + + default: + return; + } + + latency += my_hist->offset; + + if (atomic_read(&my_hist->hist_mode) == 0) + return; + + if (latency < 0 || latency >= MAX_ENTRY_NUM) { + if (latency < 0) + my_hist->below_hist_bound_samples++; + else + my_hist->above_hist_bound_samples++; + } else + my_hist->hist_array[latency]++; + + if (unlikely(latency > my_hist->max_lat || + my_hist->min_lat == LONG_MAX)) { +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) || \ + defined(CONFIG_SWITCHTIME_HIST) + if (latency_type == WAKEUP_LATENCY || + latency_type == WAKEUP_LATENCY_SHAREDPRIO || + latency_type == MISSED_TIMER_OFFSETS || + latency_type == TIMERANDWAKEUP_LATENCY || + latency_type == SWITCHTIME || + latency_type == TIMERWAKEUPSWITCH_LATENCY) { + char tcomm[MAXFULLTASKNAME]; + + if (latency_type == SWITCHTIME || + latency_type == TIMERWAKEUPSWITCH_LATENCY) { + fulltaskname(p, tcomm, sizeof(tcomm)); + strncpy(mp->current_comm, tcomm, + sizeof(mp->current_comm)); + strncpy(mp->comm, current->comm, + sizeof(mp->comm)); + mp->current_pid = task_pid_nr(p); + mp->pid = task_pid_nr(current); + mp->current_prio = p->prio; + mp->prio = current->prio; + } else { + strncpy(mp->comm, p->comm, sizeof(mp->comm)); + fulltaskname(current, tcomm, sizeof(tcomm)); + strncpy(mp->current_comm, tcomm, + sizeof(mp->current_comm)); + mp->pid = task_pid_nr(p); + mp->current_pid = task_pid_nr(current); + mp->prio = p->prio; + mp->current_prio = current->prio; + } + mp->latency = latency; + mp->timer_offset = timer_offset; + mp->wakeup_latency = wakeup_latency; + mp->timestamp = stop; + mp->nr = nr; + } +#endif + my_hist->max_lat = latency; + } + if (unlikely(latency < my_hist->min_lat)) + my_hist->min_lat = latency; + my_hist->total_samples++; + my_hist->accumulate_lat += latency; +} + +static void *l_start(struct seq_file *m, loff_t *pos) +{ + loff_t *index_ptr = NULL; + loff_t index = *pos; + struct hist_data *my_hist = m->private; + + if (index == 0) { + char minstr[32], avgstr[32], maxstr[32]; + + atomic_dec(&my_hist->hist_mode); + + if (likely(my_hist->total_samples)) { + long avg = (long) div64_s64(my_hist->accumulate_lat, + my_hist->total_samples); + snprintf(minstr, sizeof(minstr), "%ld", + my_hist->min_lat - my_hist->offset); + snprintf(avgstr, sizeof(avgstr), "%ld", + avg - my_hist->offset); + snprintf(maxstr, sizeof(maxstr), "%ld", + my_hist->max_lat - my_hist->offset); + } else { + strcpy(minstr, "<undef>"); + strcpy(avgstr, minstr); + strcpy(maxstr, minstr); + } + + seq_printf(m, "#Minimum latency: %s microseconds\n" + "#Average latency: %s microseconds\n" + "#Maximum latency: %s microseconds\n" + "#Total samples: %llu\n" + "#There are %llu samples lt %ld microseconds.\n" + "#There are %llu samples ge %ld microseconds.\n" + "#usecs\t%16s\n", + minstr, avgstr, maxstr, + my_hist->total_samples, + my_hist->below_hist_bound_samples, + -my_hist->offset, + my_hist->above_hist_bound_samples, + MAX_ENTRY_NUM - my_hist->offset, + "samples"); + } + if (index < MAX_ENTRY_NUM) { + index_ptr = kmalloc(sizeof(loff_t), GFP_KERNEL); + if (index_ptr) + *index_ptr = index; + } + + return index_ptr; +} + +static void *l_next(struct seq_file *m, void *p, loff_t *pos) +{ + loff_t *index_ptr = p; + struct hist_data *my_hist = m->private; + + if (++*pos >= MAX_ENTRY_NUM) { + atomic_inc(&my_hist->hist_mode); + return NULL; + } + *index_ptr = *pos; + return index_ptr; +} + +static void l_stop(struct seq_file *m, void *p) +{ + kfree(p); +} + +static int l_show(struct seq_file *m, void *p) +{ + int index = *(loff_t *) p; + struct hist_data *my_hist = m->private; + + seq_printf(m, "%6ld\t%16llu\n", index - my_hist->offset, + my_hist->hist_array[index]); + return 0; +} + +static const struct seq_operations latency_hist_seq_op = { + .start = l_start, + .next = l_next, + .stop = l_stop, + .show = l_show +}; + +static int latency_hist_open(struct inode *inode, struct file *file) +{ + int ret; + + ret = seq_open(file, &latency_hist_seq_op); + if (!ret) { + struct seq_file *seq = file->private_data; + + seq->private = inode->i_private; + } + return ret; +} + +static const struct file_operations latency_hist_fops = { + .open = latency_hist_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) +static void clear_maxlatprocdata(struct maxlatproc_data *mp) +{ + mp->comm[0] = mp->current_comm[0] = '\0'; + mp->prio = mp->current_prio = mp->pid = mp->current_pid = + mp->latency = mp->timer_offset = mp->wakeup_latency = -1; + mp->timestamp = 0; + mp->nr = 0; +} +#endif + +static void hist_reset(struct hist_data *hist) +{ + atomic_dec(&hist->hist_mode); + + memset(hist->hist_array, 0, sizeof(hist->hist_array)); + hist->below_hist_bound_samples = 0ULL; + hist->above_hist_bound_samples = 0ULL; + hist->min_lat = LONG_MAX; + hist->max_lat = LONG_MIN; + hist->total_samples = 0ULL; + hist->accumulate_lat = 0LL; + + atomic_inc(&hist->hist_mode); +} + +static ssize_t +latency_hist_reset(struct file *file, const char __user *a, + size_t size, loff_t *off) +{ + int cpu; + struct hist_data *hist = NULL; +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + struct maxlatproc_data *mp = NULL; +#endif + off_t latency_type = (off_t) file->private_data; + + for_each_online_cpu(cpu) { + + switch (latency_type) { +#ifdef CONFIG_PREEMPT_OFF_HIST + case PREEMPTOFF_LATENCY: + hist = &per_cpu(preemptoff_hist, cpu); + break; +#endif +#ifdef CONFIG_INTERRUPT_OFF_HIST + case IRQSOFF_LATENCY: + hist = &per_cpu(irqsoff_hist, cpu); + break; +#endif +#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST) + case PREEMPTIRQSOFF_LATENCY: + hist = &per_cpu(preemptirqsoff_hist, cpu); + break; +#endif +#ifdef CONFIG_WAKEUP_LATENCY_HIST + case WAKEUP_LATENCY: + hist = &per_cpu(wakeup_latency_hist, cpu); + mp = &per_cpu(wakeup_maxlatproc, cpu); + break; + case WAKEUP_LATENCY_SHAREDPRIO: + hist = &per_cpu(wakeup_latency_hist_sharedprio, cpu); + mp = &per_cpu(wakeup_maxlatproc_sharedprio, cpu); + break; +#endif +#ifdef CONFIG_SWITCHTIME_HIST + case SWITCHTIME: + hist = &per_cpu(switchtime, cpu); + mp = &per_cpu(switchtime_maxlatproc, cpu); + break; +#endif +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + case MISSED_TIMER_OFFSETS: + hist = &per_cpu(missed_timer_offsets, cpu); + mp = &per_cpu(missed_timer_offsets_maxlatproc, cpu); + break; +#endif +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + case TIMERANDWAKEUP_LATENCY: + hist = &per_cpu(timerandwakeup_latency_hist, cpu); + mp = &per_cpu(timerandwakeup_maxlatproc, cpu); + break; +#endif +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) && \ + defined(CONFIG_SWITCHTIME_HIST) + case TIMERWAKEUPSWITCH_LATENCY: + hist = &per_cpu(timerwakeupswitch_latency_hist, cpu); + mp = &per_cpu(timerwakeupswitch_maxlatproc, cpu); + break; +#endif + } + + hist_reset(hist); +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) || \ + defined(CONFIG_SWITCHTIME_HIST) + if (latency_type == WAKEUP_LATENCY || + latency_type == WAKEUP_LATENCY_SHAREDPRIO || + latency_type == MISSED_TIMER_OFFSETS || + latency_type == TIMERANDWAKEUP_LATENCY || + latency_type == SWITCHTIME || + latency_type == TIMERWAKEUPSWITCH_LATENCY) + clear_maxlatprocdata(mp); +#endif + } + + return size; +} + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) || \ + defined(CONFIG_SWITCHTIME_HIST) +static ssize_t +show_pid(struct file *file, char __user *ubuf, size_t cnt, loff_t *ppos) +{ + char buf[64]; + int r; + unsigned long *this_pid = file->private_data; + + r = snprintf(buf, sizeof(buf), "%lu\n", *this_pid); + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); +} + +static ssize_t do_pid(struct file *file, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + char buf[64]; + unsigned long pid; + unsigned long *this_pid = file->private_data; + + if (cnt >= sizeof(buf)) + return -EINVAL; + + if (copy_from_user(&buf, ubuf, cnt)) + return -EFAULT; + + buf[cnt] = '\0'; + + if (kstrtoul(buf, 10, &pid)) + return -EINVAL; + + *this_pid = pid; + + return cnt; +} +#endif + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) +static ssize_t +show_maxlatproc(struct file *file, char __user *ubuf, size_t cnt, loff_t *ppos) +{ + int r; + struct maxlatproc_data *mp = file->private_data; + int strmaxlen = (TASK_COMM_LEN * 2) + 32 + (8 * 8); + unsigned long long t; + unsigned long usecs, secs; + char *buf; + + if (mp->pid == -1 || mp->current_pid == -1) { + buf = "(none)\n"; + return simple_read_from_buffer(ubuf, cnt, ppos, buf, + strlen(buf)); + } + + buf = kmalloc(strmaxlen, GFP_KERNEL); + if (buf == NULL) + return -ENOMEM; + + t = ns2usecs(mp->timestamp); + usecs = do_div(t, USEC_PER_SEC); + secs = (unsigned long) t; + r = snprintf(buf, strmaxlen, + "%d %d %ld (%ld,%ld) %s <- %d %d %s %lu.%06lu", mp->pid, + MAX_RT_PRIO-1 - mp->prio, mp->latency, mp->timer_offset, + mp->wakeup_latency, mp->comm, mp->current_pid, + MAX_RT_PRIO-1 - mp->current_prio, mp->current_comm, + secs, usecs); +#ifdef CONFIG_SWITCHTIME_HIST + if (mp->nr >= 0 && mp->nr < NR_syscalls) { + int oldr = r; + char *start = buf + r, *syspos; + + r += snprintf(start, strmaxlen - r, " %ps", + (void *) sys_call_table[mp->nr]); + syspos = strstr(start, "sys_"); + if (syspos != NULL) { + strncpy(start + 1, syspos + 4, strmaxlen - oldr - 1); + r = oldr + strlen(start); + } + } +#endif + r += snprintf(buf + r, strmaxlen - r, "\n"); + + r = simple_read_from_buffer(ubuf, cnt, ppos, buf, r); + kfree(buf); + return r; +} +#endif + +static ssize_t +show_enable(struct file *file, char __user *ubuf, size_t cnt, loff_t *ppos) +{ + char buf[64]; + struct enable_data *ed = file->private_data; + int r; + + r = snprintf(buf, sizeof(buf), "%d\n", ed->enabled); + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); +} + +enum { + PREEMPT_DISABLE = 0, + PREEMPT_ENABLE, + IRQ_DISABLE, + IRQ_ENABLE, +}; +#ifdef CONFIG_PREEMPT_OFF_HIST +static int preempt_disable_action = PREEMPT_DISABLE; +static int preempt_enable_action = PREEMPT_ENABLE; +#endif +#ifdef CONFIG_INTERRUPT_OFF_HIST +static int irq_disable_action = IRQ_DISABLE; +static int irq_enable_action = IRQ_ENABLE; +#endif + +static ssize_t +do_enable(struct file *file, const char __user *ubuf, size_t cnt, loff_t *ppos) +{ + char buf[64]; + long enable; + struct enable_data *ed = file->private_data; + + if (cnt >= sizeof(buf)) + return -EINVAL; + + if (copy_from_user(&buf, ubuf, cnt)) + return -EFAULT; + + buf[cnt] = 0; + + if (kstrtoul(buf, 10, &enable)) + return -EINVAL; + + if ((enable && ed->enabled) || (!enable && !ed->enabled)) + return cnt; + + if (enable) { + int ret; + + switch (ed->latency_type) { +#if defined(CONFIG_PREEMPT_OFF_HIST) || defined(CONFIG_INTERRUPT_OFF_HIST) + case PREEMPTIRQSOFF_LATENCY: +#ifdef CONFIG_PREEMPT_OFF_HIST + ret = register_trace_preempt_disable( + probe_preemptirqsoff_hist, (void *) + &preempt_disable_action); + if (ret) { + pr_info("Couldn't register preempt_disable probe\n"); + return ret; + } + ret = register_trace_preempt_enable( + probe_preemptirqsoff_hist, + (void *) &preempt_enable_action); + if (ret) { + pr_info("Couldn't register preempt_enable probe\n"); + unregister_trace_preempt_disable( + probe_preemptirqsoff_hist, + (void *) &preempt_disable_action); + return ret; + } +#endif +#ifdef CONFIG_INTERRUPT_OFF_HIST + ret = register_trace_irq_disable( + probe_preemptirqsoff_hist, + (void *) &irq_disable_action); + if (ret) { + pr_info("Couldn't register irq_disable probe\n"); + return ret; + } + ret = register_trace_irq_enable( + probe_preemptirqsoff_hist, + (void *) &irq_enable_action); + if (ret) { + pr_info("Couldn't register irq_enable probe\n"); + unregister_trace_irq_disable( + probe_preemptirqsoff_hist, + (void *) &irq_disable_action); + return ret; + } +#endif + break; +#endif +#ifdef CONFIG_WAKEUP_LATENCY_HIST + case WAKEUP_LATENCY: + ret = register_trace_sched_wakeup( + probe_wakeup_latency_hist_start, NULL); + if (ret) { + pr_info("Couldn't register sched_wakeup probe\n"); + return ret; + } + ret = register_trace_sched_wakeup_new( + probe_wakeup_latency_hist_start, NULL); + if (ret) { + pr_info("Couldn't register sched_wakeup_new probe\n"); + unregister_trace_sched_wakeup( + probe_wakeup_latency_hist_start, NULL); + return ret; + } + ret = register_trace_sched_switch( + probe_wakeup_latency_hist_stop, NULL); + if (ret) { + pr_info("Couldn't register sched_switch probe\n"); + unregister_trace_sched_wakeup( + probe_wakeup_latency_hist_start, NULL); + unregister_trace_sched_wakeup_new( + probe_wakeup_latency_hist_start, NULL); + return ret; + } +#ifdef CONFIG_SMP + ret = register_trace_sched_migrate_task( + probe_sched_migrate_task, NULL); + if (ret) { + pr_info("Couldn't register sched_migrate_task probe\n"); + unregister_trace_sched_wakeup( + probe_wakeup_latency_hist_start, NULL); + unregister_trace_sched_wakeup_new( + probe_wakeup_latency_hist_start, NULL); + unregister_trace_sched_switch( + probe_wakeup_latency_hist_stop, NULL); + return ret; + } +#endif + break; +#endif +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + case MISSED_TIMER_OFFSETS: + ret = register_trace_hrtimer_interrupt( + probe_hrtimer_interrupt, NULL); + if (ret) { + pr_info("Couldn't register hrtimer IRQ probe\n"); + return ret; + } + break; +#endif +#ifdef CONFIG_SWITCHTIME_HIST + case SWITCHTIME: + if (!wakeup_latency_enabled_data.enabled) + return -EINVAL; + ret = tracepoint_probe_register(&__tracepoint_sys_exit, + probe_syscall_exit, NULL); + if (ret) { + pr_info("Couldn't register sys_exit probe\n"); + return ret; + } + break; +#endif +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + case TIMERANDWAKEUP_LATENCY: + if (!wakeup_latency_enabled_data.enabled || + !missed_timer_offsets_enabled_data.enabled) + return -EINVAL; + break; +#endif +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) && \ + defined(CONFIG_SWITCHTIME_HIST) + case TIMERWAKEUPSWITCH_LATENCY: + if (!wakeup_latency_enabled_data.enabled || + !missed_timer_offsets_enabled_data.enabled || + !switchtime_enabled_data.enabled) + return -EINVAL; + break; +#endif + default: + break; + } + } else { + switch (ed->latency_type) { +#if defined(CONFIG_INTERRUPT_OFF_HIST) || defined(CONFIG_PREEMPT_OFF_HIST) + case PREEMPTIRQSOFF_LATENCY: + { + int cpu; + +#ifdef CONFIG_PREEMPT_OFF_HIST + unregister_trace_preempt_disable( + probe_preemptirqsoff_hist, + (void *) &preempt_disable_action); + unregister_trace_preempt_enable( + probe_preemptirqsoff_hist, + (void *) &preempt_enable_action); +#endif +#ifdef CONFIG_INTERRUPT_OFF_HIST + unregister_trace_irq_disable( + probe_preemptirqsoff_hist, + (void *) &irq_disable_action); + unregister_trace_irq_enable( + probe_preemptirqsoff_hist, + (void *) &irq_enable_action); +#endif + for_each_online_cpu(cpu) { +#ifdef CONFIG_INTERRUPT_OFF_HIST + per_cpu(hist_irqsoff_counting, + cpu) = 0; +#endif +#ifdef CONFIG_PREEMPT_OFF_HIST + per_cpu(hist_preemptoff_counting, + cpu) = 0; +#endif +#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST) + per_cpu(hist_preemptirqsoff_counting, + cpu) = 0; +#endif + } + } + break; +#endif +#ifdef CONFIG_WAKEUP_LATENCY_HIST + case WAKEUP_LATENCY: + { + int cpu; + + unregister_trace_sched_wakeup( + probe_wakeup_latency_hist_start, NULL); + unregister_trace_sched_wakeup_new( + probe_wakeup_latency_hist_start, NULL); + unregister_trace_sched_switch( + probe_wakeup_latency_hist_stop, NULL); +#ifdef CONFIG_SMP + unregister_trace_sched_migrate_task( + probe_sched_migrate_task, NULL); +#endif + + for_each_online_cpu(cpu) { + per_cpu(wakeup_task, cpu) = NULL; + per_cpu(wakeup_sharedprio, cpu) = 0; + } + } +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + timerandwakeup_enabled_data.enabled = 0; +#endif +#ifdef CONFIG_SWITCHTIME_HIST + switchtime_enabled_data.enabled = 0; +#endif +#ifdef CONFIG_TIMERWAKEUPSWITCH_LATENCY_HIST + timerwakeupswitch_enabled_data.enabled = 0; +#endif + break; +#endif +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + case MISSED_TIMER_OFFSETS: + unregister_trace_hrtimer_interrupt( + probe_hrtimer_interrupt, NULL); +#ifdef CONFIG_WAKEUP_LATENCY_HIST + timerandwakeup_enabled_data.enabled = 0; +#endif +#ifdef CONFIG_TIMERWAKEUPSWITCH_LATENCY_HIST + timerwakeupswitch_enabled_data.enabled = 0; +#endif + break; +#endif +#ifdef CONFIG_SWITCHTIME_HIST + case SWITCHTIME: + { + int cpu; + + tracepoint_probe_unregister( + &__tracepoint_sys_exit, + probe_syscall_exit, NULL); + + for_each_online_cpu(cpu) + per_cpu(switchtime_task, cpu) = NULL; + } +#ifdef CONFIG_TIMERWAKEUPSWITCH_LATENCY_HIST + timerwakeupswitch_enabled_data.enabled = 0; +#endif + break; +#endif + default: + break; + } + } + ed->enabled = enable; + return cnt; +} + +static const struct file_operations latency_hist_reset_fops = { + .open = tracing_open_generic, + .write = latency_hist_reset, +}; + +static const struct file_operations enable_fops = { + .open = tracing_open_generic, + .read = show_enable, + .write = do_enable, +}; + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) +static const struct file_operations pid_fops = { + .open = tracing_open_generic, + .read = show_pid, + .write = do_pid, +}; + +static const struct file_operations maxlatproc_fops = { + .open = tracing_open_generic, + .read = show_maxlatproc, +}; +#endif + +#if defined(CONFIG_INTERRUPT_OFF_HIST) || defined(CONFIG_PREEMPT_OFF_HIST) +static notrace void probe_preemptirqsoff_hist(void *v, unsigned long ip, + unsigned long parent_ip) +{ + int cpu = raw_smp_processor_id(); + int time_set = 0; + int action = *((int *) v); + + if (action == PREEMPT_DISABLE || action == IRQ_DISABLE) { + u64 start = 0ULL; + + if (!preempt_count() && !irqs_disabled()) + return; + +#ifdef CONFIG_INTERRUPT_OFF_HIST + if ((action == IRQ_DISABLE) && + !per_cpu(hist_irqsoff_counting, cpu)) { + per_cpu(hist_irqsoff_counting, cpu) = 1; + start = ftrace_now(cpu); + time_set++; + per_cpu(hist_irqsoff_start, cpu) = start; + } +#endif + +#ifdef CONFIG_PREEMPT_OFF_HIST + if ((action == PREEMPT_DISABLE) && + !per_cpu(hist_preemptoff_counting, cpu)) { + per_cpu(hist_preemptoff_counting, cpu) = 1; + if (!(time_set++)) + start = ftrace_now(cpu); + per_cpu(hist_preemptoff_start, cpu) = start; + } +#endif + +#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST) + if (per_cpu(hist_irqsoff_counting, cpu) && + per_cpu(hist_preemptoff_counting, cpu) && + !per_cpu(hist_preemptirqsoff_counting, cpu)) { + per_cpu(hist_preemptirqsoff_counting, cpu) = 1; + if (!time_set) + start = ftrace_now(cpu); + per_cpu(hist_preemptirqsoff_start, cpu) = start; + } +#endif + } else { + u64 stop = 0ULL; + +#ifdef CONFIG_INTERRUPT_OFF_HIST + if ((action == IRQ_ENABLE) && + per_cpu(hist_irqsoff_counting, cpu)) { + u64 start = per_cpu(hist_irqsoff_start, cpu); + + stop = ftrace_now(cpu); + time_set++; + if (start) { + long latency = ((long) (stop - start)) / + NSECS_PER_USECS; + + latency_hist(IRQSOFF_LATENCY, cpu, latency, 0, + 0, stop, NULL, -1); + } + per_cpu(hist_irqsoff_counting, cpu) = 0; + } +#endif + +#ifdef CONFIG_PREEMPT_OFF_HIST + if ((action == PREEMPT_ENABLE) && + per_cpu(hist_preemptoff_counting, cpu)) { + u64 start = per_cpu(hist_preemptoff_start, cpu); + + if (!(time_set++)) + stop = ftrace_now(cpu); + if (start) { + long latency = ((long) (stop - start)) / + NSECS_PER_USECS; + + latency_hist(PREEMPTOFF_LATENCY, cpu, latency, + 0, 0, stop, NULL, -1); + } + per_cpu(hist_preemptoff_counting, cpu) = 0; + } +#endif + +#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST) + if ((!per_cpu(hist_irqsoff_counting, cpu) || + !per_cpu(hist_preemptoff_counting, cpu)) && + per_cpu(hist_preemptirqsoff_counting, cpu)) { + u64 start = per_cpu(hist_preemptirqsoff_start, cpu); + + if (!time_set) + stop = ftrace_now(cpu); + if (start) { + long latency = ((long) (stop - start)) / + NSECS_PER_USECS; + + latency_hist(PREEMPTIRQSOFF_LATENCY, cpu, + latency, 0, 0, stop, NULL, -1); + } + per_cpu(hist_preemptirqsoff_counting, cpu) = 0; + } +#endif + } +} +#endif + +#ifdef CONFIG_WAKEUP_LATENCY_HIST +static DEFINE_RAW_SPINLOCK(wakeup_lock); +#ifdef CONFIG_SWITCHTIME_HIST +static DEFINE_RAW_SPINLOCK(switchtime_lock); +#endif +#ifdef CONFIG_SMP +static notrace void probe_sched_migrate_task(void *v, struct task_struct *task, + int cpu) +{ + int old_cpu = task_cpu(task); + + if (cpu != old_cpu) { + unsigned long flags; + struct task_struct *cpu_wakeup_task; + + raw_spin_lock_irqsave(&wakeup_lock, flags); + cpu_wakeup_task = per_cpu(wakeup_task, old_cpu); + if (task == cpu_wakeup_task) { + put_task_struct(cpu_wakeup_task); + per_cpu(wakeup_task, old_cpu) = NULL; + cpu_wakeup_task = per_cpu(wakeup_task, cpu) = task; + get_task_struct(cpu_wakeup_task); + } + raw_spin_unlock_irqrestore(&wakeup_lock, flags); + +#ifdef CONFIG_SWITCHTIME_HIST + { + struct task_struct *cpu_switchtime_task; + + raw_spin_lock_irqsave(&switchtime_lock, flags); + cpu_switchtime_task = per_cpu(switchtime_task, old_cpu); + if (task == cpu_switchtime_task) { + put_task_struct(cpu_switchtime_task); + per_cpu(switchtime_task, old_cpu) = NULL; + cpu_switchtime_task = + per_cpu(switchtime_task, cpu) = task; + get_task_struct(cpu_switchtime_task); + } + raw_spin_unlock_irqrestore(&switchtime_lock, flags); + } +#endif + } +} +#endif /* CONFIG_SMP */ + +static notrace void probe_wakeup_latency_hist_start(void *v, + struct task_struct *p) +{ + unsigned long flags; + struct task_struct *curr = current; + int cpu = task_cpu(p); + struct task_struct *cpu_wakeup_task; + unsigned long cpu_wakeup_pid; + +#ifdef CONFIG_SWITCHTIME_HIST + struct task_struct *cpu_switchtime_task; + unsigned long cpu_switchtime_pid = per_cpu(switchtime_pid, cpu); + + if (!cpu_switchtime_pid) { + raw_spin_lock_irqsave(&switchtime_lock, flags); + cpu_switchtime_task = per_cpu(switchtime_task, cpu); + if (cpu_switchtime_task && + p->prio < cpu_switchtime_task->prio) { + cpu_switchtime_task->switchtime_timestamp_hist = 0; + put_task_struct(cpu_switchtime_task); + per_cpu(switchtime_task, cpu) = NULL; + } + raw_spin_unlock_irqrestore(&switchtime_lock, flags); + } +#endif + + raw_spin_lock_irqsave(&wakeup_lock, flags); + cpu_wakeup_task = per_cpu(wakeup_task, cpu); + cpu_wakeup_pid = per_cpu(wakeup_pid, cpu); + + if (cpu_wakeup_pid) { + if ((cpu_wakeup_task && p->prio == cpu_wakeup_task->prio) || + p->prio == curr->prio) + per_cpu(wakeup_sharedprio, cpu) = 1; + if (likely(cpu_wakeup_pid != task_pid_nr(p))) + goto out; + } else { + if (likely(!rt_task(p)) || + (cpu_wakeup_task && p->prio > cpu_wakeup_task->prio) || + p->prio > curr->prio) + goto out; + if ((cpu_wakeup_task && p->prio == cpu_wakeup_task->prio) || + p->prio == curr->prio) + per_cpu(wakeup_sharedprio, cpu) = 1; + } + + if (cpu_wakeup_task) + put_task_struct(cpu_wakeup_task); + cpu_wakeup_task = per_cpu(wakeup_task, cpu) = p; + get_task_struct(cpu_wakeup_task); + cpu_wakeup_task->preempt_timestamp_hist = + ftrace_now(raw_smp_processor_id()); +out: + raw_spin_unlock_irqrestore(&wakeup_lock, flags); +} + +static notrace void probe_wakeup_latency_hist_stop(void *v, + bool preempt, struct task_struct *prev, struct task_struct *next, + unsigned int prev_state) +{ + unsigned long flags; + int cpu = task_cpu(next); + long latency; + u64 stop; + struct task_struct *cpu_wakeup_task; + int hit = 0; + unsigned long cpu_switchtime_pid; + unsigned long cpu_wakeup_pid = per_cpu(wakeup_pid, cpu); + + raw_spin_lock_irqsave(&wakeup_lock, flags); + + cpu_wakeup_task = per_cpu(wakeup_task, cpu); + + if (cpu_wakeup_task == NULL) + goto out; + + /* Already running? */ + if (unlikely(current == cpu_wakeup_task)) + goto out_reset; + + if (next != cpu_wakeup_task) { + if (next->prio < cpu_wakeup_task->prio) + goto out_reset; + + if (next->prio == cpu_wakeup_task->prio && + task_cpu(next) == task_cpu(cpu_wakeup_task)) + per_cpu(wakeup_sharedprio, cpu) = 1; + + goto out; + } + + if (current->prio == cpu_wakeup_task->prio && + task_cpu(current) == task_cpu(cpu_wakeup_task)) + per_cpu(wakeup_sharedprio, cpu) = 1; + + /* + * The task we are waiting for is about to be switched to. + * Calculate latency and store it in histogram. + */ + stop = ftrace_now(raw_smp_processor_id()); + + latency = ((long) (stop - next->preempt_timestamp_hist)) / + NSECS_PER_USECS; + + if (per_cpu(wakeup_sharedprio, cpu)) { + latency_hist(WAKEUP_LATENCY_SHAREDPRIO, cpu, latency, 0, 0, + stop, next, -1); + per_cpu(wakeup_sharedprio, cpu) = 0; + } + if (!per_cpu(wakeup_sharedprio, cpu) || + (cpu_wakeup_pid && task_pid_nr(next) == cpu_wakeup_pid)) { + latency_hist(WAKEUP_LATENCY, cpu, latency, 0, 0, stop, next, + -1); +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + if (timerandwakeup_enabled_data.enabled) { + latency_hist(TIMERANDWAKEUP_LATENCY, cpu, + next->timer_offset + latency, next->timer_offset, 0, + stop, next, -1); + } + hit = 1; +#endif + } + +out_reset: + put_task_struct(cpu_wakeup_task); + per_cpu(wakeup_task, cpu) = NULL; +out: + raw_spin_unlock_irqrestore(&wakeup_lock, flags); + cpu_switchtime_pid = per_cpu(switchtime_pid, cpu); + +#ifdef CONFIG_SWITCHTIME_HIST + if (hit && switchtime_enabled_data.enabled && + (!cpu_switchtime_pid || + task_pid_nr(next) == cpu_switchtime_pid)) { + unsigned long flags; + struct task_struct *cpu_switchtime_task; + + raw_spin_lock_irqsave(&switchtime_lock, flags); + + cpu_switchtime_task = per_cpu(switchtime_task, cpu); + if (cpu_switchtime_task != NULL) + put_task_struct(cpu_switchtime_task); + cpu_switchtime_task = next; + per_cpu(switchtime_task, cpu) = cpu_switchtime_task; + get_task_struct(cpu_switchtime_task); + + raw_spin_unlock_irqrestore(&switchtime_lock, flags); + + next->switchtime_timestamp_hist = stop; + next->wakeup_latency = latency; + next->switchtime_prev = current; + } +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + else + next->timer_offset = 0; +#endif + +#endif +} +#endif + +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST +static notrace void probe_hrtimer_interrupt(void *v, int cpu, + long long latency_ns, struct task_struct *curr, + struct task_struct *task) +{ +#ifdef CONFIG_SWITCHTIME_HIST + unsigned long cpu_switchtime_pid; +#endif + + if (latency_ns <= 0 && task != NULL && rt_task(task) && + (task->prio < curr->prio || + (task->prio == curr->prio && + !cpumask_test_cpu(cpu, task->cpus_ptr)))) { + int nextcpu = -1; + long latency; + u64 now; + unsigned long cpu_missed_timer_offsets_pid = + per_cpu(missed_timer_offsets_pid, cpu); + + if (cpu_missed_timer_offsets_pid) { + if (likely(cpu_missed_timer_offsets_pid != + task_pid_nr(task))) + return; + } + + if (cpumask_weight(task->cpus_ptr) == 1) + nextcpu = cpumask_first(task->cpus_ptr); + +#ifdef CONFIG_SWITCHTIME_HIST + cpu_switchtime_pid = per_cpu(switchtime_pid, cpu); + if (!cpu_switchtime_pid && nextcpu != -1) { + unsigned long flags; + struct task_struct *cpu_switchtime_task; + + raw_spin_lock_irqsave(&switchtime_lock, flags); + cpu_switchtime_task = + per_cpu(switchtime_task, nextcpu); + if (cpu_switchtime_task && + task->prio < cpu_switchtime_task->prio) { + cpu_switchtime_task->switchtime_timestamp_hist = + 0; + put_task_struct(cpu_switchtime_task); + per_cpu(switchtime_task, nextcpu) = NULL; + } + raw_spin_unlock_irqrestore(&switchtime_lock, flags); + } +#endif + + now = ftrace_now(cpu); + if (nextcpu == -1) + nextcpu = cpu; + latency = (long) div_s64(-latency_ns, NSECS_PER_USECS); + latency_hist(MISSED_TIMER_OFFSETS, nextcpu, latency, 0, 0, now, + task, -1); +#ifdef CONFIG_WAKEUP_LATENCY_HIST + task->timer_offset = latency; +#endif + } +} +#endif + +#ifdef CONFIG_SWITCHTIME_HIST +static notrace void probe_syscall_exit(void *v, struct pt_regs *regs, long ret) +{ + unsigned long flags; + int cpu = task_cpu(current); + struct task_struct *cpu_switchtime_task; + unsigned long cpu_switchtime_pid; + + raw_spin_lock_irqsave(&switchtime_lock, flags); + + cpu_switchtime_task = per_cpu(switchtime_task, cpu); + if (cpu_switchtime_task == NULL) + goto out; + + cpu_switchtime_pid = per_cpu(switchtime_pid, cpu); + if (current == cpu_switchtime_task) { + unsigned long latency; + u64 stop; + int syscall_nr; + + if (cpu_switchtime_pid) { + if (likely(cpu_switchtime_pid != + task_pid_nr(current))) + goto out; + } + + /* + * The task we are waiting for exited from system call. + * Calculate latency since start of context switch and store it + * in histogram. + */ + stop = ftrace_now(raw_smp_processor_id()); + + latency = ((long) (stop - current->switchtime_timestamp_hist)) / + NSECS_PER_USECS; + + syscall_nr = syscall_get_nr(current, regs); + + current->switchtime_timestamp_hist = 0; + + if ((!cpu_switchtime_pid && + task_pid_nr(current) == cpu_switchtime_pid) || + current->prio < current->switchtime_prev->prio) { + latency_hist(SWITCHTIME, cpu, latency, 0, 0, stop, + current->switchtime_prev, syscall_nr); +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + if (timerwakeupswitch_enabled_data.enabled) + latency_hist(TIMERWAKEUPSWITCH_LATENCY, cpu, + current->timer_offset + + current->wakeup_latency + latency, + current->timer_offset, + current->wakeup_latency, stop, + current->switchtime_prev, syscall_nr); +#endif + } + current->timer_offset = current->wakeup_latency = 0; + } + put_task_struct(cpu_switchtime_task); + per_cpu(switchtime_task, cpu) = NULL; + +out: + raw_spin_unlock_irqrestore(&switchtime_lock, flags); +} +#endif + +static __init int latency_hist_init(void) +{ + struct dentry *latency_hist_root = NULL; + struct dentry *dentry; +#ifdef CONFIG_WAKEUP_LATENCY_HIST + struct dentry *dentry_sharedprio; +#endif + struct dentry *entry; + struct dentry *enable_root; + int i = 0; + struct hist_data *my_hist; + char name[64]; + char *cpufmt = "CPU%d"; + char *cpufmt_pid = "pid-CPU%d"; +#if defined(CONFIG_WAKEUP_LATENCY_HIST) || \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + char *cpufmt_maxlatproc = "max_latency-CPU%d"; + struct maxlatproc_data *mp = NULL; +#endif + + latency_hist_root = debugfs_create_dir(latency_hist_dir_root, NULL); + enable_root = debugfs_create_dir("enable", latency_hist_root); + +#ifdef CONFIG_INTERRUPT_OFF_HIST + dentry = debugfs_create_dir(irqsoff_hist_dir, latency_hist_root); + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(irqsoff_hist, i), &latency_hist_fops); + my_hist = &per_cpu(irqsoff_hist, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)IRQSOFF_LATENCY, &latency_hist_reset_fops); +#endif + +#ifdef CONFIG_PREEMPT_OFF_HIST + dentry = debugfs_create_dir(preemptoff_hist_dir, + latency_hist_root); + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(preemptoff_hist, i), &latency_hist_fops); + my_hist = &per_cpu(preemptoff_hist, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)PREEMPTOFF_LATENCY, &latency_hist_reset_fops); +#endif + +#if defined(CONFIG_INTERRUPT_OFF_HIST) && defined(CONFIG_PREEMPT_OFF_HIST) + dentry = debugfs_create_dir(preemptirqsoff_hist_dir, + latency_hist_root); + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(preemptirqsoff_hist, i), &latency_hist_fops); + my_hist = &per_cpu(preemptirqsoff_hist, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)PREEMPTIRQSOFF_LATENCY, &latency_hist_reset_fops); +#endif + +#if defined(CONFIG_INTERRUPT_OFF_HIST) || defined(CONFIG_PREEMPT_OFF_HIST) + entry = debugfs_create_file("preemptirqsoff", 0644, + enable_root, (void *)&preemptirqsoff_enabled_data, + &enable_fops); +#endif + +#ifdef CONFIG_WAKEUP_LATENCY_HIST + dentry = debugfs_create_dir(wakeup_latency_hist_dir, + latency_hist_root); + dentry_sharedprio = debugfs_create_dir( + wakeup_latency_hist_dir_sharedprio, dentry); + + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(wakeup_latency_hist, i), + &latency_hist_fops); + my_hist = &per_cpu(wakeup_latency_hist, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + + entry = debugfs_create_file(name, 0444, dentry_sharedprio, + &per_cpu(wakeup_latency_hist_sharedprio, i), + &latency_hist_fops); + my_hist = &per_cpu(wakeup_latency_hist_sharedprio, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + + sprintf(name, cpufmt_maxlatproc, i); + + mp = &per_cpu(wakeup_maxlatproc, i); + entry = debugfs_create_file(name, 0444, dentry, mp, + &maxlatproc_fops); + clear_maxlatprocdata(mp); + + mp = &per_cpu(wakeup_maxlatproc_sharedprio, i); + entry = debugfs_create_file(name, 0444, dentry_sharedprio, mp, + &maxlatproc_fops); + clear_maxlatprocdata(mp); + + sprintf(name, cpufmt_pid, i); + entry = debugfs_create_file(name, 0644, dentry, + &per_cpu(wakeup_pid, i), &pid_fops); + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)WAKEUP_LATENCY, &latency_hist_reset_fops); + entry = debugfs_create_file("reset", 0644, dentry_sharedprio, + (void *)WAKEUP_LATENCY_SHAREDPRIO, &latency_hist_reset_fops); + entry = debugfs_create_file("wakeup", 0644, + enable_root, (void *)&wakeup_latency_enabled_data, + &enable_fops); +#endif + +#ifdef CONFIG_MISSED_TIMER_OFFSETS_HIST + dentry = debugfs_create_dir(missed_timer_offsets_dir, + latency_hist_root); + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(missed_timer_offsets, i), &latency_hist_fops); + my_hist = &per_cpu(missed_timer_offsets, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + + sprintf(name, cpufmt_maxlatproc, i); + mp = &per_cpu(missed_timer_offsets_maxlatproc, i); + entry = debugfs_create_file(name, 0444, dentry, mp, + &maxlatproc_fops); + clear_maxlatprocdata(mp); + + sprintf(name, cpufmt_pid, i); + entry = debugfs_create_file(name, 0644, dentry, + &per_cpu(missed_timer_offsets_pid, i), &pid_fops); + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)MISSED_TIMER_OFFSETS, &latency_hist_reset_fops); + entry = debugfs_create_file("missed_timer_offsets", 0644, + enable_root, (void *)&missed_timer_offsets_enabled_data, + &enable_fops); +#endif + +#ifdef CONFIG_SWITCHTIME_HIST + dentry = debugfs_create_dir(switchtime_dir, + latency_hist_root); + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(switchtime, i), &latency_hist_fops); + my_hist = &per_cpu(switchtime, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + + sprintf(name, cpufmt_maxlatproc, i); + mp = &per_cpu(switchtime_maxlatproc, i); + entry = debugfs_create_file(name, 0444, dentry, mp, + &maxlatproc_fops); + clear_maxlatprocdata(mp); + + sprintf(name, cpufmt_pid, i); + entry = debugfs_create_file(name, 0644, dentry, + &per_cpu(switchtime_pid, i), &pid_fops); + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)SWITCHTIME, &latency_hist_reset_fops); + entry = debugfs_create_file("switchtime", 0644, + enable_root, (void *)&switchtime_enabled_data, + &enable_fops); +#endif + +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) + dentry = debugfs_create_dir(timerandwakeup_latency_hist_dir, + latency_hist_root); + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(timerandwakeup_latency_hist, i), + &latency_hist_fops); + my_hist = &per_cpu(timerandwakeup_latency_hist, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + + sprintf(name, cpufmt_maxlatproc, i); + mp = &per_cpu(timerandwakeup_maxlatproc, i); + entry = debugfs_create_file(name, 0444, dentry, mp, + &maxlatproc_fops); + clear_maxlatprocdata(mp); + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)TIMERANDWAKEUP_LATENCY, &latency_hist_reset_fops); + entry = debugfs_create_file("timerandwakeup", 0644, + enable_root, (void *)&timerandwakeup_enabled_data, + &enable_fops); +#endif +#if defined(CONFIG_WAKEUP_LATENCY_HIST) && \ + defined(CONFIG_MISSED_TIMER_OFFSETS_HIST) && \ + defined(CONFIG_SWITCHTIME_HIST) + dentry = debugfs_create_dir(timerwakeupswitch_latency_hist_dir, + latency_hist_root); + for_each_possible_cpu(i) { + sprintf(name, cpufmt, i); + entry = debugfs_create_file(name, 0444, dentry, + &per_cpu(timerwakeupswitch_latency_hist, i), + &latency_hist_fops); + my_hist = &per_cpu(timerwakeupswitch_latency_hist, i); + atomic_set(&my_hist->hist_mode, 1); + my_hist->min_lat = LONG_MAX; + + sprintf(name, cpufmt_maxlatproc, i); + mp = &per_cpu(timerwakeupswitch_maxlatproc, i); + entry = debugfs_create_file(name, 0444, dentry, mp, + &maxlatproc_fops); + clear_maxlatprocdata(mp); + } + entry = debugfs_create_file("reset", 0644, dentry, + (void *)TIMERWAKEUPSWITCH_LATENCY, &latency_hist_reset_fops); + entry = debugfs_create_file("timerwakeupswitch", 0644, + enable_root, (void *)&timerwakeupswitch_enabled_data, + &enable_fops); +#endif + return 0; +} + +device_initcall(latency_hist_init); Index: linux-6.17.1-rt5-ras6/kernel/trace/trace_irqsoff.c =================================================================== --- linux-6.17.1-rt5-ras6.orig/kernel/trace/trace_irqsoff.c +++ linux-6.17.1-rt5-ras6/kernel/trace/trace_irqsoff.c @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:18 @ #include <linux/module.h> #include <linux/ftrace.h> #include <linux/kprobes.h> +#include <trace/events/hist.h> #include "trace.h" +#include "helper_latency_hist.h" #include <trace/events/preemptirq.h> @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:474 @ void start_critical_timings(void) { if (preempt_trace(preempt_count()) || irq_trace()) start_critical_timing(CALLER_ADDR0, CALLER_ADDR1); + trace(preempt_disable, TP_ARGS(TRACE_START, 1)); } EXPORT_SYMBOL_GPL(start_critical_timings); NOKPROBE_SYMBOL(start_critical_timings); void stop_critical_timings(void) { + trace(preempt_disable, TP_ARGS(TRACE_STOP, 0)); if (preempt_trace(preempt_count()) || irq_trace()) stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); } @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:645 @ static void irqsoff_tracer_stop(struct t */ void tracer_hardirqs_on(unsigned long a0, unsigned long a1) { + trace(preempt_disable, TP_ARGS(IRQS_ON, 0)); if (!preempt_trace(preempt_count()) && irq_trace()) stop_critical_timing(a0, a1); } @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:655 @ void tracer_hardirqs_off(unsigned long a { if (!preempt_trace(preempt_count()) && irq_trace()) start_critical_timing(a0, a1); + trace(preempt_disable, TP_ARGS(IRQS_OFF, 1)); } NOKPROBE_SYMBOL(tracer_hardirqs_off); Index: linux-6.17.1-rt5-ras6/kernel/trace/trace_preemptirq.c =================================================================== --- linux-6.17.1-rt5-ras6.orig/kernel/trace/trace_preemptirq.c +++ linux-6.17.1-rt5-ras6/kernel/trace/trace_preemptirq.c @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:15 @ #include <linux/kprobes.h> #include <linux/hardirq.h> #include "trace.h" +#include "helper_latency_hist.h" #define CREATE_TRACE_POINTS #include <trace/events/preemptirq.h> -/* - * Use regular trace points on architectures that implement noinstr - * tooling: these calls will only happen with RCU enabled, which can - * use a regular tracepoint. - * - * On older architectures, RCU may not be watching in idle. In that - * case, wake up RCU to watch while calling the tracepoint. These - * aren't NMI-safe - so exclude NMI contexts: - */ -#ifdef CONFIG_ARCH_WANTS_NO_INSTR -#define trace(point, args) trace_##point(args) -#else -#define trace(point, args) \ - do { \ - if (trace_##point##_enabled()) { \ - bool exit_rcu = false; \ - if (in_nmi()) \ - break; \ - if (!IS_ENABLED(CONFIG_TINY_RCU) && \ - is_idle_task(current)) { \ - ct_irq_enter(); \ - exit_rcu = true; \ - } \ - trace_##point(args); \ - if (exit_rcu) \ - ct_irq_exit(); \ - } \ - } while (0) -#endif - #ifdef CONFIG_TRACE_IRQFLAGS /* Per-cpu variable to prevent redundant calls when IRQs already off */ static DEFINE_PER_CPU(int, tracing_irq_cpu); Index: linux-6.17.1-rt5-ras6/kernel/workqueue.c =================================================================== --- linux-6.17.1-rt5-ras6.orig/kernel/workqueue.c +++ linux-6.17.1-rt5-ras6/kernel/workqueue.c @ linux-6.17.1-rt5-ras6/Documentation/trace/latency-histograms.txt:6524 @ void wq_worker_comm(char *buf, size_t si mutex_unlock(&wq_pool_attach_mutex); } +/* used to get worker information for latency histograms */ +void wq_worker_comm_nolock(char *buf, size_t size, struct task_struct *task) +{ + int off; + + /* always show the actual comm */ + off = strscpy(buf, task->comm, size); + if (off < 0) + return; + + if (task->flags & PF_WQ_WORKER) { + struct worker *worker = kthread_data(task); + struct worker_pool *pool = worker->pool; + + if (pool) { + /* + * ->desc tracks information (wq name or + * set_worker_desc()) for the latest execution. If + * current, prepend '+', otherwise '-'. + */ + if (worker->desc[0] != '\0') { + if (worker->current_work) + scnprintf(buf + off, size - off, "+%s", + worker->desc); + else + scnprintf(buf + off, size - off, "-%s", + worker->desc); + } + } + } +} + #ifdef CONFIG_SMP /*