diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
index 57300db4b5ff..31c99382994e 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
@@ -56,8 +56,8 @@ sections.
RCU-preempt Expedited Grace Periods
-CONFIG_PREEMPT=y kernels implement RCU-preempt.
-The overall flow of the handling of a given CPU by an RCU-preempt
+CONFIG_PREEMPT=y and CONFIG_PREEMPT_RT=y kernels implement
+RCU-preempt. The overall flow of the handling of a given CPU by an RCU-preempt
expedited grace period is shown in the following diagram:
@@ -140,8 +140,8 @@ or offline, among other things.
RCU-sched Expedited Grace Periods
-CONFIG_PREEMPT=n kernels implement RCU-sched.
-The overall flow of the handling of a given CPU by an RCU-sched
+CONFIG_PREEMPT=n and CONFIG_PREEMPT_RT=n kernels implement
+RCU-sched. The overall flow of the handling of a given CPU by an RCU-sched
expedited grace period is shown in the following diagram:
diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
index 467251f7fef6..348c5db1ff2b 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.html
+++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -106,7 +106,7 @@ big RCU read-side critical section.
Production-quality implementations of rcu_read_lock() and
rcu_read_unlock() are extremely lightweight, and in
fact have exactly zero overhead in Linux kernels built for production
-use with CONFIG_PREEMPT=n .
+use with CONFIG_PREEMPTION=n .
This guarantee allows ordering to be enforced with extremely low
@@ -1499,7 +1499,7 @@ costs have plummeted.
However, as I learned from Matt Mackall's
bloatwatch
efforts, memory footprint is critically important on single-CPU systems with
-non-preemptible (CONFIG_PREEMPT=n ) kernels, and thus
+non-preemptible (CONFIG_PREEMPTION=n ) kernels, and thus
tiny RCU
was born.
Josh Triplett has since taken over the small-memory banner with his
@@ -1887,7 +1887,7 @@ constructs, there are limitations.
Implementations of RCU for which rcu_read_lock()
and rcu_read_unlock() generate no code, such as
-Linux-kernel RCU when CONFIG_PREEMPT=n , can be
+Linux-kernel RCU when CONFIG_PREEMPTION=n , can be
nested arbitrarily deeply.
After all, there is no overhead.
Except that if all these instances of rcu_read_lock()
@@ -2229,7 +2229,7 @@ be a no-op.
However, once the scheduler has spawned its first kthread, this early
boot trick fails for synchronize_rcu() (as well as for
-synchronize_rcu_expedited() ) in CONFIG_PREEMPT=y
+synchronize_rcu_expedited() ) in CONFIG_PREEMPTION=y
kernels.
The reason is that an RCU read-side critical section might be preempted,
which means that a subsequent synchronize_rcu() really does have
@@ -2568,7 +2568,7 @@ the following:
If the compiler did make this transformation in a
-CONFIG_PREEMPT=n kernel build, and if get_user() did
+CONFIG_PREEMPTION=n kernel build, and if get_user() did
page fault, the result would be a quiescent state in the middle
of an RCU read-side critical section.
This misplaced quiescent state could result in line 4 being
@@ -2906,7 +2906,7 @@ in conjunction with the
The real-time-latency response requirements are such that the
traditional approach of disabling preemption across RCU
read-side critical sections is inappropriate.
-Kernels built with CONFIG_PREEMPT=y therefore
+Kernels built with CONFIG_PREEMPTION=y therefore
use an RCU implementation that allows RCU read-side critical
sections to be preempted.
This requirement made its presence known after users made it
@@ -3064,7 +3064,7 @@ includes
rcu_barrier_bh() , and
rcu_read_lock_bh_held() .
However, the update-side APIs are now simple wrappers for other RCU
-flavors, namely RCU-sched in CONFIG_PREEMPT=n kernels and RCU-preempt
+flavors, namely RCU-sched in CONFIG_PREEMPTION=n kernels and RCU-preempt
otherwise.
@@ -3088,12 +3088,12 @@ of an RCU read-side critical section can be a quiescent state.
Therefore, RCU-sched was created, which follows “classic”
RCU in that an RCU-sched grace period waits for for pre-existing
interrupt and NMI handlers.
-In kernels built with CONFIG_PREEMPT=n , the RCU and RCU-sched
+In kernels built with CONFIG_PREEMPTION=n , the RCU and RCU-sched
APIs have identical implementations, while kernels built with
-CONFIG_PREEMPT=y provide a separate implementation for each.
+CONFIG_PREEMPTION=y provide a separate implementation for each.
-Note well that in CONFIG_PREEMPT=y kernels,
+Note well that in CONFIG_PREEMPTION=y kernels,
rcu_read_lock_sched() and rcu_read_unlock_sched()
disable and re-enable preemption, respectively.
This means that if there was a preemption attempt during the
@@ -3302,12 +3302,12 @@ The tasks-RCU API is quite compact, consisting only of
call_rcu_tasks() ,
synchronize_rcu_tasks() , and
rcu_barrier_tasks() .
-In CONFIG_PREEMPT=n kernels, trampolines cannot be preempted,
+In CONFIG_PREEMPTION=n kernels, trampolines cannot be preempted,
so these APIs map to
call_rcu() ,
synchronize_rcu() , and
rcu_barrier() , respectively.
-In CONFIG_PREEMPT=y kernels, trampolines can be preempted,
+In CONFIG_PREEMPTION=y kernels, trampolines can be preempted,
and these three APIs are therefore implemented by separate functions
that check for voluntary context switches.
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index e98ff261a438..087dc6c22c37 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -210,8 +210,8 @@ over a rather long period of time, but improvements are always welcome!
the rest of the system.
7. As of v4.20, a given kernel implements only one RCU flavor,
- which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y.
- If the updater uses call_rcu() or synchronize_rcu(),
+ which is RCU-sched for PREEMPTION=n and RCU-preempt for
+ PREEMPTION=y. If the updater uses call_rcu() or synchronize_rcu(),
then the corresponding readers my use rcu_read_lock() and
rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
or any pair of primitives that disables and re-enables preemption,
diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.txt
index a2782df69732..5aa93c215af4 100644
--- a/Documentation/RCU/rcubarrier.txt
+++ b/Documentation/RCU/rcubarrier.txt
@@ -6,8 +6,8 @@ RCU (read-copy update) is a synchronization mechanism that can be thought
of as a replacement for read-writer locking (among other things), but with
very low-overhead readers that are immune to deadlock, priority inversion,
and unbounded latency. RCU read-side critical sections are delimited
-by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPT
-kernels, generate no code whatsoever.
+by rcu_read_lock() and rcu_read_unlock(), which, in
+non-CONFIG_PREEMPTION kernels, generate no code whatsoever.
This means that RCU writers are unaware of the presence of concurrent
readers, so that RCU updates to shared data must be undertaken quite
@@ -303,10 +303,10 @@ Answer: This cannot happen. The reason is that on_each_cpu() has its last
to smp_call_function() and further to smp_call_function_on_cpu(),
causing this latter to spin until the cross-CPU invocation of
rcu_barrier_func() has completed. This by itself would prevent
- a grace period from completing on non-CONFIG_PREEMPT kernels,
+ a grace period from completing on non-CONFIG_PREEMPTION kernels,
since each CPU must undergo a context switch (or other quiescent
state) before the grace period can complete. However, this is
- of no use in CONFIG_PREEMPT kernels.
+ of no use in CONFIG_PREEMPTION kernels.
Therefore, on_each_cpu() disables preemption across its call
to smp_call_function() and also across the local call to
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index f48f4621ccbc..bd510771b75e 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -20,7 +20,7 @@ o A CPU looping with preemption disabled.
o A CPU looping with bottom halves disabled.
-o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
+o For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is
really expected and desirable behavior, you might need to add
some calls to cond_resched().
@@ -39,7 +39,7 @@ o Anything that prevents RCU's grace-period kthreads from running.
result in the "rcu_.*kthread starved for" console-log message,
which will include additional debugging information.
-o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
+o A CPU-bound real-time task in a CONFIG_PREEMPTION kernel, which might
happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU,
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 7e1a8721637a..7e03e8f80b29 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -648,9 +648,10 @@ Quick Quiz #1: Why is this argument naive? How could a deadlock
This section presents a "toy" RCU implementation that is based on
"classic RCU". It is also short on performance (but only for updates) and
-on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
-kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
-are the same as those shown in the preceding section, so they are omitted.
+on features such as hotplug CPU and the ability to run in
+CONFIG_PREEMPTION kernels. The definitions of rcu_dereference() and
+rcu_assign_pointer() are the same as those shown in the preceding
+section, so they are omitted.
void rcu_read_lock(void) { }
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009ca..0329a4d3fa9e 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
This should be used on systems where stalls for minor page faults are an
acceptable trade for large contiguous free memory. Set to 0 to prevent
compaction from moving pages that are unevictable. Default value is 1.
+On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due
+to compaction, which would block the task from becomming active until the fault
+is resolved.
dirty_background_bytes
diff --git a/Documentation/printk-ringbuffer.txt b/Documentation/printk-ringbuffer.txt
new file mode 100644
index 000000000000..6bde5dbd8545
--- /dev/null
+++ b/Documentation/printk-ringbuffer.txt
@@ -0,0 +1,377 @@
+struct printk_ringbuffer
+------------------------
+John Ogness
+
+Overview
+~~~~~~~~
+As the name suggests, this ring buffer was implemented specifically to serve
+the needs of the printk() infrastructure. The ring buffer itself is not
+specific to printk and could be used for other purposes. _However_, the
+requirements and semantics of printk are rather unique. If you intend to use
+this ring buffer for anything other than printk, you need to be very clear on
+its features, behavior, and pitfalls.
+
+Features
+^^^^^^^^
+The printk ring buffer has the following features:
+
+- single global buffer
+- resides in initialized data section (available at early boot)
+- lockless readers
+- supports multiple writers
+- supports multiple non-consuming readers
+- safe from any context (including NMI)
+- groups bytes into variable length blocks (referenced by entries)
+- entries tagged with sequence numbers
+
+Behavior
+^^^^^^^^
+Since the printk ring buffer readers are lockless, there exists no
+synchronization between readers and writers. Basically writers are the tasks
+in control and may overwrite any and all committed data at any time and from
+any context. For this reason readers can miss entries if they are overwritten
+before the reader was able to access the data. The reader API implementation
+is such that reader access to entries is atomic, so there is no risk of
+readers having to deal with partial or corrupt data. Also, entries are
+tagged with sequence numbers so readers can recognize if entries were missed.
+
+Writing to the ring buffer consists of 2 steps. First a writer must reserve
+an entry of desired size. After this step the writer has exclusive access
+to the memory region. Once the data has been written to memory, it needs to
+be committed to the ring buffer. After this step the entry has been inserted
+into the ring buffer and assigned an appropriate sequence number.
+
+Once committed, a writer must no longer access the data directly. This is
+because the data may have been overwritten and no longer exists. If a
+writer must access the data, it should either keep a private copy before
+committing the entry or use the reader API to gain access to the data.
+
+Because of how the data backend is implemented, entries that have been
+reserved but not yet committed act as barriers, preventing future writers
+from filling the ring buffer beyond the location of the reserved but not
+yet committed entry region. For this reason it is *important* that writers
+perform both reserve and commit as quickly as possible. Also, be aware that
+preemption and local interrupts are disabled and writing to the ring buffer
+is processor-reentrant locked during the reserve/commit window. Writers in
+NMI contexts can still preempt any other writers, but as long as these
+writers do not write a large amount of data with respect to the ring buffer
+size, this should not become an issue.
+
+API
+~~~
+
+Declaration
+^^^^^^^^^^^
+The printk ring buffer can be instantiated as a static structure:
+
+ /* declare a static struct printk_ringbuffer */
+ #define DECLARE_STATIC_PRINTKRB(name, szbits, cpulockptr)
+
+The value of szbits specifies the size of the ring buffer in bits. The
+cpulockptr field is a pointer to a prb_cpulock struct that is used to
+perform processor-reentrant spin locking for the writers. It is specified
+externally because it may be used for multiple ring buffers (or other
+code) to synchronize writers without risk of deadlock.
+
+Here is an example of a declaration of a printk ring buffer specifying a
+32KB (2^15) ring buffer:
+
+....
+DECLARE_STATIC_PRINTKRB_CPULOCK(rb_cpulock);
+DECLARE_STATIC_PRINTKRB(rb, 15, &rb_cpulock);
+....
+
+If writers will be using multiple ring buffers and the ordering of that usage
+is not clear, the same prb_cpulock should be used for both ring buffers.
+
+Writer API
+^^^^^^^^^^
+The writer API consists of 2 functions. The first is to reserve an entry in
+the ring buffer, the second is to commit that data to the ring buffer. The
+reserved entry information is stored within a provided `struct prb_handle`.
+
+ /* reserve an entry */
+ char *prb_reserve(struct prb_handle *h, struct printk_ringbuffer *rb,
+ unsigned int size);
+
+ /* commit a reserved entry to the ring buffer */
+ void prb_commit(struct prb_handle *h);
+
+Here is an example of a function to write data to a ring buffer:
+
+....
+int write_data(struct printk_ringbuffer *rb, char *data, int size)
+{
+ struct prb_handle h;
+ char *buf;
+
+ buf = prb_reserve(&h, rb, size);
+ if (!buf)
+ return -1;
+ memcpy(buf, data, size);
+ prb_commit(&h);
+
+ return 0;
+}
+....
+
+Pitfalls
+++++++++
+Be aware that prb_reserve() can fail. A retry might be successful, but it
+depends entirely on whether or not the next part of the ring buffer to
+overwrite belongs to reserved but not yet committed entries of other writers.
+Writers can use the prb_inc_lost() function to allow readers to notice that a
+message was lost.
+
+Reader API
+^^^^^^^^^^
+The reader API utilizes a `struct prb_iterator` to track the reader's
+position in the ring buffer.
+
+ /* declare a pre-initialized static iterator for a ring buffer */
+ #define DECLARE_STATIC_PRINTKRB_ITER(name, rbaddr)
+
+ /* initialize iterator for a ring buffer (if static macro NOT used) */
+ void prb_iter_init(struct prb_iterator *iter,
+ struct printk_ringbuffer *rb, u64 *seq);
+
+ /* make a deep copy of an iterator */
+ void prb_iter_copy(struct prb_iterator *dest,
+ struct prb_iterator *src);
+
+ /* non-blocking, advance to next entry (and read the data) */
+ int prb_iter_next(struct prb_iterator *iter, char *buf,
+ int size, u64 *seq);
+
+ /* blocking, advance to next entry (and read the data) */
+ int prb_iter_wait_next(struct prb_iterator *iter, char *buf,
+ int size, u64 *seq);
+
+ /* position iterator at the entry seq */
+ int prb_iter_seek(struct prb_iterator *iter, u64 seq);
+
+ /* read data at current position */
+ int prb_iter_data(struct prb_iterator *iter, char *buf,
+ int size, u64 *seq);
+
+Typically prb_iter_data() is not needed because the data can be retrieved
+directly with prb_iter_next().
+
+Here is an example of a non-blocking function that will read all the data in
+a ring buffer:
+
+....
+void read_all_data(struct printk_ringbuffer *rb, char *buf, int size)
+{
+ struct prb_iterator iter;
+ u64 prev_seq = 0;
+ u64 seq;
+ int ret;
+
+ prb_iter_init(&iter, rb, NULL);
+
+ for (;;) {
+ ret = prb_iter_next(&iter, buf, size, &seq);
+ if (ret > 0) {
+ if (seq != ++prev_seq) {
+ /* "seq - prev_seq" entries missed */
+ prev_seq = seq;
+ }
+ /* process buf here */
+ } else if (ret == 0) {
+ /* hit the end, done */
+ break;
+ } else if (ret < 0) {
+ /*
+ * iterator is invalid, a writer overtook us, reset the
+ * iterator and keep going, entries were missed
+ */
+ prb_iter_init(&iter, rb, NULL);
+ }
+ }
+}
+....
+
+Pitfalls
+++++++++
+The reader's iterator can become invalid at any time because the reader was
+overtaken by a writer. Typically the reader should reset the iterator back
+to the current oldest entry (which will be newer than the entry the reader
+was at) and continue, noting the number of entries that were missed.
+
+Utility API
+^^^^^^^^^^^
+Several functions are available as convenience for external code.
+
+ /* query the size of the data buffer */
+ int prb_buffer_size(struct printk_ringbuffer *rb);
+
+ /* skip a seq number to signify a lost record */
+ void prb_inc_lost(struct printk_ringbuffer *rb);
+
+ /* processor-reentrant spin lock */
+ void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store);
+
+ /* processor-reentrant spin unlock */
+ void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store);
+
+Pitfalls
+++++++++
+Although the value returned by prb_buffer_size() does represent an absolute
+upper bound, the amount of data that can be stored within the ring buffer
+is actually less because of the additional storage space of a header for each
+entry.
+
+The prb_lock() and prb_unlock() functions can be used to synchronize between
+ring buffer writers and other external activities. The function of a
+processor-reentrant spin lock is to disable preemption and local interrupts
+and synchronize against other processors. It does *not* protect against
+multiple contexts of a single processor, i.e NMI.
+
+Implementation
+~~~~~~~~~~~~~~
+This section describes several of the implementation concepts and details to
+help developers better understand the code.
+
+Entries
+^^^^^^^
+All ring buffer data is stored within a single static byte array. The reason
+for this is to ensure that any pointers to the data (past and present) will
+always point to valid memory. This is important because the lockless readers
+may be preempted for long periods of time and when they resume may be working
+with expired pointers.
+
+Entries are identified by start index and size. (The start index plus size
+is the start index of the next entry.) The start index is not simply an
+offset into the byte array, but rather a logical position (lpos) that maps
+directly to byte array offsets.
+
+For example, for a byte array of 1000, an entry may have have a start index
+of 100. Another entry may have a start index of 1100. And yet another 2100.
+All of these entry are pointing to the same memory region, but only the most
+recent entry is valid. The other entries are pointing to valid memory, but
+represent entries that have been overwritten.
+
+Note that due to overflowing, the most recent entry is not necessarily the one
+with the highest lpos value. Indeed, the printk ring buffer initializes its
+data such that an overflow happens relatively quickly in order to validate the
+handling of this situation. The implementation assumes that an lpos (unsigned
+long) will never completely wrap while a reader is preempted. If this were to
+become an issue, the seq number (which never wraps) could be used to increase
+the robustness of handling this situation.
+
+Buffer Wrapping
+^^^^^^^^^^^^^^^
+If an entry starts near the end of the byte array but would extend beyond it,
+a special terminating entry (size = -1) is inserted into the byte array and
+the real entry is placed at the beginning of the byte array. This can waste
+space at the end of the byte array, but simplifies the implementation by
+allowing writers to always work with contiguous buffers.
+
+Note that the size field is the first 4 bytes of the entry header. Also note
+that calc_next() always ensures that there are at least 4 bytes left at the
+end of the byte array to allow room for a terminating entry.
+
+Ring Buffer Pointers
+^^^^^^^^^^^^^^^^^^^^
+Three pointers (lpos values) are used to manage the ring buffer:
+
+ - _tail_: points to the oldest entry
+ - _head_: points to where the next new committed entry will be
+ - _reserve_: points to where the next new reserved entry will be
+
+These pointers always maintain a logical ordering:
+
+ tail <= head <= reserve
+
+The reserve pointer moves forward when a writer reserves a new entry. The
+head pointer moves forward when a writer commits a new entry.
+
+The reserve pointer cannot overwrite the tail pointer in a wrap situation. In
+such a situation, the tail pointer must be "pushed forward", thus
+invalidating that oldest entry. Readers identify if they are accessing a
+valid entry by ensuring their entry pointer is `>= tail && < head`.
+
+If the tail pointer is equal to the head pointer, it cannot be pushed and any
+reserve operation will fail. The only resolution is for writers to commit
+their reserved entries.
+
+Processor-Reentrant Locking
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The purpose of the processor-reentrant locking is to limit the interruption
+scenarios of writers to 2 contexts. This allows for a simplified
+implementation where:
+
+- The reserve/commit window only exists on 1 processor at a time. A reserve
+ can never fail due to uncommitted entries of other processors.
+
+- When committing entries, it is trivial to handle the situation when
+ subsequent entries have already been committed, i.e. managing the head
+ pointer.
+
+Performance
+~~~~~~~~~~~
+Some basic tests were performed on a quad Intel(R) Xeon(R) CPU E5-2697 v4 at
+2.30GHz (36 cores / 72 threads). All tests involved writing a total of
+32,000,000 records at an average of 33 bytes each. Each writer was pinned to
+its own CPU and would write as fast as it could until a total of 32,000,000
+records were written. All tests involved 2 readers that were both pinned
+together to another CPU. Each reader would read as fast as it could and track
+how many of the 32,000,000 records it could read. All tests used a ring buffer
+of 16KB in size, which holds around 350 records (header + data for each
+entry).
+
+The only difference between the tests is the number of writers (and thus also
+the number of records per writer). As more writers are added, the time to
+write a record increases. This is because data pointers, modified via cmpxchg,
+and global data access in general become more contended.
+
+1 writer
+^^^^^^^^
+ runtime: 0m 18s
+ reader1: 16219900/32000000 (50%) records
+ reader2: 16141582/32000000 (50%) records
+
+2 writers
+^^^^^^^^^
+ runtime: 0m 32s
+ reader1: 16327957/32000000 (51%) records
+ reader2: 16313988/32000000 (50%) records
+
+4 writers
+^^^^^^^^^
+ runtime: 0m 42s
+ reader1: 16421642/32000000 (51%) records
+ reader2: 16417224/32000000 (51%) records
+
+8 writers
+^^^^^^^^^
+ runtime: 0m 43s
+ reader1: 16418300/32000000 (51%) records
+ reader2: 16432222/32000000 (51%) records
+
+16 writers
+^^^^^^^^^^
+ runtime: 0m 54s
+ reader1: 16539189/32000000 (51%) records
+ reader2: 16542711/32000000 (51%) records
+
+32 writers
+^^^^^^^^^^
+ runtime: 1m 13s
+ reader1: 16731808/32000000 (52%) records
+ reader2: 16735119/32000000 (52%) records
+
+Comments
+^^^^^^^^
+It is particularly interesting to compare/contrast the 1-writer and 32-writer
+tests. Despite the writing of the 32,000,000 records taking over 4 times
+longer, the readers (which perform no cmpxchg) were still unable to keep up.
+This shows that the memory contention between the increasing number of CPUs
+also has a dramatic effect on readers.
+
+It should also be noted that in all cases each reader was able to read >=50%
+of the records. This means that a single reader would have been able to keep
+up with the writer(s) in all cases, becoming slightly easier as more writers
+are added. This was the purpose of pinning 2 readers to 1 CPU: to observe how
+maximum reader performance changes.
diff --git a/Documentation/trace/ftrace-uses.rst b/Documentation/trace/ftrace-uses.rst
index 1fbc69894eed..1e0020b0bc74 100644
--- a/Documentation/trace/ftrace-uses.rst
+++ b/Documentation/trace/ftrace-uses.rst
@@ -146,7 +146,7 @@ FTRACE_OPS_FL_RECURSION_SAFE
itself or any nested functions that those functions call.
If this flag is set, it is possible that the callback will also
- be called with preemption enabled (when CONFIG_PREEMPT is set),
+ be called with preemption enabled (when CONFIG_PREEMPTION is set),
but this is not guaranteed.
FTRACE_OPS_FL_IPMODIFY
diff --git a/arch/Kconfig b/arch/Kconfig
index 238dccfa7691..a886cbe86efc 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -31,6 +31,7 @@ config OPROFILE
tristate "OProfile system profiling"
depends on PROFILING
depends on HAVE_OPROFILE
+ depends on !PREEMPT_RT
select RING_BUFFER
select RING_BUFFER_ALLOW_SWAP
help
diff --git a/arch/alpha/include/asm/spinlock_types.h b/arch/alpha/include/asm/spinlock_types.h
index 1d5716bc060b..6883bc952d22 100644
--- a/arch/alpha/include/asm/spinlock_types.h
+++ b/arch/alpha/include/asm/spinlock_types.h
@@ -2,10 +2,6 @@
#ifndef _ALPHA_SPINLOCK_TYPES_H
#define _ALPHA_SPINLOCK_TYPES_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
typedef struct {
volatile unsigned int lock;
} arch_spinlock_t;
diff --git a/arch/arc/kernel/entry.S b/arch/arc/kernel/entry.S
index ea74a1eee5d9..29a45c2f0b17 100644
--- a/arch/arc/kernel/entry.S
+++ b/arch/arc/kernel/entry.S
@@ -331,11 +331,11 @@ resume_user_mode_begin:
resume_kernel_mode:
; Disable Interrupts from this point on
- ; CONFIG_PREEMPT: This is a must for preempt_schedule_irq()
- ; !CONFIG_PREEMPT: To ensure restore_regs is intr safe
+ ; CONFIG_PREEMPTION: This is a must for preempt_schedule_irq()
+ ; !CONFIG_PREEMPTION: To ensure restore_regs is intr safe
IRQ_DISABLE r9
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
; Can't preempt if preemption disabled
GET_CURR_THR_INFO_FROM_SP r10
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 05c9bbfe444d..d3f02889760b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -32,6 +32,7 @@ config ARM
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_RT
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
@@ -64,7 +65,7 @@ config ARM
select HARDIRQS_SW_RESEND
select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
- select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
+ select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU && !PREEMPT_RT
select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
@@ -103,6 +104,7 @@ config ARM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+ select HAVE_PREEMPT_LAZY
select HAVE_RCU_TABLE_FREE if SMP && ARM_LPAE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RSEQ
diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 46d41140df27..c421b5b81946 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -23,6 +23,8 @@
#endif
#ifndef __ASSEMBLY__
+#include
+
struct irqaction;
struct pt_regs;
diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h
index 5976958647fe..a37c0803954b 100644
--- a/arch/arm/include/asm/spinlock_types.h
+++ b/arch/arm/include/asm/spinlock_types.h
@@ -2,10 +2,6 @@
#ifndef __ASM_SPINLOCK_TYPES_H
#define __ASM_SPINLOCK_TYPES_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
#define TICKET_SHIFT 16
typedef struct {
diff --git a/arch/arm/include/asm/switch_to.h b/arch/arm/include/asm/switch_to.h
index d3e937dcee4d..285e6248454f 100644
--- a/arch/arm/include/asm/switch_to.h
+++ b/arch/arm/include/asm/switch_to.h
@@ -4,13 +4,20 @@
#include
+#if defined CONFIG_PREEMPT_RT && defined CONFIG_HIGHMEM
+void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p);
+#else
+static inline void
+switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) { }
+#endif
+
/*
* For v7 SMP cores running a preemptible kernel we may be pre-empted
* during a TLB maintenance operation, so execute an inner-shareable dsb
* to ensure that the maintenance completes in case we migrate to another
* CPU.
*/
-#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) && defined(CONFIG_CPU_V7)
+#if defined(CONFIG_PREEMPTION) && defined(CONFIG_SMP) && defined(CONFIG_CPU_V7)
#define __complete_pending_tlbi() dsb(ish)
#else
#define __complete_pending_tlbi()
@@ -26,6 +33,7 @@ extern struct task_struct *__switch_to(struct task_struct *, struct thread_info
#define switch_to(prev,next,last) \
do { \
__complete_pending_tlbi(); \
+ switch_kmaps(prev, next); \
last = __switch_to(prev,task_thread_info(prev), task_thread_info(next)); \
} while (0)
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 0d0d5178e2c3..13be9c2137e2 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -46,6 +46,7 @@ struct cpu_context_save {
struct thread_info {
unsigned long flags; /* low level flags */
int preempt_count; /* 0 => preemptable, <0 => bug */
+ int preempt_lazy_count; /* 0 => preemptable, <0 => bug */
mm_segment_t addr_limit; /* address limit */
struct task_struct *task; /* main task structure */
__u32 cpu; /* cpu */
@@ -139,7 +140,8 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
#define TIF_SYSCALL_TRACE 4 /* syscall trace active */
#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */
#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */
-#define TIF_SECCOMP 7 /* seccomp syscall filtering active */
+#define TIF_SECCOMP 8 /* seccomp syscall filtering active */
+#define TIF_NEED_RESCHED_LAZY 7
#define TIF_NOHZ 12 /* in adaptive nohz mode */
#define TIF_USING_IWMMXT 17
@@ -149,6 +151,7 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
+#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY)
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -164,7 +167,8 @@ extern int vfp_restore_user_hwstate(struct user_vfp *,
* Change these and you break ASM code in entry-common.S
*/
#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
- _TIF_NOTIFY_RESUME | _TIF_UPROBE)
+ _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
+ _TIF_NEED_RESCHED_LAZY)
#endif /* __KERNEL__ */
#endif /* __ASM_ARM_THREAD_INFO_H */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index c773b829ee8e..f3a0e1cd1f04 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -53,6 +53,7 @@ int main(void)
BLANK();
DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
+ DEFINE(TI_PREEMPT_LAZY, offsetof(struct thread_info, preempt_lazy_count));
DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit));
DEFINE(TI_TASK, offsetof(struct thread_info, task));
DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index a874b753397e..1e689a727cb9 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -204,13 +204,20 @@ __irq_svc:
svc_entry
irq_handler
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
ldr r8, [tsk, #TI_PREEMPT] @ get preempt count
- ldr r0, [tsk, #TI_FLAGS] @ get flags
teq r8, #0 @ if preempt count != 0
+ bne 1f @ return from exeption
+ ldr r0, [tsk, #TI_FLAGS] @ get flags
+ tst r0, #_TIF_NEED_RESCHED @ if NEED_RESCHED is set
+ blne svc_preempt @ preempt!
+
+ ldr r8, [tsk, #TI_PREEMPT_LAZY] @ get preempt lazy count
+ teq r8, #0 @ if preempt lazy count != 0
movne r0, #0 @ force flags to 0
- tst r0, #_TIF_NEED_RESCHED
+ tst r0, #_TIF_NEED_RESCHED_LAZY
blne svc_preempt
+1:
#endif
svc_exit r5, irq = 1 @ return from exception
@@ -219,14 +226,20 @@ ENDPROC(__irq_svc)
.ltorg
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
svc_preempt:
mov r8, lr
1: bl preempt_schedule_irq @ irq en/disable is done inside
ldr r0, [tsk, #TI_FLAGS] @ get new tasks TI_FLAGS
tst r0, #_TIF_NEED_RESCHED
+ bne 1b
+ tst r0, #_TIF_NEED_RESCHED_LAZY
reteq r8 @ go again
- b 1b
+ ldr r0, [tsk, #TI_PREEMPT_LAZY] @ get preempt lazy count
+ teq r0, #0 @ if preempt lazy count != 0
+ beq 1b
+ ret r8 @ go again
+
#endif
__und_fault:
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 271cb8a1eba1..fd039b1b3731 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -53,7 +53,9 @@ __ret_fast_syscall:
cmp r2, #TASK_SIZE
blne addr_limit_check_failed
ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
- tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
+ tst r1, #((_TIF_SYSCALL_WORK | _TIF_WORK_MASK) & ~_TIF_SECCOMP)
+ bne fast_work_pending
+ tst r1, #_TIF_SECCOMP
bne fast_work_pending
@@ -90,8 +92,11 @@ __ret_fast_syscall:
cmp r2, #TASK_SIZE
blne addr_limit_check_failed
ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
- tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
+ tst r1, #((_TIF_SYSCALL_WORK | _TIF_WORK_MASK) & ~_TIF_SECCOMP)
+ bne do_slower_path
+ tst r1, #_TIF_SECCOMP
beq no_work_pending
+do_slower_path:
UNWIND(.fnend )
ENDPROC(ret_fast_syscall)
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index ab2568996ddb..50d43ce823bf 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -649,7 +649,8 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
*/
trace_hardirqs_off();
do {
- if (likely(thread_flags & _TIF_NEED_RESCHED)) {
+ if (likely(thread_flags & (_TIF_NEED_RESCHED |
+ _TIF_NEED_RESCHED_LAZY))) {
schedule();
} else {
if (unlikely(!user_mode(regs)))
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 46e1be9e57a8..581caf6493a9 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -682,11 +682,9 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
break;
case IPI_CPU_BACKTRACE:
- printk_nmi_enter();
irq_enter();
nmi_cpu_backtrace(regs);
irq_exit();
- printk_nmi_exit();
break;
default:
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 97a512551b21..1e70e7227f0f 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -250,6 +250,8 @@ void show_stack(struct task_struct *tsk, unsigned long *sp)
#ifdef CONFIG_PREEMPT
#define S_PREEMPT " PREEMPT"
+#elif defined(CONFIG_PREEMPT_RT)
+#define S_PREEMPT " PREEMPT_RT"
#else
#define S_PREEMPT ""
#endif
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 0ee8fc4b4672..dc8f152f3556 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -135,13 +135,13 @@ flush_levels:
and r1, r1, #7 @ mask of the bits for current cache only
cmp r1, #2 @ see what cache we have at this level
blt skip @ skip if no cache, or just i-cache
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
save_and_disable_irqs_notrace r9 @ make cssr&csidr read atomic
#endif
mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr
isb @ isb to sych the new cssr&csidr
mrc p15, 1, r1, c0, c0, 0 @ read the new csidr
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
restore_irqs_notrace r9
#endif
and r2, r1, #7 @ extract the length of the cache lines
diff --git a/arch/arm/mm/cache-v7m.S b/arch/arm/mm/cache-v7m.S
index a0035c426ce6..1bc3a0a50753 100644
--- a/arch/arm/mm/cache-v7m.S
+++ b/arch/arm/mm/cache-v7m.S
@@ -183,13 +183,13 @@ flush_levels:
and r1, r1, #7 @ mask of the bits for current cache only
cmp r1, #2 @ see what cache we have at this level
blt skip @ skip if no cache, or just i-cache
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
save_and_disable_irqs_notrace r9 @ make cssr&csidr read atomic
#endif
write_csselr r10, r1 @ set current cache level
isb @ isb to sych the new cssr&csidr
read_ccsidr r1 @ read the new csidr
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
restore_irqs_notrace r9
#endif
and r2, r1, #7 @ extract the length of the cache lines
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index bd0f4821f7e1..865fd8fbfe59 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -414,6 +414,9 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
if (addr < TASK_SIZE)
return do_page_fault(addr, fsr, regs);
+ if (interrupts_enabled(regs))
+ local_irq_enable();
+
if (user_mode(regs))
goto bad_area;
@@ -481,6 +484,9 @@ do_translation_fault(unsigned long addr, unsigned int fsr,
static int
do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
{
+ if (interrupts_enabled(regs))
+ local_irq_enable();
+
do_bad_area(addr, fsr, regs);
return 0;
}
diff --git a/arch/arm/mm/highmem.c b/arch/arm/mm/highmem.c
index a76f8ace9ce6..eb4a361d47d7 100644
--- a/arch/arm/mm/highmem.c
+++ b/arch/arm/mm/highmem.c
@@ -31,6 +31,11 @@ static inline pte_t get_fixmap_pte(unsigned long vaddr)
return *ptep;
}
+static unsigned int fixmap_idx(int type)
+{
+ return FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
+}
+
void *kmap(struct page *page)
{
might_sleep();
@@ -51,12 +56,13 @@ EXPORT_SYMBOL(kunmap);
void *kmap_atomic(struct page *page)
{
+ pte_t pte = mk_pte(page, kmap_prot);
unsigned int idx;
unsigned long vaddr;
void *kmap;
int type;
- preempt_disable();
+ preempt_disable_nort();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -76,7 +82,7 @@ void *kmap_atomic(struct page *page)
type = kmap_atomic_idx_push();
- idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
+ idx = fixmap_idx(type);
vaddr = __fix_to_virt(idx);
#ifdef CONFIG_DEBUG_HIGHMEM
/*
@@ -90,7 +96,10 @@ void *kmap_atomic(struct page *page)
* in place, so the contained TLB flush ensures the TLB is updated
* with the new mapping.
*/
- set_fixmap_pte(idx, mk_pte(page, kmap_prot));
+#ifdef CONFIG_PREEMPT_RT
+ current->kmap_pte[type] = pte;
+#endif
+ set_fixmap_pte(idx, pte);
return (void *)vaddr;
}
@@ -103,44 +112,75 @@ void __kunmap_atomic(void *kvaddr)
if (kvaddr >= (void *)FIXADDR_START) {
type = kmap_atomic_idx();
- idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
+ idx = fixmap_idx(type);
if (cache_is_vivt())
__cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE);
+#ifdef CONFIG_PREEMPT_RT
+ current->kmap_pte[type] = __pte(0);
+#endif
#ifdef CONFIG_DEBUG_HIGHMEM
BUG_ON(vaddr != __fix_to_virt(idx));
- set_fixmap_pte(idx, __pte(0));
#else
(void) idx; /* to kill a warning */
#endif
+ set_fixmap_pte(idx, __pte(0));
kmap_atomic_idx_pop();
} else if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
/* this address was obtained through kmap_high_get() */
kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
}
pagefault_enable();
- preempt_enable();
+ preempt_enable_nort();
}
EXPORT_SYMBOL(__kunmap_atomic);
void *kmap_atomic_pfn(unsigned long pfn)
{
+ pte_t pte = pfn_pte(pfn, kmap_prot);
unsigned long vaddr;
int idx, type;
struct page *page = pfn_to_page(pfn);
- preempt_disable();
+ preempt_disable_nort();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
type = kmap_atomic_idx_push();
- idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
+ idx = fixmap_idx(type);
vaddr = __fix_to_virt(idx);
#ifdef CONFIG_DEBUG_HIGHMEM
BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
#endif
- set_fixmap_pte(idx, pfn_pte(pfn, kmap_prot));
+#ifdef CONFIG_PREEMPT_RT
+ current->kmap_pte[type] = pte;
+#endif
+ set_fixmap_pte(idx, pte);
return (void *)vaddr;
}
+#if defined CONFIG_PREEMPT_RT
+void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p)
+{
+ int i;
+
+ /*
+ * Clear @prev's kmap_atomic mappings
+ */
+ for (i = 0; i < prev_p->kmap_idx; i++) {
+ int idx = fixmap_idx(i);
+
+ set_fixmap_pte(idx, __pte(0));
+ }
+ /*
+ * Restore @next_p's kmap_atomic mappings
+ */
+ for (i = 0; i < next_p->kmap_idx; i++) {
+ int idx = fixmap_idx(i);
+
+ if (!pte_none(next_p->kmap_pte[i]))
+ set_fixmap_pte(idx, next_p->kmap_pte[i]);
+ }
+}
+#endif
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a0bc9bbb92f3..0ba73283452c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -35,32 +35,32 @@ config ARM64
select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_NMI_SAFE_CMPXCHG
- select ARCH_INLINE_READ_LOCK if !PREEMPT
- select ARCH_INLINE_READ_LOCK_BH if !PREEMPT
- select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT
- select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPT
- select ARCH_INLINE_READ_UNLOCK if !PREEMPT
- select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPT
- select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPT
- select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPT
- select ARCH_INLINE_WRITE_LOCK if !PREEMPT
- select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPT
- select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPT
- select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPT
- select ARCH_INLINE_WRITE_UNLOCK if !PREEMPT
- select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT
- select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT
- select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
- select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPT
- select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPT
- select ARCH_INLINE_SPIN_LOCK if !PREEMPT
- select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPT
- select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPT
- select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPT
- select ARCH_INLINE_SPIN_UNLOCK if !PREEMPT
- select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPT
- select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPT
- select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPT
+ select ARCH_INLINE_READ_LOCK if !PREEMPTION
+ select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
+ select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
+ select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION
+ select ARCH_INLINE_READ_UNLOCK if !PREEMPTION
+ select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION
+ select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION
+ select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION
+ select ARCH_INLINE_WRITE_LOCK if !PREEMPTION
+ select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION
+ select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION
+ select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION
+ select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION
+ select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION
+ select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION
+ select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION
+ select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION
+ select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION
+ select ARCH_INLINE_SPIN_LOCK if !PREEMPTION
+ select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION
+ select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION
+ select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION
+ select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION
+ select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
+ select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
+ select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
select ARCH_KEEP_MEMBLOCK
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_QUEUED_RWLOCKS
@@ -69,6 +69,7 @@ config ARM64
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 50000 || CC_IS_CLANG
select ARCH_SUPPORTS_NUMA_BALANCING
+ select ARCH_SUPPORTS_RT
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
select ARCH_WANT_FRAME_POINTERS
@@ -159,6 +160,7 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+ select HAVE_PREEMPT_LAZY
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_RCU_TABLE_FREE
diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
index e273faca924f..999da59f03a9 100644
--- a/arch/arm64/crypto/sha256-glue.c
+++ b/arch/arm64/crypto/sha256-glue.c
@@ -97,7 +97,7 @@ static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
* input when running on a preemptible kernel, but process the
* data block by block instead.
*/
- if (IS_ENABLED(CONFIG_PREEMPT) &&
+ if (IS_ENABLED(CONFIG_PREEMPTION) &&
chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE)
chunk = SHA256_BLOCK_SIZE -
sctx->count % SHA256_BLOCK_SIZE;
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index b8cf7c85ffa2..2cc0dd8bd9f7 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -699,8 +699,8 @@ USER(\label, ic ivau, \tmp2) // invalidate I line PoU
* where is optional, and marks the point where execution will resume
* after a yield has been performed. If omitted, execution resumes right after
* the endif_yield_neon invocation. Note that the entire sequence, including
- * the provided patchup code, will be omitted from the image if CONFIG_PREEMPT
- * is not defined.
+ * the provided patchup code, will be omitted from the image if
+ * CONFIG_PREEMPTION is not defined.
*
* As a convenience, in the case where no patchup code is required, the above
* sequence may be abbreviated to
@@ -728,7 +728,7 @@ USER(\label, ic ivau, \tmp2) // invalidate I line PoU
.endm
.macro if_will_cond_yield_neon
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
get_current_task x0
ldr x0, [x0, #TSK_TI_PREEMPT]
sub x0, x0, #PREEMPT_DISABLE_OFFSET
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index befe37d4bc0e..53d846f1bfe7 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -91,6 +91,7 @@ alternative_cb_end
void kvm_update_va_mask(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
+void kvm_compute_layout(void);
static inline unsigned long __kern_hyp_va(unsigned long v)
{
diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index d49951647014..3b19db2db1ed 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -70,20 +70,43 @@ static inline bool __preempt_count_dec_and_test(void)
* interrupt occurring between the non-atomic READ_ONCE/WRITE_ONCE
* pair.
*/
- return !pc || !READ_ONCE(ti->preempt_count);
+ if (!pc || !READ_ONCE(ti->preempt_count))
+ return true;
+#ifdef CONFIG_PREEMPT_LAZY
+ if ((pc & ~PREEMPT_NEED_RESCHED))
+ return false;
+ if (current_thread_info()->preempt_lazy_count)
+ return false;
+ return test_thread_flag(TIF_NEED_RESCHED_LAZY);
+#else
+ return false;
+#endif
}
static inline bool should_resched(int preempt_offset)
{
+#ifdef CONFIG_PREEMPT_LAZY
+ u64 pc = READ_ONCE(current_thread_info()->preempt_count);
+ if (pc == preempt_offset)
+ return true;
+
+ if ((pc & ~PREEMPT_NEED_RESCHED) != preempt_offset)
+ return false;
+
+ if (current_thread_info()->preempt_lazy_count)
+ return false;
+ return test_thread_flag(TIF_NEED_RESCHED_LAZY);
+#else
u64 pc = READ_ONCE(current_thread_info()->preempt_count);
return pc == preempt_offset;
+#endif
}
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
void preempt_schedule(void);
#define __preempt_schedule() preempt_schedule()
void preempt_schedule_notrace(void);
#define __preempt_schedule_notrace() preempt_schedule_notrace()
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
#endif /* __ASM_PREEMPT_H */
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 18782f0c4721..6672b05350b4 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -5,10 +5,6 @@
#ifndef __ASM_SPINLOCK_TYPES_H
#define __ASM_SPINLOCK_TYPES_H
-#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
-# error "please don't include this file directly"
-#endif
-
#include
#include
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 4e3ed702bec7..d537305c1bfe 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -29,6 +29,7 @@ struct thread_info {
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
u64 ttbr0; /* saved TTBR0_EL1 */
#endif
+ int preempt_lazy_count; /* 0 => preemptable, <0 => bug */
union {
u64 preempt_count; /* 0 => preemptible, <0 => bug */
struct {
@@ -63,6 +64,7 @@ void arch_release_task_struct(struct task_struct *tsk);
#define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */
#define TIF_UPROBE 4 /* uprobe breakpoint or singlestep */
#define TIF_FSCHECK 5 /* Check FS is USER_DS on return */
+#define TIF_NEED_RESCHED_LAZY 6
#define TIF_NOHZ 7
#define TIF_SYSCALL_TRACE 8 /* syscall trace active */
#define TIF_SYSCALL_AUDIT 9 /* syscall auditing */
@@ -83,6 +85,7 @@ void arch_release_task_struct(struct task_struct *tsk);
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE)
+#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
@@ -97,8 +100,9 @@ void arch_release_task_struct(struct task_struct *tsk);
#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
_TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
- _TIF_UPROBE | _TIF_FSCHECK)
+ _TIF_UPROBE | _TIF_FSCHECK | _TIF_NEED_RESCHED_LAZY)
+#define _TIF_NEED_RESCHED_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)
#define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
_TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
_TIF_NOHZ | _TIF_SYSCALL_EMU)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 214685760e1c..f7975b079551 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -30,6 +30,7 @@ int main(void)
BLANK();
DEFINE(TSK_TI_FLAGS, offsetof(struct task_struct, thread_info.flags));
DEFINE(TSK_TI_PREEMPT, offsetof(struct task_struct, thread_info.preempt_count));
+ DEFINE(TSK_TI_PREEMPT_LAZY, offsetof(struct task_struct, thread_info.preempt_lazy_count));
DEFINE(TSK_TI_ADDR_LIMIT, offsetof(struct task_struct, thread_info.addr_limit));
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0));
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index cf3bd2976e57..b25f63365080 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -669,7 +669,7 @@ el1_irq:
irq_handler
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
ldr x24, [tsk, #TSK_TI_PREEMPT] // get preempt count
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
/*
@@ -679,9 +679,18 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
mrs x0, daif
orr x24, x24, x0
alternative_else_nop_endif
- cbnz x24, 1f // preempt count != 0 || NMI return path
- bl arm64_preempt_schedule_irq // irq en/disable is done inside
+
+ cbz x24, 1f // (need_resched + count) == 0
+ cbnz w24, 2f // count != 0
+
+ ldr w24, [tsk, #TSK_TI_PREEMPT_LAZY] // get preempt lazy count
+ cbnz w24, 2f // preempt lazy count != 0
+
+ ldr x0, [tsk, #TSK_TI_FLAGS] // get flags
+ tbz x0, #TIF_NEED_RESCHED_LAZY, 2f // needs rescheduling?
1:
+ bl arm64_preempt_schedule_irq // irq en/disable is done inside
+2:
#endif
#ifdef CONFIG_ARM64_PSEUDO_NMI
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 04b982a2799e..0da24b606bbe 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -213,6 +213,16 @@ static void sve_free(struct task_struct *task)
__sve_free(task);
}
+static void *sve_free_atomic(struct task_struct *task)
+{
+ void *sve_state = task->thread.sve_state;
+
+ WARN_ON(test_tsk_thread_flag(task, TIF_SVE));
+
+ task->thread.sve_state = NULL;
+ return sve_state;
+}
+
/*
* TIF_SVE controls whether a task can use SVE without trapping while
* in userspace, and also the way a task's FPSIMD/SVE state is stored
@@ -1010,6 +1020,7 @@ void fpsimd_thread_switch(struct task_struct *next)
void fpsimd_flush_thread(void)
{
int vl, supported_vl;
+ void *mem = NULL;
if (!system_supports_fpsimd())
return;
@@ -1022,7 +1033,7 @@ void fpsimd_flush_thread(void)
if (system_supports_sve()) {
clear_thread_flag(TIF_SVE);
- sve_free(current);
+ mem = sve_free_atomic(current);
/*
* Reset the task vector length as required.
@@ -1056,6 +1067,7 @@ void fpsimd_flush_thread(void)
}
put_cpu_fpsimd_context();
+ kfree(mem);
}
/*
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index ddb757b2c3e5..71ad20ebc2f3 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -903,7 +903,7 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
/* Check valid user FS if needed */
addr_limit_user_check();
- if (thread_flags & _TIF_NEED_RESCHED) {
+ if (thread_flags & _TIF_NEED_RESCHED_MASK) {
/* Unmask Debug and SError for the next task */
local_daif_restore(DAIF_PROCCTX_NOIRQ);
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 102dc3e7f2e1..bac8da81af2e 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -31,6 +31,7 @@
#include
#include
#include
+#include
#include
#include
@@ -39,6 +40,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -408,6 +410,8 @@ static void __init hyp_mode_check(void)
"CPU: CPUs started in inconsistent modes");
else
pr_info("CPU: All CPU(s) started at EL1\n");
+ if (IS_ENABLED(CONFIG_KVM_ARM_HOST))
+ kvm_compute_layout();
}
void __init smp_cpus_done(unsigned int max_cpus)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 4e3e9d9c8151..05b68961a110 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -143,9 +143,12 @@ void show_stack(struct task_struct *tsk, unsigned long *sp)
#ifdef CONFIG_PREEMPT
#define S_PREEMPT " PREEMPT"
+#elif defined(CONFIG_PREEMPT_RT)
+#define S_PREEMPT " PREEMPT_RT"
#else
#define S_PREEMPT ""
#endif
+
#define S_SMP " SMP"
static int __die(const char *str, int err, struct pt_regs *regs)
diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
index 2cf7d4b606c3..dab1fea4752a 100644
--- a/arch/arm64/kvm/va_layout.c
+++ b/arch/arm64/kvm/va_layout.c
@@ -22,7 +22,7 @@ static u8 tag_lsb;
static u64 tag_val;
static u64 va_mask;
-static void compute_layout(void)
+__init void kvm_compute_layout(void)
{
phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
u64 hyp_va_msb;
@@ -110,9 +110,6 @@ void __init kvm_update_va_mask(struct alt_instr *alt,
BUG_ON(nr_inst != 5);
- if (!has_vhe() && !va_mask)
- compute_layout();
-
for (i = 0; i < nr_inst; i++) {
u32 rd, rn, insn, oinsn;
@@ -156,9 +153,6 @@ void kvm_patch_vector_branch(struct alt_instr *alt,
return;
}
- if (!va_mask)
- compute_layout();
-
/*
* Compute HYP VA by using the same computation as kern_hyp_va()
*/
diff --git a/arch/c6x/kernel/entry.S b/arch/c6x/kernel/entry.S
index 4332a10aec6c..fb154d19625b 100644
--- a/arch/c6x/kernel/entry.S
+++ b/arch/c6x/kernel/entry.S
@@ -18,7 +18,7 @@
#define DP B14
#define SP B15
-#ifndef CONFIG_PREEMPT
+#ifndef CONFIG_PREEMPTION
#define resume_kernel restore_all
#endif
@@ -287,7 +287,7 @@ work_notifysig:
;; is a little bit different
;;
ENTRY(ret_from_exception)
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
MASK_INT B2
#endif
@@ -557,7 +557,7 @@ ENDPROC(_nmi_handler)
;;
;; Jump to schedule() then return to ret_from_isr
;;
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
resume_kernel:
GET_THREAD_INFO A12
LDW .D1T1 *+A12(THREAD_INFO_PREEMPT_COUNT),A1
@@ -582,7 +582,7 @@ preempt_schedule:
B .S2 preempt_schedule_irq
#endif
ADDKPC .S2 preempt_schedule,B3,4
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
ENTRY(enable_exception)
DINT
diff --git a/arch/csky/kernel/entry.S b/arch/csky/kernel/entry.S
index 4349528fbf38..ff908d28f0a0 100644
--- a/arch/csky/kernel/entry.S
+++ b/arch/csky/kernel/entry.S
@@ -279,7 +279,7 @@ ENTRY(csky_irq)
zero_fp
psrset ee
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
mov r9, sp /* Get current stack pointer */
bmaski r10, THREAD_SHIFT
andn r9, r10 /* Get thread_info */
@@ -296,7 +296,7 @@ ENTRY(csky_irq)
mov a0, sp
jbsr csky_do_IRQ
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
subi r12, 1
stw r12, (r9, TINFO_PREEMPT)
cmpnei r12, 0
diff --git a/arch/h8300/kernel/entry.S b/arch/h8300/kernel/entry.S
index 4ade5f8299ba..c6e289b5f1f2 100644
--- a/arch/h8300/kernel/entry.S
+++ b/arch/h8300/kernel/entry.S
@@ -284,12 +284,12 @@ badsys:
mov.l er0,@(LER0:16,sp)
bra resume_userspace
-#if !defined(CONFIG_PREEMPT)
+#if !defined(CONFIG_PREEMPTION)
#define resume_kernel restore_all
#endif
ret_from_exception:
-#if defined(CONFIG_PREEMPT)
+#if defined(CONFIG_PREEMPTION)
orc #0xc0,ccr
#endif
ret_from_interrupt:
@@ -319,7 +319,7 @@ work_resched:
restore_all:
RESTORE_ALL /* Does RTE */
-#if defined(CONFIG_PREEMPT)
+#if defined(CONFIG_PREEMPTION)
resume_kernel:
mov.l @(TI_PRE_COUNT:16,er4),er0
bne restore_all:8
diff --git a/arch/hexagon/include/asm/spinlock_types.h b/arch/hexagon/include/asm/spinlock_types.h
index 19d233497ba5..de72fb23016d 100644
--- a/arch/hexagon/include/asm/spinlock_types.h
+++ b/arch/hexagon/include/asm/spinlock_types.h
@@ -8,10 +8,6 @@
#ifndef _ASM_SPINLOCK_TYPES_H
#define _ASM_SPINLOCK_TYPES_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
typedef struct {
volatile unsigned int lock;
} arch_spinlock_t;
diff --git a/arch/hexagon/kernel/vm_entry.S b/arch/hexagon/kernel/vm_entry.S
index 4023fdbea490..554371d92bed 100644
--- a/arch/hexagon/kernel/vm_entry.S
+++ b/arch/hexagon/kernel/vm_entry.S
@@ -265,12 +265,12 @@ event_dispatch:
* should be in the designated register (usually R19)
*
* If we were in kernel mode, we don't need to check scheduler
- * or signals if CONFIG_PREEMPT is not set. If set, then it has
+ * or signals if CONFIG_PREEMPTION is not set. If set, then it has
* to jump to a need_resched kind of block.
- * BTW, CONFIG_PREEMPT is not supported yet.
+ * BTW, CONFIG_PREEMPTION is not supported yet.
*/
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
R0 = #VM_INT_DISABLE
trap1(#HVM_TRAP1_VMSETIE)
#endif
diff --git a/arch/ia64/include/asm/spinlock_types.h b/arch/ia64/include/asm/spinlock_types.h
index 6e345fefcdca..681408d6816f 100644
--- a/arch/ia64/include/asm/spinlock_types.h
+++ b/arch/ia64/include/asm/spinlock_types.h
@@ -2,10 +2,6 @@
#ifndef _ASM_IA64_SPINLOCK_TYPES_H
#define _ASM_IA64_SPINLOCK_TYPES_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
typedef struct {
volatile unsigned int lock;
} arch_spinlock_t;
diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
index a9992be5718b..2ac926331500 100644
--- a/arch/ia64/kernel/entry.S
+++ b/arch/ia64/kernel/entry.S
@@ -670,12 +670,12 @@ GLOBAL_ENTRY(ia64_leave_syscall)
*
* p6 controls whether current_thread_info()->flags needs to be check for
* extra work. We always check for extra work when returning to user-level.
- * With CONFIG_PREEMPT, we also check for extra work when the preempt_count
+ * With CONFIG_PREEMPTION, we also check for extra work when the preempt_count
* is 0. After extra work processing has been completed, execution
* resumes at ia64_work_processed_syscall with p6 set to 1 if the extra-work-check
* needs to be redone.
*/
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
RSM_PSR_I(p0, r2, r18) // disable interrupts
cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall
(pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13
@@ -685,7 +685,7 @@ GLOBAL_ENTRY(ia64_leave_syscall)
(pUStk) mov r21=0 // r21 <- 0
;;
cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0)
-#else /* !CONFIG_PREEMPT */
+#else /* !CONFIG_PREEMPTION */
RSM_PSR_I(pUStk, r2, r18)
cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall
(pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk
@@ -814,12 +814,12 @@ GLOBAL_ENTRY(ia64_leave_kernel)
*
* p6 controls whether current_thread_info()->flags needs to be check for
* extra work. We always check for extra work when returning to user-level.
- * With CONFIG_PREEMPT, we also check for extra work when the preempt_count
+ * With CONFIG_PREEMPTION, we also check for extra work when the preempt_count
* is 0. After extra work processing has been completed, execution
* resumes at .work_processed_syscall with p6 set to 1 if the extra-work-check
* needs to be redone.
*/
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
RSM_PSR_I(p0, r17, r31) // disable interrupts
cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel
(pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13
@@ -1120,7 +1120,7 @@ skip_rbs_switch:
/*
* On entry:
- * r20 = ¤t->thread_info->pre_count (if CONFIG_PREEMPT)
+ * r20 = ¤t->thread_info->pre_count (if CONFIG_PREEMPTION)
* r31 = current->thread_info->flags
* On exit:
* p6 = TRUE if work-pending-check needs to be redone
diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index b8356edbde65..a6d6a0556f08 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -841,7 +841,7 @@ static int __kprobes pre_kprobes_handler(struct die_args *args)
return 1;
}
-#if !defined(CONFIG_PREEMPT)
+#if !defined(CONFIG_PREEMPTION)
if (p->ainsn.inst_flag == INST_FLAG_BOOSTABLE && !p->post_handler) {
/* Boost up -- we can execute copied instructions directly */
ia64_psr(regs)->ri = p->ainsn.slot;
diff --git a/arch/m68k/coldfire/entry.S b/arch/m68k/coldfire/entry.S
index 52d312d5b4d4..d43a02795a4a 100644
--- a/arch/m68k/coldfire/entry.S
+++ b/arch/m68k/coldfire/entry.S
@@ -108,7 +108,7 @@ ret_from_exception:
btst #5,%sp@(PT_OFF_SR) /* check if returning to kernel */
jeq Luser_return /* if so, skip resched, signals */
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
movel %sp,%d1 /* get thread_info pointer */
andl #-THREAD_SIZE,%d1 /* at base of kernel stack */
movel %d1,%a0
diff --git a/arch/microblaze/kernel/entry.S b/arch/microblaze/kernel/entry.S
index 4e1b567becd6..253f2b702b4c 100644
--- a/arch/microblaze/kernel/entry.S
+++ b/arch/microblaze/kernel/entry.S
@@ -728,7 +728,7 @@ no_intr_resched:
bri 6f;
/* MS: Return to kernel state. */
2:
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
lwi r11, CURRENT_TASK, TS_THREAD_INFO;
/* MS: get preempt_count from thread info */
lwi r5, r11, TI_PREEMPT_COUNT;
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 6ecdc690f733..d84e3284d43f 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2587,7 +2587,7 @@ config MIPS_CRC_SUPPORT
#
config HIGHMEM
bool "High Memory Support"
- depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA
+ depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA && !PREEMPT_RT
config CPU_SUPPORTS_HIGHMEM
bool
diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
index feb069cbf44e..655f40ddb6d1 100644
--- a/arch/mips/include/asm/asmmacro.h
+++ b/arch/mips/include/asm/asmmacro.h
@@ -63,7 +63,7 @@
.endm
.macro local_irq_disable reg=t0
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
lw \reg, TI_PRE_COUNT($28)
addi \reg, \reg, 1
sw \reg, TI_PRE_COUNT($28)
@@ -73,7 +73,7 @@
xori \reg, \reg, 1
mtc0 \reg, CP0_STATUS
irq_disable_hazard
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
lw \reg, TI_PRE_COUNT($28)
addi \reg, \reg, -1
sw \reg, TI_PRE_COUNT($28)
diff --git a/arch/mips/kernel/entry.S b/arch/mips/kernel/entry.S
index 5469d43b6966..4849a48afc0f 100644
--- a/arch/mips/kernel/entry.S
+++ b/arch/mips/kernel/entry.S
@@ -19,7 +19,7 @@
#include
#include
-#ifndef CONFIG_PREEMPT
+#ifndef CONFIG_PREEMPTION
#define resume_kernel restore_all
#else
#define __ret_from_irq ret_from_exception
@@ -27,7 +27,7 @@
.text
.align 5
-#ifndef CONFIG_PREEMPT
+#ifndef CONFIG_PREEMPTION
FEXPORT(ret_from_exception)
local_irq_disable # preempt stop
b __ret_from_irq
@@ -53,7 +53,7 @@ resume_userspace:
bnez t0, work_pending
j restore_all
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
resume_kernel:
local_irq_disable
lw t0, TI_PRE_COUNT($28)
diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index fbd68329737f..abbb74e46f05 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -61,7 +61,7 @@ config GENERIC_HWEIGHT
config GENERIC_LOCKBREAK
def_bool y
- depends on PREEMPT
+ depends on PREEMPTION
config TRACE_IRQFLAGS_SUPPORT
def_bool y
diff --git a/arch/nds32/kernel/ex-exit.S b/arch/nds32/kernel/ex-exit.S
index 1df02a793364..6a2966c2d8c8 100644
--- a/arch/nds32/kernel/ex-exit.S
+++ b/arch/nds32/kernel/ex-exit.S
@@ -72,7 +72,7 @@
restore_user_regs_last
.endm
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
.macro preempt_stop
.endm
#else
@@ -158,7 +158,7 @@ no_work_pending:
/*
* preemptive kernel
*/
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
resume_kernel:
gie_disable
lwi $t0, [tsk+#TSK_TI_PREEMPT]
diff --git a/arch/nios2/kernel/entry.S b/arch/nios2/kernel/entry.S
index 1e515ccd698e..3d8d1d0bcb64 100644
--- a/arch/nios2/kernel/entry.S
+++ b/arch/nios2/kernel/entry.S
@@ -365,7 +365,7 @@ ENTRY(ret_from_interrupt)
ldw r1, PT_ESTATUS(sp) /* check if returning to kernel */
TSTBNZ r1, r1, ESTATUS_EU, Luser_return
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
GET_THREAD_INFO r1
ldw r4, TI_PREEMPT_COUNT(r1)
bne r4, r0, restore_all
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 0c29d6cb2c8d..9b44f06b263e 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -82,7 +82,7 @@ config STACK_GROWSUP
config GENERIC_LOCKBREAK
bool
default y
- depends on SMP && PREEMPT
+ depends on SMP && PREEMPTION
config ARCH_HAS_ILOG2_U32
bool
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 873bf3434da9..755240ce671e 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -942,14 +942,14 @@ intr_restore:
rfi
nop
-#ifndef CONFIG_PREEMPT
+#ifndef CONFIG_PREEMPTION
# define intr_do_preempt intr_restore
-#endif /* !CONFIG_PREEMPT */
+#endif /* !CONFIG_PREEMPTION */
.import schedule,code
intr_do_resched:
/* Only call schedule on return to userspace. If we're returning
- * to kernel space, we may schedule if CONFIG_PREEMPT, otherwise
+ * to kernel space, we may schedule if CONFIG_PREEMPTION, otherwise
* we jump back to intr_restore.
*/
LDREG PT_IASQ0(%r16), %r20
@@ -981,7 +981,7 @@ intr_do_resched:
* and preempt_count is 0. otherwise, we continue on
* our merry way back to the current running task.
*/
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
.import preempt_schedule_irq,code
intr_do_preempt:
rsm PSW_SM_I, %r0 /* disable interrupts */
@@ -1001,7 +1001,7 @@ intr_do_preempt:
nop
b,n intr_restore /* ssm PSW_SM_I done by intr_restore */
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
/*
* External interrupts.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ad620637cbd1..0f537df67a95 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -106,7 +106,7 @@ config LOCKDEP_SUPPORT
config GENERIC_LOCKBREAK
bool
default y
- depends on SMP && PREEMPT
+ depends on SMP && PREEMPTION
config GENERIC_HWEIGHT
bool
@@ -144,6 +144,7 @@ config PPC
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_RT
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_WANT_IPC_PARSE_VERSION
@@ -221,6 +222,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+ select HAVE_PREEMPT_LAZY
select HAVE_RCU_TABLE_FREE
select HAVE_MMU_GATHER_PAGE_SIZE
select HAVE_REGS_AND_STACK_ACCESS_API
@@ -398,7 +400,7 @@ menu "Kernel options"
config HIGHMEM
bool "High memory support"
- depends on PPC32
+ depends on PPC32 && !PREEMPT_RT
source "kernel/Kconfig.hz"
diff --git a/arch/powerpc/include/asm/spinlock_types.h b/arch/powerpc/include/asm/spinlock_types.h
index 87adaf13b7e8..7305cb6a53e4 100644
--- a/arch/powerpc/include/asm/spinlock_types.h
+++ b/arch/powerpc/include/asm/spinlock_types.h
@@ -2,10 +2,6 @@
#ifndef _ASM_POWERPC_SPINLOCK_TYPES_H
#define _ASM_POWERPC_SPINLOCK_TYPES_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
typedef struct {
volatile unsigned int slock;
} arch_spinlock_t;
diff --git a/arch/powerpc/include/asm/stackprotector.h b/arch/powerpc/include/asm/stackprotector.h
index 1c8460e23583..b1653c160bab 100644
--- a/arch/powerpc/include/asm/stackprotector.h
+++ b/arch/powerpc/include/asm/stackprotector.h
@@ -24,7 +24,11 @@ static __always_inline void boot_init_stack_canary(void)
unsigned long canary;
/* Try to get a semi random initial value. */
+#ifdef CONFIG_PREEMPT_RT
+ canary = (unsigned long)&canary;
+#else
canary = get_random_canary();
+#endif
canary ^= mftb();
canary ^= LINUX_VERSION_CODE;
canary &= CANARY_MASK;
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 8e1d0195ac36..594fdb569e54 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -30,6 +30,8 @@
struct thread_info {
int preempt_count; /* 0 => preemptable,
<0 => BUG */
+ int preempt_lazy_count; /* 0 => preemptable,
+ <0 => BUG */
unsigned long local_flags; /* private flags for thread */
#ifdef CONFIG_LIVEPATCH
unsigned long *livepatch_sp;
@@ -80,11 +82,12 @@ void arch_setup_new_exec(void);
#define TIF_SINGLESTEP 8 /* singlestepping active */
#define TIF_NOHZ 9 /* in adaptive nohz mode */
#define TIF_SECCOMP 10 /* secure computing */
-#define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */
-#define TIF_NOERROR 12 /* Force successful syscall return */
+
+#define TIF_NEED_RESCHED_LAZY 11 /* lazy rescheduling necessary */
+#define TIF_SYSCALL_TRACEPOINT 12 /* syscall tracepoint instrumentation */
+
#define TIF_NOTIFY_RESUME 13 /* callback before returning to user */
#define TIF_UPROBE 14 /* breakpointed or single-stepping */
-#define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */
#define TIF_EMULATE_STACK_STORE 16 /* Is an instruction emulation
for stack store? */
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
@@ -93,6 +96,9 @@ void arch_setup_new_exec(void);
#endif
#define TIF_POLLING_NRFLAG 19 /* true if poll_idle() is polling TIF_NEED_RESCHED */
#define TIF_32BIT 20 /* 32 bit binary */
+#define TIF_RESTOREALL 21 /* Restore all regs (implies NOERROR) */
+#define TIF_NOERROR 22 /* Force successful syscall return */
+
/* as above, but as bit values */
#define _TIF_SYSCALL_TRACE (1<preempt_count */
lwz r0,TI_PREEMPT(r2)
cmpwi 0,r0,0 /* if non-zero, just restore regs and return */
bne restore_kuap
andi. r8,r8,_TIF_NEED_RESCHED
+ bne+ 1f
+ lwz r0,TI_PREEMPT_LAZY(r2)
+ cmpwi 0,r0,0 /* if non-zero, just restore regs and return */
+ bne restore_kuap
+ lwz r0,TI_FLAGS(r2)
+ andi. r0,r0,_TIF_NEED_RESCHED_LAZY
beq+ restore_kuap
+1:
lwz r3,_MSR(r1)
andi. r0,r3,MSR_EE /* interrupts off? */
beq restore_kuap /* don't schedule if so */
@@ -922,7 +931,7 @@ resume_kernel:
*/
bl trace_hardirqs_on
#endif
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
restore_kuap:
kuap_restore r1, r2, r9, r10, r0
@@ -1225,7 +1234,7 @@ global_dbcr0:
#endif /* !(CONFIG_4xx || CONFIG_BOOKE) */
do_work: /* r10 contains MSR_KERNEL here */
- andi. r0,r9,_TIF_NEED_RESCHED
+ andi. r0,r9,_TIF_NEED_RESCHED_MASK
beq do_user_signal
do_resched: /* r10 contains MSR_KERNEL here */
@@ -1246,7 +1255,7 @@ recheck:
SYNC
MTMSRD(r10) /* disable interrupts */
lwz r9,TI_FLAGS(r2)
- andi. r0,r9,_TIF_NEED_RESCHED
+ andi. r0,r9,_TIF_NEED_RESCHED_MASK
bne- do_resched
andi. r0,r9,_TIF_USER_WORK_MASK
beq restore_user
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3fd3ef352e3f..dfd96e87d7b6 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -240,7 +240,9 @@ system_call_exit:
ld r9,TI_FLAGS(r12)
li r11,-MAX_ERRNO
- andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
+ lis r0,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)@h
+ ori r0,r0,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)@l
+ and. r0,r9,r0
bne- .Lsyscall_exit_work
andi. r0,r8,MSR_FP
@@ -363,25 +365,25 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
/* If TIF_RESTOREALL is set, don't scribble on either r3 or ccr.
If TIF_NOERROR is set, just save r3 as it is. */
- andi. r0,r9,_TIF_RESTOREALL
+ andis. r0,r9,_TIF_RESTOREALL@h
beq+ 0f
REST_NVGPRS(r1)
b 2f
0: cmpld r3,r11 /* r11 is -MAX_ERRNO */
blt+ 1f
- andi. r0,r9,_TIF_NOERROR
+ andis. r0,r9,_TIF_NOERROR@h
bne- 1f
ld r5,_CCR(r1)
neg r3,r3
oris r5,r5,0x1000 /* Set SO bit in CR */
std r5,_CCR(r1)
1: std r3,GPR3(r1)
-2: andi. r0,r9,(_TIF_PERSYSCALL_MASK)
+2: andis. r0,r9,(_TIF_PERSYSCALL_MASK)@h
beq 4f
/* Clear per-syscall TIF flags if any are set. */
- li r11,_TIF_PERSYSCALL_MASK
+ lis r11,(_TIF_PERSYSCALL_MASK)@h
addi r12,r12,TI_FLAGS
3: ldarx r10,0,r12
andc r10,r10,r11
@@ -786,7 +788,7 @@ _GLOBAL(ret_from_except_lite)
bl restore_math
b restore
#endif
-1: andi. r0,r4,_TIF_NEED_RESCHED
+1: andi. r0,r4,_TIF_NEED_RESCHED_MASK
beq 2f
bl restore_interrupts
SCHEDULE_USER
@@ -846,12 +848,20 @@ resume_kernel:
bne- 0b
1:
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* Check if we need to preempt */
+ lwz r8,TI_PREEMPT(r9)
+ cmpwi 0,r8,0 /* if non-zero, just restore regs and return */
+ bne restore
andi. r0,r4,_TIF_NEED_RESCHED
+ bne+ check_count
+
+ andi. r0,r4,_TIF_NEED_RESCHED_LAZY
beq+ restore
+ lwz r8,TI_PREEMPT_LAZY(r9)
+
/* Check that preempt_count() == 0 and interrupts are enabled */
- lwz r8,TI_PREEMPT(r9)
+check_count:
cmpwi cr0,r8,0
bne restore
ld r0,SOFTE(r1)
@@ -877,7 +887,7 @@ resume_kernel:
li r10,MSR_RI
mtmsrd r10,1 /* Update machine state */
#endif /* CONFIG_PPC_BOOK3E */
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
.globl fast_exc_return_irq
fast_exc_return_irq:
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index add67498c126..201ddd5d3f15 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -679,10 +679,12 @@ void *mcheckirq_ctx[NR_CPUS] __read_mostly;
void *softirq_ctx[NR_CPUS] __read_mostly;
void *hardirq_ctx[NR_CPUS] __read_mostly;
+#ifndef CONFIG_PREEMPT_RT
void do_softirq_own_stack(void)
{
call_do_softirq(softirq_ctx[smp_processor_id()]);
}
+#endif
irq_hw_number_t virq_to_hw(unsigned int virq)
{
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index f4e4a1926a7a..36bd3e7f381e 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -37,6 +37,7 @@
* We store the saved ksp_limit in the unused part
* of the STACK_FRAME_OVERHEAD
*/
+#ifndef CONFIG_PREEMPT_RT
_GLOBAL(call_do_softirq)
mflr r0
stw r0,4(r1)
@@ -52,6 +53,7 @@ _GLOBAL(call_do_softirq)
stw r10,THREAD+KSP_LIMIT(r2)
mtlr r0
blr
+#endif
/*
* void call_do_irq(struct pt_regs *regs, void *sp);
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index ff20c253f273..bba77ded0931 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -27,6 +27,7 @@
.text
+#ifndef CONFIG_PREEMPT_RT
_GLOBAL(call_do_softirq)
mflr r0
std r0,16(r1)
@@ -37,6 +38,7 @@ _GLOBAL(call_do_softirq)
ld r0,16(r1)
mtlr r0
blr
+#endif
_GLOBAL(call_do_irq)
mflr r0
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9432fc6af28a..676f239c82b0 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -171,7 +171,6 @@ extern void panic_flush_kmsg_start(void)
extern void panic_flush_kmsg_end(void)
{
- printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
@@ -261,12 +260,17 @@ static char *get_mmu_str(void)
static int __die(const char *str, struct pt_regs *regs, long err)
{
+ const char *pr = "";
+
printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
+ if (IS_ENABLED(CONFIG_PREEMPTION))
+ pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT";
+
printk("%s PAGE_SIZE=%luK%s%s%s%s%s%s %s\n",
IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) ? "LE" : "BE",
PAGE_SIZE / 1024, get_mmu_str(),
- IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "",
+ pr,
IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
IS_ENABLED(CONFIG_SMP) ? (" NR_CPUS=" __stringify(NR_CPUS)) : "",
debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index af3c15a1d41e..8ae46c5945d0 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -181,11 +181,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
wd_smp_unlock(&flags);
- printk_safe_flush();
- /*
- * printk_safe_flush() seems to require another print
- * before anything actually goes out to console.
- */
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 711fca9bc6f0..dd3babcb3b16 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -178,6 +178,7 @@ config KVM_E500MC
config KVM_MPIC
bool "KVM in-kernel MPIC emulation"
depends on KVM && E500
+ depends on !PREEMPT_RT
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQFD
select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c
index 2735ec90414d..359231cea8cd 100644
--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -738,8 +738,8 @@ static int ps3_notification_read_write(struct ps3_notification_device *dev,
}
pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
- res = wait_event_interruptible(dev->done.wait,
- dev->done.done || kthread_should_stop());
+ res = swait_event_interruptible_exclusive(dev->done.wait,
+ dev->done.done || kthread_should_stop());
if (kthread_should_stop())
res = -EINTR;
if (res) {
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index b4ce9d472dfe..07fafa7a98a3 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -24,6 +24,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -177,6 +178,7 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
}
static DEFINE_PER_CPU(__be64 *, tce_page);
+static DEFINE_LOCAL_IRQ_LOCK(tcp_page_lock);
static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
long npages, unsigned long uaddr,
@@ -198,8 +200,8 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
direction, attrs);
}
- local_irq_save(flags); /* to protect tcep and the page behind it */
-
+ /* to protect tcep and the page behind it */
+ local_lock_irqsave(tcp_page_lock, flags);
tcep = __this_cpu_read(tce_page);
/* This is safe to do since interrupts are off when we're called
@@ -209,7 +211,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
/* If allocation fails, fall back to the loop implementation */
if (!tcep) {
- local_irq_restore(flags);
+ local_unlock_irqrestore(tcp_page_lock, flags);
return tce_build_pSeriesLP(tbl->it_index, tcenum,
tbl->it_page_shift,
npages, uaddr, direction, attrs);
@@ -244,7 +246,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
tcenum += limit;
} while (npages > 0 && !rc);
- local_irq_restore(flags);
+ local_unlock_irqrestore(tcp_page_lock, flags);
if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
ret = (int)rc;
@@ -415,13 +417,14 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
DMA_BIDIRECTIONAL, 0);
}
- local_irq_disable(); /* to protect tcep and the page behind it */
+ /* to protect tcep and the page behind it */
+ local_lock_irq(tcp_page_lock);
tcep = __this_cpu_read(tce_page);
if (!tcep) {
tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
if (!tcep) {
- local_irq_enable();
+ local_unlock_irq(tcp_page_lock);
return -ENOMEM;
}
__this_cpu_write(tce_page, tcep);
@@ -467,7 +470,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
/* error cleanup: caller will clear whole range */
- local_irq_enable();
+ local_unlock_irq(tcp_page_lock);
return rc;
}
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 8ca479831142..8f482313e560 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -155,7 +155,7 @@ _save_context:
REG_L x2, PT_SP(sp)
.endm
-#if !IS_ENABLED(CONFIG_PREEMPT)
+#if !IS_ENABLED(CONFIG_PREEMPTION)
.set resume_kernel, restore_all
#endif
@@ -269,7 +269,7 @@ restore_all:
RESTORE_ALL
sret
-#if IS_ENABLED(CONFIG_PREEMPT)
+#if IS_ENABLED(CONFIG_PREEMPTION)
resume_kernel:
REG_L s0, TASK_TI_PREEMPT_COUNT(tp)
bnez s0, restore_all
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 43a81d0ad507..33d968175038 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -30,7 +30,7 @@ config GENERIC_BUG_RELATIVE_POINTERS
def_bool y
config GENERIC_LOCKBREAK
- def_bool y if PREEMPT
+ def_bool y if PREEMPTTION
config PGSTE
def_bool y if KVM
diff --git a/arch/s390/include/asm/preempt.h b/arch/s390/include/asm/preempt.h
index b5ea9e14c017..6ede29907fbf 100644
--- a/arch/s390/include/asm/preempt.h
+++ b/arch/s390/include/asm/preempt.h
@@ -130,11 +130,11 @@ static inline bool should_resched(int preempt_offset)
#endif /* CONFIG_HAVE_MARCH_Z196_FEATURES */
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
extern asmlinkage void preempt_schedule(void);
#define __preempt_schedule() preempt_schedule()
extern asmlinkage void preempt_schedule_notrace(void);
#define __preempt_schedule_notrace() preempt_schedule_notrace()
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
#endif /* __ASM_PREEMPT_H */
diff --git a/arch/s390/include/asm/spinlock_types.h b/arch/s390/include/asm/spinlock_types.h
index cfed272e4fd5..8e28e8176ec8 100644
--- a/arch/s390/include/asm/spinlock_types.h
+++ b/arch/s390/include/asm/spinlock_types.h
@@ -2,10 +2,6 @@
#ifndef __ASM_SPINLOCK_TYPES_H
#define __ASM_SPINLOCK_TYPES_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
typedef struct {
int lock;
} __attribute__ ((aligned (4))) arch_spinlock_t;
diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c
index 34bdc60c0b11..8e38447be465 100644
--- a/arch/s390/kernel/dumpstack.c
+++ b/arch/s390/kernel/dumpstack.c
@@ -194,6 +194,8 @@ void die(struct pt_regs *regs, const char *str)
regs->int_code >> 17, ++die_counter);
#ifdef CONFIG_PREEMPT
pr_cont("PREEMPT ");
+#elif defined(CONFIG_PREEMPT_RT)
+ pr_cont("PREEMPT_RT ");
#endif
pr_cont("SMP ");
if (debug_pagealloc_enabled())
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index c544b7a11ebb..9584e743102b 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -800,7 +800,7 @@ ENTRY(io_int_handler)
.Lio_work:
tm __PT_PSW+1(%r11),0x01 # returning to user ?
jo .Lio_work_user # yes -> do resched & signal
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
# check for preemptive scheduling
icm %r0,15,__LC_PREEMPT_COUNT
jnz .Lio_restore # preemption is disabled
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index f356ee674d89..9ece111b0254 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -108,7 +108,7 @@ config GENERIC_CALIBRATE_DELAY
config GENERIC_LOCKBREAK
def_bool y
- depends on SMP && PREEMPT
+ depends on SMP && PREEMPTION
config ARCH_SUSPEND_POSSIBLE
def_bool n
diff --git a/arch/sh/include/asm/spinlock_types.h b/arch/sh/include/asm/spinlock_types.h
index e82369f286a2..22ca9a98bbb8 100644
--- a/arch/sh/include/asm/spinlock_types.h
+++ b/arch/sh/include/asm/spinlock_types.h
@@ -2,10 +2,6 @@
#ifndef __ASM_SH_SPINLOCK_TYPES_H
#define __ASM_SH_SPINLOCK_TYPES_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
typedef struct {
volatile unsigned int lock;
} arch_spinlock_t;
diff --git a/arch/sh/kernel/cpu/sh5/entry.S b/arch/sh/kernel/cpu/sh5/entry.S
index de68ffdfffbf..81c8b64b977f 100644
--- a/arch/sh/kernel/cpu/sh5/entry.S
+++ b/arch/sh/kernel/cpu/sh5/entry.S
@@ -86,7 +86,7 @@
andi r6, ~0xf0, r6; \
putcon r6, SR;
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
# define preempt_stop() CLI()
#else
# define preempt_stop()
@@ -884,7 +884,7 @@ ret_from_exception:
/* Check softirqs */
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
pta ret_from_syscall, tr0
blink tr0, ZERO
diff --git a/arch/sh/kernel/entry-common.S b/arch/sh/kernel/entry-common.S
index 4a8ec9e40cc2..9bac5bbb67f3 100644
--- a/arch/sh/kernel/entry-common.S
+++ b/arch/sh/kernel/entry-common.S
@@ -41,7 +41,7 @@
*/
#include
-#if defined(CONFIG_PREEMPT)
+#if defined(CONFIG_PREEMPTION)
# define preempt_stop() cli ; TRACE_IRQS_OFF
#else
# define preempt_stop()
@@ -84,7 +84,7 @@ ENTRY(ret_from_irq)
get_current_thread_info r8, r0
bt resume_kernel ! Yes, it's from kernel, go back soon
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
bra resume_userspace
nop
ENTRY(resume_kernel)
diff --git a/arch/sh/kernel/irq.c b/arch/sh/kernel/irq.c
index 5717c7cbdd97..c4e46252377e 100644
--- a/arch/sh/kernel/irq.c
+++ b/arch/sh/kernel/irq.c
@@ -148,6 +148,7 @@ void irq_ctx_exit(int cpu)
hardirq_ctx[cpu] = NULL;
}
+#ifndef CONFIG_PREEMPT_RT
void do_softirq_own_stack(void)
{
struct thread_info *curctx;
@@ -175,6 +176,7 @@ void do_softirq_own_stack(void)
"r5", "r6", "r7", "r8", "r9", "r15", "t", "pr"
);
}
+#endif
#else
static inline void handle_one_irq(unsigned int irq)
{
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 18e9fb6fcf1b..7b9b3a954a76 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -276,7 +276,7 @@ config US3_MC
config GENERIC_LOCKBREAK
bool
default y
- depends on SPARC64 && SMP && PREEMPT
+ depends on SPARC64 && SMP && PREEMPTION
config NUMA
bool "NUMA support"
diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c
index 3ec9f1402aad..eb21682abfcb 100644
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -854,6 +854,7 @@ void __irq_entry handler_irq(int pil, struct pt_regs *regs)
set_irq_regs(old_regs);
}
+#ifndef CONFIG_PREEMPT_RT
void do_softirq_own_stack(void)
{
void *orig_sp, *sp = softirq_stack[smp_processor_id()];
@@ -868,6 +869,7 @@ void do_softirq_own_stack(void)
__asm__ __volatile__("mov %0, %%sp"
: : "r" (orig_sp));
}
+#endif
#ifdef CONFIG_HOTPLUG_CPU
void fixup_irqs(void)
diff --git a/arch/sparc/kernel/rtrap_64.S b/arch/sparc/kernel/rtrap_64.S
index 29aa34f11720..c5fd4b450d9b 100644
--- a/arch/sparc/kernel/rtrap_64.S
+++ b/arch/sparc/kernel/rtrap_64.S
@@ -310,7 +310,7 @@ kern_rtt_restore:
retry
to_kernel:
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
ldsw [%g6 + TI_PRE_COUNT], %l5
brnz %l5, kern_fpucheck
ldx [%g6 + TI_FLAGS], %l5
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8ef85139553f..4b77b3273051 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -90,6 +90,7 @@ config X86
select ARCH_SUPPORTS_ACPI
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
+ select ARCH_SUPPORTS_RT
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
@@ -132,8 +133,8 @@ config X86
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
- select HAVE_ARCH_JUMP_LABEL
- select HAVE_ARCH_JUMP_LABEL_RELATIVE
+ select HAVE_ARCH_JUMP_LABEL if !PREEMPT_RT
+ select HAVE_ARCH_JUMP_LABEL_RELATIVE if !PREEMPT_RT
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS if MMU
@@ -199,6 +200,7 @@ config X86
select HAVE_PCI
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+ select HAVE_PREEMPT_LAZY
select HAVE_RCU_TABLE_FREE if PARAVIRT
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 3e707e81afdb..0e80da8190be 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -387,14 +387,14 @@ static int ecb_encrypt(struct skcipher_request *req)
err = skcipher_walk_virt(&walk, req, true);
- kernel_fpu_begin();
while ((nbytes = walk.nbytes)) {
+ kernel_fpu_begin();
aesni_ecb_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
nbytes & AES_BLOCK_MASK);
+ kernel_fpu_end();
nbytes &= AES_BLOCK_SIZE - 1;
err = skcipher_walk_done(&walk, nbytes);
}
- kernel_fpu_end();
return err;
}
@@ -409,14 +409,14 @@ static int ecb_decrypt(struct skcipher_request *req)
err = skcipher_walk_virt(&walk, req, true);
- kernel_fpu_begin();
while ((nbytes = walk.nbytes)) {
+ kernel_fpu_begin();
aesni_ecb_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
nbytes & AES_BLOCK_MASK);
+ kernel_fpu_end();
nbytes &= AES_BLOCK_SIZE - 1;
err = skcipher_walk_done(&walk, nbytes);
}
- kernel_fpu_end();
return err;
}
@@ -431,14 +431,14 @@ static int cbc_encrypt(struct skcipher_request *req)
err = skcipher_walk_virt(&walk, req, true);
- kernel_fpu_begin();
while ((nbytes = walk.nbytes)) {
+ kernel_fpu_begin();
aesni_cbc_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
nbytes & AES_BLOCK_MASK, walk.iv);
+ kernel_fpu_end();
nbytes &= AES_BLOCK_SIZE - 1;
err = skcipher_walk_done(&walk, nbytes);
}
- kernel_fpu_end();
return err;
}
@@ -453,14 +453,14 @@ static int cbc_decrypt(struct skcipher_request *req)
err = skcipher_walk_virt(&walk, req, true);
- kernel_fpu_begin();
while ((nbytes = walk.nbytes)) {
+ kernel_fpu_begin();
aesni_cbc_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
nbytes & AES_BLOCK_MASK, walk.iv);
+ kernel_fpu_end();
nbytes &= AES_BLOCK_SIZE - 1;
err = skcipher_walk_done(&walk, nbytes);
}
- kernel_fpu_end();
return err;
}
@@ -510,18 +510,20 @@ static int ctr_crypt(struct skcipher_request *req)
err = skcipher_walk_virt(&walk, req, true);
- kernel_fpu_begin();
while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) {
+ kernel_fpu_begin();
aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr,
nbytes & AES_BLOCK_MASK, walk.iv);
+ kernel_fpu_end();
nbytes &= AES_BLOCK_SIZE - 1;
err = skcipher_walk_done(&walk, nbytes);
}
if (walk.nbytes) {
+ kernel_fpu_begin();
ctr_crypt_final(ctx, &walk);
+ kernel_fpu_end();
err = skcipher_walk_done(&walk, 0);
}
- kernel_fpu_end();
return err;
}
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 384ccb00f9e1..2f8df8ef8644 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -46,7 +46,7 @@ static inline void cast5_fpu_end(bool fpu_enabled)
static int ecb_crypt(struct skcipher_request *req, bool enc)
{
- bool fpu_enabled = false;
+ bool fpu_enabled;
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct cast5_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
@@ -61,7 +61,7 @@ static int ecb_crypt(struct skcipher_request *req, bool enc)
u8 *wsrc = walk.src.virt.addr;
u8 *wdst = walk.dst.virt.addr;
- fpu_enabled = cast5_fpu_begin(fpu_enabled, &walk, nbytes);
+ fpu_enabled = cast5_fpu_begin(false, &walk, nbytes);
/* Process multi-block batch */
if (nbytes >= bsize * CAST5_PARALLEL_BLOCKS) {
@@ -90,10 +90,9 @@ static int ecb_crypt(struct skcipher_request *req, bool enc)
} while (nbytes >= bsize);
done:
+ cast5_fpu_end(fpu_enabled);
err = skcipher_walk_done(&walk, nbytes);
}
-
- cast5_fpu_end(fpu_enabled);
return err;
}
@@ -197,7 +196,7 @@ static int cbc_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct cast5_ctx *ctx = crypto_skcipher_ctx(tfm);
- bool fpu_enabled = false;
+ bool fpu_enabled;
struct skcipher_walk walk;
unsigned int nbytes;
int err;
@@ -205,12 +204,11 @@ static int cbc_decrypt(struct skcipher_request *req)
err = skcipher_walk_virt(&walk, req, false);
while ((nbytes = walk.nbytes)) {
- fpu_enabled = cast5_fpu_begin(fpu_enabled, &walk, nbytes);
+ fpu_enabled = cast5_fpu_begin(false, &walk, nbytes);
nbytes = __cbc_decrypt(ctx, &walk);
+ cast5_fpu_end(fpu_enabled);
err = skcipher_walk_done(&walk, nbytes);
}
-
- cast5_fpu_end(fpu_enabled);
return err;
}
@@ -277,7 +275,7 @@ static int ctr_crypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct cast5_ctx *ctx = crypto_skcipher_ctx(tfm);
- bool fpu_enabled = false;
+ bool fpu_enabled;
struct skcipher_walk walk;
unsigned int nbytes;
int err;
@@ -285,13 +283,12 @@ static int ctr_crypt(struct skcipher_request *req)
err = skcipher_walk_virt(&walk, req, false);
while ((nbytes = walk.nbytes) >= CAST5_BLOCK_SIZE) {
- fpu_enabled = cast5_fpu_begin(fpu_enabled, &walk, nbytes);
+ fpu_enabled = cast5_fpu_begin(false, &walk, nbytes);
nbytes = __ctr_crypt(&walk, ctx);
+ cast5_fpu_end(fpu_enabled);
err = skcipher_walk_done(&walk, nbytes);
}
- cast5_fpu_end(fpu_enabled);
-
if (walk.nbytes) {
ctr_crypt_final(&walk, ctx);
err = skcipher_walk_done(&walk, 0);
diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
index 388f95a4ec24..f8e9ee81c39d 100644
--- a/arch/x86/crypto/chacha_glue.c
+++ b/arch/x86/crypto/chacha_glue.c
@@ -127,7 +127,6 @@ static int chacha_simd_stream_xor(struct skcipher_walk *walk,
const struct chacha_ctx *ctx, const u8 *iv)
{
u32 *state, state_buf[16 + 2] __aligned(8);
- int next_yield = 4096; /* bytes until next FPU yield */
int err = 0;
BUILD_BUG_ON(CHACHA_STATE_ALIGN != 16);
@@ -140,20 +139,14 @@ static int chacha_simd_stream_xor(struct skcipher_walk *walk,
if (nbytes < walk->total) {
nbytes = round_down(nbytes, walk->stride);
- next_yield -= nbytes;
}
chacha_dosimd(state, walk->dst.virt.addr, walk->src.virt.addr,
nbytes, ctx->nrounds);
- if (next_yield <= 0) {
- /* temporarily allow preemption */
- kernel_fpu_end();
- kernel_fpu_begin();
- next_yield = 4096;
- }
-
+ kernel_fpu_end();
err = skcipher_walk_done(walk, walk->nbytes - nbytes);
+ kernel_fpu_begin();
}
return err;
diff --git a/arch/x86/crypto/glue_helper.c b/arch/x86/crypto/glue_helper.c
index d15b99397480..36c28b295b8e 100644
--- a/arch/x86/crypto/glue_helper.c
+++ b/arch/x86/crypto/glue_helper.c
@@ -24,7 +24,7 @@ int glue_ecb_req_128bit(const struct common_glue_ctx *gctx,
void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req));
const unsigned int bsize = 128 / 8;
struct skcipher_walk walk;
- bool fpu_enabled = false;
+ bool fpu_enabled;
unsigned int nbytes;
int err;
@@ -37,7 +37,7 @@ int glue_ecb_req_128bit(const struct common_glue_ctx *gctx,
unsigned int i;
fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
- &walk, fpu_enabled, nbytes);
+ &walk, false, nbytes);
for (i = 0; i < gctx->num_funcs; i++) {
func_bytes = bsize * gctx->funcs[i].num_blocks;
@@ -55,10 +55,9 @@ int glue_ecb_req_128bit(const struct common_glue_ctx *gctx,
if (nbytes < bsize)
break;
}
+ glue_fpu_end(fpu_enabled);
err = skcipher_walk_done(&walk, nbytes);
}
-
- glue_fpu_end(fpu_enabled);
return err;
}
EXPORT_SYMBOL_GPL(glue_ecb_req_128bit);
@@ -101,7 +100,7 @@ int glue_cbc_decrypt_req_128bit(const struct common_glue_ctx *gctx,
void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req));
const unsigned int bsize = 128 / 8;
struct skcipher_walk walk;
- bool fpu_enabled = false;
+ bool fpu_enabled;
unsigned int nbytes;
int err;
@@ -115,7 +114,7 @@ int glue_cbc_decrypt_req_128bit(const struct common_glue_ctx *gctx,
u128 last_iv;
fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
- &walk, fpu_enabled, nbytes);
+ &walk, false, nbytes);
/* Start of the last block. */
src += nbytes / bsize - 1;
dst += nbytes / bsize - 1;
@@ -147,10 +146,10 @@ int glue_cbc_decrypt_req_128bit(const struct common_glue_ctx *gctx,
done:
u128_xor(dst, dst, (u128 *)walk.iv);
*(u128 *)walk.iv = last_iv;
+ glue_fpu_end(fpu_enabled);
err = skcipher_walk_done(&walk, nbytes);
}
- glue_fpu_end(fpu_enabled);
return err;
}
EXPORT_SYMBOL_GPL(glue_cbc_decrypt_req_128bit);
@@ -161,7 +160,7 @@ int glue_ctr_req_128bit(const struct common_glue_ctx *gctx,
void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req));
const unsigned int bsize = 128 / 8;
struct skcipher_walk walk;
- bool fpu_enabled = false;
+ bool fpu_enabled;
unsigned int nbytes;
int err;
@@ -175,7 +174,7 @@ int glue_ctr_req_128bit(const struct common_glue_ctx *gctx,
le128 ctrblk;
fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
- &walk, fpu_enabled, nbytes);
+ &walk, false, nbytes);
be128_to_le128(&ctrblk, (be128 *)walk.iv);
@@ -199,11 +198,10 @@ int glue_ctr_req_128bit(const struct common_glue_ctx *gctx,
}
le128_to_be128((be128 *)walk.iv, &ctrblk);
+ glue_fpu_end(fpu_enabled);
err = skcipher_walk_done(&walk, nbytes);
}
- glue_fpu_end(fpu_enabled);
-
if (nbytes) {
le128 ctrblk;
u128 tmp;
@@ -301,8 +299,14 @@ int glue_xts_req_128bit(const struct common_glue_ctx *gctx,
tweak_fn(tweak_ctx, walk.iv, walk.iv);
while (nbytes) {
+ fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit,
+ &walk, fpu_enabled,
+ nbytes < bsize ? bsize : nbytes);
nbytes = __glue_xts_req_128bit(gctx, crypt_ctx, &walk);
+ glue_fpu_end(fpu_enabled);
+ fpu_enabled = false;
+
err = skcipher_walk_done(&walk, nbytes);
nbytes = walk.nbytes;
}
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3f8e22615812..26934d55c14a 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -130,7 +130,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
#define EXIT_TO_USERMODE_LOOP_FLAGS \
(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
- _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING)
+ _TIF_NEED_RESCHED_MASK | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING)
static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
{
@@ -145,9 +145,16 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
/* We have work to do. */
local_irq_enable();
- if (cached_flags & _TIF_NEED_RESCHED)
+ if (cached_flags & _TIF_NEED_RESCHED_MASK)
schedule();
+#ifdef ARCH_RT_DELAYS_SIGNAL_SEND
+ if (unlikely(current->forced_info.si_signo)) {
+ struct task_struct *t = current;
+ force_sig_info(&t->forced_info);
+ t->forced_info.si_signo = 0;
+ }
+#endif
if (cached_flags & _TIF_UPROBE)
uprobe_notify_resume(regs);
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 390edb763826..0e62c515eb54 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1106,8 +1106,26 @@ restore_all:
restore_all_kernel:
#ifdef CONFIG_PREEMPTION
DISABLE_INTERRUPTS(CLBR_ANY)
+ # preempt count == 0 + NEED_RS set?
cmpl $0, PER_CPU_VAR(__preempt_count)
+#ifndef CONFIG_PREEMPT_LAZY
jnz .Lno_preempt
+#else
+ jz test_int_off
+
+ # atleast preempt count == 0 ?
+ cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count)
+ jne .Lno_preempt
+
+ movl PER_CPU_VAR(current_task), %ebp
+ cmpl $0,TASK_TI_preempt_lazy_count(%ebp) # non-zero preempt_lazy_count ?
+ jnz .Lno_preempt
+
+ testl $_TIF_NEED_RESCHED_LAZY, TASK_TI_flags(%ebp)
+ jz .Lno_preempt
+
+test_int_off:
+#endif
testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ?
jz .Lno_preempt
call preempt_schedule_irq
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 2ba3d53ac5b1..d859d2fe7421 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -671,7 +671,23 @@ retint_kernel:
btl $9, EFLAGS(%rsp) /* were interrupts off? */
jnc 1f
cmpl $0, PER_CPU_VAR(__preempt_count)
+#ifndef CONFIG_PREEMPT_LAZY
jnz 1f
+#else
+ jz do_preempt_schedule_irq
+
+ # atleast preempt count == 0 ?
+ cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count)
+ jnz 1f
+
+ movq PER_CPU_VAR(current_task), %rcx
+ cmpl $0, TASK_TI_preempt_lazy_count(%rcx)
+ jnz 1f
+
+ btl $TIF_NEED_RESCHED_LAZY,TASK_TI_flags(%rcx)
+ jnc 1f
+do_preempt_schedule_irq:
+#endif
call preempt_schedule_irq
1:
#endif
@@ -1075,6 +1091,7 @@ EXPORT_SYMBOL(native_load_gs_index)
jmp 2b
.previous
+#ifndef CONFIG_PREEMPT_RT
/* Call softirq on interrupt stack. Interrupts are off. */
ENTRY(do_softirq_own_stack)
pushq %rbp
@@ -1085,6 +1102,7 @@ ENTRY(do_softirq_own_stack)
leaveq
ret
ENDPROC(do_softirq_own_stack)
+#endif
#ifdef CONFIG_XEN_PV
idtentry hypervisor_callback xen_do_hypervisor_callback has_error_code=0
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index b774c52e5411..1786e4d9f70e 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -23,6 +23,7 @@ extern void kernel_fpu_begin(void);
extern void kernel_fpu_end(void);
extern bool irq_fpu_usable(void);
extern void fpregs_mark_activate(void);
+extern void kernel_fpu_resched(void);
/*
* Use fpregs_lock() while editing CPU's FPU registers or fpu->state.
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 3d4cb83a8828..1f28e9f3dc8e 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -89,17 +89,48 @@ static __always_inline void __preempt_count_sub(int val)
* a decrement which hits zero means we have no preempt_count and should
* reschedule.
*/
-static __always_inline bool __preempt_count_dec_and_test(void)
+static __always_inline bool ____preempt_count_dec_and_test(void)
{
return GEN_UNARY_RMWcc("decl", __preempt_count, e, __percpu_arg([var]));
}
+static __always_inline bool __preempt_count_dec_and_test(void)
+{
+ if (____preempt_count_dec_and_test())
+ return true;
+#ifdef CONFIG_PREEMPT_LAZY
+ if (preempt_count())
+ return false;
+ if (current_thread_info()->preempt_lazy_count)
+ return false;
+ return test_thread_flag(TIF_NEED_RESCHED_LAZY);
+#else
+ return false;
+#endif
+}
+
/*
* Returns true when we need to resched and can (barring IRQ state).
*/
static __always_inline bool should_resched(int preempt_offset)
{
+#ifdef CONFIG_PREEMPT_LAZY
+ u32 tmp;
+ tmp = raw_cpu_read_4(__preempt_count);
+ if (tmp == preempt_offset)
+ return true;
+
+ /* preempt count == 0 ? */
+ tmp &= ~PREEMPT_NEED_RESCHED;
+ if (tmp != preempt_offset)
+ return false;
+ /* XXX PREEMPT_LOCK_OFFSET */
+ if (current_thread_info()->preempt_lazy_count)
+ return false;
+ return test_thread_flag(TIF_NEED_RESCHED_LAZY);
+#else
return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
+#endif
}
#ifdef CONFIG_PREEMPTION
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index 33d3c88a7225..65b9a5d4132b 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -28,6 +28,19 @@ typedef struct {
#define SA_IA32_ABI 0x02000000u
#define SA_X32_ABI 0x01000000u
+/*
+ * Because some traps use the IST stack, we must keep preemption
+ * disabled while calling do_trap(), but do_trap() may call
+ * force_sig_info() which will grab the signal spin_locks for the
+ * task, which in PREEMPT_RT are mutexes. By defining
+ * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set
+ * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the
+ * trap.
+ */
+#if defined(CONFIG_PREEMPT_RT)
+#define ARCH_RT_DELAYS_SIGNAL_SEND
+#endif
+
#ifndef CONFIG_COMPAT
typedef sigset_t compat_sigset_t;
#endif
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 9804a7957f4e..afc3d1e17841 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -65,7 +65,7 @@
*/
static __always_inline void boot_init_stack_canary(void)
{
- u64 canary;
+ u64 uninitialized_var(canary);
u64 tsc;
#ifdef CONFIG_X86_64
@@ -76,8 +76,14 @@ static __always_inline void boot_init_stack_canary(void)
* of randomness. The TSC only matters for very early init,
* there it already has some randomness on most systems. Later
* on during the bootup the random pool has true entropy too.
+ * For preempt-rt we need to weaken the randomness a bit, as
+ * we can't call into the random generator from atomic context
+ * due to locking constraints. We just leave canary
+ * uninitialized and use the TSC based randomness on top of it.
*/
+#ifndef CONFIG_PREEMPT_RT
get_random_bytes(&canary, sizeof(canary));
+#endif
tsc = rdtsc();
canary += tsc + (tsc << 32UL);
canary &= CANARY_MASK;
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f9453536f9bb..f346f24140be 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -56,17 +56,24 @@ struct task_struct;
struct thread_info {
unsigned long flags; /* low level flags */
u32 status; /* thread synchronous flags */
+ int preempt_lazy_count; /* 0 => lazy preemptable
+ <0 => BUG */
};
#define INIT_THREAD_INFO(tsk) \
{ \
.flags = 0, \
+ .preempt_lazy_count = 0, \
}
#else /* !__ASSEMBLY__ */
#include
+#define GET_THREAD_INFO(reg) \
+ _ASM_MOV PER_CPU_VAR(cpu_current_top_of_stack),reg ; \
+ _ASM_SUB $(THREAD_SIZE),reg ;
+
#endif
/*
@@ -92,6 +99,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_NEED_RESCHED_LAZY 18 /* lazy rescheduling necessary */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +130,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -159,6 +168,8 @@ struct thread_info {
#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
+#define _TIF_NEED_RESCHED_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)
+
#define STACK_WARN (THREAD_SIZE/8)
/*
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 4b6301946f45..76a6b7553a0f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1725,7 +1725,7 @@ static bool io_apic_level_ack_pending(struct mp_chip_data *data)
return false;
}
-static inline bool ioapic_irqd_mask(struct irq_data *data)
+static inline bool ioapic_prepare_move(struct irq_data *data)
{
/* If we are moving the IRQ we need to mask it */
if (unlikely(irqd_is_setaffinity_pending(data))) {
@@ -1736,9 +1736,9 @@ static inline bool ioapic_irqd_mask(struct irq_data *data)
return false;
}
-static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked)
+static inline void ioapic_finish_move(struct irq_data *data, bool moveit)
{
- if (unlikely(masked)) {
+ if (unlikely(moveit)) {
/* Only migrate the irq if the ack has been received.
*
* On rare occasions the broadcast level triggered ack gets
@@ -1773,11 +1773,11 @@ static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked)
}
}
#else
-static inline bool ioapic_irqd_mask(struct irq_data *data)
+static inline bool ioapic_prepare_move(struct irq_data *data)
{
return false;
}
-static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked)
+static inline void ioapic_finish_move(struct irq_data *data, bool moveit)
{
}
#endif
@@ -1786,11 +1786,11 @@ static void ioapic_ack_level(struct irq_data *irq_data)
{
struct irq_cfg *cfg = irqd_cfg(irq_data);
unsigned long v;
- bool masked;
+ bool moveit;
int i;
irq_complete_move(cfg);
- masked = ioapic_irqd_mask(irq_data);
+ moveit = ioapic_prepare_move(irq_data);
/*
* It appears there is an erratum which affects at least version 0x11
@@ -1845,7 +1845,7 @@ static void ioapic_ack_level(struct irq_data *irq_data)
eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
}
- ioapic_irqd_unmask(irq_data, masked);
+ ioapic_finish_move(irq_data, moveit);
}
static void ioapic_ir_ack_level(struct irq_data *irq_data)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 5c7ee3df4d0b..24bcc28f68d4 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -38,6 +38,10 @@ static void __used common(void)
#endif
BLANK();
+#ifdef CONFIG_PREEMPT_LAZY
+ OFFSET(TASK_TI_flags, task_struct, thread_info.flags);
+ OFFSET(TASK_TI_preempt_lazy_count, task_struct, thread_info.preempt_lazy_count);
+#endif
OFFSET(TASK_addr_limit, task_struct, thread.addr_limit);
BLANK();
@@ -92,6 +96,7 @@ static void __used common(void)
BLANK();
DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
+ DEFINE(_PREEMPT_ENABLED, PREEMPT_ENABLED);
/* TLB state for the entry code */
OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 1c2f9baf8483..e7fd451ff9d2 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -77,12 +77,13 @@ EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
__visible void __irq_entry hv_stimer0_vector_handler(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
+ u64 ip = regs ? instruction_pointer(regs) : 0;
entering_irq();
inc_irq_stat(hyperv_stimer0_count);
if (hv_stimer0_handler)
hv_stimer0_handler();
- add_interrupt_randomness(HYPERV_STIMER0_VECTOR, 0);
+ add_interrupt_randomness(HYPERV_STIMER0_VECTOR, 0, ip);
ack_APIC_irq();
exiting_irq();
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index cd8839027f66..e55ddeaf23ea 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -119,6 +119,18 @@ void kernel_fpu_end(void)
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
+void kernel_fpu_resched(void)
+{
+ WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
+
+ if (should_resched(PREEMPT_OFFSET)) {
+ kernel_fpu_end();
+ cond_resched();
+ kernel_fpu_begin();
+ }
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_resched);
+
/*
* Save the FPU state (mark it for reload if necessary):
*
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index a759ca97cd01..a74a7606806c 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -131,6 +131,7 @@ int irq_init_percpu_irqstack(unsigned int cpu)
return 0;
}
+#ifndef CONFIG_PREEMPT_RT
void do_softirq_own_stack(void)
{
struct irq_stack *irqstk;
@@ -147,6 +148,7 @@ void do_softirq_own_stack(void)
call_on_stack(__do_softirq, isp);
}
+#endif
void handle_irq(struct irq_desc *desc, struct pt_regs *regs)
{
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index b8ceec4974fe..39510443d996 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -38,6 +38,7 @@
#include
#include
#include
+#include
#include
#include
@@ -196,6 +197,35 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
}
EXPORT_SYMBOL_GPL(start_thread);
+#ifdef CONFIG_PREEMPT_RT
+static void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p)
+{
+ int i;
+
+ /*
+ * Clear @prev's kmap_atomic mappings
+ */
+ for (i = 0; i < prev_p->kmap_idx; i++) {
+ int idx = i + KM_TYPE_NR * smp_processor_id();
+ pte_t *ptep = kmap_pte - idx;
+
+ kpte_clear_flush(ptep, __fix_to_virt(FIX_KMAP_BEGIN + idx));
+ }
+ /*
+ * Restore @next_p's kmap_atomic mappings
+ */
+ for (i = 0; i < next_p->kmap_idx; i++) {
+ int idx = i + KM_TYPE_NR * smp_processor_id();
+
+ if (!pte_none(next_p->kmap_pte[i]))
+ set_pte(kmap_pte - idx, next_p->kmap_pte[i]);
+ }
+}
+#else
+static inline void
+switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) { }
+#endif
+
/*
* switch_to(x,y) should switch tasks from x to y.
@@ -266,6 +296,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
switch_to_extra(prev_p, next_p);
+ switch_kmaps(prev_p, next_p);
+
/*
* Leave lazy mode, flushing any hypercalls made here.
* This must be done before restoring TLS segments so
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 12e83297ea02..59f00d73b71d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7280,6 +7280,14 @@ int kvm_arch_init(void *opaque)
goto out;
}
+#ifdef CONFIG_PREEMPT_RT
+ if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+ pr_err("RT requires X86_FEATURE_CONSTANT_TSC\n");
+ r = -EOPNOTSUPP;
+ goto out;
+ }
+#endif
+
r = -ENOMEM;
x86_fpu_cache = kmem_cache_create("x86_fpu", sizeof(struct fpu),
__alignof__(struct fpu), SLAB_ACCOUNT,
diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 0a1898b8552e..8606f1758207 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -33,10 +33,11 @@ EXPORT_SYMBOL(kunmap);
*/
void *kmap_atomic_prot(struct page *page, pgprot_t prot)
{
+ pte_t pte = mk_pte(page, prot);
unsigned long vaddr;
int idx, type;
- preempt_disable();
+ preempt_disable_nort();
pagefault_disable();
if (!PageHighMem(page))
@@ -46,7 +47,10 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
idx = type + KM_TYPE_NR*smp_processor_id();
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
BUG_ON(!pte_none(*(kmap_pte-idx)));
- set_pte(kmap_pte-idx, mk_pte(page, prot));
+#ifdef CONFIG_PREEMPT_RT
+ current->kmap_pte[type] = pte;
+#endif
+ set_pte(kmap_pte-idx, pte);
arch_flush_lazy_mmu_mode();
return (void *)vaddr;
@@ -89,6 +93,9 @@ void __kunmap_atomic(void *kvaddr)
* is a bad idea also, in case the page changes cacheability
* attributes or becomes a protected page in a hypervisor.
*/
+#ifdef CONFIG_PREEMPT_RT
+ current->kmap_pte[type] = __pte(0);
+#endif
kpte_clear_flush(kmap_pte-idx, vaddr);
kmap_atomic_idx_pop();
arch_flush_lazy_mmu_mode();
@@ -101,7 +108,7 @@ void __kunmap_atomic(void *kvaddr)
#endif
pagefault_enable();
- preempt_enable();
+ preempt_enable_nort();
}
EXPORT_SYMBOL(__kunmap_atomic);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 6748b4c2baff..54e2f28fe189 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -46,6 +46,7 @@ EXPORT_SYMBOL_GPL(iomap_free);
void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
{
+ pte_t pte = pfn_pte(pfn, prot);
unsigned long vaddr;
int idx, type;
@@ -55,7 +56,12 @@ void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
type = kmap_atomic_idx_push();
idx = type + KM_TYPE_NR * smp_processor_id();
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
- set_pte(kmap_pte - idx, pfn_pte(pfn, prot));
+ WARN_ON(!pte_none(*(kmap_pte - idx)));
+
+#ifdef CONFIG_PREEMPT_RT
+ current->kmap_pte[type] = pte;
+#endif
+ set_pte(kmap_pte - idx, pte);
arch_flush_lazy_mmu_mode();
return (void *)vaddr;
@@ -106,6 +112,9 @@ iounmap_atomic(void __iomem *kvaddr)
* is a bad idea also, in case the page changes cacheability
* attributes or becomes a protected page in a hypervisor.
*/
+#ifdef CONFIG_PREEMPT_RT
+ current->kmap_pte[type] = __pte(0);
+#endif
kpte_clear_flush(kmap_pte-idx, vaddr);
kmap_atomic_idx_pop();
}
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e6a9edc5baaf..66f96f21a7b6 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -708,7 +708,7 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
(void *)info, 1);
else
on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func_remote,
- (void *)info, 1, GFP_ATOMIC, cpumask);
+ (void *)info, 1, cpumask);
}
/*
diff --git a/arch/xtensa/include/asm/spinlock_types.h b/arch/xtensa/include/asm/spinlock_types.h
index 64c9389254f1..dc846323b1cd 100644
--- a/arch/xtensa/include/asm/spinlock_types.h
+++ b/arch/xtensa/include/asm/spinlock_types.h
@@ -2,10 +2,6 @@
#ifndef __ASM_SPINLOCK_TYPES_H
#define __ASM_SPINLOCK_TYPES_H
-#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
-# error "please don't include this file directly"
-#endif
-
#include
#include
diff --git a/arch/xtensa/kernel/entry.S b/arch/xtensa/kernel/entry.S
index 1f07876ea2ed..7728623713ac 100644
--- a/arch/xtensa/kernel/entry.S
+++ b/arch/xtensa/kernel/entry.S
@@ -525,7 +525,7 @@ common_exception_return:
call4 schedule # void schedule (void)
j 1b
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
6:
_bbci.l a4, TIF_NEED_RESCHED, 4f
diff --git a/arch/xtensa/kernel/traps.c b/arch/xtensa/kernel/traps.c
index 4a6c495ce9b6..2e7cf56412a0 100644
--- a/arch/xtensa/kernel/traps.c
+++ b/arch/xtensa/kernel/traps.c
@@ -524,12 +524,15 @@ DEFINE_SPINLOCK(die_lock);
void die(const char * str, struct pt_regs * regs, long err)
{
static int die_counter;
+ const char *pr = "";
+
+ if (IS_ENABLED(CONFIG_PREEMPTION))
+ pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT";
console_verbose();
spin_lock_irq(&die_lock);
- pr_info("%s: sig: %ld [#%d]%s\n", str, err, ++die_counter,
- IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "");
+ pr_info("%s: sig: %ld [#%d]%s\n", str, err, ++die_counter, pr);
show_regs(regs);
if (!user_mode(regs))
show_stack(NULL, (unsigned long*)regs->areg[1]);
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9df50fb507ca..10244a1f8fee 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -9,6 +9,7 @@
#include
#include
#include
+#include
#include "blk.h"
@@ -116,7 +117,7 @@ static void ioc_release_fn(struct work_struct *work)
spin_unlock(&q->queue_lock);
} else {
spin_unlock_irqrestore(&ioc->lock, flags);
- cpu_relax();
+ cpu_chill();
spin_lock_irqsave_nested(&ioc->lock, flags, 1);
}
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b748d1e63f9c..f834f3ef46c7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -611,9 +611,17 @@ static void __blk_mq_complete_request(struct request *rq)
return;
}
- cpu = get_cpu();
+ cpu = get_cpu_light();
+ /*
+ * Avoid SMP function calls for completions because they acquire
+ * sleeping spinlocks on RT.
+ */
+#ifdef CONFIG_PREEMPT_RT
+ shared = true;
+#else
if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
shared = cpus_share_cache(cpu, ctx->cpu);
+#endif
if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
rq->csd.func = __blk_mq_complete_request_remote;
@@ -623,7 +631,7 @@ static void __blk_mq_complete_request(struct request *rq)
} else {
q->mq_ops->complete(rq);
}
- put_cpu();
+ put_cpu_light();
}
static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx)
@@ -1477,14 +1485,14 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
return;
if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) {
- int cpu = get_cpu();
+ int cpu = get_cpu_light();
if (cpumask_test_cpu(cpu, hctx->cpumask)) {
__blk_mq_run_hw_queue(hctx);
- put_cpu();
+ put_cpu_light();
return;
}
- put_cpu();
+ put_cpu_light();
}
kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work,
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 457d9ba3eb20..4debe1124b99 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -42,17 +42,13 @@ static __latent_entropy void blk_done_softirq(struct softirq_action *h)
static void trigger_softirq(void *data)
{
struct request *rq = data;
- unsigned long flags;
struct list_head *list;
- local_irq_save(flags);
list = this_cpu_ptr(&blk_cpu_done);
list_add_tail(&rq->ipi_list, list);
if (list->next == &rq->ipi_list)
raise_softirq_irqoff(BLOCK_SOFTIRQ);
-
- local_irq_restore(flags);
}
/*
@@ -91,6 +87,7 @@ static int blk_softirq_cpu_dead(unsigned int cpu)
this_cpu_ptr(&blk_cpu_done));
raise_softirq_irqoff(BLOCK_SOFTIRQ);
local_irq_enable();
+ preempt_check_resched_rt();
return 0;
}
@@ -142,6 +139,7 @@ void __blk_complete_request(struct request *req)
goto do_local;
local_irq_restore(flags);
+ preempt_check_resched_rt();
}
static __init int blk_softirq_init(void)
diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index 927760b316a4..fb1e305aa398 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -36,6 +36,7 @@ static struct workqueue_struct *cryptd_wq;
struct cryptd_cpu_queue {
struct crypto_queue queue;
struct work_struct work;
+ spinlock_t qlock;
};
struct cryptd_queue {
@@ -105,6 +106,7 @@ static int cryptd_init_queue(struct cryptd_queue *queue,
cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu);
crypto_init_queue(&cpu_queue->queue, max_cpu_qlen);
INIT_WORK(&cpu_queue->work, cryptd_queue_worker);
+ spin_lock_init(&cpu_queue->qlock);
}
pr_info("cryptd: max_cpu_qlen set to %d\n", max_cpu_qlen);
return 0;
@@ -129,8 +131,10 @@ static int cryptd_enqueue_request(struct cryptd_queue *queue,
struct cryptd_cpu_queue *cpu_queue;
refcount_t *refcnt;
- cpu = get_cpu();
- cpu_queue = this_cpu_ptr(queue->cpu_queue);
+ cpu_queue = raw_cpu_ptr(queue->cpu_queue);
+ spin_lock_bh(&cpu_queue->qlock);
+ cpu = smp_processor_id();
+
err = crypto_enqueue_request(&cpu_queue->queue, request);
refcnt = crypto_tfm_ctx(request->tfm);
@@ -146,7 +150,7 @@ static int cryptd_enqueue_request(struct cryptd_queue *queue,
refcount_inc(refcnt);
out_put_cpu:
- put_cpu();
+ spin_unlock_bh(&cpu_queue->qlock);
return err;
}
@@ -162,16 +166,11 @@ static void cryptd_queue_worker(struct work_struct *work)
cpu_queue = container_of(work, struct cryptd_cpu_queue, work);
/*
* Only handle one request at a time to avoid hogging crypto workqueue.
- * preempt_disable/enable is used to prevent being preempted by
- * cryptd_enqueue_request(). local_bh_disable/enable is used to prevent
- * cryptd_enqueue_request() being accessed from software interrupts.
*/
- local_bh_disable();
- preempt_disable();
+ spin_lock_bh(&cpu_queue->qlock);
backlog = crypto_get_backlog(&cpu_queue->queue);
req = crypto_dequeue_request(&cpu_queue->queue);
- preempt_enable();
- local_bh_enable();
+ spin_unlock_bh(&cpu_queue->qlock);
if (!req)
return;
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 1a8564a79d8d..9dc3cc8beb2a 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -113,12 +113,20 @@ ssize_t zcomp_available_show(const char *comp, char *buf)
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
{
- return *get_cpu_ptr(comp->stream);
+ struct zcomp_strm *zstrm;
+
+ zstrm = *get_local_ptr(comp->stream);
+ spin_lock(&zstrm->zcomp_lock);
+ return zstrm;
}
void zcomp_stream_put(struct zcomp *comp)
{
- put_cpu_ptr(comp->stream);
+ struct zcomp_strm *zstrm;
+
+ zstrm = *this_cpu_ptr(comp->stream);
+ spin_unlock(&zstrm->zcomp_lock);
+ put_local_ptr(zstrm);
}
int zcomp_compress(struct zcomp_strm *zstrm,
@@ -168,6 +176,7 @@ int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
pr_err("Can't allocate a compression stream\n");
return -ENOMEM;
}
+ spin_lock_init(&zstrm->zcomp_lock);
*per_cpu_ptr(comp->stream, cpu) = zstrm;
return 0;
}
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 1806475b919d..b2c1db8a75ec 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -10,6 +10,7 @@ struct zcomp_strm {
/* compression/decompression buffer */
void *buffer;
struct crypto_comp *tfm;
+ spinlock_t zcomp_lock;
};
/* dynamic per-device compression frontend */
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 36d49159140f..a87413c29dff 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -55,6 +55,40 @@ static void zram_free_page(struct zram *zram, size_t index);
static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
u32 index, int offset, struct bio *bio);
+#ifdef CONFIG_PREEMPT_RT
+static void zram_meta_init_table_locks(struct zram *zram, size_t num_pages)
+{
+ size_t index;
+
+ for (index = 0; index < num_pages; index++)
+ spin_lock_init(&zram->table[index].lock);
+}
+
+static int zram_slot_trylock(struct zram *zram, u32 index)
+{
+ int ret;
+
+ ret = spin_trylock(&zram->table[index].lock);
+ if (ret)
+ __set_bit(ZRAM_LOCK, &zram->table[index].flags);
+ return ret;
+}
+
+static void zram_slot_lock(struct zram *zram, u32 index)
+{
+ spin_lock(&zram->table[index].lock);
+ __set_bit(ZRAM_LOCK, &zram->table[index].flags);
+}
+
+static void zram_slot_unlock(struct zram *zram, u32 index)
+{
+ __clear_bit(ZRAM_LOCK, &zram->table[index].flags);
+ spin_unlock(&zram->table[index].lock);
+}
+
+#else
+
+static void zram_meta_init_table_locks(struct zram *zram, size_t num_pages) { }
static int zram_slot_trylock(struct zram *zram, u32 index)
{
@@ -70,6 +104,7 @@ static void zram_slot_unlock(struct zram *zram, u32 index)
{
bit_spin_unlock(ZRAM_LOCK, &zram->table[index].flags);
}
+#endif
static inline bool init_done(struct zram *zram)
{
@@ -1154,6 +1189,7 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
if (!huge_class_size)
huge_class_size = zs_huge_class_size(zram->mem_pool);
+ zram_meta_init_table_locks(zram, num_pages);
return true;
}
@@ -1216,6 +1252,7 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
unsigned long handle;
unsigned int size;
void *src, *dst;
+ struct zcomp_strm *zstrm;
zram_slot_lock(zram, index);
if (zram_test_flag(zram, index, ZRAM_WB)) {
@@ -1246,6 +1283,7 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
size = zram_get_obj_size(zram, index);
+ zstrm = zcomp_stream_get(zram->comp);
src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
if (size == PAGE_SIZE) {
dst = kmap_atomic(page);
@@ -1253,14 +1291,13 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
kunmap_atomic(dst);
ret = 0;
} else {
- struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp);
dst = kmap_atomic(page);
ret = zcomp_decompress(zstrm, src, size, dst);
kunmap_atomic(dst);
- zcomp_stream_put(zram->comp);
}
zs_unmap_object(zram->mem_pool, handle);
+ zcomp_stream_put(zram->comp);
zram_slot_unlock(zram, index);
/* Should NEVER happen. Return bio error if it does. */
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index f2fd46daa760..7e4dd447e1dd 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -63,6 +63,7 @@ struct zram_table_entry {
unsigned long element;
};
unsigned long flags;
+ spinlock_t lock;
#ifdef CONFIG_ZRAM_MEMORY_TRACKING
ktime_t ac_time;
#endif
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 75a8f7f57269..156e7db0fe32 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1305,28 +1305,27 @@ static __u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
return *ptr;
}
-void add_interrupt_randomness(int irq, int irq_flags)
+void add_interrupt_randomness(int irq, int irq_flags, __u64 ip)
{
struct entropy_store *r;
struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness);
- struct pt_regs *regs = get_irq_regs();
unsigned long now = jiffies;
cycles_t cycles = random_get_entropy();
__u32 c_high, j_high;
- __u64 ip;
unsigned long seed;
int credit = 0;
if (cycles == 0)
- cycles = get_reg(fast_pool, regs);
+ cycles = get_reg(fast_pool, NULL);
c_high = (sizeof(cycles) > 4) ? cycles >> 32 : 0;
j_high = (sizeof(now) > 4) ? now >> 32 : 0;
fast_pool->pool[0] ^= cycles ^ j_high ^ irq;
fast_pool->pool[1] ^= now ^ c_high;
- ip = regs ? instruction_pointer(regs) : _RET_IP_;
+ if (!ip)
+ ip = _RET_IP_;
fast_pool->pool[2] ^= ip;
fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
- get_reg(fast_pool, regs);
+ get_reg(fast_pool, NULL);
fast_mix(fast_pool);
add_interrupt_bench(cycles);
diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c
index 1784530b8387..c08cbb306636 100644
--- a/drivers/char/tpm/tpm-dev-common.c
+++ b/drivers/char/tpm/tpm-dev-common.c
@@ -20,7 +20,6 @@
#include "tpm-dev.h"
static struct workqueue_struct *tpm_dev_wq;
-static DEFINE_MUTEX(tpm_dev_wq_lock);
static ssize_t tpm_dev_transmit(struct tpm_chip *chip, struct tpm_space *space,
u8 *buf, size_t bufsiz)
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index e7df342a317d..f89057fdc053 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -49,6 +49,31 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
return container_of(data, struct tpm_tis_tcg_phy, priv);
}
+#ifdef CONFIG_PREEMPT_RT
+/*
+ * Flushes previous write operations to chip so that a subsequent
+ * ioread*()s won't stall a cpu.
+ */
+static inline void tpm_tis_flush(void __iomem *iobase)
+{
+ ioread8(iobase + TPM_ACCESS(0));
+}
+#else
+#define tpm_tis_flush(iobase) do { } while (0)
+#endif
+
+static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr)
+{
+ iowrite8(b, iobase + addr);
+ tpm_tis_flush(iobase);
+}
+
+static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr)
+{
+ iowrite32(b, iobase + addr);
+ tpm_tis_flush(iobase);
+}
+
static bool interrupts = true;
module_param(interrupts, bool, 0444);
MODULE_PARM_DESC(interrupts, "Enable interrupts");
@@ -146,7 +171,7 @@ static int tpm_tcg_write_bytes(struct tpm_tis_data *data, u32 addr, u16 len,
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);
while (len--)
- iowrite8(*value++, phy->iobase + addr);
+ tpm_tis_iowrite8(*value++, phy->iobase, addr);
return 0;
}
@@ -173,7 +198,7 @@ static int tpm_tcg_write32(struct tpm_tis_data *data, u32 addr, u32 value)
{
struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data);
- iowrite32(value, phy->iobase + addr);
+ tpm_tis_iowrite32(value, phy->iobase, addr);
return 0;
}
diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index f35a53ce8988..d02f9c8d2848 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -434,6 +434,13 @@ config ATMEL_TCB_CLKSRC
help
Support for Timer Counter Blocks on Atmel SoCs.
+config ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK
+ bool "TC Block use 32 KiHz clock"
+ depends on ATMEL_TCB_CLKSRC
+ default y
+ help
+ Select this to use 32 KiHz base clock rate as TC block clock.
+
config CLKSRC_EXYNOS_MCT
bool "Exynos multi core timer driver" if COMPILE_TEST
depends on ARM || ARM64
diff --git a/drivers/clocksource/timer-atmel-tcb.c b/drivers/clocksource/timer-atmel-tcb.c
index 7427b07495a8..ce6627ee161f 100644
--- a/drivers/clocksource/timer-atmel-tcb.c
+++ b/drivers/clocksource/timer-atmel-tcb.c
@@ -28,8 +28,7 @@
* this 32 bit free-running counter. the second channel is not used.
*
* - The third channel may be used to provide a 16-bit clockevent
- * source, used in either periodic or oneshot mode. This runs
- * at 32 KiHZ, and can handle delays of up to two seconds.
+ * source, used in either periodic or oneshot mode.
*
* REVISIT behavior during system suspend states... we should disable
* all clocks and save the power. Easily done for clockevent devices,
@@ -143,6 +142,8 @@ static unsigned long notrace tc_delay_timer_read32(void)
struct tc_clkevt_device {
struct clock_event_device clkevt;
struct clk *clk;
+ bool clk_enabled;
+ u32 freq;
void __iomem *regs;
};
@@ -151,15 +152,26 @@ static struct tc_clkevt_device *to_tc_clkevt(struct clock_event_device *clkevt)
return container_of(clkevt, struct tc_clkevt_device, clkevt);
}
-/* For now, we always use the 32K clock ... this optimizes for NO_HZ,
- * because using one of the divided clocks would usually mean the
- * tick rate can never be less than several dozen Hz (vs 0.5 Hz).
- *
- * A divided clock could be good for high resolution timers, since
- * 30.5 usec resolution can seem "low".
- */
static u32 timer_clock;
+static void tc_clk_disable(struct clock_event_device *d)
+{
+ struct tc_clkevt_device *tcd = to_tc_clkevt(d);
+
+ clk_disable(tcd->clk);
+ tcd->clk_enabled = false;
+}
+
+static void tc_clk_enable(struct clock_event_device *d)
+{
+ struct tc_clkevt_device *tcd = to_tc_clkevt(d);
+
+ if (tcd->clk_enabled)
+ return;
+ clk_enable(tcd->clk);
+ tcd->clk_enabled = true;
+}
+
static int tc_shutdown(struct clock_event_device *d)
{
struct tc_clkevt_device *tcd = to_tc_clkevt(d);
@@ -167,8 +179,14 @@ static int tc_shutdown(struct clock_event_device *d)
writel(0xff, regs + ATMEL_TC_REG(2, IDR));
writel(ATMEL_TC_CLKDIS, regs + ATMEL_TC_REG(2, CCR));
+ return 0;
+}
+
+static int tc_shutdown_clk_off(struct clock_event_device *d)
+{
+ tc_shutdown(d);
if (!clockevent_state_detached(d))
- clk_disable(tcd->clk);
+ tc_clk_disable(d);
return 0;
}
@@ -181,9 +199,9 @@ static int tc_set_oneshot(struct clock_event_device *d)
if (clockevent_state_oneshot(d) || clockevent_state_periodic(d))
tc_shutdown(d);
- clk_enable(tcd->clk);
+ tc_clk_enable(d);
- /* slow clock, count up to RC, then irq and stop */
+ /* count up to RC, then irq and stop */
writel(timer_clock | ATMEL_TC_CPCSTOP | ATMEL_TC_WAVE |
ATMEL_TC_WAVESEL_UP_AUTO, regs + ATMEL_TC_REG(2, CMR));
writel(ATMEL_TC_CPCS, regs + ATMEL_TC_REG(2, IER));
@@ -203,12 +221,12 @@ static int tc_set_periodic(struct clock_event_device *d)
/* By not making the gentime core emulate periodic mode on top
* of oneshot, we get lower overhead and improved accuracy.
*/
- clk_enable(tcd->clk);
+ tc_clk_enable(d);
- /* slow clock, count up to RC, then irq and restart */
+ /* count up to RC, then irq and restart */
writel(timer_clock | ATMEL_TC_WAVE | ATMEL_TC_WAVESEL_UP_AUTO,
regs + ATMEL_TC_REG(2, CMR));
- writel((32768 + HZ / 2) / HZ, tcaddr + ATMEL_TC_REG(2, RC));
+ writel((tcd->freq + HZ / 2) / HZ, tcaddr + ATMEL_TC_REG(2, RC));
/* Enable clock and interrupts on RC compare */
writel(ATMEL_TC_CPCS, regs + ATMEL_TC_REG(2, IER));
@@ -234,9 +252,13 @@ static struct tc_clkevt_device clkevt = {
.features = CLOCK_EVT_FEAT_PERIODIC |
CLOCK_EVT_FEAT_ONESHOT,
/* Should be lower than at91rm9200's system timer */
+#ifdef CONFIG_ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK
.rating = 125,
+#else
+ .rating = 200,
+#endif
.set_next_event = tc_next_event,
- .set_state_shutdown = tc_shutdown,
+ .set_state_shutdown = tc_shutdown_clk_off,
.set_state_periodic = tc_set_periodic,
.set_state_oneshot = tc_set_oneshot,
},
@@ -256,8 +278,11 @@ static irqreturn_t ch2_irq(int irq, void *handle)
return IRQ_NONE;
}
-static int __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx)
+static const u8 atmel_tcb_divisors[5] = { 2, 8, 32, 128, 0, };
+
+static int __init setup_clkevents(struct atmel_tc *tc, int divisor_idx)
{
+ unsigned divisor = atmel_tcb_divisors[divisor_idx];
int ret;
struct clk *t2_clk = tc->clk[2];
int irq = tc->irq[2];
@@ -278,7 +303,11 @@ static int __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx)
clkevt.regs = tc->regs;
clkevt.clk = t2_clk;
- timer_clock = clk32k_divisor_idx;
+ timer_clock = divisor_idx;
+ if (!divisor)
+ clkevt.freq = 32768;
+ else
+ clkevt.freq = clk_get_rate(t2_clk) / divisor;
clkevt.clkevt.cpumask = cpumask_of(0);
@@ -289,7 +318,7 @@ static int __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx)
return ret;
}
- clockevents_config_and_register(&clkevt.clkevt, 32768, 1, 0xffff);
+ clockevents_config_and_register(&clkevt.clkevt, clkevt.freq, 1, 0xffff);
return ret;
}
@@ -346,8 +375,6 @@ static void __init tcb_setup_single_chan(struct atmel_tc *tc, int mck_divisor_id
writel(ATMEL_TC_SYNC, tcaddr + ATMEL_TC_BCR);
}
-static const u8 atmel_tcb_divisors[5] = { 2, 8, 32, 128, 0, };
-
static const struct of_device_id atmel_tcb_of_match[] = {
{ .compatible = "atmel,at91rm9200-tcb", .data = (void *)16, },
{ .compatible = "atmel,at91sam9x5-tcb", .data = (void *)32, },
@@ -467,7 +494,11 @@ static int __init tcb_clksrc_init(struct device_node *node)
goto err_disable_t1;
/* channel 2: periodic and oneshot timer support */
+#ifdef CONFIG_ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK
ret = setup_clkevents(&tc, clk32k_divisor_idx);
+#else
+ ret = setup_clkevents(&tc, best_divisor_idx);
+#endif
if (ret)
goto err_unregister_clksrc;
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index d58ce664da84..1b2e9f19c98d 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -18,6 +18,7 @@
#include
#include
+#include
/*
* Size of a cn_msg followed by a proc_event structure. Since the
@@ -40,10 +41,11 @@ static struct cb_id cn_proc_event_id = { CN_IDX_PROC, CN_VAL_PROC };
/* proc_event_counts is used as the sequence number of the netlink message */
static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 };
+static DEFINE_LOCAL_IRQ_LOCK(send_msg_lock);
static inline void send_msg(struct cn_msg *msg)
{
- preempt_disable();
+ local_lock(send_msg_lock);
msg->seq = __this_cpu_inc_return(proc_event_counts) - 1;
((struct proc_event *)msg->data)->cpu = smp_processor_id();
@@ -56,7 +58,7 @@ static inline void send_msg(struct cn_msg *msg)
*/
cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT);
- preempt_enable();
+ local_unlock(send_msg_lock);
}
void proc_fork_connector(struct task_struct *task)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f6df6ef1b0fb..6ac054a13d0b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -214,7 +214,7 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
return 0;
retry:
- seq = read_seqcount_begin(&resv->seq);
+ seq = read_seqbegin(&resv->seq);
rcu_read_lock();
fobj = rcu_dereference(resv->fence);
@@ -223,7 +223,7 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
else
shared_count = 0;
fence_excl = rcu_dereference(resv->fence_excl);
- if (read_seqcount_retry(&resv->seq, seq)) {
+ if (read_seqretry(&resv->seq, seq)) {
rcu_read_unlock();
goto retry;
}
@@ -1192,12 +1192,12 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
robj = buf_obj->resv;
while (true) {
- seq = read_seqcount_begin(&robj->seq);
+ seq = read_seqbegin(&robj->seq);
rcu_read_lock();
fobj = rcu_dereference(robj->fence);
shared_count = fobj ? fobj->shared_count : 0;
fence = rcu_dereference(robj->fence_excl);
- if (!read_seqcount_retry(&robj->seq, seq))
+ if (!read_seqretry(&robj->seq, seq))
break;
rcu_read_unlock();
}
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 709002515550..b0c0d4fe0b33 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -49,12 +49,6 @@
DEFINE_WD_CLASS(reservation_ww_class);
EXPORT_SYMBOL(reservation_ww_class);
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
/**
* dma_resv_list_alloc - allocate fence list
* @shared_max: number of fences we need space for
@@ -103,8 +97,7 @@ void dma_resv_init(struct dma_resv *obj)
{
ww_mutex_init(&obj->lock, &reservation_ww_class);
- __seqcount_init(&obj->seq, reservation_seqcount_string,
- &reservation_seqcount_class);
+ seqlock_init(&obj->seq);
RCU_INIT_POINTER(obj->fence, NULL);
RCU_INIT_POINTER(obj->fence_excl, NULL);
}
@@ -234,8 +227,7 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
fobj = dma_resv_get_list(obj);
count = fobj->shared_count;
- preempt_disable();
- write_seqcount_begin(&obj->seq);
+ write_seqlock(&obj->seq);
for (i = 0; i < count; ++i) {
@@ -255,8 +247,7 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence)
/* pointer update must be visible before we extend the shared_count */
smp_store_mb(fobj->shared_count, count);
- write_seqcount_end(&obj->seq);
- preempt_enable();
+ write_sequnlock(&obj->seq);
dma_fence_put(old);
}
EXPORT_SYMBOL(dma_resv_add_shared_fence);
@@ -283,14 +274,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
if (fence)
dma_fence_get(fence);
- preempt_disable();
- write_seqcount_begin(&obj->seq);
- /* write_seqcount_begin provides the necessary memory barrier */
+ write_seqlock(&obj->seq);
+ /* write_seqlock provides the necessary memory barrier */
RCU_INIT_POINTER(obj->fence_excl, fence);
if (old)
old->shared_count = 0;
- write_seqcount_end(&obj->seq);
- preempt_enable();
+ write_sequnlock(&obj->seq);
/* inplace update, no shared fences */
while (i--)
@@ -368,13 +357,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)
src_list = dma_resv_get_list(dst);
old = dma_resv_get_excl(dst);
- preempt_disable();
- write_seqcount_begin(&dst->seq);
- /* write_seqcount_begin provides the necessary memory barrier */
+ write_seqlock(&dst->seq);
+ /* write_seqlock provides the necessary memory barrier */
RCU_INIT_POINTER(dst->fence_excl, new);
RCU_INIT_POINTER(dst->fence, dst_list);
- write_seqcount_end(&dst->seq);
- preempt_enable();
+ write_sequnlock(&dst->seq);
dma_resv_list_free(src_list);
dma_fence_put(old);
@@ -414,7 +401,7 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
shared_count = i = 0;
rcu_read_lock();
- seq = read_seqcount_begin(&obj->seq);
+ seq = read_seqbegin(&obj->seq);
fence_excl = rcu_dereference(obj->fence_excl);
if (fence_excl && !dma_fence_get_rcu(fence_excl))
@@ -456,7 +443,7 @@ int dma_resv_get_fences_rcu(struct dma_resv *obj,
}
}
- if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) {
+ if (i != shared_count || read_seqretry(&obj->seq, seq)) {
while (i--)
dma_fence_put(shared[i]);
dma_fence_put(fence_excl);
@@ -507,7 +494,7 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
retry:
shared_count = 0;
- seq = read_seqcount_begin(&obj->seq);
+ seq = read_seqbegin(&obj->seq);
rcu_read_lock();
i = -1;
@@ -553,7 +540,7 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj,
rcu_read_unlock();
if (fence) {
- if (read_seqcount_retry(&obj->seq, seq)) {
+ if (read_seqretry(&obj->seq, seq)) {
dma_fence_put(fence);
goto retry;
}
@@ -607,7 +594,7 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
retry:
ret = true;
shared_count = 0;
- seq = read_seqcount_begin(&obj->seq);
+ seq = read_seqbegin(&obj->seq);
if (test_all) {
unsigned i;
@@ -627,7 +614,7 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
break;
}
- if (read_seqcount_retry(&obj->seq, seq))
+ if (read_seqretry(&obj->seq, seq))
goto retry;
}
@@ -639,7 +626,7 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all)
if (ret < 0)
goto retry;
- if (read_seqcount_retry(&obj->seq, seq))
+ if (read_seqretry(&obj->seq, seq))
goto retry;
}
}
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b299e22b7532..354953ef4be3 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -68,7 +68,7 @@ struct mm_struct efi_mm = {
struct workqueue_struct *efi_rts_wq;
-static bool disable_runtime;
+static bool disable_runtime = IS_ENABLED(CONFIG_PREEMPT_RT);
static int __init setup_noefi(char *arg)
{
disable_runtime = true;
@@ -94,6 +94,9 @@ static int __init parse_efi_cmdline(char *str)
if (parse_option_str(str, "noruntime"))
disable_runtime = true;
+ if (parse_option_str(str, "runtime"))
+ disable_runtime = false;
+
return 0;
}
early_param("efi", parse_efi_cmdline);
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e67c194c2aca..5f6dcbbeb0c1 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -397,7 +397,7 @@ config DRM_R128
config DRM_I810
tristate "Intel I810"
- # !PREEMPT because of missing ioctl locking
+ # !PREEMPTION because of missing ioctl locking
depends on DRM && AGP && AGP_INTEL && (!PREEMPTION || BROKEN)
help
Choose this option if you have an Intel I810 graphics card. If M is
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index f3fa271e3394..d5e5b04f5bed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -252,11 +252,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
new->shared_count = k;
/* Install the new fence list, seqcount provides the barriers */
- preempt_disable();
- write_seqcount_begin(&resv->seq);
+ write_seqlock(&resv->seq);
RCU_INIT_POINTER(resv->fence, new);
- write_seqcount_end(&resv->seq);
- preempt_enable();
+ write_sequnlock(&resv->seq);
/* Drop the references to the removed fences or move them to ef_list */
for (i = j, k = 0; i < old->shared_count; ++i) {
diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c b/drivers/gpu/drm/i915/display/intel_sprite.c
index cae25e493128..da71ac8dea35 100644
--- a/drivers/gpu/drm/i915/display/intel_sprite.c
+++ b/drivers/gpu/drm/i915/display/intel_sprite.c
@@ -38,6 +38,7 @@
#include
#include
#include
+#include
#include "i915_drv.h"
#include "i915_trace.h"
@@ -80,6 +81,8 @@ int intel_usecs_to_scanlines(const struct drm_display_mode *adjusted_mode,
#define VBLANK_EVASION_TIME_US 100
#endif
+static DEFINE_LOCAL_IRQ_LOCK(pipe_update_lock);
+
/**
* intel_pipe_update_start() - start update of a set of display registers
* @new_crtc_state: the new crtc state
@@ -129,7 +132,7 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
DRM_ERROR("PSR idle timed out 0x%x, atomic update may fail\n",
psr_status);
- local_irq_disable();
+ local_lock_irq(pipe_update_lock);
crtc->debug.min_vbl = min;
crtc->debug.max_vbl = max;
@@ -153,11 +156,11 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
break;
}
- local_irq_enable();
+ local_unlock_irq(pipe_update_lock);
timeout = schedule_timeout(timeout);
- local_irq_disable();
+ local_lock_irq(pipe_update_lock);
}
finish_wait(wq, &wait);
@@ -190,7 +193,7 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
return;
irq_disable:
- local_irq_disable();
+ local_lock_irq(pipe_update_lock);
}
/**
@@ -226,7 +229,7 @@ void intel_pipe_update_end(struct intel_crtc_state *new_crtc_state)
new_crtc_state->base.event = NULL;
}
- local_irq_enable();
+ local_unlock_irq(pipe_update_lock);
if (intel_vgpu_active(dev_priv))
return;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
index 25235ef630c1..eaa8735aabee 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
@@ -75,7 +75,6 @@ busy_check_writer(const struct dma_fence *fence)
return __busy_set_if_active(fence, __busy_write_id);
}
-
int
i915_gem_busy_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
@@ -110,7 +109,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
*
*/
retry:
- seq = raw_read_seqcount(&obj->base.resv->seq);
+ /* XXX raw_read_seqcount() does not wait for the WRTIE to finish */
+ seq = read_seqbegin(&obj->base.resv->seq);
/* Translate the exclusive fence to the READ *and* WRITE engine */
args->busy =
@@ -129,7 +129,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
}
}
- if (args->busy && read_seqcount_retry(&obj->base.resv->seq, seq))
+ if (args->busy && read_seqretry(&obj->base.resv->seq, seq))
goto retry;
err = 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 09c68dda2098..55317081d48b 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -120,7 +120,6 @@ __dma_fence_signal__notify(struct dma_fence *fence,
struct dma_fence_cb *cur, *tmp;
lockdep_assert_held(fence->lock);
- lockdep_assert_irqs_disabled();
list_for_each_entry_safe(cur, tmp, list, node) {
INIT_LIST_HEAD(&cur->node);
@@ -134,9 +133,10 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
const ktime_t timestamp = ktime_get();
struct intel_context *ce, *cn;
struct list_head *pos, *next;
+ unsigned long flags;
LIST_HEAD(signal);
- spin_lock(&b->irq_lock);
+ spin_lock_irqsave(&b->irq_lock, flags);
if (b->irq_armed && list_empty(&b->signalers))
__intel_breadcrumbs_disarm_irq(b);
@@ -182,30 +182,23 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
}
}
- spin_unlock(&b->irq_lock);
+ spin_unlock_irqrestore(&b->irq_lock, flags);
list_for_each_safe(pos, next, &signal) {
struct i915_request *rq =
list_entry(pos, typeof(*rq), signal_link);
struct list_head cb_list;
- spin_lock(&rq->lock);
+ spin_lock_irqsave(&rq->lock, flags);
list_replace(&rq->fence.cb_list, &cb_list);
__dma_fence_signal__timestamp(&rq->fence, timestamp);
__dma_fence_signal__notify(&rq->fence, &cb_list);
- spin_unlock(&rq->lock);
+ spin_unlock_irqrestore(&rq->lock, flags);
i915_request_put(rq);
}
}
-void intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
-{
- local_irq_disable();
- intel_engine_breadcrumbs_irq(engine);
- local_irq_enable();
-}
-
static void signal_irq_work(struct irq_work *work)
{
struct intel_engine_cs *engine =
@@ -275,7 +268,6 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
bool i915_request_enable_breadcrumb(struct i915_request *rq)
{
lockdep_assert_held(&rq->lock);
- lockdep_assert_irqs_disabled();
if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) {
struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
@@ -325,7 +317,6 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
lockdep_assert_held(&rq->lock);
- lockdep_assert_irqs_disabled();
/*
* We must wait for b->irq_lock so that we know the interrupt handler
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 926272b5a0ca..66c0380fc275 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -357,7 +357,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine);
void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
-void intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine);
void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
static inline void
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 65b5ca74b394..0e48a3d8ea22 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -38,12 +38,15 @@ static int __engine_unpark(struct intel_wakeref *wf)
}
#if IS_ENABLED(CONFIG_LOCKDEP)
+#include
+
+static DEFINE_LOCAL_IRQ_LOCK(timeline_lock);
static inline unsigned long __timeline_mark_lock(struct intel_context *ce)
{
unsigned long flags;
- local_irq_save(flags);
+ local_lock_irqsave(timeline_lock, flags);
mutex_acquire(&ce->timeline->mutex.dep_map, 2, 0, _THIS_IP_);
return flags;
@@ -53,7 +56,7 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
unsigned long flags)
{
mutex_release(&ce->timeline->mutex.dep_map, 0, _THIS_IP_);
- local_irq_restore(flags);
+ local_unlock_irqrestore(timeline_lock, flags);
}
#else
diff --git a/drivers/gpu/drm/i915/gt/intel_hangcheck.c b/drivers/gpu/drm/i915/gt/intel_hangcheck.c
index 05d042cdefe2..7ca67aac27d4 100644
--- a/drivers/gpu/drm/i915/gt/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/intel_hangcheck.c
@@ -283,7 +283,7 @@ static void hangcheck_elapsed(struct work_struct *work)
for_each_engine(engine, gt->i915, id) {
struct hangcheck hc;
- intel_engine_signal_breadcrumbs(engine);
+ intel_engine_breadcrumbs_irq(engine);
hangcheck_load_sample(engine, &hc);
hangcheck_accumulate_sample(engine, &hc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 8cea42379dd7..05f965e2e132 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -695,7 +695,7 @@ static void reset_finish_engine(struct intel_engine_cs *engine)
engine->reset.finish(engine);
intel_uncore_forcewake_put(engine->uncore, FORCEWAKE_ALL);
- intel_engine_signal_breadcrumbs(engine);
+ intel_engine_breadcrumbs_irq(engine);
}
static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4193a9970251..f20f1a424828 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -983,6 +983,7 @@ bool i915_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe,
spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+ preempt_disable_rt();
/* Get optional system timestamp before query. */
if (stime)
@@ -1034,6 +1035,7 @@ bool i915_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe,
*etime = ktime_get();
/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+ preempt_enable_rt();
spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 49d498882cf6..b94d2a1af222 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -205,14 +205,14 @@ static void remove_from_engine(struct i915_request *rq)
* check that the rq still belongs to the newly locked engine.
*/
locked = READ_ONCE(rq->engine);
- spin_lock(&locked->active.lock);
+ spin_lock_irq(&locked->active.lock);
while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->active.lock);
spin_lock(&engine->active.lock);
locked = engine;
}
list_del(&rq->sched.link);
- spin_unlock(&locked->active.lock);
+ spin_unlock_irq(&locked->active.lock);
}
static bool i915_request_retire(struct i915_request *rq)
@@ -272,8 +272,6 @@ static bool i915_request_retire(struct i915_request *rq)
active->retire(active, rq);
}
- local_irq_disable();
-
/*
* We only loosely track inflight requests across preemption,
* and so we may find ourselves attempting to retire a _completed_
@@ -282,7 +280,7 @@ static bool i915_request_retire(struct i915_request *rq)
*/
remove_from_engine(rq);
- spin_lock(&rq->lock);
+ spin_lock_irq(&rq->lock);
i915_request_mark_complete(rq);
if (!i915_request_signaled(rq))
dma_fence_signal_locked(&rq->fence);
@@ -297,9 +295,7 @@ static bool i915_request_retire(struct i915_request *rq)
__notify_execute_cb(rq);
}
GEM_BUG_ON(!list_empty(&rq->execute_cb));
- spin_unlock(&rq->lock);
-
- local_irq_enable();
+ spin_unlock_irq(&rq->lock);
remove_from_client(rq);
list_del(&rq->link);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 24f2944da09d..b8ee3cd14323 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -2,6 +2,10 @@
#if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
#define _I915_TRACE_H_
+#ifdef CONFIG_PREEMPT_RT
+#define NOTRACE
+#endif
+
#include
#include
#include
@@ -721,7 +725,7 @@ DEFINE_EVENT(i915_request, i915_request_add,
TP_ARGS(rq)
);
-#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) && !defined(NOTRACE)
DEFINE_EVENT(i915_request, i915_request_submit,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
index f9f74150d0d7..a7d8914a25ab 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -1823,6 +1823,7 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe,
struct radeon_device *rdev = dev->dev_private;
/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+ preempt_disable_rt();
/* Get optional system timestamp before query. */
if (stime)
@@ -1915,6 +1916,7 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe,
*etime = ktime_get();
/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+ preempt_enable_rt();
/* Decode into vertical and horizontal scanout position. */
*vpos = position & 0x1fff;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
index e5252ef3812f..6941689085ed 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
@@ -169,10 +169,8 @@ void vmw_fifo_ping_host(struct vmw_private *dev_priv, uint32_t reason)
{
u32 *fifo_mem = dev_priv->mmio_virt;
- preempt_disable();
if (cmpxchg(fifo_mem + SVGA_FIFO_BUSY, 0, 1) == 0)
vmw_write(dev_priv, SVGA_REG_SYNC, reason);
- preempt_enable();
}
void vmw_fifo_release(struct vmw_private *dev_priv, struct vmw_fifo_state *fifo)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index cabcb66e7c5e..666a987fd17b 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -18,6 +18,7 @@
#include
#include
#include
+#include
#include "hv_trace.h"
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 2d2568dac2a6..a4557c7eda79 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -22,6 +22,7 @@
#include
#include
#include
+#include
#include
#include
@@ -1259,6 +1260,8 @@ static void vmbus_isr(void)
void *page_addr = hv_cpu->synic_event_page;
struct hv_message *msg;
union hv_synic_event_flags *event;
+ struct pt_regs *regs = get_irq_regs();
+ u64 ip = regs ? instruction_pointer(regs) : 0;
bool handled = false;
if (unlikely(page_addr == NULL))
@@ -1303,7 +1306,7 @@ static void vmbus_isr(void)
tasklet_schedule(&hv_cpu->msg_dpc);
}
- add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0);
+ add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0, ip);
}
/*
diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
index ce9429ca6dde..29ccbd6acf43 100644
--- a/drivers/leds/trigger/Kconfig
+++ b/drivers/leds/trigger/Kconfig
@@ -64,6 +64,7 @@ config LEDS_TRIGGER_BACKLIGHT
config LEDS_TRIGGER_CPU
bool "LED CPU Trigger"
+ depends on !PREEMPT_RT
help
This allows LEDs to be controlled by active CPUs. This shows
the active CPUs across an array of LEDs so you can see which
diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index 6dfa653d30db..29c35eb52414 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -2,6 +2,7 @@
config BCACHE
tristate "Block device as cache"
+ depends on !PREEMPT_RT
select CRC64
help
Allows a block device to be used as cache for other devices; uses
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 02acd5d5a848..f95bfebbe9bf 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2058,8 +2058,9 @@ static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
struct raid5_percpu *percpu;
unsigned long cpu;
- cpu = get_cpu();
+ cpu = get_cpu_light();
percpu = per_cpu_ptr(conf->percpu, cpu);
+ spin_lock(&percpu->lock);
if (test_bit(STRIPE_OP_BIOFILL, &ops_request)) {
ops_run_biofill(sh);
overlap_clear++;
@@ -2118,7 +2119,8 @@ static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
if (test_and_clear_bit(R5_Overlap, &dev->flags))
wake_up(&sh->raid_conf->wait_for_overlap);
}
- put_cpu();
+ spin_unlock(&percpu->lock);
+ put_cpu_light();
}
static void free_stripe(struct kmem_cache *sc, struct stripe_head *sh)
@@ -6825,6 +6827,7 @@ static int raid456_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
__func__, cpu);
return -ENOMEM;
}
+ spin_lock_init(&per_cpu_ptr(conf->percpu, cpu)->lock);
return 0;
}
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..cca69edd669d 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -634,6 +634,7 @@ struct r5conf {
int recovery_disabled;
/* per cpu variables */
struct raid5_percpu {
+ spinlock_t lock; /* Protection for -RT */
struct page *spare_page; /* Used when checking P/Q in raid6 */
void *scribble; /* space for constructing buffer
* lists and performing address
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index f1f61419fd29..56d4c1e91c27 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -585,7 +585,7 @@ config VIDEO_MESON_G12A_AO_CEC
config CEC_GPIO
tristate "Generic GPIO-based CEC driver"
- depends on PREEMPT || COMPILE_TEST
+ depends on PREEMPTION || COMPILE_TEST
select CEC_CORE
select CEC_PIN
select GPIOLIB
diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 4190f9ed5313..9ed715e9be40 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -19,7 +19,6 @@
#include
#include
#include
-#include
#include
#include
#include
@@ -34,7 +33,6 @@ struct fixed_mdio_bus {
struct fixed_phy {
int addr;
struct phy_device *phydev;
- seqcount_t seqcount;
struct fixed_phy_status status;
bool no_carrier;
int (*link_update)(struct net_device *, struct fixed_phy_status *);
@@ -80,19 +78,17 @@ static int fixed_mdio_read(struct mii_bus *bus, int phy_addr, int reg_num)
list_for_each_entry(fp, &fmb->phys, node) {
if (fp->addr == phy_addr) {
struct fixed_phy_status state;
- int s;
-
- do {
- s = read_seqcount_begin(&fp->seqcount);
- fp->status.link = !fp->no_carrier;
- /* Issue callback if user registered it. */
- if (fp->link_update)
- fp->link_update(fp->phydev->attached_dev,
- &fp->status);
- /* Check the GPIO for change in status */
- fixed_phy_update(fp);
- state = fp->status;
- } while (read_seqcount_retry(&fp->seqcount, s));
+
+ fp->status.link = !fp->no_carrier;
+
+ /* Issue callback if user registered it. */
+ if (fp->link_update)
+ fp->link_update(fp->phydev->attached_dev,
+ &fp->status);
+
+ /* Check the GPIO for change in status */
+ fixed_phy_update(fp);
+ state = fp->status;
return swphy_read_reg(reg_num, &state);
}
@@ -150,8 +146,6 @@ static int fixed_phy_add_gpiod(unsigned int irq, int phy_addr,
if (!fp)
return -ENOMEM;
- seqcount_init(&fp->seqcount);
-
if (irq != PHY_POLL)
fmb->mii_bus->irq[phy_addr] = irq;
diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index e753f43e0162..7e53a0ba5776 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -693,8 +693,8 @@ static void ezusb_req_ctx_wait(struct ezusb_priv *upriv,
while (!ctx->done.done && msecs--)
udelay(1000);
} else {
- wait_event_interruptible(ctx->done.wait,
- ctx->done.done);
+ swait_event_interruptible_exclusive(ctx->done.wait,
+ ctx->done.done);
}
break;
default:
diff --git a/drivers/of/base.c b/drivers/of/base.c
index 1d667eb730e1..a82b8c05169e 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -123,115 +123,38 @@ int __weak of_node_to_nid(struct device_node *np)
}
#endif
-/*
- * Assumptions behind phandle_cache implementation:
- * - phandle property values are in a contiguous range of 1..n
- *
- * If the assumptions do not hold, then
- * - the phandle lookup overhead reduction provided by the cache
- * will likely be less
- */
+#define OF_PHANDLE_CACHE_BITS 7
+#define OF_PHANDLE_CACHE_SZ BIT(OF_PHANDLE_CACHE_BITS)
-static struct device_node **phandle_cache;
-static u32 phandle_cache_mask;
+static struct device_node *phandle_cache[OF_PHANDLE_CACHE_SZ];
-/*
- * Caller must hold devtree_lock.
- */
-static void __of_free_phandle_cache(void)
+static u32 of_phandle_cache_hash(phandle handle)
{
- u32 cache_entries = phandle_cache_mask + 1;
- u32 k;
-
- if (!phandle_cache)
- return;
-
- for (k = 0; k < cache_entries; k++)
- of_node_put(phandle_cache[k]);
-
- kfree(phandle_cache);
- phandle_cache = NULL;
+ return hash_32(handle, OF_PHANDLE_CACHE_BITS);
}
-int of_free_phandle_cache(void)
-{
- unsigned long flags;
-
- raw_spin_lock_irqsave(&devtree_lock, flags);
-
- __of_free_phandle_cache();
-
- raw_spin_unlock_irqrestore(&devtree_lock, flags);
-
- return 0;
-}
-#if !defined(CONFIG_MODULES)
-late_initcall_sync(of_free_phandle_cache);
-#endif
-
/*
* Caller must hold devtree_lock.
*/
-void __of_free_phandle_cache_entry(phandle handle)
+void __of_phandle_cache_inv_entry(phandle handle)
{
- phandle masked_handle;
+ u32 handle_hash;
struct device_node *np;
if (!handle)
return;
- masked_handle = handle & phandle_cache_mask;
+ handle_hash = of_phandle_cache_hash(handle);
- if (phandle_cache) {
- np = phandle_cache[masked_handle];
- if (np && handle == np->phandle) {
- of_node_put(np);
- phandle_cache[masked_handle] = NULL;
- }
- }
-}
-
-void of_populate_phandle_cache(void)
-{
- unsigned long flags;
- u32 cache_entries;
- struct device_node *np;
- u32 phandles = 0;
-
- raw_spin_lock_irqsave(&devtree_lock, flags);
-
- __of_free_phandle_cache();
-
- for_each_of_allnodes(np)
- if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
- phandles++;
-
- if (!phandles)
- goto out;
-
- cache_entries = roundup_pow_of_two(phandles);
- phandle_cache_mask = cache_entries - 1;
-
- phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
- GFP_ATOMIC);
- if (!phandle_cache)
- goto out;
-
- for_each_of_allnodes(np)
- if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) {
- of_node_get(np);
- phandle_cache[np->phandle & phandle_cache_mask] = np;
- }
-
-out:
- raw_spin_unlock_irqrestore(&devtree_lock, flags);
+ np = phandle_cache[handle_hash];
+ if (np && handle == np->phandle)
+ phandle_cache[handle_hash] = NULL;
}
void __init of_core_init(void)
{
struct device_node *np;
- of_populate_phandle_cache();
/* Create the kset, and register existing nodes */
mutex_lock(&of_mutex);
@@ -241,8 +164,11 @@ void __init of_core_init(void)
pr_err("failed to register existing nodes\n");
return;
}
- for_each_of_allnodes(np)
+ for_each_of_allnodes(np) {
__of_attach_node_sysfs(np);
+ if (np->phandle && !phandle_cache[of_phandle_cache_hash(np->phandle)])
+ phandle_cache[of_phandle_cache_hash(np->phandle)] = np;
+ }
mutex_unlock(&of_mutex);
/* Symlink in /proc as required by userspace ABI */
@@ -1223,36 +1149,29 @@ struct device_node *of_find_node_by_phandle(phandle handle)
{
struct device_node *np = NULL;
unsigned long flags;
- phandle masked_handle;
+ u32 handle_hash;
if (!handle)
return NULL;
+ handle_hash = of_phandle_cache_hash(handle);
+
raw_spin_lock_irqsave(&devtree_lock, flags);
- masked_handle = handle & phandle_cache_mask;
-
- if (phandle_cache) {
- if (phandle_cache[masked_handle] &&
- handle == phandle_cache[masked_handle]->phandle)
- np = phandle_cache[masked_handle];
- if (np && of_node_check_flag(np, OF_DETACHED)) {
- WARN_ON(1); /* did not uncache np on node removal */
- of_node_put(np);
- phandle_cache[masked_handle] = NULL;
- np = NULL;
- }
+ if (phandle_cache[handle_hash] &&
+ handle == phandle_cache[handle_hash]->phandle)
+ np = phandle_cache[handle_hash];
+ if (np && of_node_check_flag(np, OF_DETACHED)) {
+ WARN_ON(1); /* did not uncache np on node removal */
+ phandle_cache[handle_hash] = NULL;
+ np = NULL;
}
if (!np) {
for_each_of_allnodes(np)
if (np->phandle == handle &&
!of_node_check_flag(np, OF_DETACHED)) {
- if (phandle_cache) {
- /* will put when removed from cache */
- of_node_get(np);
- phandle_cache[masked_handle] = np;
- }
+ phandle_cache[handle_hash] = np;
break;
}
}
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 49b16f76d78e..08fd823edac9 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -276,7 +276,7 @@ void __of_detach_node(struct device_node *np)
of_node_set_flag(np, OF_DETACHED);
/* race with of_find_node_by_phandle() prevented by devtree_lock */
- __of_free_phandle_cache_entry(np->phandle);
+ __of_phandle_cache_inv_entry(np->phandle);
}
/**
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 24786818e32e..de0042bc7701 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -85,14 +85,12 @@ int of_resolve_phandles(struct device_node *tree);
#endif
#if defined(CONFIG_OF_DYNAMIC)
-void __of_free_phandle_cache_entry(phandle handle);
+void __of_phandle_cache_inv_entry(phandle handle);
#endif
#if defined(CONFIG_OF_OVERLAY)
void of_overlay_mutex_lock(void);
void of_overlay_mutex_unlock(void);
-int of_free_phandle_cache(void);
-void of_populate_phandle_cache(void);
#else
static inline void of_overlay_mutex_lock(void) {};
static inline void of_overlay_mutex_unlock(void) {};
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 1688f576ee8a..88cdcb080e38 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -976,8 +976,6 @@ static int of_overlay_apply(const void *fdt, struct device_node *tree,
goto err_free_overlay_changeset;
}
- of_populate_phandle_cache();
-
ret = __of_changeset_apply_notify(&ovcs->cset);
if (ret)
pr_err("overlay apply changeset entry notify error %d\n", ret);
@@ -1220,17 +1218,9 @@ int of_overlay_remove(int *ovcs_id)
list_del(&ovcs->ovcs_list);
- /*
- * Disable phandle cache. Avoids race condition that would arise
- * from removing cache entry when the associated node is deleted.
- */
- of_free_phandle_cache();
-
ret_apply = 0;
ret = __of_changeset_revert_entries(&ovcs->cset, &ret_apply);
- of_populate_phandle_cache();
-
if (ret) {
if (ret_apply)
devicetree_state_flags |= DTSF_REVERT_FAIL;
diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index 2c9c3061894b..90d6eb310e70 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
enum mrpc_state state;
- struct completion comp;
+ wait_queue_head_t cmd_comp;
struct kref kref;
struct list_head list;
+ bool cmd_done;
u32 cmd;
u32 status;
u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_create(struct switchtec_dev *stdev)
stuser->stdev = stdev;
kref_init(&stuser->kref);
INIT_LIST_HEAD(&stuser->list);
- init_completion(&stuser->comp);
+ init_waitqueue_head(&stuser->cmd_comp);
stuser->event_cnt = atomic_read(&stdev->event_cnt);
dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
kref_get(&stuser->kref);
stuser->read_len = sizeof(stuser->data);
stuser_set_state(stuser, MRPC_QUEUED);
- reinit_completion(&stuser->comp);
+ stuser->cmd_done = false;
list_add_tail(&stuser->list, &stdev->mrpc_queue);
mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct switchtec_dev *stdev)
memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
stuser->read_len);
out:
- complete_all(&stuser->comp);
+ stuser->cmd_done = true;
+ wake_up_interruptible(&stuser->cmd_comp);
list_del_init(&stuser->list);
stuser_put(stuser);
stdev->mrpc_busy = 0;
@@ -494,10 +496,11 @@ static ssize_t switchtec_dev_read(struct file *filp, char __user *data,
mutex_unlock(&stdev->mrpc_mutex);
if (filp->f_flags & O_NONBLOCK) {
- if (!try_wait_for_completion(&stuser->comp))
+ if (!READ_ONCE(stuser->cmd_done))
return -EAGAIN;
} else {
- rc = wait_for_completion_interruptible(&stuser->comp);
+ rc = wait_event_interruptible(stuser->cmd_comp,
+ stuser->cmd_done);
if (rc < 0)
return rc;
}
@@ -545,7 +548,7 @@ static __poll_t switchtec_dev_poll(struct file *filp, poll_table *wait)
struct switchtec_dev *stdev = stuser->stdev;
__poll_t ret = 0;
- poll_wait(filp, &stuser->comp.wait, wait);
+ poll_wait(filp, &stuser->cmd_comp, wait);
poll_wait(filp, &stdev->event_wq, wait);
if (lock_mutex_and_test_alive(stdev))
@@ -553,7 +556,7 @@ static __poll_t switchtec_dev_poll(struct file *filp, poll_table *wait)
mutex_unlock(&stdev->mrpc_mutex);
- if (try_wait_for_completion(&stuser->comp))
+ if (READ_ONCE(stuser->cmd_done))
ret |= EPOLLIN | EPOLLRDNORM;
if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
@@ -1106,7 +1109,8 @@ static void stdev_kill(struct switchtec_dev *stdev)
/* Wake up and kill any users waiting on an MRPC request */
list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
- complete_all(&stuser->comp);
+ stuser->cmd_done = true;
+ wake_up_interruptible(&stuser->cmd_comp);
list_del_init(&stuser->list);
stuser_put(stuser);
}
diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 25dae9f0b205..2acc1f6da902 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -1452,11 +1452,11 @@ static int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
static int fcoe_alloc_paged_crc_eof(struct sk_buff *skb, int tlen)
{
struct fcoe_percpu_s *fps;
- int rc;
+ int rc, cpu = get_cpu_light();
- fps = &get_cpu_var(fcoe_percpu);
+ fps = &per_cpu(fcoe_percpu, cpu);
rc = fcoe_get_paged_crc_eof(skb, tlen, fps);
- put_cpu_var(fcoe_percpu);
+ put_cpu_light();
return rc;
}
@@ -1641,11 +1641,11 @@ static inline int fcoe_filter_frames(struct fc_lport *lport,
return 0;
}
- stats = per_cpu_ptr(lport->stats, get_cpu());
+ stats = per_cpu_ptr(lport->stats, get_cpu_light());
stats->InvalidCRCCount++;
if (stats->InvalidCRCCount < 5)
printk(KERN_WARNING "fcoe: dropping frame with CRC error\n");
- put_cpu();
+ put_cpu_light();
return -EINVAL;
}
@@ -1686,7 +1686,7 @@ static void fcoe_recv_frame(struct sk_buff *skb)
*/
hp = (struct fcoe_hdr *) skb_network_header(skb);
- stats = per_cpu_ptr(lport->stats, get_cpu());
+ stats = per_cpu_ptr(lport->stats, get_cpu_light());
if (unlikely(FC_FCOE_DECAPS_VER(hp) != FC_FCOE_VER)) {
if (stats->ErrorFrames < 5)
printk(KERN_WARNING "fcoe: FCoE version "
@@ -1718,13 +1718,13 @@ static void fcoe_recv_frame(struct sk_buff *skb)
goto drop;
if (!fcoe_filter_frames(lport, fp)) {
- put_cpu();
+ put_cpu_light();
fc_exch_recv(lport, fp);
return;
}
drop:
stats->ErrorFrames++;
- put_cpu();
+ put_cpu_light();
kfree_skb(skb);
}
diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c
index 07a0dadc75bf..a5f996001a89 100644
--- a/drivers/scsi/fcoe/fcoe_ctlr.c
+++ b/drivers/scsi/fcoe/fcoe_ctlr.c
@@ -826,7 +826,7 @@ static unsigned long fcoe_ctlr_age_fcfs(struct fcoe_ctlr *fip)
INIT_LIST_HEAD(&del_list);
- stats = per_cpu_ptr(fip->lp->stats, get_cpu());
+ stats = per_cpu_ptr(fip->lp->stats, get_cpu_light());
list_for_each_entry_safe(fcf, next, &fip->fcfs, list) {
deadline = fcf->time + fcf->fka_period + fcf->fka_period / 2;
@@ -862,7 +862,7 @@ static unsigned long fcoe_ctlr_age_fcfs(struct fcoe_ctlr *fip)
sel_time = fcf->time;
}
}
- put_cpu();
+ put_cpu_light();
list_for_each_entry_safe(fcf, next, &del_list, list) {
/* Removes fcf from current list */
diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
index 52e866659853..78c75a231422 100644
--- a/drivers/scsi/libfc/fc_exch.c
+++ b/drivers/scsi/libfc/fc_exch.c
@@ -821,10 +821,10 @@ static struct fc_exch *fc_exch_em_alloc(struct fc_lport *lport,
}
memset(ep, 0, sizeof(*ep));
- cpu = get_cpu();
+ cpu = get_cpu_light();
pool = per_cpu_ptr(mp->pool, cpu);
spin_lock_bh(&pool->lock);
- put_cpu();
+ put_cpu_light();
/* peek cache of free slot */
if (pool->left != FC_XID_UNKNOWN) {
diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c b/drivers/thermal/intel/x86_pkg_temp_thermal.c
index ddb4a973c698..e18758a8e911 100644
--- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
@@ -63,7 +63,7 @@ static int max_id __read_mostly;
/* Array of zone pointers */
static struct zone_device **zones;
/* Serializes interrupt notification, work and hotplug */
-static DEFINE_SPINLOCK(pkg_temp_lock);
+static DEFINE_RAW_SPINLOCK(pkg_temp_lock);
/* Protects zone operation in the work function against hotplug removal */
static DEFINE_MUTEX(thermal_zone_mutex);
@@ -266,12 +266,12 @@ static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
u64 msr_val, wr_val;
mutex_lock(&thermal_zone_mutex);
- spin_lock_irq(&pkg_temp_lock);
+ raw_spin_lock_irq(&pkg_temp_lock);
++pkg_work_cnt;
zonedev = pkg_temp_thermal_get_dev(cpu);
if (!zonedev) {
- spin_unlock_irq(&pkg_temp_lock);
+ raw_spin_unlock_irq(&pkg_temp_lock);
mutex_unlock(&thermal_zone_mutex);
return;
}
@@ -285,7 +285,7 @@ static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
}
enable_pkg_thres_interrupt();
- spin_unlock_irq(&pkg_temp_lock);
+ raw_spin_unlock_irq(&pkg_temp_lock);
/*
* If tzone is not NULL, then thermal_zone_mutex will prevent the
@@ -310,7 +310,7 @@ static int pkg_thermal_notify(u64 msr_val)
struct zone_device *zonedev;
unsigned long flags;
- spin_lock_irqsave(&pkg_temp_lock, flags);
+ raw_spin_lock_irqsave(&pkg_temp_lock, flags);
++pkg_interrupt_cnt;
disable_pkg_thres_interrupt();
@@ -322,7 +322,7 @@ static int pkg_thermal_notify(u64 msr_val)
pkg_thermal_schedule_work(zonedev->cpu, &zonedev->work);
}
- spin_unlock_irqrestore(&pkg_temp_lock, flags);
+ raw_spin_unlock_irqrestore(&pkg_temp_lock, flags);
return 0;
}
@@ -368,9 +368,9 @@ static int pkg_temp_thermal_device_add(unsigned int cpu)
zonedev->msr_pkg_therm_high);
cpumask_set_cpu(cpu, &zonedev->cpumask);
- spin_lock_irq(&pkg_temp_lock);
+ raw_spin_lock_irq(&pkg_temp_lock);
zones[id] = zonedev;
- spin_unlock_irq(&pkg_temp_lock);
+ raw_spin_unlock_irq(&pkg_temp_lock);
return 0;
}
@@ -407,7 +407,7 @@ static int pkg_thermal_cpu_offline(unsigned int cpu)
}
/* Protect against work and interrupts */
- spin_lock_irq(&pkg_temp_lock);
+ raw_spin_lock_irq(&pkg_temp_lock);
/*
* Check whether this cpu was the current target and store the new
@@ -439,9 +439,9 @@ static int pkg_thermal_cpu_offline(unsigned int cpu)
* To cancel the work we need to drop the lock, otherwise
* we might deadlock if the work needs to be flushed.
*/
- spin_unlock_irq(&pkg_temp_lock);
+ raw_spin_unlock_irq(&pkg_temp_lock);
cancel_delayed_work_sync(&zonedev->work);
- spin_lock_irq(&pkg_temp_lock);
+ raw_spin_lock_irq(&pkg_temp_lock);
/*
* If this is not the last cpu in the package and the work
* did not run after we dropped the lock above, then we
@@ -452,7 +452,7 @@ static int pkg_thermal_cpu_offline(unsigned int cpu)
pkg_thermal_schedule_work(target, &zonedev->work);
}
- spin_unlock_irq(&pkg_temp_lock);
+ raw_spin_unlock_irq(&pkg_temp_lock);
/* Final cleanup if this is the last cpu */
if (lastcpu)
diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h
index 33ad9d6de532..b29aed97a54d 100644
--- a/drivers/tty/serial/8250/8250.h
+++ b/drivers/tty/serial/8250/8250.h
@@ -130,12 +130,55 @@ static inline void serial_dl_write(struct uart_8250_port *up, int value)
up->dl_write(up, value);
}
+static inline void serial8250_set_IER(struct uart_8250_port *up,
+ unsigned char ier)
+{
+ struct uart_port *port = &up->port;
+ unsigned int flags;
+ bool is_console;
+
+ is_console = uart_console(port);
+
+ if (is_console)
+ console_atomic_lock(&flags);
+
+ serial_out(up, UART_IER, ier);
+
+ if (is_console)
+ console_atomic_unlock(flags);
+}
+
+static inline unsigned char serial8250_clear_IER(struct uart_8250_port *up)
+{
+ struct uart_port *port = &up->port;
+ unsigned int clearval = 0;
+ unsigned int prior;
+ unsigned int flags;
+ bool is_console;
+
+ is_console = uart_console(port);
+
+ if (up->capabilities & UART_CAP_UUE)
+ clearval = UART_IER_UUE;
+
+ if (is_console)
+ console_atomic_lock(&flags);
+
+ prior = serial_port_in(port, UART_IER);
+ serial_port_out(port, UART_IER, clearval);
+
+ if (is_console)
+ console_atomic_unlock(flags);
+
+ return prior;
+}
+
static inline bool serial8250_set_THRI(struct uart_8250_port *up)
{
if (up->ier & UART_IER_THRI)
return false;
up->ier |= UART_IER_THRI;
- serial_out(up, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
return true;
}
@@ -144,7 +187,7 @@ static inline bool serial8250_clear_THRI(struct uart_8250_port *up)
if (!(up->ier & UART_IER_THRI))
return false;
up->ier &= ~UART_IER_THRI;
- serial_out(up, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
return true;
}
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 2675771a03a0..c9fa80dbbd4c 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -274,10 +274,8 @@ static void serial8250_backup_timeout(struct timer_list *t)
* Must disable interrupts or else we risk racing with the interrupt
* based handler.
*/
- if (up->port.irq) {
- ier = serial_in(up, UART_IER);
- serial_out(up, UART_IER, 0);
- }
+ if (up->port.irq)
+ ier = serial8250_clear_IER(up);
iir = serial_in(up, UART_IIR);
@@ -300,7 +298,7 @@ static void serial8250_backup_timeout(struct timer_list *t)
serial8250_tx_chars(up);
if (up->port.irq)
- serial_out(up, UART_IER, ier);
+ serial8250_set_IER(up, ier);
spin_unlock_irqrestore(&up->port.lock, flags);
@@ -578,6 +576,14 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
#ifdef CONFIG_SERIAL_8250_CONSOLE
+static void univ8250_console_write_atomic(struct console *co, const char *s,
+ unsigned int count)
+{
+ struct uart_8250_port *up = &serial8250_ports[co->index];
+
+ serial8250_console_write_atomic(up, s, count);
+}
+
static void univ8250_console_write(struct console *co, const char *s,
unsigned int count)
{
@@ -663,6 +669,7 @@ static int univ8250_console_match(struct console *co, char *name, int idx,
static struct console univ8250_console = {
.name = "ttyS",
+ .write_atomic = univ8250_console_write_atomic,
.write = univ8250_console_write,
.device = uart_console_device,
.setup = univ8250_console_setup,
diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c
index aa0e216d5ead..8f711afadd4b 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -57,9 +57,18 @@ int fsl8250_handle_irq(struct uart_port *port)
/* Stop processing interrupts on input overrun */
if ((orig_lsr & UART_LSR_OE) && (up->overrun_backoff_time_ms > 0)) {
+ unsigned int ca_flags;
unsigned long delay;
+ bool is_console;
+ is_console = uart_console(port);
+
+ if (is_console)
+ console_atomic_lock(&ca_flags);
up->ier = port->serial_in(port, UART_IER);
+ if (is_console)
+ console_atomic_unlock(ca_flags);
+
if (up->ier & (UART_IER_RLSI | UART_IER_RDI)) {
port->ops->stop_rx(port);
} else {
diff --git a/drivers/tty/serial/8250/8250_ingenic.c b/drivers/tty/serial/8250/8250_ingenic.c
index 424c07c5f629..47f1482bd818 100644
--- a/drivers/tty/serial/8250/8250_ingenic.c
+++ b/drivers/tty/serial/8250/8250_ingenic.c
@@ -146,6 +146,8 @@ OF_EARLYCON_DECLARE(x1000_uart, "ingenic,x1000-uart",
static void ingenic_uart_serial_out(struct uart_port *p, int offset, int value)
{
+ unsigned int flags;
+ bool is_console;
int ier;
switch (offset) {
@@ -167,7 +169,12 @@ static void ingenic_uart_serial_out(struct uart_port *p, int offset, int value)
* If we have enabled modem status IRQs we should enable
* modem mode.
*/
+ is_console = uart_console(p);
+ if (is_console)
+ console_atomic_lock(&flags);
ier = p->serial_in(p, UART_IER);
+ if (is_console)
+ console_atomic_unlock(flags);
if (ier & UART_IER_MSI)
value |= UART_MCR_MDCE | UART_MCR_FCM;
diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index 2b59a4305077..3f42bccf4ca8 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -212,12 +212,37 @@ static void mtk8250_shutdown(struct uart_port *port)
static void mtk8250_disable_intrs(struct uart_8250_port *up, int mask)
{
- serial_out(up, UART_IER, serial_in(up, UART_IER) & (~mask));
+ struct uart_port *port = &up->port;
+ unsigned int flags;
+ unsigned int ier;
+ bool is_console;
+
+ is_console = uart_console(port);
+
+ if (is_console)
+ console_atomic_lock(&flags);
+
+ ier = serial_in(up, UART_IER);
+ serial_out(up, UART_IER, ier & (~mask));
+
+ if (is_console)
+ console_atomic_unlock(flags);
}
static void mtk8250_enable_intrs(struct uart_8250_port *up, int mask)
{
- serial_out(up, UART_IER, serial_in(up, UART_IER) | mask);
+ struct uart_port *port = &up->port;
+ unsigned int flags;
+ unsigned int ier;
+
+ if (uart_console(port))
+ console_atomic_lock(&flags);
+
+ ier = serial_in(up, UART_IER);
+ serial_out(up, UART_IER, ier | mask);
+
+ if (uart_console(port))
+ console_atomic_unlock(flags);
}
static void mtk8250_set_flow_ctrl(struct uart_8250_port *up, int mode)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 5b673077639b..1fc608598382 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -721,7 +721,7 @@ static void serial8250_set_sleep(struct uart_8250_port *p, int sleep)
serial_out(p, UART_EFR, UART_EFR_ECB);
serial_out(p, UART_LCR, 0);
}
- serial_out(p, UART_IER, sleep ? UART_IERX_SLEEP : 0);
+ serial8250_set_IER(p, sleep ? UART_IERX_SLEEP : 0);
if (p->capabilities & UART_CAP_EFR) {
serial_out(p, UART_LCR, UART_LCR_CONF_MODE_B);
serial_out(p, UART_EFR, efr);
@@ -1390,7 +1390,7 @@ static void serial8250_stop_rx(struct uart_port *port)
up->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
up->port.read_status_mask &= ~UART_LSR_DR;
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
serial8250_rpm_put(up);
}
@@ -1408,7 +1408,7 @@ static void __do_stop_tx_rs485(struct uart_8250_port *p)
serial8250_clear_and_reinit_fifos(p);
p->ier |= UART_IER_RLSI | UART_IER_RDI;
- serial_port_out(&p->port, UART_IER, p->ier);
+ serial8250_set_IER(p, p->ier);
}
}
static enum hrtimer_restart serial8250_em485_handle_stop_tx(struct hrtimer *t)
@@ -1616,7 +1616,7 @@ static void serial8250_disable_ms(struct uart_port *port)
mctrl_gpio_disable_ms(up->gpios);
up->ier &= ~UART_IER_MSI;
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
}
static void serial8250_enable_ms(struct uart_port *port)
@@ -1632,7 +1632,7 @@ static void serial8250_enable_ms(struct uart_port *port)
up->ier |= UART_IER_MSI;
serial8250_rpm_get(up);
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
serial8250_rpm_put(up);
}
@@ -2040,14 +2040,7 @@ static void serial8250_put_poll_char(struct uart_port *port,
struct uart_8250_port *up = up_to_u8250p(port);
serial8250_rpm_get(up);
- /*
- * First save the IER then disable the interrupts
- */
- ier = serial_port_in(port, UART_IER);
- if (up->capabilities & UART_CAP_UUE)
- serial_port_out(port, UART_IER, UART_IER_UUE);
- else
- serial_port_out(port, UART_IER, 0);
+ ier = serial8250_clear_IER(up);
wait_for_xmitr(up, BOTH_EMPTY);
/*
@@ -2060,7 +2053,7 @@ static void serial8250_put_poll_char(struct uart_port *port,
* and restore the IER
*/
wait_for_xmitr(up, BOTH_EMPTY);
- serial_port_out(port, UART_IER, ier);
+ serial8250_set_IER(up, ier);
serial8250_rpm_put(up);
}
@@ -2375,7 +2368,7 @@ void serial8250_do_shutdown(struct uart_port *port)
*/
spin_lock_irqsave(&port->lock, flags);
up->ier = 0;
- serial_port_out(port, UART_IER, 0);
+ serial8250_set_IER(up, 0);
spin_unlock_irqrestore(&port->lock, flags);
synchronize_irq(port->irq);
@@ -2662,7 +2655,7 @@ serial8250_do_set_termios(struct uart_port *port, struct ktermios *termios,
if (up->capabilities & UART_CAP_RTOIE)
up->ier |= UART_IER_RTOIE;
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
if (up->capabilities & UART_CAP_EFR) {
unsigned char efr = 0;
@@ -3126,7 +3119,7 @@ EXPORT_SYMBOL_GPL(serial8250_set_defaults);
#ifdef CONFIG_SERIAL_8250_CONSOLE
-static void serial8250_console_putchar(struct uart_port *port, int ch)
+static void serial8250_console_putchar_locked(struct uart_port *port, int ch)
{
struct uart_8250_port *up = up_to_u8250p(port);
@@ -3134,6 +3127,18 @@ static void serial8250_console_putchar(struct uart_port *port, int ch)
serial_port_out(port, UART_TX, ch);
}
+static void serial8250_console_putchar(struct uart_port *port, int ch)
+{
+ struct uart_8250_port *up = up_to_u8250p(port);
+ unsigned int flags;
+
+ wait_for_xmitr(up, UART_LSR_THRE);
+
+ console_atomic_lock(&flags);
+ serial8250_console_putchar_locked(port, ch);
+ console_atomic_unlock(flags);
+}
+
/*
* Restore serial console when h/w power-off detected
*/
@@ -3155,6 +3160,32 @@ static void serial8250_console_restore(struct uart_8250_port *up)
serial8250_out_MCR(up, UART_MCR_DTR | UART_MCR_RTS);
}
+void serial8250_console_write_atomic(struct uart_8250_port *up,
+ const char *s, unsigned int count)
+{
+ struct uart_port *port = &up->port;
+ unsigned int flags;
+ unsigned int ier;
+
+ console_atomic_lock(&flags);
+
+ touch_nmi_watchdog();
+
+ ier = serial8250_clear_IER(up);
+
+ if (atomic_fetch_inc(&up->console_printing)) {
+ uart_console_write(port, "\n", 1,
+ serial8250_console_putchar_locked);
+ }
+ uart_console_write(port, s, count, serial8250_console_putchar_locked);
+ atomic_dec(&up->console_printing);
+
+ wait_for_xmitr(up, BOTH_EMPTY);
+ serial8250_set_IER(up, ier);
+
+ console_atomic_unlock(flags);
+}
+
/*
* Print a string to the serial port trying not to disturb
* any possible real use of the port...
@@ -3167,26 +3198,13 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
struct uart_port *port = &up->port;
unsigned long flags;
unsigned int ier;
- int locked = 1;
touch_nmi_watchdog();
serial8250_rpm_get(up);
+ spin_lock_irqsave(&port->lock, flags);
- if (oops_in_progress)
- locked = spin_trylock_irqsave(&port->lock, flags);
- else
- spin_lock_irqsave(&port->lock, flags);
-
- /*
- * First save the IER then disable the interrupts
- */
- ier = serial_port_in(port, UART_IER);
-
- if (up->capabilities & UART_CAP_UUE)
- serial_port_out(port, UART_IER, UART_IER_UUE);
- else
- serial_port_out(port, UART_IER, 0);
+ ier = serial8250_clear_IER(up);
/* check scratch reg to see if port powered off during system sleep */
if (up->canary && (up->canary != serial_port_in(port, UART_SCR))) {
@@ -3194,14 +3212,16 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
up->canary = 0;
}
+ atomic_inc(&up->console_printing);
uart_console_write(port, s, count, serial8250_console_putchar);
+ atomic_dec(&up->console_printing);
/*
* Finally, wait for transmitter to become empty
* and restore the IER
*/
wait_for_xmitr(up, BOTH_EMPTY);
- serial_port_out(port, UART_IER, ier);
+ serial8250_set_IER(up, ier);
/*
* The receive handling will happen properly because the
@@ -3213,8 +3233,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
if (up->msr_saved_flags)
serial8250_modem_status(up);
- if (locked)
- spin_unlock_irqrestore(&port->lock, flags);
+ spin_unlock_irqrestore(&port->lock, flags);
serial8250_rpm_put(up);
}
@@ -3235,6 +3254,7 @@ static unsigned int probe_baud(struct uart_port *port)
int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
{
+ struct uart_8250_port *up = up_to_u8250p(port);
int baud = 9600;
int bits = 8;
int parity = 'n';
@@ -3243,6 +3263,8 @@ int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
if (!port->iobase && !port->membase)
return -ENODEV;
+ atomic_set(&up->console_printing, 0);
+
if (options)
uart_parse_options(options, &baud, &parity, &bits, &flow);
else if (probe)
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 16720c97a4dd..662adcb2d619 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2212,18 +2212,24 @@ pl011_console_write(struct console *co, const char *s, unsigned int count)
{
struct uart_amba_port *uap = amba_ports[co->index];
unsigned int old_cr = 0, new_cr;
- unsigned long flags;
+ unsigned long flags = 0;
int locked = 1;
clk_enable(uap->clk);
- local_irq_save(flags);
+ /*
+ * local_irq_save(flags);
+ *
+ * This local_irq_save() is nonsense. If we come in via sysrq
+ * handling then interrupts are already disabled. Aside of
+ * that the port.sysrq check is racy on SMP regardless.
+ */
if (uap->port.sysrq)
locked = 0;
else if (oops_in_progress)
- locked = spin_trylock(&uap->port.lock);
+ locked = spin_trylock_irqsave(&uap->port.lock, flags);
else
- spin_lock(&uap->port.lock);
+ spin_lock_irqsave(&uap->port.lock, flags);
/*
* First save the CR then disable the interrupts
@@ -2249,8 +2255,7 @@ pl011_console_write(struct console *co, const char *s, unsigned int count)
pl011_write(old_cr, uap, REG_CR);
if (locked)
- spin_unlock(&uap->port.lock);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&uap->port.lock, flags);
clk_disable(uap->clk);
}
diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c
index 6420ae581a80..0f4f41ed9ffa 100644
--- a/drivers/tty/serial/omap-serial.c
+++ b/drivers/tty/serial/omap-serial.c
@@ -1307,13 +1307,10 @@ serial_omap_console_write(struct console *co, const char *s,
pm_runtime_get_sync(up->dev);
- local_irq_save(flags);
- if (up->port.sysrq)
- locked = 0;
- else if (oops_in_progress)
- locked = spin_trylock(&up->port.lock);
+ if (up->port.sysrq || oops_in_progress)
+ locked = spin_trylock_irqsave(&up->port.lock, flags);
else
- spin_lock(&up->port.lock);
+ spin_lock_irqsave(&up->port.lock, flags);
/*
* First save the IER then disable the interrupts
@@ -1342,8 +1339,7 @@ serial_omap_console_write(struct console *co, const char *s,
pm_runtime_mark_last_busy(up->dev);
pm_runtime_put_autosuspend(up->dev);
if (locked)
- spin_unlock(&up->port.lock);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&up->port.lock, flags);
}
static int __init
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index f8bcfc506f4a..11c1978c6372 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1719,7 +1719,7 @@ static void ffs_data_put(struct ffs_data *ffs)
pr_info("%s(): freeing\n", __func__);
ffs_data_clear(ffs);
BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
- waitqueue_active(&ffs->ep0req_completion.wait) ||
+ swait_active(&ffs->ep0req_completion.wait) ||
waitqueue_active(&ffs->wait));
destroy_workqueue(ffs->io_completion_wq);
kfree(ffs->dev_name);
diff --git a/drivers/usb/gadget/legacy/inode.c b/drivers/usb/gadget/legacy/inode.c
index 238f555fe494..1f332d0db93d 100644
--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf, unsigned len)
spin_unlock_irq (&epdata->dev->lock);
if (likely (value == 0)) {
- value = wait_event_interruptible (done.wait, done.done);
+ value = swait_event_interruptible_exclusive(done.wait, done.done);
if (value != 0) {
spin_lock_irq (&epdata->dev->lock);
if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf, unsigned len)
usb_ep_dequeue (epdata->ep, epdata->req);
spin_unlock_irq (&epdata->dev->lock);
- wait_event (done.wait, done.done);
+ swait_event_exclusive(done.wait, done.done);
if (epdata->status == -ECONNRESET)
epdata->status = -EINTR;
} else {
diff --git a/drivers/video/backlight/Kconfig b/drivers/video/backlight/Kconfig
index 40676be2e46a..d09396393724 100644
--- a/drivers/video/backlight/Kconfig
+++ b/drivers/video/backlight/Kconfig
@@ -99,7 +99,7 @@ config LCD_TOSA
config LCD_HP700
tristate "HP Jornada 700 series LCD Driver"
- depends on SA1100_JORNADA720_SSP && !PREEMPT
+ depends on SA1100_JORNADA720_SSP && !PREEMPTION
default y
help
If you have an HP Jornada 700 series handheld (710/720/728)
@@ -228,7 +228,7 @@ config BACKLIGHT_HP680
config BACKLIGHT_HP700
tristate "HP Jornada 700 series Backlight Driver"
- depends on SA1100_JORNADA720_SSP && !PREEMPT
+ depends on SA1100_JORNADA720_SSP && !PREEMPTION
default y
help
If you have an HP Jornada 700 series,
diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c
index 98a9d6892d98..6ad87b5c95ed 100644
--- a/drivers/xen/preempt.c
+++ b/drivers/xen/preempt.c
@@ -8,7 +8,7 @@
#include
#include
-#ifndef CONFIG_PREEMPT
+#ifndef CONFIG_PREEMPTION
/*
* Some hypercalls issued by the toolstack can take many 10s of
@@ -39,4 +39,4 @@ asmlinkage __visible void xen_maybe_preempt_hcall(void)
__this_cpu_write(xen_in_preemptible_hcall, true);
}
}
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c
index d94e2b7cddff..5197f7c3d31f 100644
--- a/fs/afs/dir_silly.c
+++ b/fs/afs/dir_silly.c
@@ -210,7 +210,7 @@ int afs_silly_iput(struct dentry *dentry, struct inode *inode)
struct dentry *alias;
int ret;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
_enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5acf5c507ec2..ad1ccf94e4b8 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -179,7 +179,7 @@ btrfs_device_set_##name(struct btrfs_device *dev, u64 size) \
write_seqcount_end(&dev->data_seqcount); \
preempt_enable(); \
}
-#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
#define BTRFS_DEVICE_GETSET_FUNCS(name) \
static inline u64 \
btrfs_device_get_##name(const struct btrfs_device *dev) \
diff --git a/fs/buffer.c b/fs/buffer.c
index 22d8ac4a8c40..4a4ea8de4e22 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -275,8 +275,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
* decide that the page is now completely done.
*/
first = page_buffers(page);
- local_irq_save(flags);
- bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
+ spin_lock_irqsave(&first->b_uptodate_lock, flags);
clear_buffer_async_read(bh);
unlock_buffer(bh);
tmp = bh;
@@ -289,8 +288,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
}
tmp = tmp->b_this_page;
} while (tmp != bh);
- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
/*
* If none of the buffers had errors and they are all
@@ -302,8 +300,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
return;
still_busy:
- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
return;
}
@@ -331,8 +328,7 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate)
}
first = page_buffers(page);
- local_irq_save(flags);
- bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
+ spin_lock_irqsave(&first->b_uptodate_lock, flags);
clear_buffer_async_write(bh);
unlock_buffer(bh);
@@ -344,14 +340,12 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate)
}
tmp = tmp->b_this_page;
}
- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
end_page_writeback(page);
return;
still_busy:
- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
return;
}
EXPORT_SYMBOL(end_buffer_async_write);
@@ -1404,7 +1398,7 @@ static bool has_bh_in_lru(int cpu, void *dummy)
void invalidate_bh_lrus(void)
{
- on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1, GFP_KERNEL);
+ on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
}
EXPORT_SYMBOL_GPL(invalidate_bh_lrus);
@@ -3365,6 +3359,7 @@ struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags);
if (ret) {
INIT_LIST_HEAD(&ret->b_assoc_buffers);
+ spin_lock_init(&ret->b_uptodate_lock);
preempt_disable();
__this_cpu_inc(bh_accounting.nr);
recalc_bh_state();
diff --git a/fs/cifs/readdir.c b/fs/cifs/readdir.c
index 3925a7bfc74d..33f7723fb83e 100644
--- a/fs/cifs/readdir.c
+++ b/fs/cifs/readdir.c
@@ -80,7 +80,7 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *name,
struct inode *inode;
struct super_block *sb = parent->d_sb;
struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
cifs_dbg(FYI, "%s: for %s\n", __func__, name->name);
diff --git a/fs/dcache.c b/fs/dcache.c
index b2a7f1765f0b..d0267103d4b3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2484,9 +2484,10 @@ EXPORT_SYMBOL(d_rehash);
static inline unsigned start_dir_add(struct inode *dir)
{
+ preempt_disable_rt();
for (;;) {
- unsigned n = dir->i_dir_seq;
- if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n)
+ unsigned n = dir->__i_dir_seq;
+ if (!(n & 1) && cmpxchg(&dir->__i_dir_seq, n, n + 1) == n)
return n;
cpu_relax();
}
@@ -2494,26 +2495,30 @@ static inline unsigned start_dir_add(struct inode *dir)
static inline void end_dir_add(struct inode *dir, unsigned n)
{
- smp_store_release(&dir->i_dir_seq, n + 2);
+ smp_store_release(&dir->__i_dir_seq, n + 2);
+ preempt_enable_rt();
}
static void d_wait_lookup(struct dentry *dentry)
{
- if (d_in_lookup(dentry)) {
- DECLARE_WAITQUEUE(wait, current);
- add_wait_queue(dentry->d_wait, &wait);
- do {
- set_current_state(TASK_UNINTERRUPTIBLE);
- spin_unlock(&dentry->d_lock);
- schedule();
- spin_lock(&dentry->d_lock);
- } while (d_in_lookup(dentry));
- }
+ struct swait_queue __wait;
+
+ if (!d_in_lookup(dentry))
+ return;
+
+ INIT_LIST_HEAD(&__wait.task_list);
+ do {
+ prepare_to_swait_exclusive(dentry->d_wait, &__wait, TASK_UNINTERRUPTIBLE);
+ spin_unlock(&dentry->d_lock);
+ schedule();
+ spin_lock(&dentry->d_lock);
+ } while (d_in_lookup(dentry));
+ finish_swait(dentry->d_wait, &__wait);
}
struct dentry *d_alloc_parallel(struct dentry *parent,
const struct qstr *name,
- wait_queue_head_t *wq)
+ struct swait_queue_head *wq)
{
unsigned int hash = name->hash;
struct hlist_bl_head *b = in_lookup_hash(parent, hash);
@@ -2527,7 +2532,7 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
retry:
rcu_read_lock();
- seq = smp_load_acquire(&parent->d_inode->i_dir_seq);
+ seq = smp_load_acquire(&parent->d_inode->__i_dir_seq);
r_seq = read_seqbegin(&rename_lock);
dentry = __d_lookup_rcu(parent, name, &d_seq);
if (unlikely(dentry)) {
@@ -2555,7 +2560,7 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
}
hlist_bl_lock(b);
- if (unlikely(READ_ONCE(parent->d_inode->i_dir_seq) != seq)) {
+ if (unlikely(READ_ONCE(parent->d_inode->__i_dir_seq) != seq)) {
hlist_bl_unlock(b);
rcu_read_unlock();
goto retry;
@@ -2628,7 +2633,7 @@ void __d_lookup_done(struct dentry *dentry)
hlist_bl_lock(b);
dentry->d_flags &= ~DCACHE_PAR_LOOKUP;
__hlist_bl_del(&dentry->d_u.d_in_lookup_hash);
- wake_up_all(dentry->d_wait);
+ swake_up_all(dentry->d_wait);
dentry->d_wait = NULL;
hlist_bl_unlock(b);
INIT_HLIST_NODE(&dentry->d_u.d_alias);
@@ -3141,6 +3146,8 @@ __setup("dhash_entries=", set_dhash_entries);
static void __init dcache_init_early(void)
{
+ unsigned int loop;
+
/* If hashes are distributed across NUMA nodes, defer
* hash allocation until vmalloc space is available.
*/
@@ -3157,11 +3164,16 @@ static void __init dcache_init_early(void)
NULL,
0,
0);
+
+ for (loop = 0; loop < (1U << d_hash_shift); loop++)
+ INIT_HLIST_BL_HEAD(dentry_hashtable + loop);
+
d_hash_shift = 32 - d_hash_shift;
}
static void __init dcache_init(void)
{
+ unsigned int loop;
/*
* A constructor could be added for stable state like the lists,
* but it is probably not worth it because of the cache nature
@@ -3185,6 +3197,10 @@ static void __init dcache_init(void)
NULL,
0,
0);
+
+ for (loop = 0; loop < (1U << d_hash_shift); loop++)
+ INIT_HLIST_BL_HEAD(dentry_hashtable + loop);
+
d_hash_shift = 32 - d_hash_shift;
}
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 339453ac834c..2f2eefb59299 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -565,12 +565,12 @@ static int ep_poll_wakeup_proc(void *priv, void *cookie, int call_nests)
static void ep_poll_safewake(wait_queue_head_t *wq)
{
- int this_cpu = get_cpu();
+ int this_cpu = get_cpu_light();
ep_call_nested(&poll_safewake_ncalls,
ep_poll_wakeup_proc, NULL, wq, (void *) (long) this_cpu);
- put_cpu();
+ put_cpu_light();
}
#else
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 2cc9f2168b9e..33610b1bbac7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -87,11 +87,10 @@ static void ext4_finish_bio(struct bio *bio)
}
bh = head = page_buffers(page);
/*
- * We check all buffers in the page under BH_Uptodate_Lock
+ * We check all buffers in the page under b_uptodate_lock
* to avoid races with other end io clearing async_write flags
*/
- local_irq_save(flags);
- bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
+ spin_lock_irqsave(&head->b_uptodate_lock, flags);
do {
if (bh_offset(bh) < bio_start ||
bh_offset(bh) + bh->b_size > bio_end) {
@@ -103,8 +102,7 @@ static void ext4_finish_bio(struct bio *bio)
if (bio->bi_status)
buffer_io_error(bh);
} while ((bh = bh->b_this_page) != head);
- bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&head->b_uptodate_lock, flags);
if (!under_io) {
fscrypt_free_bounce_page(bounce_page);
end_page_writeback(page);
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 0ce39658a620..5508d92e3f8f 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -958,3 +958,11 @@ int __fscache_check_consistency(struct fscache_cookie *cookie,
return -ESTALE;
}
EXPORT_SYMBOL(__fscache_check_consistency);
+
+void __init fscache_cookie_init(void)
+{
+ int i;
+
+ for (i = 0; i < (1 << fscache_cookie_hash_shift) - 1; i++)
+ INIT_HLIST_BL_HEAD(&fscache_cookie_hash[i]);
+}
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 59c2494efda3..f9625eb89853 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -145,6 +145,7 @@ static int __init fscache_init(void)
ret = -ENOMEM;
goto error_cookie_jar;
}
+ fscache_cookie_init();
fscache_root = kobject_create_and_add("fscache", kernel_kobj);
if (!fscache_root)
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index 6a40f75a0d25..6e813ecf8f51 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -158,7 +158,7 @@ static int fuse_direntplus_link(struct file *file,
struct inode *dir = d_inode(parent);
struct fuse_conn *fc;
struct inode *inode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
if (!o->nodeid) {
/*
diff --git a/fs/inode.c b/fs/inode.c
index c5267a4db0f5..4ef9239abd82 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -157,7 +157,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
inode->i_bdev = NULL;
inode->i_cdev = NULL;
inode->i_link = NULL;
- inode->i_dir_seq = 0;
+ inode->__i_dir_seq = 0;
inode->i_rdev = 0;
inode->dirtied_when = 0;
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 88146008b3e3..227618a50f41 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -482,10 +482,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (jh->b_committed_data) {
struct buffer_head *bh = jh2bh(jh);
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
jbd2_free(jh->b_committed_data, bh->b_size);
jh->b_committed_data = NULL;
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
}
jbd2_journal_refile_buffer(journal, jh);
}
@@ -920,6 +920,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
transaction_t *cp_transaction;
struct buffer_head *bh;
int try_to_free = 0;
+ bool drop_ref;
jh = commit_transaction->t_forget;
spin_unlock(&journal->j_list_lock);
@@ -929,7 +930,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
* done with it.
*/
get_bh(bh);
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
J_ASSERT_JH(jh, jh->b_transaction == commit_transaction);
/*
@@ -1029,8 +1030,10 @@ void jbd2_journal_commit_transaction(journal_t *journal)
try_to_free = 1;
}
JBUFFER_TRACE(jh, "refile or unfile buffer");
- __jbd2_journal_refile_buffer(jh);
- jbd_unlock_bh_state(bh);
+ drop_ref = __jbd2_journal_refile_buffer(jh);
+ spin_unlock(&jh->b_state_lock);
+ if (drop_ref)
+ jbd2_journal_put_journal_head(jh);
if (try_to_free)
release_buffer_page(bh); /* Drops bh reference */
else
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b7c5819bfc41..e0c9ffac2f0e 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -362,7 +362,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
/* keep subsequent assertions sane */
atomic_set(&new_bh->b_count, 1);
- jbd_lock_bh_state(bh_in);
+ spin_lock(&jh_in->b_state_lock);
repeat:
/*
* If a new transaction has already done a buffer copy-out, then
@@ -404,13 +404,13 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
if (need_copy_out && !done_copy_out) {
char *tmp;
- jbd_unlock_bh_state(bh_in);
+ spin_unlock(&jh_in->b_state_lock);
tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS);
if (!tmp) {
brelse(new_bh);
return -ENOMEM;
}
- jbd_lock_bh_state(bh_in);
+ spin_lock(&jh_in->b_state_lock);
if (jh_in->b_frozen_data) {
jbd2_free(tmp, bh_in->b_size);
goto repeat;
@@ -463,7 +463,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
__jbd2_journal_file_buffer(jh_in, transaction, BJ_Shadow);
spin_unlock(&journal->j_list_lock);
set_buffer_shadow(bh_in);
- jbd_unlock_bh_state(bh_in);
+ spin_unlock(&jh_in->b_state_lock);
return do_escape | (done_copy_out << 1);
}
@@ -2390,6 +2390,8 @@ static struct journal_head *journal_alloc_journal_head(void)
ret = kmem_cache_zalloc(jbd2_journal_head_cache,
GFP_NOFS | __GFP_NOFAIL);
}
+ if (ret)
+ spin_lock_init(&ret->b_state_lock);
return ret;
}
@@ -2509,17 +2511,23 @@ static void __journal_remove_journal_head(struct buffer_head *bh)
J_ASSERT_BH(bh, buffer_jbd(bh));
J_ASSERT_BH(bh, jh2bh(jh) == bh);
BUFFER_TRACE(bh, "remove journal_head");
+
+ /* Unlink before dropping the lock */
+ bh->b_private = NULL;
+ jh->b_bh = NULL; /* debug, really */
+ clear_buffer_jbd(bh);
+}
+
+static void journal_release_journal_head(struct journal_head *jh, size_t b_size)
+{
if (jh->b_frozen_data) {
printk(KERN_WARNING "%s: freeing b_frozen_data\n", __func__);
- jbd2_free(jh->b_frozen_data, bh->b_size);
+ jbd2_free(jh->b_frozen_data, b_size);
}
if (jh->b_committed_data) {
printk(KERN_WARNING "%s: freeing b_committed_data\n", __func__);
- jbd2_free(jh->b_committed_data, bh->b_size);
+ jbd2_free(jh->b_committed_data, b_size);
}
- bh->b_private = NULL;
- jh->b_bh = NULL; /* debug, really */
- clear_buffer_jbd(bh);
journal_free_journal_head(jh);
}
@@ -2537,9 +2545,11 @@ void jbd2_journal_put_journal_head(struct journal_head *jh)
if (!jh->b_jcount) {
__journal_remove_journal_head(bh);
jbd_unlock_bh_journal_head(bh);
+ journal_release_journal_head(jh, bh->b_size);
__brelse(bh);
- } else
+ } else {
jbd_unlock_bh_journal_head(bh);
+ }
}
/*
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 90453309345d..c66452a2c201 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -877,7 +877,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
start_lock = jiffies;
lock_buffer(bh);
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
/* If it takes too long to lock the buffer, trace it */
time_lock = jbd2_time_diff(start_lock, jiffies);
@@ -927,7 +927,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
error = -EROFS;
if (is_handle_aborted(handle)) {
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
goto out;
}
error = 0;
@@ -991,7 +991,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
*/
if (buffer_shadow(bh)) {
JBUFFER_TRACE(jh, "on shadow: sleep");
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
wait_on_bit_io(&bh->b_state, BH_Shadow, TASK_UNINTERRUPTIBLE);
goto repeat;
}
@@ -1012,7 +1012,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
JBUFFER_TRACE(jh, "generate frozen data");
if (!frozen_buffer) {
JBUFFER_TRACE(jh, "allocate memory for buffer");
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
frozen_buffer = jbd2_alloc(jh2bh(jh)->b_size,
GFP_NOFS | __GFP_NOFAIL);
goto repeat;
@@ -1031,7 +1031,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
jh->b_next_transaction = transaction;
done:
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
/*
* If we are about to journal a buffer, then any revoke pending on it is
@@ -1173,7 +1173,7 @@ int jbd2_journal_get_create_access(handle_t *handle, struct buffer_head *bh)
* that case: the transaction must have deleted the buffer for it to be
* reused here.
*/
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
J_ASSERT_JH(jh, (jh->b_transaction == transaction ||
jh->b_transaction == NULL ||
(jh->b_transaction == journal->j_committing_transaction &&
@@ -1208,7 +1208,7 @@ int jbd2_journal_get_create_access(handle_t *handle, struct buffer_head *bh)
jh->b_next_transaction = transaction;
spin_unlock(&journal->j_list_lock);
}
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
/*
* akpm: I added this. ext3_alloc_branch can pick up new indirect
@@ -1279,13 +1279,13 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
committed_data = jbd2_alloc(jh2bh(jh)->b_size,
GFP_NOFS|__GFP_NOFAIL);
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
if (!jh->b_committed_data) {
/* Copy out the current buffer contents into the
* preserved, committed copy. */
JBUFFER_TRACE(jh, "generate b_committed data");
if (!committed_data) {
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
goto repeat;
}
@@ -1293,7 +1293,7 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
committed_data = NULL;
memcpy(jh->b_committed_data, bh->b_data, bh->b_size);
}
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
out:
jbd2_journal_put_journal_head(jh);
if (unlikely(committed_data))
@@ -1394,16 +1394,16 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
*/
if (jh->b_transaction != transaction &&
jh->b_next_transaction != transaction) {
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
J_ASSERT_JH(jh, jh->b_transaction == transaction ||
jh->b_next_transaction == transaction);
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
}
if (jh->b_modified == 1) {
/* If it's in our transaction it must be in BJ_Metadata list. */
if (jh->b_transaction == transaction &&
jh->b_jlist != BJ_Metadata) {
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
if (jh->b_transaction == transaction &&
jh->b_jlist != BJ_Metadata)
pr_err("JBD2: assertion failure: h_type=%u "
@@ -1413,13 +1413,13 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
jh->b_jlist);
J_ASSERT_JH(jh, jh->b_transaction != transaction ||
jh->b_jlist == BJ_Metadata);
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
}
goto out;
}
journal = transaction->t_journal;
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
if (jh->b_modified == 0) {
/*
@@ -1505,7 +1505,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
__jbd2_journal_file_buffer(jh, transaction, BJ_Metadata);
spin_unlock(&journal->j_list_lock);
out_unlock_bh:
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
out:
JBUFFER_TRACE(jh, "exit");
return ret;
@@ -1543,18 +1543,20 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
BUFFER_TRACE(bh, "entry");
- jbd_lock_bh_state(bh);
+ jh = jbd2_journal_grab_journal_head(bh);
+ if (!jh) {
+ __bforget(bh);
+ return 0;
+ }
- if (!buffer_jbd(bh))
- goto not_jbd;
- jh = bh2jh(bh);
+ spin_lock(&jh->b_state_lock);
/* Critical error: attempting to delete a bitmap buffer, maybe?
* Don't do any jbd operations, and return an error. */
if (!J_EXPECT_JH(jh, !jh->b_committed_data,
"inconsistent data on disk")) {
err = -EIO;
- goto not_jbd;
+ goto drop;
}
/* keep track of whether or not this transaction modified us */
@@ -1602,10 +1604,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
__jbd2_journal_file_buffer(jh, transaction, BJ_Forget);
} else {
__jbd2_journal_unfile_buffer(jh);
- if (!buffer_jbd(bh)) {
- spin_unlock(&journal->j_list_lock);
- goto not_jbd;
- }
+ jbd2_journal_put_journal_head(jh);
}
spin_unlock(&journal->j_list_lock);
} else if (jh->b_transaction) {
@@ -1647,7 +1646,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
if (!jh->b_cp_transaction) {
JBUFFER_TRACE(jh, "belongs to none transaction");
spin_unlock(&journal->j_list_lock);
- goto not_jbd;
+ goto drop;
}
/*
@@ -1657,7 +1656,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
if (!buffer_dirty(bh)) {
__jbd2_journal_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
- goto not_jbd;
+ goto drop;
}
/*
@@ -1670,20 +1669,15 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
__jbd2_journal_file_buffer(jh, transaction, BJ_Forget);
spin_unlock(&journal->j_list_lock);
}
-
- jbd_unlock_bh_state(bh);
- __brelse(bh);
drop:
+ __brelse(bh);
+ spin_unlock(&jh->b_state_lock);
+ jbd2_journal_put_journal_head(jh);
if (drop_reserve) {
/* no need to reserve log space for this block -bzzz */
handle->h_buffer_credits++;
}
return err;
-
-not_jbd:
- jbd_unlock_bh_state(bh);
- __bforget(bh);
- goto drop;
}
/**
@@ -1882,7 +1876,7 @@ int jbd2_journal_stop(handle_t *handle)
*
* j_list_lock is held.
*
- * jbd_lock_bh_state(jh2bh(jh)) is held.
+ * jh->b_state_lock is held.
*/
static inline void
@@ -1906,7 +1900,7 @@ __blist_add_buffer(struct journal_head **list, struct journal_head *jh)
*
* Called with j_list_lock held, and the journal may not be locked.
*
- * jbd_lock_bh_state(jh2bh(jh)) is held.
+ * jh->b_state_lock is held.
*/
static inline void
@@ -1938,7 +1932,7 @@ static void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh)
transaction_t *transaction;
struct buffer_head *bh = jh2bh(jh);
- J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
+ lockdep_assert_held(&jh->b_state_lock);
transaction = jh->b_transaction;
if (transaction)
assert_spin_locked(&transaction->t_journal->j_list_lock);
@@ -1975,11 +1969,10 @@ static void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh)
}
/*
- * Remove buffer from all transactions.
+ * Remove buffer from all transactions. The caller is responsible for dropping
+ * the jh reference that belonged to the transaction.
*
* Called with bh_state lock and j_list_lock
- *
- * jh and bh may be already freed when this function returns.
*/
static void __jbd2_journal_unfile_buffer(struct journal_head *jh)
{
@@ -1988,7 +1981,6 @@ static void __jbd2_journal_unfile_buffer(struct journal_head *jh)
__jbd2_journal_temp_unlink_buffer(jh);
jh->b_transaction = NULL;
- jbd2_journal_put_journal_head(jh);
}
void jbd2_journal_unfile_buffer(journal_t *journal, struct journal_head *jh)
@@ -1997,18 +1989,19 @@ void jbd2_journal_unfile_buffer(journal_t *journal, struct journal_head *jh)
/* Get reference so that buffer cannot be freed before we unlock it */
get_bh(bh);
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
spin_lock(&journal->j_list_lock);
__jbd2_journal_unfile_buffer(jh);
spin_unlock(&journal->j_list_lock);
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
+ jbd2_journal_put_journal_head(jh);
__brelse(bh);
}
/*
* Called from jbd2_journal_try_to_free_buffers().
*
- * Called under jbd_lock_bh_state(bh)
+ * Called under jh->b_state_lock
*/
static void
__journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
@@ -2096,10 +2089,10 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal,
if (!jh)
continue;
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
__journal_try_to_free_buffer(journal, bh);
+ spin_unlock(&jh->b_state_lock);
jbd2_journal_put_journal_head(jh);
- jbd_unlock_bh_state(bh);
if (buffer_jbd(bh))
goto busy;
@@ -2135,7 +2128,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal,
*
* Called under j_list_lock.
*
- * Called under jbd_lock_bh_state(bh).
+ * Called under jh->b_state_lock.
*/
static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction)
{
@@ -2156,6 +2149,7 @@ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction)
} else {
JBUFFER_TRACE(jh, "on running transaction");
__jbd2_journal_unfile_buffer(jh);
+ jbd2_journal_put_journal_head(jh);
}
return may_free;
}
@@ -2222,18 +2216,15 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
* holding the page lock. --sct
*/
- if (!buffer_jbd(bh))
+ jh = jbd2_journal_grab_journal_head(bh);
+ if (!jh)
goto zap_buffer_unlocked;
/* OK, we have data buffer in journaled mode */
write_lock(&journal->j_state_lock);
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
spin_lock(&journal->j_list_lock);
- jh = jbd2_journal_grab_journal_head(bh);
- if (!jh)
- goto zap_buffer_no_jh;
-
/*
* We cannot remove the buffer from checkpoint lists until the
* transaction adding inode to orphan list (let's call it T)
@@ -2312,10 +2303,10 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
* for commit and try again.
*/
if (partial_page) {
- jbd2_journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
write_unlock(&journal->j_state_lock);
+ jbd2_journal_put_journal_head(jh);
return -EBUSY;
}
/*
@@ -2329,10 +2320,10 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
if (journal->j_running_transaction && buffer_jbddirty(bh))
jh->b_next_transaction = journal->j_running_transaction;
jh->b_modified = 0;
- jbd2_journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
write_unlock(&journal->j_state_lock);
+ jbd2_journal_put_journal_head(jh);
return 0;
} else {
/* Good, the buffer belongs to the running transaction.
@@ -2356,11 +2347,10 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
* here.
*/
jh->b_modified = 0;
- jbd2_journal_put_journal_head(jh);
-zap_buffer_no_jh:
spin_unlock(&journal->j_list_lock);
- jbd_unlock_bh_state(bh);
+ spin_unlock(&jh->b_state_lock);
write_unlock(&journal->j_state_lock);
+ jbd2_journal_put_journal_head(jh);
zap_buffer_unlocked:
clear_buffer_dirty(bh);
J_ASSERT_BH(bh, !buffer_jbddirty(bh));
@@ -2447,7 +2437,7 @@ void __jbd2_journal_file_buffer(struct journal_head *jh,
int was_dirty = 0;
struct buffer_head *bh = jh2bh(jh);
- J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
+ lockdep_assert_held(&jh->b_state_lock);
assert_spin_locked(&transaction->t_journal->j_list_lock);
J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
@@ -2509,11 +2499,11 @@ void __jbd2_journal_file_buffer(struct journal_head *jh,
void jbd2_journal_file_buffer(struct journal_head *jh,
transaction_t *transaction, int jlist)
{
- jbd_lock_bh_state(jh2bh(jh));
+ spin_lock(&jh->b_state_lock);
spin_lock(&transaction->t_journal->j_list_lock);
__jbd2_journal_file_buffer(jh, transaction, jlist);
spin_unlock(&transaction->t_journal->j_list_lock);
- jbd_unlock_bh_state(jh2bh(jh));
+ spin_unlock(&jh->b_state_lock);
}
/*
@@ -2523,23 +2513,25 @@ void jbd2_journal_file_buffer(struct journal_head *jh,
* buffer on that transaction's metadata list.
*
* Called under j_list_lock
- * Called under jbd_lock_bh_state(jh2bh(jh))
+ * Called under jh->b_state_lock
*
- * jh and bh may be already free when this function returns
+ * When this function returns true, there's no next transaction to refile to
+ * and the caller has to drop jh reference through
+ * jbd2_journal_put_journal_head().
*/
-void __jbd2_journal_refile_buffer(struct journal_head *jh)
+bool __jbd2_journal_refile_buffer(struct journal_head *jh)
{
int was_dirty, jlist;
struct buffer_head *bh = jh2bh(jh);
- J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
+ lockdep_assert_held(&jh->b_state_lock);
if (jh->b_transaction)
assert_spin_locked(&jh->b_transaction->t_journal->j_list_lock);
/* If the buffer is now unused, just drop it. */
if (jh->b_next_transaction == NULL) {
__jbd2_journal_unfile_buffer(jh);
- return;
+ return true;
}
/*
@@ -2574,6 +2566,7 @@ void __jbd2_journal_refile_buffer(struct journal_head *jh)
if (was_dirty)
set_buffer_jbddirty(bh);
+ return false;
}
/*
@@ -2584,16 +2577,15 @@ void __jbd2_journal_refile_buffer(struct journal_head *jh)
*/
void jbd2_journal_refile_buffer(journal_t *journal, struct journal_head *jh)
{
- struct buffer_head *bh = jh2bh(jh);
+ bool drop;
- /* Get reference so that buffer cannot be freed before we unlock it */
- get_bh(bh);
- jbd_lock_bh_state(bh);
+ spin_lock(&jh->b_state_lock);
spin_lock(&journal->j_list_lock);
- __jbd2_journal_refile_buffer(jh);
- jbd_unlock_bh_state(bh);
+ drop = __jbd2_journal_refile_buffer(jh);
+ spin_unlock(&jh->b_state_lock);
spin_unlock(&journal->j_list_lock);
- __brelse(bh);
+ if (drop)
+ jbd2_journal_put_journal_head(jh);
}
/*
diff --git a/fs/namei.c b/fs/namei.c
index 5b5759d70822..2d1dce3fa9b3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1638,7 +1638,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
{
struct dentry *dentry, *old;
struct inode *inode = dir->d_inode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
/* Don't go there if it's already dead */
if (unlikely(IS_DEADDIR(inode)))
@@ -3126,7 +3126,7 @@ static int lookup_open(struct nameidata *nd, struct path *path,
struct dentry *dentry;
int error, create_error = 0;
umode_t mode = op->mode;
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
if (unlikely(IS_DEADDIR(dir_inode)))
return -ENOENT;
diff --git a/fs/namespace.c b/fs/namespace.c
index 2adfe7b166a3..e67f6e405f36 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -14,6 +14,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -321,8 +322,11 @@ int __mnt_want_write(struct vfsmount *m)
* incremented count after it has set MNT_WRITE_HOLD.
*/
smp_mb();
- while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD)
- cpu_relax();
+ while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
+ preempt_enable();
+ cpu_chill();
+ preempt_disable();
+ }
/*
* After the slowpath clears MNT_WRITE_HOLD, mnt_is_readonly will
* be set to match its requirements. So we must not load that until
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index af549d70ec50..86a72d0d3817 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -162,11 +162,11 @@ static int nfs_delegation_claim_opens(struct inode *inode,
sp = state->owner;
/* Block nfs4_proc_unlck */
mutex_lock(&sp->so_delegreturn_mutex);
- seq = raw_seqcount_begin(&sp->so_reclaim_seqcount);
+ seq = read_seqbegin(&sp->so_reclaim_seqlock);
err = nfs4_open_delegation_recall(ctx, state, stateid);
if (!err)
err = nfs_delegation_claim_locks(state, stateid);
- if (!err && read_seqcount_retry(&sp->so_reclaim_seqcount, seq))
+ if (!err && read_seqretry(&sp->so_reclaim_seqlock, seq))
err = -EAGAIN;
mutex_unlock(&sp->so_delegreturn_mutex);
put_nfs_open_context(ctx);
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 188b17a3b19e..3ec873ad794b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -457,7 +457,7 @@ static
void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
{
struct qstr filename = QSTR_INIT(entry->name, entry->len);
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
struct dentry *dentry;
struct dentry *alias;
struct inode *dir = d_inode(parent);
@@ -1520,7 +1520,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned open_flags,
umode_t mode)
{
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
struct nfs_open_context *ctx;
struct dentry *res;
struct iattr attr = { .ia_valid = ATTR_OPEN };
@@ -1849,7 +1849,11 @@ int nfs_rmdir(struct inode *dir, struct dentry *dentry)
trace_nfs_rmdir_enter(dir, dentry);
if (d_really_is_positive(dentry)) {
+#ifdef CONFIG_PREEMPT_RT
+ down(&NFS_I(d_inode(dentry))->rmdir_sem);
+#else
down_write(&NFS_I(d_inode(dentry))->rmdir_sem);
+#endif
error = NFS_PROTO(dir)->rmdir(dir, &dentry->d_name);
/* Ensure the VFS deletes this inode */
switch (error) {
@@ -1859,7 +1863,11 @@ int nfs_rmdir(struct inode *dir, struct dentry *dentry)
case -ENOENT:
nfs_dentry_handle_enoent(dentry);
}
+#ifdef CONFIG_PREEMPT_RT
+ up(&NFS_I(d_inode(dentry))->rmdir_sem);
+#else
up_write(&NFS_I(d_inode(dentry))->rmdir_sem);
+#endif
} else
error = NFS_PROTO(dir)->rmdir(dir, &dentry->d_name);
trace_nfs_rmdir_exit(dir, dentry, error);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 6de41f741280..015ec938a527 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2113,7 +2113,11 @@ static void init_once(void *foo)
atomic_long_set(&nfsi->nrequests, 0);
atomic_long_set(&nfsi->commit_info.ncommit, 0);
atomic_set(&nfsi->commit_info.rpcs_out, 0);
+#ifdef CONFIG_PREEMPT_RT
+ sema_init(&nfsi->rmdir_sem, 1);
+#else
init_rwsem(&nfsi->rmdir_sem);
+#endif
mutex_init(&nfsi->commit_mutex);
nfs4_init_once(nfsi);
}
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index bb322d9de313..68b8ccb53fe3 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -115,7 +115,7 @@ struct nfs4_state_owner {
unsigned long so_flags;
struct list_head so_states;
struct nfs_seqid_counter so_seqid;
- seqcount_t so_reclaim_seqcount;
+ seqlock_t so_reclaim_seqlock;
struct mutex so_delegreturn_mutex;
};
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 00435556db0c..40442546a767 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2983,7 +2983,7 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
unsigned int seq;
int ret;
- seq = raw_seqcount_begin(&sp->so_reclaim_seqcount);
+ seq = raw_seqcount_begin(&sp->so_reclaim_seqlock.seqcount);
dir_verifier = nfs_save_change_attribute(dir);
ret = _nfs4_proc_open(opendata, ctx);
@@ -3037,7 +3037,7 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
if (d_inode(dentry) == state->inode) {
nfs_inode_attach_open_context(ctx);
- if (read_seqcount_retry(&sp->so_reclaim_seqcount, seq))
+ if (read_seqretry(&sp->so_reclaim_seqlock, seq))
nfs4_schedule_stateid_recovery(server, state);
}
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index ea680f619438..09143fa7e6aa 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -508,7 +508,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
nfs4_init_seqid_counter(&sp->so_seqid);
atomic_set(&sp->so_count, 1);
INIT_LIST_HEAD(&sp->so_lru);
- seqcount_init(&sp->so_reclaim_seqcount);
+ seqlock_init(&sp->so_reclaim_seqlock);
mutex_init(&sp->so_delegreturn_mutex);
return sp;
}
@@ -1616,8 +1616,12 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
* recovering after a network partition or a reboot from a
* server that doesn't support a grace period.
*/
+#ifdef CONFIG_PREEMPT_RT
+ write_seqlock(&sp->so_reclaim_seqlock);
+#else
+ write_seqcount_begin(&sp->so_reclaim_seqlock.seqcount);
+#endif
spin_lock(&sp->so_lock);
- raw_write_seqcount_begin(&sp->so_reclaim_seqcount);
restart:
list_for_each_entry(state, &sp->so_states, open_states) {
if (!test_and_clear_bit(ops->state_flag_bit, &state->flags))
@@ -1678,14 +1682,20 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
spin_lock(&sp->so_lock);
goto restart;
}
- raw_write_seqcount_end(&sp->so_reclaim_seqcount);
spin_unlock(&sp->so_lock);
+#ifdef CONFIG_PREEMPT_RT
+ write_sequnlock(&sp->so_reclaim_seqlock);
+#else
+ write_seqcount_end(&sp->so_reclaim_seqlock.seqcount);
+#endif
return 0;
out_err:
nfs4_put_open_state(state);
- spin_lock(&sp->so_lock);
- raw_write_seqcount_end(&sp->so_reclaim_seqcount);
- spin_unlock(&sp->so_lock);
+#ifdef CONFIG_PREEMPT_RT
+ write_sequnlock(&sp->so_reclaim_seqlock);
+#else
+ write_seqcount_end(&sp->so_reclaim_seqlock.seqcount);
+#endif
return status;
}
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 0effeee28352..efa5adce9b30 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -13,7 +13,7 @@
#include
#include
#include
-#include
+#include
#include
#include
@@ -53,6 +53,29 @@ static void nfs_async_unlink_done(struct rpc_task *task, void *calldata)
rpc_restart_call_prepare(task);
}
+#ifdef CONFIG_PREEMPT_RT
+static void nfs_down_anon(struct semaphore *sema)
+{
+ down(sema);
+}
+
+static void nfs_up_anon(struct semaphore *sema)
+{
+ up(sema);
+}
+
+#else
+static void nfs_down_anon(struct rw_semaphore *rwsem)
+{
+ down_read_non_owner(rwsem);
+}
+
+static void nfs_up_anon(struct rw_semaphore *rwsem)
+{
+ up_read_non_owner(rwsem);
+}
+#endif
+
/**
* nfs_async_unlink_release - Release the sillydelete data.
* @calldata: struct nfs_unlinkdata to release
@@ -66,7 +89,7 @@ static void nfs_async_unlink_release(void *calldata)
struct dentry *dentry = data->dentry;
struct super_block *sb = dentry->d_sb;
- up_read_non_owner(&NFS_I(d_inode(dentry->d_parent))->rmdir_sem);
+ nfs_up_anon(&NFS_I(d_inode(dentry->d_parent))->rmdir_sem);
d_lookup_done(dentry);
nfs_free_unlinkdata(data);
dput(dentry);
@@ -119,10 +142,10 @@ static int nfs_call_unlink(struct dentry *dentry, struct inode *inode, struct nf
struct inode *dir = d_inode(dentry->d_parent);
struct dentry *alias;
- down_read_non_owner(&NFS_I(dir)->rmdir_sem);
+ nfs_down_anon(&NFS_I(dir)->rmdir_sem);
alias = d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq);
if (IS_ERR(alias)) {
- up_read_non_owner(&NFS_I(dir)->rmdir_sem);
+ nfs_up_anon(&NFS_I(dir)->rmdir_sem);
return 0;
}
if (!d_in_lookup(alias)) {
@@ -144,7 +167,7 @@ static int nfs_call_unlink(struct dentry *dentry, struct inode *inode, struct nf
ret = 0;
spin_unlock(&alias->d_lock);
dput(alias);
- up_read_non_owner(&NFS_I(dir)->rmdir_sem);
+ nfs_up_anon(&NFS_I(dir)->rmdir_sem);
/*
* If we'd displaced old cached devname, free it. At that
* point dentry is definitely not a root, so we won't need
@@ -180,7 +203,7 @@ nfs_async_unlink(struct dentry *dentry, const struct qstr *name)
data->cred = get_current_cred();
data->res.dir_attr = &data->dir_attr;
- init_waitqueue_head(&data->wq);
+ init_swait_queue_head(&data->wq);
status = -EBUSY;
spin_lock(&dentry->d_lock);
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index 7202a1e39d70..554b744f41bf 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -92,8 +92,7 @@ static void ntfs_end_buffer_async_read(struct buffer_head *bh, int uptodate)
"0x%llx.", (unsigned long long)bh->b_blocknr);
}
first = page_buffers(page);
- local_irq_save(flags);
- bit_spin_lock(BH_Uptodate_Lock, &first->b_state);
+ spin_lock_irqsave(&first->b_uptodate_lock, flags);
clear_buffer_async_read(bh);
unlock_buffer(bh);
tmp = bh;
@@ -108,8 +107,7 @@ static void ntfs_end_buffer_async_read(struct buffer_head *bh, int uptodate)
}
tmp = tmp->b_this_page;
} while (tmp != bh);
- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
/*
* If none of the buffers had errors then we can set the page uptodate,
* but we first have to perform the post read mst fixups, if the
@@ -142,8 +140,7 @@ static void ntfs_end_buffer_async_read(struct buffer_head *bh, int uptodate)
unlock_page(page);
return;
still_busy:
- bit_spin_unlock(BH_Uptodate_Lock, &first->b_state);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
return;
}
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 5e0eaea47405..92225a4df817 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1252,6 +1252,7 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh,
int nr)
{
struct ocfs2_group_desc *bg = (struct ocfs2_group_desc *) bg_bh->b_data;
+ struct journal_head *jh;
int ret;
if (ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap))
@@ -1260,13 +1261,14 @@ static int ocfs2_test_bg_bit_allocatable(struct buffer_head *bg_bh,
if (!buffer_jbd(bg_bh))
return 1;
- jbd_lock_bh_state(bg_bh);
- bg = (struct ocfs2_group_desc *) bh2jh(bg_bh)->b_committed_data;
+ jh = bh2jh(bg_bh);
+ spin_lock(&jh->b_state_lock);
+ bg = (struct ocfs2_group_desc *) jh->b_committed_data;
if (bg)
ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap);
else
ret = 1;
- jbd_unlock_bh_state(bg_bh);
+ spin_unlock(&jh->b_state_lock);
return ret;
}
@@ -2387,6 +2389,7 @@ static int ocfs2_block_group_clear_bits(handle_t *handle,
int status;
unsigned int tmp;
struct ocfs2_group_desc *undo_bg = NULL;
+ struct journal_head *jh;
/* The caller got this descriptor from
* ocfs2_read_group_descriptor(). Any corruption is a code bug. */
@@ -2405,10 +2408,10 @@ static int ocfs2_block_group_clear_bits(handle_t *handle,
goto bail;
}
+ jh = bh2jh(group_bh);
if (undo_fn) {
- jbd_lock_bh_state(group_bh);
- undo_bg = (struct ocfs2_group_desc *)
- bh2jh(group_bh)->b_committed_data;
+ spin_lock(&jh->b_state_lock);
+ undo_bg = (struct ocfs2_group_desc *) jh->b_committed_data;
BUG_ON(!undo_bg);
}
@@ -2423,7 +2426,7 @@ static int ocfs2_block_group_clear_bits(handle_t *handle,
le16_add_cpu(&bg->bg_free_bits_count, num_bits);
if (le16_to_cpu(bg->bg_free_bits_count) > le16_to_cpu(bg->bg_bits)) {
if (undo_fn)
- jbd_unlock_bh_state(group_bh);
+ spin_unlock(&jh->b_state_lock);
return ocfs2_error(alloc_inode->i_sb, "Group descriptor # %llu has bit count %u but claims %u are freed. num_bits %d\n",
(unsigned long long)le64_to_cpu(bg->bg_blkno),
le16_to_cpu(bg->bg_bits),
@@ -2432,7 +2435,7 @@ static int ocfs2_block_group_clear_bits(handle_t *handle,
}
if (undo_fn)
- jbd_unlock_bh_state(group_bh);
+ spin_unlock(&jh->b_state_lock);
ocfs2_journal_dirty(handle, group_bh);
bail:
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b690074e65ff..b19cead6bbc4 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -95,6 +95,7 @@
#include
#include
#include
+#include
#include "internal.h"
#include "fd.h"
@@ -1890,7 +1891,7 @@ bool proc_fill_cache(struct file *file, struct dir_context *ctx,
child = d_hash_and_lookup(dir, &qname);
if (!child) {
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
child = d_alloc_parallel(dir, &qname, &wq);
if (IS_ERR(child))
goto end_instantiate;
diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c
index 4f4a2abb225e..4e62963a87ca 100644
--- a/fs/proc/kmsg.c
+++ b/fs/proc/kmsg.c
@@ -18,8 +18,6 @@
#include
#include
-extern wait_queue_head_t log_wait;
-
static int kmsg_open(struct inode * inode, struct file * file)
{
return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC);
@@ -42,7 +40,7 @@ static ssize_t kmsg_read(struct file *file, char __user *buf,
static __poll_t kmsg_poll(struct file *file, poll_table *wait)
{
- poll_wait(file, &log_wait, wait);
+ poll_wait(file, printk_wait_queue(), wait);
if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
return EPOLLIN | EPOLLRDNORM;
return 0;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index d80989b6c344..b5921f996e86 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -702,7 +702,7 @@ static bool proc_sys_fill_cache(struct file *file,
child = d_lookup(dir, &qname);
if (!child) {
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
+ DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq);
child = d_alloc_parallel(dir, &qname, &wq);
if (IS_ERR(child))
return false;
diff --git a/fs/squashfs/decompressor_multi_percpu.c b/fs/squashfs/decompressor_multi_percpu.c
index 2a2a2d106440..f8dd25676328 100644
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@ -8,6 +8,7 @@
#include
#include
#include
+#include
#include "squashfs_fs.h"
#include "squashfs_fs_sb.h"
@@ -23,6 +24,8 @@ struct squashfs_stream {
void *stream;
};
+static DEFINE_LOCAL_IRQ_LOCK(stream_lock);
+
void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
void *comp_opts)
{
@@ -77,10 +80,15 @@ int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
{
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
- struct squashfs_stream *stream = get_cpu_ptr(percpu);
- int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
- offset, length, output);
- put_cpu_ptr(stream);
+ struct squashfs_stream *stream;
+ int res;
+
+ stream = get_locked_ptr(stream_lock, percpu);
+
+ res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
+ offset, length, output);
+
+ put_locked_ptr(stream_lock, stream);
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
diff --git a/fs/stack.c b/fs/stack.c
index 4ef2c056269d..c9830924eb12 100644
--- a/fs/stack.c
+++ b/fs/stack.c
@@ -23,7 +23,7 @@ void fsstack_copy_inode_size(struct inode *dst, struct inode *src)
/*
* But on 32-bit, we ought to make an effort to keep the two halves of
- * i_blocks in sync despite SMP or PREEMPT - though stat's
+ * i_blocks in sync despite SMP or PREEMPTION - though stat's
* generic_fillattr() doesn't bother, and we won't be applying quotas
* (where i_blocks does become important) at the upper level.
*
@@ -38,14 +38,14 @@ void fsstack_copy_inode_size(struct inode *dst, struct inode *src)
spin_unlock(&src->i_lock);
/*
- * If CONFIG_SMP or CONFIG_PREEMPT on 32-bit, it's vital for
+ * If CONFIG_SMP or CONFIG_PREEMPTION on 32-bit, it's vital for
* fsstack_copy_inode_size() to hold some lock around
* i_size_write(), otherwise i_size_read() may spin forever (see
* include/linux/fs.h). We don't necessarily hold i_mutex when this
* is called, so take i_lock for that case.
*
* And if on 32-bit, continue our effort to keep the two halves of
- * i_blocks in sync despite SMP or PREEMPT: use i_lock for that case
+ * i_blocks in sync despite SMP or PREEMPTION: use i_lock for that case
* too, and do both at once by combining the tests.
*
* There is none of this locking overhead in the 64-bit case.
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index d99d166fd892..2deffbc812ee 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -61,7 +61,7 @@ struct userfaultfd_ctx {
/* waitqueue head for events */
wait_queue_head_t event_wqh;
/* a refile sequence protected by fault_pending_wqh lock */
- struct seqcount refile_seq;
+ seqlock_t refile_seq;
/* pseudo fd refcounting */
refcount_t refcount;
/* userfaultfd syscall flags */
@@ -1063,7 +1063,7 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
* waitqueue could become empty if this is the
* only userfault.
*/
- write_seqcount_begin(&ctx->refile_seq);
+ write_seqlock(&ctx->refile_seq);
/*
* The fault_pending_wqh.lock prevents the uwq
@@ -1089,7 +1089,7 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
list_del(&uwq->wq.entry);
add_wait_queue(&ctx->fault_wqh, &uwq->wq);
- write_seqcount_end(&ctx->refile_seq);
+ write_sequnlock(&ctx->refile_seq);
/* careful to always initialize msg if ret == 0 */
*msg = uwq->msg;
@@ -1262,11 +1262,11 @@ static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
* sure we've userfaults to wake.
*/
do {
- seq = read_seqcount_begin(&ctx->refile_seq);
+ seq = read_seqbegin(&ctx->refile_seq);
need_wakeup = waitqueue_active(&ctx->fault_pending_wqh) ||
waitqueue_active(&ctx->fault_wqh);
cond_resched();
- } while (read_seqcount_retry(&ctx->refile_seq, seq));
+ } while (read_seqretry(&ctx->refile_seq, seq));
if (need_wakeup)
__wake_userfault(ctx, range);
}
@@ -1939,7 +1939,7 @@ static void init_once_userfaultfd_ctx(void *mem)
init_waitqueue_head(&ctx->fault_wqh);
init_waitqueue_head(&ctx->event_wqh);
init_waitqueue_head(&ctx->fd_wqh);
- seqcount_init(&ctx->refile_seq);
+ seqlock_init(&ctx->refile_seq);
}
SYSCALL_DEFINE1(userfaultfd, int, flags)
diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
index a19519f4241d..ef2366a65ba2 100644
--- a/include/linux/bottom_half.h
+++ b/include/linux/bottom_half.h
@@ -4,6 +4,10 @@
#include
+#ifdef CONFIG_PREEMPT_RT
+extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt);
+#else
+
#ifdef CONFIG_TRACE_IRQFLAGS
extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt);
#else
@@ -13,6 +17,7 @@ static __always_inline void __local_bh_disable_ip(unsigned long ip, unsigned int
barrier();
}
#endif
+#endif
static inline void local_bh_disable(void)
{
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7aa0d8b5aaf0..576f008f16c0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -541,7 +541,7 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array,
struct bpf_prog *_prog; \
struct bpf_prog_array *_array; \
u32 _ret = 1; \
- preempt_disable(); \
+ migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
if (unlikely(check_non_null && !_array))\
@@ -554,7 +554,7 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array,
} \
_out: \
rcu_read_unlock(); \
- preempt_enable(); \
+ migrate_enable(); \
_ret; \
})
@@ -588,7 +588,7 @@ _out: \
u32 ret; \
u32 _ret = 1; \
u32 _cn = 0; \
- preempt_disable(); \
+ migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
_item = &_array->items[0]; \
@@ -600,7 +600,7 @@ _out: \
_item++; \
} \
rcu_read_unlock(); \
- preempt_enable(); \
+ migrate_enable(); \
if (_ret) \
_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \
else \
@@ -617,6 +617,36 @@ _out: \
#ifdef CONFIG_BPF_SYSCALL
DECLARE_PER_CPU(int, bpf_prog_active);
+/*
+ * Block execution of BPF programs attached to instrumentation (perf,
+ * kprobes, tracepoints) to prevent deadlocks on map operations as any of
+ * these events can happen inside a region which holds a map bucket lock
+ * and can deadlock on it.
+ *
+ * Use the preemption safe inc/dec variants on RT because migrate disable
+ * is preemptible on RT and preemption in the middle of the RMW operation
+ * might lead to inconsistent state. Use the raw variants for non RT
+ * kernels as migrate_disable() maps to preempt_disable() so the slightly
+ * more expensive save operation can be avoided.
+ */
+static inline void bpf_disable_instrumentation(void)
+{
+ migrate_disable();
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ this_cpu_inc(bpf_prog_active);
+ else
+ __this_cpu_inc(bpf_prog_active);
+}
+
+static inline void bpf_enable_instrumentation(void)
+{
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ this_cpu_dec(bpf_prog_active);
+ else
+ __this_cpu_dec(bpf_prog_active);
+ migrate_enable();
+}
+
extern const struct file_operations bpf_map_fops;
extern const struct file_operations bpf_prog_fops;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index b56cc825f64d..15b765a181b8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -22,9 +22,6 @@ enum bh_state_bits {
BH_Dirty, /* Is dirty */
BH_Lock, /* Is locked */
BH_Req, /* Has been submitted for I/O */
- BH_Uptodate_Lock,/* Used by the first bh in a page, to serialise
- * IO completion of other buffers in the page
- */
BH_Mapped, /* Has a disk mapping */
BH_New, /* Disk mapping was newly created by get_block */
@@ -76,6 +73,9 @@ struct buffer_head {
struct address_space *b_assoc_map; /* mapping this buffer is
associated with */
atomic_t b_count; /* users using this buffer_head */
+ spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to
+ * serialise IO completion of other
+ * buffers in the page */
};
/*
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 1ccfa3779e18..1162ab71f919 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -144,9 +144,6 @@ struct cgroup_subsys_state {
struct list_head sibling;
struct list_head children;
- /* flush target list anchored at cgrp->rstat_css_list */
- struct list_head rstat_css_node;
-
/*
* PI: Subsys-unique ID. 0 is unused and root is always 1. The
* matching css can be looked up using css_from_id().
@@ -455,7 +452,6 @@ struct cgroup {
/* per-cpu recursive resource statistics */
struct cgroup_rstat_cpu __percpu *rstat_cpu;
- struct list_head rstat_css_list;
/* cgroup basic resource statistics */
struct cgroup_base_stat pending_bstat; /* pending from children */
@@ -633,7 +629,6 @@ struct cgroup_subsys {
void (*css_released)(struct cgroup_subsys_state *css);
void (*css_free)(struct cgroup_subsys_state *css);
void (*css_reset)(struct cgroup_subsys_state *css);
- void (*css_rstat_flush)(struct cgroup_subsys_state *css, int cpu);
int (*css_extra_stat_show)(struct seq_file *seq,
struct cgroup_subsys_state *css);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 202852383ae9..1843a1a4ac34 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -751,9 +751,6 @@ static inline void cgroup_path_from_kernfs_id(const union kernfs_node_id *id,
*/
void cgroup_rstat_updated(struct cgroup *cgrp, int cpu);
void cgroup_rstat_flush(struct cgroup *cgrp);
-void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp);
-void cgroup_rstat_flush_hold(struct cgroup *cgrp);
-void cgroup_rstat_flush_release(void);
/*
* Basic resource stats.
diff --git a/include/linux/completion.h b/include/linux/completion.h
index 519e94915d18..bf8e77001f18 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
* See kernel/sched/completion.c for details.
*/
-#include
+#include
/*
* struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
*/
struct completion {
unsigned int done;
- wait_queue_head_t wait;
+ struct swait_queue_head wait;
};
#define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(struct completion *x) {}
static inline void complete_release(struct completion *x) {}
#define COMPLETION_INITIALIZER(work) \
- { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+ { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
#define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(struct completion *x) {}
static inline void __init_completion(struct completion *x)
{
x->done = 0;
- init_waitqueue_head(&x->wait);
+ init_swait_queue_head(&x->wait);
}
/**
diff --git a/include/linux/console.h b/include/linux/console.h
index d09951d5a94e..39d1bb43b4f2 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -145,6 +145,7 @@ static inline int con_debug_leave(void)
struct console {
char name[16];
void (*write)(struct console *, const char *, unsigned);
+ void (*write_atomic)(struct console *, const char *, unsigned);
int (*read)(struct console *, char *, unsigned);
struct tty_driver *(*device)(struct console *, int *);
void (*unblank)(void);
@@ -153,6 +154,8 @@ struct console {
short flags;
short index;
int cflag;
+ unsigned long printk_seq;
+ int wrote_history;
void *data;
struct console *next;
};
@@ -234,4 +237,7 @@ extern void console_init(void);
void dummycon_register_output_notifier(struct notifier_block *nb);
void dummycon_unregister_output_notifier(struct notifier_block *nb);
+extern void console_atomic_lock(unsigned int *flags);
+extern void console_atomic_unlock(unsigned int flags);
+
#endif /* _LINUX_CONSOLE_H */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 10090f11ab95..ac021d808c71 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -106,7 +106,7 @@ struct dentry {
union {
struct list_head d_lru; /* LRU list */
- wait_queue_head_t *d_wait; /* in-lookup ones only */
+ struct swait_queue_head *d_wait; /* in-lookup ones only */
};
struct list_head d_child; /* child of parent list */
struct list_head d_subdirs; /* our children */
@@ -236,7 +236,7 @@ extern void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op
extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_anon(struct super_block *);
extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
- wait_queue_head_t *);
+ struct swait_queue_head *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
extern struct dentry * d_exact_alias(struct dentry *, struct inode *);
diff --git a/include/linux/delay.h b/include/linux/delay.h
index 8e6828094c1e..7f2df975a53e 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -65,4 +65,10 @@ static inline void ssleep(unsigned int seconds)
msleep(seconds * 1000);
}
+#ifdef CONFIG_PREEMPT_RT
+extern void cpu_chill(void);
+#else
+# define cpu_chill() cpu_relax()
+#endif
+
#endif /* defined(_LINUX_DELAY_H) */
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..81272ca122ee 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -65,13 +65,13 @@ struct dma_resv_list {
/**
* struct dma_resv - a reservation object manages fences for a buffer
* @lock: update side lock
- * @seq: sequence count for managing RCU read-side synchronization
+ * @seq: sequence lock for managing RCU read-side synchronization
* @fence_excl: the exclusive fence, if there is one currently
* @fence: list of current shared fences
*/
struct dma_resv {
struct ww_mutex lock;
- seqcount_t seq;
+ seqlock_t seq;
struct dma_fence __rcu *fence_excl;
struct dma_resv_list __rcu *fence;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 79830bc9e45c..c466b3b8552f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -555,7 +555,7 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
#define BPF_PROG_RUN(prog, ctx) ({ \
u32 ret; \
- cant_sleep(); \
+ cant_migrate(); \
if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
struct bpf_prog_stats *stats; \
u64 start = sched_clock(); \
@@ -570,6 +570,28 @@ DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
} \
ret; })
+/*
+ * Use in preemptible and therefore migratable context to make sure that
+ * the execution of the BPF program runs on one CPU.
+ *
+ * This uses migrate_disable/enable() explicitly to document that the
+ * invocation of a BPF program does not require reentrancy protection
+ * against a BPF program which is invoked from a preempting task.
+ *
+ * For non RT enabled kernels migrate_disable/enable() maps to
+ * preempt_disable/enable(), i.e. it disables also preemption.
+ */
+static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog,
+ const void *ctx)
+{
+ u32 ret;
+
+ migrate_disable();
+ ret = BPF_PROG_RUN(prog, ctx);
+ migrate_enable();
+ return ret;
+}
+
#define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
struct bpf_skb_data_end {
@@ -647,6 +669,7 @@ static inline u8 *bpf_skb_cb(struct sk_buff *skb)
return qdisc_skb_cb(skb)->data;
}
+/* Must be invoked with migration disabled */
static inline u32 __bpf_prog_run_save_cb(const struct bpf_prog *prog,
struct sk_buff *skb)
{
@@ -672,9 +695,9 @@ static inline u32 bpf_prog_run_save_cb(const struct bpf_prog *prog,
{
u32 res;
- preempt_disable();
+ migrate_disable();
res = __bpf_prog_run_save_cb(prog, skb);
- preempt_enable();
+ migrate_enable();
return res;
}
@@ -687,9 +710,7 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
if (unlikely(prog->cb_access))
memset(cb_data, 0, BPF_SKB_CB_LEN);
- preempt_disable();
- res = BPF_PROG_RUN(prog, skb);
- preempt_enable();
+ res = bpf_prog_run_pin_on_cpu(prog, skb);
return res;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4c82683e034a..a0f3b71f040b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -717,7 +717,7 @@ struct inode {
struct block_device *i_bdev;
struct cdev *i_cdev;
char *i_link;
- unsigned i_dir_seq;
+ unsigned __i_dir_seq;
};
__u32 i_generation;
@@ -856,7 +856,7 @@ static inline loff_t i_size_read(const struct inode *inode)
i_size = inode->i_size;
} while (read_seqcount_retry(&inode->i_size_seqcount, seq));
return i_size;
-#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
loff_t i_size;
preempt_disable();
@@ -881,7 +881,7 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
inode->i_size = i_size;
write_seqcount_end(&inode->i_size_seqcount);
preempt_enable();
-#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
preempt_disable();
inode->i_size = i_size;
preempt_enable();
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index ad044c0cb1f3..164bfe4d207d 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -226,6 +226,7 @@ extern void __fscache_readpages_cancel(struct fscache_cookie *cookie,
extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool);
extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t,
bool (*)(void *), void *);
+extern void fscache_cookie_init(void);
/**
* fscache_register_netfs - Register a filesystem as desiring caching services
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index c1bf9956256f..48886ae362a9 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -717,7 +717,7 @@ static inline void hd_free_part(struct hd_struct *part)
* accessor function.
*
* Code written along the lines of i_size_read() and i_size_write().
- * CONFIG_PREEMPT case optimizes the case of UP kernel with preemption
+ * CONFIG_PREEMPTION case optimizes the case of UP kernel with preemption
* on.
*/
static inline sector_t part_nr_sects_read(struct hd_struct *part)
@@ -730,7 +730,7 @@ static inline sector_t part_nr_sects_read(struct hd_struct *part)
nr_sects = part->nr_sects;
} while (read_seqcount_retry(&part->nr_sects_seq, seq));
return nr_sects;
-#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
sector_t nr_sects;
preempt_disable();
@@ -755,7 +755,7 @@ static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
part->nr_sects = size;
write_seqcount_end(&part->nr_sects_seq);
preempt_enable();
-#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION)
preempt_disable();
part->nr_sects = size;
preempt_enable();
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index da0af631ded5..8d7bb3c0ff06 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -68,7 +68,6 @@ extern void irq_exit(void);
#define nmi_enter() \
do { \
arch_nmi_enter(); \
- printk_nmi_enter(); \
lockdep_off(); \
ftrace_nmi_enter(); \
BUG_ON(in_nmi()); \
@@ -85,7 +84,6 @@ extern void irq_exit(void);
preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
ftrace_nmi_exit(); \
lockdep_on(); \
- printk_nmi_exit(); \
arch_nmi_exit(); \
} while (0)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ea5cdbd8c2c3..4191b158c66a 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -8,6 +8,7 @@
#include
#include
#include
+#include
#include
@@ -90,7 +91,7 @@ static inline void kunmap(struct page *page)
static inline void *kmap_atomic(struct page *page)
{
- preempt_disable();
+ preempt_disable_nort();
pagefault_disable();
return page_address(page);
}
@@ -99,7 +100,7 @@ static inline void *kmap_atomic(struct page *page)
static inline void __kunmap_atomic(void *addr)
{
pagefault_enable();
- preempt_enable();
+ preempt_enable_nort();
}
#define kmap_atomic_pfn(pfn) kmap_atomic(pfn_to_page(pfn))
@@ -111,32 +112,51 @@ static inline void __kunmap_atomic(void *addr)
#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
+#ifndef CONFIG_PREEMPT_RT
DECLARE_PER_CPU(int, __kmap_atomic_idx);
+#endif
static inline int kmap_atomic_idx_push(void)
{
+#ifndef CONFIG_PREEMPT_RT
int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
-#ifdef CONFIG_DEBUG_HIGHMEM
+# ifdef CONFIG_DEBUG_HIGHMEM
WARN_ON_ONCE(in_irq() && !irqs_disabled());
BUG_ON(idx >= KM_TYPE_NR);
-#endif
+# endif
return idx;
+#else
+ current->kmap_idx++;
+ BUG_ON(current->kmap_idx > KM_TYPE_NR);
+ return current->kmap_idx - 1;
+#endif
}
static inline int kmap_atomic_idx(void)
{
+#ifndef CONFIG_PREEMPT_RT
return __this_cpu_read(__kmap_atomic_idx) - 1;
+#else
+ return current->kmap_idx - 1;
+#endif
}
static inline void kmap_atomic_idx_pop(void)
{
-#ifdef CONFIG_DEBUG_HIGHMEM
+#ifndef CONFIG_PREEMPT_RT
+# ifdef CONFIG_DEBUG_HIGHMEM
int idx = __this_cpu_dec_return(__kmap_atomic_idx);
BUG_ON(idx < 0);
-#else
+# else
__this_cpu_dec(__kmap_atomic_idx);
+# endif
+#else
+ current->kmap_idx--;
+# ifdef CONFIG_DEBUG_HIGHMEM
+ BUG_ON(current->kmap_idx < 0);
+# endif
#endif
}
diff --git a/include/linux/idr.h b/include/linux/idr.h
index ac6e946b6767..839da8f2f6f1 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -169,10 +169,7 @@ static inline bool idr_is_empty(const struct idr *idr)
* Each idr_preload() should be matched with an invocation of this
* function. See idr_preload() for details.
*/
-static inline void idr_preload_end(void)
-{
- preempt_enable();
-}
+void idr_preload_end(void);
/**
* idr_for_each_entry() - Iterate over an IDR's elements of a given type.
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 89fc59dab57d..029042f474c6 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -546,7 +546,7 @@ struct softirq_action
asmlinkage void do_softirq(void);
asmlinkage void __do_softirq(void);
-#ifdef __ARCH_HAS_DO_SOFTIRQ
+#if defined(__ARCH_HAS_DO_SOFTIRQ) && !defined(CONFIG_PREEMPT_RT)
void do_softirq_own_stack(void);
#else
static inline void do_softirq_own_stack(void)
@@ -561,6 +561,7 @@ extern void __raise_softirq_irqoff(unsigned int nr);
extern void raise_softirq_irqoff(unsigned int nr);
extern void raise_softirq(unsigned int nr);
+extern void softirq_check_pending_idle(void);
DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
@@ -625,7 +626,10 @@ static inline void tasklet_unlock(struct tasklet_struct *t)
static inline void tasklet_unlock_wait(struct tasklet_struct *t)
{
- while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
+ while (test_bit(TASKLET_STATE_RUN, &(t)->state)) {
+ local_bh_disable();
+ local_bh_enable();
+ }
}
#else
#define tasklet_trylock(t) 1
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index b11fcdfd0770..2964f4526f79 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -18,6 +18,8 @@
/* Doesn't want IPI, wait for tick: */
#define IRQ_WORK_LAZY BIT(2)
+/* Run hard IRQ context, even on RT */
+#define IRQ_WORK_HARD_IRQ BIT(3)
#define IRQ_WORK_CLAIMED (IRQ_WORK_PENDING | IRQ_WORK_BUSY)
@@ -52,4 +54,10 @@ static inline bool irq_work_needs_cpu(void) { return false; }
static inline void irq_work_run(void) { }
#endif
+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT)
+void irq_work_tick_soft(void);
+#else
+static inline void irq_work_tick_soft(void) { }
+#endif
+
#endif /* _LINUX_IRQ_WORK_H */
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index d6e2ab538ef2..1e7fc375c36d 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -72,6 +72,7 @@ struct irq_desc {
unsigned int irqs_unhandled;
atomic_t threads_handled;
int threads_handled_last;
+ u64 random_ip;
raw_spinlock_t lock;
struct cpumask *percpu_enabled;
const struct cpumask *percpu_affinity;
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..5304de2f7838 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -43,14 +43,6 @@ do { \
do { \
current->hardirq_context--; \
} while (0)
-# define lockdep_softirq_enter() \
-do { \
- current->softirq_context++; \
-} while (0)
-# define lockdep_softirq_exit() \
-do { \
- current->softirq_context--; \
-} while (0)
#else
# define trace_hardirqs_on() do { } while (0)
# define trace_hardirqs_off() do { } while (0)
@@ -64,6 +56,21 @@ do { \
# define lockdep_softirq_exit() do { } while (0)
#endif
+#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PREEMPT_RT)
+# define lockdep_softirq_enter() \
+do { \
+ current->softirq_context++; \
+} while (0)
+# define lockdep_softirq_exit() \
+do { \
+ current->softirq_context--; \
+} while (0)
+
+#else
+# define lockdep_softirq_enter() do { } while (0)
+# define lockdep_softirq_exit() do { } while (0)
+#endif
+
#if defined(CONFIG_IRQSOFF_TRACER) || \
defined(CONFIG_PREEMPT_TRACER)
extern void stop_critical_timings(void);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b0e97e5de8ca..e8b4a1547893 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -313,7 +313,6 @@ enum jbd_state_bits {
BH_Revoked, /* Has been revoked from the log */
BH_RevokeValid, /* Revoked flag is valid */
BH_JBDDirty, /* Is dirty but journaled */
- BH_State, /* Pins most journal_head state */
BH_JournalHead, /* Pins bh->b_private and jh->b_bh */
BH_Shadow, /* IO on shadow buffer is running */
BH_Verified, /* Metadata block has been verified ok */
@@ -342,26 +341,6 @@ static inline struct journal_head *bh2jh(struct buffer_head *bh)
return bh->b_private;
}
-static inline void jbd_lock_bh_state(struct buffer_head *bh)
-{
- bit_spin_lock(BH_State, &bh->b_state);
-}
-
-static inline int jbd_trylock_bh_state(struct buffer_head *bh)
-{
- return bit_spin_trylock(BH_State, &bh->b_state);
-}
-
-static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
-{
- return bit_spin_is_locked(BH_State, &bh->b_state);
-}
-
-static inline void jbd_unlock_bh_state(struct buffer_head *bh)
-{
- bit_spin_unlock(BH_State, &bh->b_state);
-}
-
static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
{
bit_spin_lock(BH_JournalHead, &bh->b_state);
@@ -556,9 +535,9 @@ struct transaction_chp_stats_s {
* ->jbd_lock_bh_journal_head() (This is "innermost")
*
* j_state_lock
- * ->jbd_lock_bh_state()
+ * ->b_state_lock
*
- * jbd_lock_bh_state()
+ * b_state_lock
* ->j_list_lock
*
* j_state_lock
@@ -1257,7 +1236,7 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3)
/* Filing buffers */
extern void jbd2_journal_unfile_buffer(journal_t *, struct journal_head *);
-extern void __jbd2_journal_refile_buffer(struct journal_head *);
+extern bool __jbd2_journal_refile_buffer(struct journal_head *);
extern void jbd2_journal_refile_buffer(journal_t *, struct journal_head *);
extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
extern void __journal_free_buffer(struct journal_head *bh);
diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index 9fb870524314..75bc56109031 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -11,6 +11,8 @@
#ifndef JOURNAL_HEAD_H_INCLUDED
#define JOURNAL_HEAD_H_INCLUDED
+#include
+
typedef unsigned int tid_t; /* Unique transaction ID */
typedef struct transaction_s transaction_t; /* Compound transaction type */
@@ -23,6 +25,11 @@ struct journal_head {
*/
struct buffer_head *b_bh;
+ /*
+ * Protect the buffer head state
+ */
+ spinlock_t b_state_lock;
+
/*
* Reference count - see description in journal.c
* [jbd_lock_bh_journal_head()]
@@ -30,7 +37,7 @@ struct journal_head {
int b_jcount;
/*
- * Journalling list for this buffer [jbd_lock_bh_state()]
+ * Journalling list for this buffer [b_state_lock]
* NOTE: We *cannot* combine this with b_modified into a bitfield
* as gcc would then (which the C standard allows but which is
* very unuseful) make 64-bit accesses to the bitfield and clobber
@@ -41,20 +48,20 @@ struct journal_head {
/*
* This flag signals the buffer has been modified by
* the currently running transaction
- * [jbd_lock_bh_state()]
+ * [b_state_lock]
*/
unsigned b_modified;
/*
* Copy of the buffer data frozen for writing to the log.
- * [jbd_lock_bh_state()]
+ * [b_state_lock]
*/
char *b_frozen_data;
/*
* Pointer to a saved copy of the buffer containing no uncommitted
* deallocation references, so that allocations can avoid overwriting
- * uncommitted deletes. [jbd_lock_bh_state()]
+ * uncommitted deletes. [b_state_lock]
*/
char *b_committed_data;
@@ -63,7 +70,7 @@ struct journal_head {
* metadata: either the running transaction or the committing
* transaction (if there is one). Only applies to buffers on a
* transaction's data or metadata journaling list.
- * [j_list_lock] [jbd_lock_bh_state()]
+ * [j_list_lock] [b_state_lock]
* Either of these locks is enough for reading, both are needed for
* changes.
*/
@@ -73,13 +80,13 @@ struct journal_head {
* Pointer to the running compound transaction which is currently
* modifying the buffer's metadata, if there was already a transaction
* committing it when the new transaction touched it.
- * [t_list_lock] [jbd_lock_bh_state()]
+ * [t_list_lock] [b_state_lock]
*/
transaction_t *b_next_transaction;
/*
* Doubly-linked list of buffers on a transaction's data, metadata or
- * forget queue. [t_list_lock] [jbd_lock_bh_state()]
+ * forget queue. [t_list_lock] [b_state_lock]
*/
struct journal_head *b_tnext, *b_tprev;
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d83d403dac2e..f5ec1ddbfe07 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -227,6 +227,10 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
*/
# define might_sleep() \
do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
+
+# define might_sleep_no_state_check() \
+ do { ___might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
+
/**
* cant_sleep - annotation for functions that cannot sleep
*
@@ -258,6 +262,7 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
static inline void __might_sleep(const char *file, int line,
int preempt_offset) { }
# define might_sleep() do { might_resched(); } while (0)
+# define might_sleep_no_state_check() do { might_resched(); } while (0)
# define cant_sleep() do { } while (0)
# define sched_annotate_sleep() do { } while (0)
# define non_block_start() do { } while (0)
@@ -266,6 +271,13 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
#define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
+#ifndef CONFIG_PREEMPT_RT
+# define cant_migrate() cant_sleep()
+#else
+ /* Placeholder for now */
+# define cant_migrate() do { } while (0)
+#endif
+
/**
* abs - return absolute value of an argument
* @x: the value. If it is unsigned type, it is converted to signed type first.
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 2e7a1e032c71..ede5066663b9 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -46,10 +46,8 @@ struct kmsg_dumper {
bool registered;
/* private state of the kmsg iterator */
- u32 cur_idx;
- u32 next_idx;
- u64 cur_seq;
- u64 next_seq;
+ u64 line_seq;
+ u64 buffer_end_seq;
};
#ifdef CONFIG_PRINTK
diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h
index ae1b541446c9..83e0ec90ed0d 100644
--- a/include/linux/list_bl.h
+++ b/include/linux/list_bl.h
@@ -3,6 +3,7 @@
#define _LINUX_LIST_BL_H
#include
+#include
#include
/*
@@ -33,13 +34,24 @@
struct hlist_bl_head {
struct hlist_bl_node *first;
+#ifdef CONFIG_PREEMPT_RT
+ raw_spinlock_t lock;
+#endif
};
struct hlist_bl_node {
struct hlist_bl_node *next, **pprev;
};
-#define INIT_HLIST_BL_HEAD(ptr) \
- ((ptr)->first = NULL)
+
+#ifdef CONFIG_PREEMPT_RT
+#define INIT_HLIST_BL_HEAD(h) \
+do { \
+ (h)->first = NULL; \
+ raw_spin_lock_init(&(h)->lock); \
+} while (0)
+#else
+#define INIT_HLIST_BL_HEAD(h) (h)->first = NULL
+#endif
static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h)
{
@@ -145,12 +157,26 @@ static inline void hlist_bl_del_init(struct hlist_bl_node *n)
static inline void hlist_bl_lock(struct hlist_bl_head *b)
{
+#ifndef CONFIG_PREEMPT_RT
bit_spin_lock(0, (unsigned long *)b);
+#else
+ raw_spin_lock(&b->lock);
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+ __set_bit(0, (unsigned long *)b);
+#endif
+#endif
}
static inline void hlist_bl_unlock(struct hlist_bl_head *b)
{
+#ifndef CONFIG_PREEMPT_RT
__bit_spin_unlock(0, (unsigned long *)b);
+#else
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+ __clear_bit(0, (unsigned long *)b);
+#endif
+ raw_spin_unlock(&b->lock);
+#endif
}
static inline bool hlist_bl_is_locked(struct hlist_bl_head *b)
diff --git a/include/linux/locallock.h b/include/linux/locallock.h
new file mode 100644
index 000000000000..9b6b4def52d4
--- /dev/null
+++ b/include/linux/locallock.h
@@ -0,0 +1,282 @@
+#ifndef _LINUX_LOCALLOCK_H
+#define _LINUX_LOCALLOCK_H
+
+#include
+#include
+#include
+
+#ifdef CONFIG_PREEMPT_RT
+
+#ifdef CONFIG_DEBUG_SPINLOCK
+# define LL_WARN(cond) WARN_ON(cond)
+#else
+# define LL_WARN(cond) do { } while (0)
+#endif
+
+/*
+ * per cpu lock based substitute for local_irq_*()
+ */
+struct local_irq_lock {
+ spinlock_t lock;
+ struct task_struct *owner;
+ int nestcnt;
+ unsigned long flags;
+};
+
+#define DEFINE_LOCAL_IRQ_LOCK(lvar) \
+ DEFINE_PER_CPU(struct local_irq_lock, lvar) = { \
+ .lock = __SPIN_LOCK_UNLOCKED((lvar).lock) }
+
+#define DECLARE_LOCAL_IRQ_LOCK(lvar) \
+ DECLARE_PER_CPU(struct local_irq_lock, lvar)
+
+#define local_irq_lock_init(lvar) \
+ do { \
+ int __cpu; \
+ for_each_possible_cpu(__cpu) \
+ spin_lock_init(&per_cpu(lvar, __cpu).lock); \
+ } while (0)
+
+static inline void __local_lock(struct local_irq_lock *lv)
+{
+ if (lv->owner != current) {
+ spin_lock(&lv->lock);
+ LL_WARN(lv->owner);
+ LL_WARN(lv->nestcnt);
+ lv->owner = current;
+ }
+ lv->nestcnt++;
+}
+
+#define local_lock(lvar) \
+ do { __local_lock(&get_local_var(lvar)); } while (0)
+
+#define local_lock_on(lvar, cpu) \
+ do { __local_lock(&per_cpu(lvar, cpu)); } while (0)
+
+static inline int __local_trylock(struct local_irq_lock *lv)
+{
+ if (lv->owner != current && spin_trylock(&lv->lock)) {
+ LL_WARN(lv->owner);
+ LL_WARN(lv->nestcnt);
+ lv->owner = current;
+ lv->nestcnt = 1;
+ return 1;
+ } else if (lv->owner == current) {
+ lv->nestcnt++;
+ return 1;
+ }
+ return 0;
+}
+
+#define local_trylock(lvar) \
+ ({ \
+ int __locked; \
+ __locked = __local_trylock(&get_local_var(lvar)); \
+ if (!__locked) \
+ put_local_var(lvar); \
+ __locked; \
+ })
+
+static inline void __local_unlock(struct local_irq_lock *lv)
+{
+ LL_WARN(lv->nestcnt == 0);
+ LL_WARN(lv->owner != current);
+ if (--lv->nestcnt)
+ return;
+
+ lv->owner = NULL;
+ spin_unlock(&lv->lock);
+}
+
+#define local_unlock(lvar) \
+ do { \
+ __local_unlock(this_cpu_ptr(&lvar)); \
+ put_local_var(lvar); \
+ } while (0)
+
+#define local_unlock_on(lvar, cpu) \
+ do { __local_unlock(&per_cpu(lvar, cpu)); } while (0)
+
+static inline void __local_lock_irq(struct local_irq_lock *lv)
+{
+ spin_lock_irqsave(&lv->lock, lv->flags);
+ LL_WARN(lv->owner);
+ LL_WARN(lv->nestcnt);
+ lv->owner = current;
+ lv->nestcnt = 1;
+}
+
+#define local_lock_irq(lvar) \
+ do { __local_lock_irq(&get_local_var(lvar)); } while (0)
+
+#define local_lock_irq_on(lvar, cpu) \
+ do { __local_lock_irq(&per_cpu(lvar, cpu)); } while (0)
+
+static inline void __local_unlock_irq(struct local_irq_lock *lv)
+{
+ LL_WARN(!lv->nestcnt);
+ LL_WARN(lv->owner != current);
+ lv->owner = NULL;
+ lv->nestcnt = 0;
+ spin_unlock_irq(&lv->lock);
+}
+
+#define local_unlock_irq(lvar) \
+ do { \
+ __local_unlock_irq(this_cpu_ptr(&lvar)); \
+ put_local_var(lvar); \
+ } while (0)
+
+#define local_unlock_irq_on(lvar, cpu) \
+ do { \
+ __local_unlock_irq(&per_cpu(lvar, cpu)); \
+ } while (0)
+
+static inline int __local_lock_irqsave(struct local_irq_lock *lv)
+{
+ if (lv->owner != current) {
+ __local_lock_irq(lv);
+ return 0;
+ } else {
+ lv->nestcnt++;
+ return 1;
+ }
+}
+
+#define local_lock_irqsave(lvar, _flags) \
+ do { \
+ if (__local_lock_irqsave(&get_local_var(lvar))) \
+ put_local_var(lvar); \
+ _flags = __this_cpu_read(lvar.flags); \
+ } while (0)
+
+#define local_lock_irqsave_on(lvar, _flags, cpu) \
+ do { \
+ __local_lock_irqsave(&per_cpu(lvar, cpu)); \
+ _flags = per_cpu(lvar, cpu).flags; \
+ } while (0)
+
+static inline int __local_unlock_irqrestore(struct local_irq_lock *lv,
+ unsigned long flags)
+{
+ LL_WARN(!lv->nestcnt);
+ LL_WARN(lv->owner != current);
+ if (--lv->nestcnt)
+ return 0;
+
+ lv->owner = NULL;
+ spin_unlock_irqrestore(&lv->lock, lv->flags);
+ return 1;
+}
+
+#define local_unlock_irqrestore(lvar, flags) \
+ do { \
+ if (__local_unlock_irqrestore(this_cpu_ptr(&lvar), flags)) \
+ put_local_var(lvar); \
+ } while (0)
+
+#define local_unlock_irqrestore_on(lvar, flags, cpu) \
+ do { \
+ __local_unlock_irqrestore(&per_cpu(lvar, cpu), flags); \
+ } while (0)
+
+#define local_spin_trylock_irq(lvar, lock) \
+ ({ \
+ int __locked; \
+ local_lock_irq(lvar); \
+ __locked = spin_trylock(lock); \
+ if (!__locked) \
+ local_unlock_irq(lvar); \
+ __locked; \
+ })
+
+#define local_spin_lock_irq(lvar, lock) \
+ do { \
+ local_lock_irq(lvar); \
+ spin_lock(lock); \
+ } while (0)
+
+#define local_spin_unlock_irq(lvar, lock) \
+ do { \
+ spin_unlock(lock); \
+ local_unlock_irq(lvar); \
+ } while (0)
+
+#define local_spin_lock_irqsave(lvar, lock, flags) \
+ do { \
+ local_lock_irqsave(lvar, flags); \
+ spin_lock(lock); \
+ } while (0)
+
+#define local_spin_unlock_irqrestore(lvar, lock, flags) \
+ do { \
+ spin_unlock(lock); \
+ local_unlock_irqrestore(lvar, flags); \
+ } while (0)
+
+#define get_locked_var(lvar, var) \
+ (*({ \
+ local_lock(lvar); \
+ this_cpu_ptr(&var); \
+ }))
+
+#define put_locked_var(lvar, var) local_unlock(lvar);
+
+#define get_locked_ptr(lvar, var) \
+ ({ \
+ local_lock(lvar); \
+ this_cpu_ptr(var); \
+ })
+
+#define put_locked_ptr(lvar, var) local_unlock(lvar);
+
+#define local_lock_cpu(lvar) \
+ ({ \
+ local_lock(lvar); \
+ smp_processor_id(); \
+ })
+
+#define local_unlock_cpu(lvar) local_unlock(lvar)
+
+#else /* PREEMPT_RT */
+
+#define DEFINE_LOCAL_IRQ_LOCK(lvar) __typeof__(const int) lvar
+#define DECLARE_LOCAL_IRQ_LOCK(lvar) extern __typeof__(const int) lvar
+
+static inline void local_irq_lock_init(int lvar) { }
+
+#define local_trylock(lvar) \
+ ({ \
+ preempt_disable(); \
+ 1; \
+ })
+
+#define local_lock(lvar) preempt_disable()
+#define local_unlock(lvar) preempt_enable()
+#define local_lock_irq(lvar) local_irq_disable()
+#define local_lock_irq_on(lvar, cpu) local_irq_disable()
+#define local_unlock_irq(lvar) local_irq_enable()
+#define local_unlock_irq_on(lvar, cpu) local_irq_enable()
+#define local_lock_irqsave(lvar, flags) local_irq_save(flags)
+#define local_unlock_irqrestore(lvar, flags) local_irq_restore(flags)
+
+#define local_spin_trylock_irq(lvar, lock) spin_trylock_irq(lock)
+#define local_spin_lock_irq(lvar, lock) spin_lock_irq(lock)
+#define local_spin_unlock_irq(lvar, lock) spin_unlock_irq(lock)
+#define local_spin_lock_irqsave(lvar, lock, flags) \
+ spin_lock_irqsave(lock, flags)
+#define local_spin_unlock_irqrestore(lvar, lock, flags) \
+ spin_unlock_irqrestore(lock, flags)
+
+#define get_locked_var(lvar, var) get_cpu_var(var)
+#define put_locked_var(lvar, var) put_cpu_var(var)
+#define get_locked_ptr(lvar, var) get_cpu_ptr(var)
+#define put_locked_ptr(lvar, var) put_cpu_ptr(var)
+
+#define local_lock_cpu(lvar) get_cpu()
+#define local_unlock_cpu(lvar) put_cpu()
+
+#endif
+
+#endif
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 270aa8fd2800..497440fe047c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -12,6 +12,7 @@
#include
#include
#include
+#include
#include
#include
@@ -520,6 +521,9 @@ struct mm_struct {
bool tlb_flush_batched;
#endif
struct uprobes_state uprobes_state;
+#ifdef CONFIG_PREEMPT_RT
+ struct rcu_head delayed_drop;
+#endif
#ifdef CONFIG_HUGETLB_PAGE
atomic_long_t hugetlb_usage;
#endif
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index aca8f36dfac9..2b72473ded55 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -22,6 +22,17 @@
struct ww_acquire_ctx;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
+ , .dep_map = { .name = #lockname }
+#else
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
+#endif
+
+#ifdef CONFIG_PREEMPT_RT
+# include
+#else
+
/*
* Simple, straightforward mutexes with strict semantics:
*
@@ -108,13 +119,6 @@ do { \
__mutex_init((mutex), #mutex, &__key); \
} while (0)
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
- , .dep_map = { .name = #lockname }
-#else
-# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
-#endif
-
#define __MUTEX_INITIALIZER(lockname) \
{ .owner = ATOMIC_LONG_INIT(0) \
, .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
@@ -210,4 +214,6 @@ enum mutex_trylock_recursive_enum {
extern /* __deprecated */ __must_check enum mutex_trylock_recursive_enum
mutex_trylock_recursive(struct mutex *lock);
+#endif /* !PREEMPT_RT */
+
#endif /* __LINUX_MUTEX_H */
diff --git a/include/linux/mutex_rt.h b/include/linux/mutex_rt.h
new file mode 100644
index 000000000000..3fcb5edb1d2b
--- /dev/null
+++ b/include/linux/mutex_rt.h
@@ -0,0 +1,130 @@
+#ifndef __LINUX_MUTEX_RT_H
+#define __LINUX_MUTEX_RT_H
+
+#ifndef __LINUX_MUTEX_H
+#error "Please include mutex.h"
+#endif
+
+#include
+
+/* FIXME: Just for __lockfunc */
+#include
+
+struct mutex {
+ struct rt_mutex lock;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+};
+
+#define __MUTEX_INITIALIZER(mutexname) \
+ { \
+ .lock = __RT_MUTEX_INITIALIZER(mutexname.lock) \
+ __DEP_MAP_MUTEX_INITIALIZER(mutexname) \
+ }
+
+#define DEFINE_MUTEX(mutexname) \
+ struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
+
+extern void __mutex_do_init(struct mutex *lock, const char *name, struct lock_class_key *key);
+extern void __lockfunc _mutex_lock(struct mutex *lock);
+extern void __lockfunc _mutex_lock_io(struct mutex *lock);
+extern void __lockfunc _mutex_lock_io_nested(struct mutex *lock, int subclass);
+extern int __lockfunc _mutex_lock_interruptible(struct mutex *lock);
+extern int __lockfunc _mutex_lock_killable(struct mutex *lock);
+extern void __lockfunc _mutex_lock_nested(struct mutex *lock, int subclass);
+extern void __lockfunc _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest_lock);
+extern int __lockfunc _mutex_lock_interruptible_nested(struct mutex *lock, int subclass);
+extern int __lockfunc _mutex_lock_killable_nested(struct mutex *lock, int subclass);
+extern int __lockfunc _mutex_trylock(struct mutex *lock);
+extern void __lockfunc _mutex_unlock(struct mutex *lock);
+
+#define mutex_is_locked(l) rt_mutex_is_locked(&(l)->lock)
+#define mutex_lock(l) _mutex_lock(l)
+#define mutex_lock_interruptible(l) _mutex_lock_interruptible(l)
+#define mutex_lock_killable(l) _mutex_lock_killable(l)
+#define mutex_trylock(l) _mutex_trylock(l)
+#define mutex_unlock(l) _mutex_unlock(l)
+#define mutex_lock_io(l) _mutex_lock_io(l);
+
+#define __mutex_owner(l) ((l)->lock.owner)
+
+#ifdef CONFIG_DEBUG_MUTEXES
+#define mutex_destroy(l) rt_mutex_destroy(&(l)->lock)
+#else
+static inline void mutex_destroy(struct mutex *lock) {}
+#endif
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define mutex_lock_nested(l, s) _mutex_lock_nested(l, s)
+# define mutex_lock_interruptible_nested(l, s) \
+ _mutex_lock_interruptible_nested(l, s)
+# define mutex_lock_killable_nested(l, s) \
+ _mutex_lock_killable_nested(l, s)
+# define mutex_lock_io_nested(l, s) _mutex_lock_io_nested(l, s)
+
+# define mutex_lock_nest_lock(lock, nest_lock) \
+do { \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \
+ _mutex_lock_nest_lock(lock, &(nest_lock)->dep_map); \
+} while (0)
+
+#else
+# define mutex_lock_nested(l, s) _mutex_lock(l)
+# define mutex_lock_interruptible_nested(l, s) \
+ _mutex_lock_interruptible(l)
+# define mutex_lock_killable_nested(l, s) \
+ _mutex_lock_killable(l)
+# define mutex_lock_nest_lock(lock, nest_lock) mutex_lock(lock)
+# define mutex_lock_io_nested(l, s) _mutex_lock_io(l)
+#endif
+
+# define mutex_init(mutex) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ rt_mutex_init(&(mutex)->lock); \
+ __mutex_do_init((mutex), #mutex, &__key); \
+} while (0)
+
+# define __mutex_init(mutex, name, key) \
+do { \
+ rt_mutex_init(&(mutex)->lock); \
+ __mutex_do_init((mutex), name, key); \
+} while (0)
+
+/**
+ * These values are chosen such that FAIL and SUCCESS match the
+ * values of the regular mutex_trylock().
+ */
+enum mutex_trylock_recursive_enum {
+ MUTEX_TRYLOCK_FAILED = 0,
+ MUTEX_TRYLOCK_SUCCESS = 1,
+ MUTEX_TRYLOCK_RECURSIVE,
+};
+/**
+ * mutex_trylock_recursive - trylock variant that allows recursive locking
+ * @lock: mutex to be locked
+ *
+ * This function should not be used, _ever_. It is purely for hysterical GEM
+ * raisins, and once those are gone this will be removed.
+ *
+ * Returns:
+ * MUTEX_TRYLOCK_FAILED - trylock failed,
+ * MUTEX_TRYLOCK_SUCCESS - lock acquired,
+ * MUTEX_TRYLOCK_RECURSIVE - we already owned the lock.
+ */
+int __rt_mutex_owner_current(struct rt_mutex *lock);
+
+static inline /* __deprecated */ __must_check enum mutex_trylock_recursive_enum
+mutex_trylock_recursive(struct mutex *lock)
+{
+ if (unlikely(__rt_mutex_owner_current(&lock->lock)))
+ return MUTEX_TRYLOCK_RECURSIVE;
+
+ return mutex_trylock(lock);
+}
+
+extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
+
+#endif
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ec3081ab04c0..ccea330258b5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3018,6 +3018,7 @@ struct softnet_data {
unsigned int dropped;
struct sk_buff_head input_pkt_queue;
struct napi_struct backlog;
+ struct sk_buff_head tofree_queue;
};
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index ad09c0cc5464..edb356ef09ae 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -165,7 +165,11 @@ struct nfs_inode {
/* Readers: in-flight sillydelete RPC calls */
/* Writers: rmdir */
+#ifdef CONFIG_PREEMPT_RT
+ struct semaphore rmdir_sem;
+#else
struct rw_semaphore rmdir_sem;
+#endif
struct mutex commit_mutex;
#if IS_ENABLED(CONFIG_NFS_V4)
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 99d327d0bccb..32ada23fd9e0 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1596,7 +1596,7 @@ struct nfs_unlinkdata {
struct nfs_removeargs args;
struct nfs_removeres res;
struct dentry *dentry;
- wait_queue_head_t wq;
+ struct swait_queue_head wq;
const struct cred *cred;
struct nfs_fattr dir_attr;
long timeout;
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 7aef0abc194a..390031e816dc 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -186,14 +186,14 @@ static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
{
unsigned long __percpu *percpu_count;
- rcu_read_lock_sched();
+ rcu_read_lock();
if (__ref_is_percpu(ref, &percpu_count))
this_cpu_add(*percpu_count, nr);
else
atomic_long_add(nr, &ref->count);
- rcu_read_unlock_sched();
+ rcu_read_unlock();
}
/**
@@ -223,7 +223,7 @@ static inline bool percpu_ref_tryget(struct percpu_ref *ref)
unsigned long __percpu *percpu_count;
bool ret;
- rcu_read_lock_sched();
+ rcu_read_lock();
if (__ref_is_percpu(ref, &percpu_count)) {
this_cpu_inc(*percpu_count);
@@ -232,7 +232,7 @@ static inline bool percpu_ref_tryget(struct percpu_ref *ref)
ret = atomic_long_inc_not_zero(&ref->count);
}
- rcu_read_unlock_sched();
+ rcu_read_unlock();
return ret;
}
@@ -257,7 +257,7 @@ static inline bool percpu_ref_tryget_live(struct percpu_ref *ref)
unsigned long __percpu *percpu_count;
bool ret = false;
- rcu_read_lock_sched();
+ rcu_read_lock();
if (__ref_is_percpu(ref, &percpu_count)) {
this_cpu_inc(*percpu_count);
@@ -266,7 +266,7 @@ static inline bool percpu_ref_tryget_live(struct percpu_ref *ref)
ret = atomic_long_inc_not_zero(&ref->count);
}
- rcu_read_unlock_sched();
+ rcu_read_unlock();
return ret;
}
@@ -285,14 +285,14 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
{
unsigned long __percpu *percpu_count;
- rcu_read_lock_sched();
+ rcu_read_lock();
if (__ref_is_percpu(ref, &percpu_count))
this_cpu_sub(*percpu_count, nr);
else if (unlikely(atomic_long_sub_and_test(nr, &ref->count)))
ref->release(ref);
- rcu_read_unlock_sched();
+ rcu_read_unlock();
}
/**
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 3998cdf9cd14..6e8d3871450a 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -3,41 +3,52 @@
#define _LINUX_PERCPU_RWSEM_H
#include
-#include
#include
#include
+#include
#include
#include
struct percpu_rw_semaphore {
struct rcu_sync rss;
unsigned int __percpu *read_count;
- struct rw_semaphore rw_sem; /* slowpath */
- struct rcuwait writer; /* blocked writer */
- int readers_block;
+ struct rcuwait writer;
+ wait_queue_head_t waiters;
+ atomic_t block;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
};
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname },
+#else
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)
+#endif
+
#define __DEFINE_PERCPU_RWSEM(name, is_static) \
static DEFINE_PER_CPU(unsigned int, __percpu_rwsem_rc_##name); \
is_static struct percpu_rw_semaphore name = { \
.rss = __RCU_SYNC_INITIALIZER(name.rss), \
.read_count = &__percpu_rwsem_rc_##name, \
- .rw_sem = __RWSEM_INITIALIZER(name.rw_sem), \
.writer = __RCUWAIT_INITIALIZER(name.writer), \
+ .waiters = __WAIT_QUEUE_HEAD_INITIALIZER(name.waiters), \
+ .block = ATOMIC_INIT(0), \
+ __PERCPU_RWSEM_DEP_MAP_INIT(name) \
}
+
#define DEFINE_PERCPU_RWSEM(name) \
__DEFINE_PERCPU_RWSEM(name, /* not static */)
#define DEFINE_STATIC_PERCPU_RWSEM(name) \
__DEFINE_PERCPU_RWSEM(name, static)
-extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
-extern void __percpu_up_read(struct percpu_rw_semaphore *);
+extern bool __percpu_down_read(struct percpu_rw_semaphore *, bool);
static inline void percpu_down_read(struct percpu_rw_semaphore *sem)
{
might_sleep();
- rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_);
+ rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
preempt_disable();
/*
@@ -48,8 +59,9 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem)
* and that once the synchronize_rcu() is done, the writer will see
* anything we did within this RCU-sched read-size critical section.
*/
- __this_cpu_inc(*sem->read_count);
- if (unlikely(!rcu_sync_is_idle(&sem->rss)))
+ if (likely(rcu_sync_is_idle(&sem->rss)))
+ __this_cpu_inc(*sem->read_count);
+ else
__percpu_down_read(sem, false); /* Unconditional memory barrier */
/*
* The preempt_enable() prevents the compiler from
@@ -58,16 +70,17 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem)
preempt_enable();
}
-static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
+static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
{
- int ret = 1;
+ bool ret = true;
preempt_disable();
/*
* Same as in percpu_down_read().
*/
- __this_cpu_inc(*sem->read_count);
- if (unlikely(!rcu_sync_is_idle(&sem->rss)))
+ if (likely(rcu_sync_is_idle(&sem->rss)))
+ __this_cpu_inc(*sem->read_count);
+ else
ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */
preempt_enable();
/*
@@ -76,24 +89,36 @@ static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
*/
if (ret)
- rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1, _RET_IP_);
+ rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
return ret;
}
static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
{
+ rwsem_release(&sem->dep_map, 1, _RET_IP_);
+
preempt_disable();
/*
* Same as in percpu_down_read().
*/
- if (likely(rcu_sync_is_idle(&sem->rss)))
+ if (likely(rcu_sync_is_idle(&sem->rss))) {
__this_cpu_dec(*sem->read_count);
- else
- __percpu_up_read(sem); /* Unconditional memory barrier */
+ } else {
+ /*
+ * slowpath; reader will only ever wake a single blocked
+ * writer.
+ */
+ smp_mb(); /* B matches C */
+ /*
+ * In other words, if they see our decrement (presumably to
+ * aggregate zero, as that is the only time it matters) they
+ * will also see our critical section.
+ */
+ __this_cpu_dec(*sem->read_count);
+ rcuwait_wake_up(&sem->writer);
+ }
preempt_enable();
-
- rwsem_release(&sem->rw_sem.dep_map, 1, _RET_IP_);
}
extern void percpu_down_write(struct percpu_rw_semaphore *);
@@ -110,29 +135,19 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
__percpu_init_rwsem(sem, #sem, &rwsem_key); \
})
-#define percpu_rwsem_is_held(sem) lockdep_is_held(&(sem)->rw_sem)
-
-#define percpu_rwsem_assert_held(sem) \
- lockdep_assert_held(&(sem)->rw_sem)
+#define percpu_rwsem_is_held(sem) lockdep_is_held(sem)
+#define percpu_rwsem_assert_held(sem) lockdep_assert_held(sem)
static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
bool read, unsigned long ip)
{
- lock_release(&sem->rw_sem.dep_map, 1, ip);
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
- if (!read)
- atomic_long_set(&sem->rw_sem.owner, RWSEM_OWNER_UNKNOWN);
-#endif
+ lock_release(&sem->dep_map, 1, ip);
}
static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
bool read, unsigned long ip)
{
- lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
- if (!read)
- atomic_long_set(&sem->rw_sem.owner, (long)current);
-#endif
+ lock_acquire(&sem->dep_map, 0, 1, read, 1, NULL, ip);
}
#endif
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 5e76af742c80..3d0d9873de61 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -19,6 +19,35 @@
#define PERCPU_MODULE_RESERVE 0
#endif
+#ifdef CONFIG_PREEMPT_RT
+
+#define get_local_var(var) (*({ \
+ migrate_disable(); \
+ this_cpu_ptr(&var); }))
+
+#define put_local_var(var) do { \
+ (void)&(var); \
+ migrate_enable(); \
+} while (0)
+
+# define get_local_ptr(var) ({ \
+ migrate_disable(); \
+ this_cpu_ptr(var); })
+
+# define put_local_ptr(var) do { \
+ (void)(var); \
+ migrate_enable(); \
+} while (0)
+
+#else
+
+#define get_local_var(var) get_cpu_var(var)
+#define put_local_var(var) put_cpu_var(var)
+#define get_local_ptr(var) get_cpu_ptr(var)
+#define put_local_ptr(var) put_cpu_ptr(var)
+
+#endif
+
/* minimum unit size, also is the maximum supported allocation size */
#define PCPU_MIN_UNIT_SIZE PFN_ALIGN(32 << 10)
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 9645b1194c98..916c1d0e42c2 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -3,6 +3,7 @@
#define _LINUX_PID_H
#include
+#include
#include
#include
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 3d10c84a97a9..bad09364b20a 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -72,6 +72,7 @@ struct cpu_timer {
struct task_struct *task;
struct list_head elist;
int firing;
+ int firing_cpu;
};
static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
@@ -123,6 +124,9 @@ struct posix_cputimers {
struct posix_cputimer_base bases[CPUCLOCK_MAX];
unsigned int timers_active;
unsigned int expiry_active;
+#ifdef CONFIG_PREEMPT_RT
+ struct task_struct *posix_timer_list;
+#endif
};
static inline void posix_cputimers_init(struct posix_cputimers *pct)
@@ -152,9 +156,16 @@ static inline void posix_cputimers_rt_watchdog(struct posix_cputimers *pct,
INIT_CPU_TIMERBASE(b[2]), \
}
+#ifdef CONFIG_PREEMPT_RT
+# define INIT_TIMER_LIST .posix_timer_list = NULL,
+#else
+# define INIT_TIMER_LIST
+#endif
+
#define INIT_CPU_TIMERS(s) \
.posix_cputimers = { \
.bases = INIT_CPU_TIMERBASES(s.posix_cputimers.bases), \
+ INIT_TIMER_LIST \
},
#else
struct posix_cputimers { };
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index bbb68dba37cc..adb085fe31e4 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -78,10 +78,8 @@
#include
#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
-#define softirq_count() (preempt_count() & SOFTIRQ_MASK)
#define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \
| NMI_MASK))
-
/*
* Are we doing bottom half or hardware interrupt processing?
*
@@ -96,12 +94,23 @@
* should not be used in new code.
*/
#define in_irq() (hardirq_count())
-#define in_softirq() (softirq_count())
#define in_interrupt() (irq_count())
-#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
#define in_nmi() (preempt_count() & NMI_MASK)
#define in_task() (!(preempt_count() & \
(NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+#ifdef CONFIG_PREEMPT_RT
+
+#define softirq_count() ((long)current->softirq_count)
+#define in_softirq() (softirq_count())
+#define in_serving_softirq() (current->softirq_count & SOFTIRQ_OFFSET)
+
+#else
+
+#define softirq_count() (preempt_count() & SOFTIRQ_MASK)
+#define in_softirq() (softirq_count())
+#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
+
+#endif
/*
* The preempt_count offset after preempt_disable();
@@ -115,7 +124,11 @@
/*
* The preempt_count offset after spin_lock()
*/
+#if !defined(CONFIG_PREEMPT_RT)
#define PREEMPT_LOCK_OFFSET PREEMPT_DISABLE_OFFSET
+#else
+#define PREEMPT_LOCK_OFFSET 0
+#endif
/*
* The preempt_count offset needed for things like:
@@ -164,6 +177,20 @@ extern void preempt_count_sub(int val);
#define preempt_count_inc() preempt_count_add(1)
#define preempt_count_dec() preempt_count_sub(1)
+#ifdef CONFIG_PREEMPT_LAZY
+#define add_preempt_lazy_count(val) do { preempt_lazy_count() += (val); } while (0)
+#define sub_preempt_lazy_count(val) do { preempt_lazy_count() -= (val); } while (0)
+#define inc_preempt_lazy_count() add_preempt_lazy_count(1)
+#define dec_preempt_lazy_count() sub_preempt_lazy_count(1)
+#define preempt_lazy_count() (current_thread_info()->preempt_lazy_count)
+#else
+#define add_preempt_lazy_count(val) do { } while (0)
+#define sub_preempt_lazy_count(val) do { } while (0)
+#define inc_preempt_lazy_count() do { } while (0)
+#define dec_preempt_lazy_count() do { } while (0)
+#define preempt_lazy_count() (0)
+#endif
+
#ifdef CONFIG_PREEMPT_COUNT
#define preempt_disable() \
@@ -172,16 +199,53 @@ do { \
barrier(); \
} while (0)
+#define preempt_lazy_disable() \
+do { \
+ inc_preempt_lazy_count(); \
+ barrier(); \
+} while (0)
+
#define sched_preempt_enable_no_resched() \
do { \
barrier(); \
preempt_count_dec(); \
} while (0)
-#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
+#ifdef CONFIG_PREEMPT_RT
+# define preempt_enable_no_resched() sched_preempt_enable_no_resched()
+# define preempt_check_resched_rt() preempt_check_resched()
+#else
+# define preempt_enable_no_resched() preempt_enable()
+# define preempt_check_resched_rt() barrier();
+#endif
#define preemptible() (preempt_count() == 0 && !irqs_disabled())
+#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+
+extern void migrate_disable(void);
+extern void migrate_enable(void);
+
+int __migrate_disabled(struct task_struct *p);
+
+#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+
+extern void migrate_disable(void);
+extern void migrate_enable(void);
+static inline int __migrate_disabled(struct task_struct *p)
+{
+ return 0;
+}
+
+#else
+#define migrate_disable() preempt_disable()
+#define migrate_enable() preempt_enable()
+static inline int __migrate_disabled(struct task_struct *p)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_PREEMPTION
#define preempt_enable() \
do { \
@@ -203,6 +267,13 @@ do { \
__preempt_schedule(); \
} while (0)
+#define preempt_lazy_enable() \
+do { \
+ dec_preempt_lazy_count(); \
+ barrier(); \
+ preempt_check_resched(); \
+} while (0)
+
#else /* !CONFIG_PREEMPTION */
#define preempt_enable() \
do { \
@@ -210,6 +281,12 @@ do { \
preempt_count_dec(); \
} while (0)
+#define preempt_lazy_enable() \
+do { \
+ dec_preempt_lazy_count(); \
+ barrier(); \
+} while (0)
+
#define preempt_enable_notrace() \
do { \
barrier(); \
@@ -248,8 +325,16 @@ do { \
#define preempt_disable_notrace() barrier()
#define preempt_enable_no_resched_notrace() barrier()
#define preempt_enable_notrace() barrier()
+#define preempt_check_resched_rt() barrier()
#define preemptible() 0
+#define migrate_disable() barrier()
+#define migrate_enable() barrier()
+
+static inline int __migrate_disabled(struct task_struct *p)
+{
+ return 0;
+}
#endif /* CONFIG_PREEMPT_COUNT */
#ifdef MODULE
@@ -268,10 +353,22 @@ do { \
} while (0)
#define preempt_fold_need_resched() \
do { \
- if (tif_need_resched()) \
+ if (tif_need_resched_now()) \
set_preempt_need_resched(); \
} while (0)
+#ifdef CONFIG_PREEMPT_RT
+# define preempt_disable_rt() preempt_disable()
+# define preempt_enable_rt() preempt_enable()
+# define preempt_disable_nort() barrier()
+# define preempt_enable_nort() barrier()
+#else
+# define preempt_disable_rt() barrier()
+# define preempt_enable_rt() barrier()
+# define preempt_disable_nort() preempt_disable()
+# define preempt_enable_nort() preempt_enable()
+#endif
+
#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 3b5cb66d8bc1..1fe8068100c2 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -58,6 +58,7 @@ static inline const char *printk_skip_headers(const char *buffer)
*/
#define CONSOLE_LOGLEVEL_DEFAULT CONFIG_CONSOLE_LOGLEVEL_DEFAULT
#define CONSOLE_LOGLEVEL_QUIET CONFIG_CONSOLE_LOGLEVEL_QUIET
+#define CONSOLE_LOGLEVEL_EMERGENCY CONFIG_CONSOLE_LOGLEVEL_EMERGENCY
extern int console_printk[];
@@ -65,6 +66,7 @@ extern int console_printk[];
#define default_message_loglevel (console_printk[1])
#define minimum_console_loglevel (console_printk[2])
#define default_console_loglevel (console_printk[3])
+#define emergency_console_loglevel (console_printk[4])
static inline void console_silent(void)
{
@@ -146,18 +148,6 @@ static inline __printf(1, 2) __cold
void early_printk(const char *s, ...) { }
#endif
-#ifdef CONFIG_PRINTK_NMI
-extern void printk_nmi_enter(void);
-extern void printk_nmi_exit(void);
-extern void printk_nmi_direct_enter(void);
-extern void printk_nmi_direct_exit(void);
-#else
-static inline void printk_nmi_enter(void) { }
-static inline void printk_nmi_exit(void) { }
-static inline void printk_nmi_direct_enter(void) { }
-static inline void printk_nmi_direct_exit(void) { }
-#endif /* PRINTK_NMI */
-
#ifdef CONFIG_PRINTK
asmlinkage __printf(5, 0)
int vprintk_emit(int facility, int level,
@@ -202,8 +192,7 @@ __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
void dump_stack_print_info(const char *log_lvl);
void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack(void) __cold;
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
+struct wait_queue_head *printk_wait_queue(void);
#else
static inline __printf(1, 0)
int vprintk(const char *s, va_list args)
@@ -267,14 +256,6 @@ static inline void show_regs_print_info(const char *log_lvl)
static inline void dump_stack(void)
{
}
-
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
-{
-}
#endif
extern int kptr_restrict;
diff --git a/include/linux/printk_ringbuffer.h b/include/linux/printk_ringbuffer.h
new file mode 100644
index 000000000000..ec3d7ceec378
--- /dev/null
+++ b/include/linux/printk_ringbuffer.h
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PRINTK_RINGBUFFER_H
+#define _LINUX_PRINTK_RINGBUFFER_H
+
+#include
+#include
+#include
+#include
+
+struct prb_cpulock {
+ atomic_t owner;
+ unsigned long __percpu *irqflags;
+};
+
+struct printk_ringbuffer {
+ void *buffer;
+ unsigned int size_bits;
+
+ u64 seq;
+ atomic_long_t lost;
+
+ atomic_long_t tail;
+ atomic_long_t head;
+ atomic_long_t reserve;
+
+ struct prb_cpulock *cpulock;
+ atomic_t ctx;
+
+ struct wait_queue_head *wq;
+ atomic_long_t wq_counter;
+ struct irq_work *wq_work;
+};
+
+struct prb_entry {
+ unsigned int size;
+ u64 seq;
+ char data[0];
+};
+
+struct prb_handle {
+ struct printk_ringbuffer *rb;
+ unsigned int cpu;
+ struct prb_entry *entry;
+};
+
+#define DECLARE_STATIC_PRINTKRB_CPULOCK(name) \
+static DEFINE_PER_CPU(unsigned long, _##name##_percpu_irqflags); \
+static struct prb_cpulock name = { \
+ .owner = ATOMIC_INIT(-1), \
+ .irqflags = &_##name##_percpu_irqflags, \
+}
+
+#define PRB_INIT ((unsigned long)-1)
+
+#define DECLARE_STATIC_PRINTKRB_ITER(name, rbaddr) \
+static struct prb_iterator name = { \
+ .rb = rbaddr, \
+ .lpos = PRB_INIT, \
+}
+
+struct prb_iterator {
+ struct printk_ringbuffer *rb;
+ unsigned long lpos;
+};
+
+#define DECLARE_STATIC_PRINTKRB(name, szbits, cpulockptr) \
+static char _##name##_buffer[1 << (szbits)] \
+ __aligned(__alignof__(long)); \
+static DECLARE_WAIT_QUEUE_HEAD(_##name##_wait); \
+static void _##name##_wake_work_func(struct irq_work *irq_work) \
+{ \
+ wake_up_interruptible_all(&_##name##_wait); \
+} \
+static struct irq_work _##name##_wake_work = { \
+ .func = _##name##_wake_work_func, \
+ .flags = IRQ_WORK_LAZY, \
+}; \
+static struct printk_ringbuffer name = { \
+ .buffer = &_##name##_buffer[0], \
+ .size_bits = szbits, \
+ .seq = 0, \
+ .lost = ATOMIC_LONG_INIT(0), \
+ .tail = ATOMIC_LONG_INIT(-111 * sizeof(long)), \
+ .head = ATOMIC_LONG_INIT(-111 * sizeof(long)), \
+ .reserve = ATOMIC_LONG_INIT(-111 * sizeof(long)), \
+ .cpulock = cpulockptr, \
+ .ctx = ATOMIC_INIT(0), \
+ .wq = &_##name##_wait, \
+ .wq_counter = ATOMIC_LONG_INIT(0), \
+ .wq_work = &_##name##_wake_work, \
+}
+
+/* writer interface */
+char *prb_reserve(struct prb_handle *h, struct printk_ringbuffer *rb,
+ unsigned int size);
+void prb_commit(struct prb_handle *h);
+
+/* reader interface */
+void prb_iter_init(struct prb_iterator *iter, struct printk_ringbuffer *rb,
+ u64 *seq);
+void prb_iter_copy(struct prb_iterator *dest, struct prb_iterator *src);
+int prb_iter_next(struct prb_iterator *iter, char *buf, int size, u64 *seq);
+int prb_iter_wait_next(struct prb_iterator *iter, char *buf, int size,
+ u64 *seq);
+int prb_iter_seek(struct prb_iterator *iter, u64 seq);
+int prb_iter_data(struct prb_iterator *iter, char *buf, int size, u64 *seq);
+
+/* utility functions */
+int prb_buffer_size(struct printk_ringbuffer *rb);
+void prb_inc_lost(struct printk_ringbuffer *rb);
+void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store);
+void prb_unlock(struct prb_cpulock *cpu_lock, unsigned int cpu_store);
+
+#endif /*_LINUX_PRINTK_RINGBUFFER_H */
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 63e62372443a..040b1fd0ab94 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -226,6 +226,7 @@ unsigned int radix_tree_gang_lookup(const struct radix_tree_root *,
unsigned int max_items);
int radix_tree_preload(gfp_t gfp_mask);
int radix_tree_maybe_preload(gfp_t gfp_mask);
+void radix_tree_preload_end(void);
void radix_tree_init(void);
void *radix_tree_tag_set(struct radix_tree_root *,
unsigned long index, unsigned int tag);
@@ -243,11 +244,6 @@ unsigned int radix_tree_gang_lookup_tag_slot(const struct radix_tree_root *,
unsigned int max_items, unsigned int tag);
int radix_tree_tagged(const struct radix_tree_root *, unsigned int tag);
-static inline void radix_tree_preload_end(void)
-{
- preempt_enable();
-}
-
void __rcu **idr_get_free(struct radix_tree_root *root,
struct radix_tree_iter *iter, gfp_t gfp,
unsigned long max);
diff --git a/include/linux/random.h b/include/linux/random.h
index 5b3ec7d2791f..15b9fed04dba 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -33,7 +33,7 @@ static inline void add_latent_entropy(void) {}
extern void add_input_randomness(unsigned int type, unsigned int code,
unsigned int value) __latent_entropy;
-extern void add_interrupt_randomness(int irq, int irq_flags) __latent_entropy;
+extern void add_interrupt_randomness(int irq, int irq_flags, __u64 ip) __latent_entropy;
extern void get_random_bytes(void *buf, int nbytes);
extern int wait_for_random_bytes(void);
diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
index 8ddf79e9207a..51e90cea9c45 100644
--- a/include/linux/ratelimit.h
+++ b/include/linux/ratelimit.h
@@ -59,7 +59,7 @@ static inline void ratelimit_state_exit(struct ratelimit_state *rs)
return;
if (rs->missed) {
- pr_warn("%s: %d output lines suppressed due to ratelimiting\n",
+ pr_info("%s: %d output lines suppressed due to ratelimiting\n",
current->comm, rs->missed);
rs->missed = 0;
}
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 1fd61a9af45c..c667ec871abb 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -19,7 +19,7 @@
#include
#include
-#include
+#include
struct rb_node {
unsigned long __rb_parent_color;
diff --git a/include/linux/rcu_assign_pointer.h b/include/linux/rcu_assign_pointer.h
new file mode 100644
index 000000000000..78c05f606a73
--- /dev/null
+++ b/include/linux/rcu_assign_pointer.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef __LINUX_RCU_ASSIGN_POINTER_H__
+#define __LINUX_RCU_ASSIGN_POINTER_H__
+#include
+#include
+
+#ifdef __CHECKER__
+#define rcu_check_sparse(p, space) \
+ ((void)(((typeof(*p) space *)p) == p))
+#else /* #ifdef __CHECKER__ */
+#define rcu_check_sparse(p, space)
+#endif /* #else #ifdef __CHECKER__ */
+
+/**
+ * RCU_INITIALIZER() - statically initialize an RCU-protected global variable
+ * @v: The value to statically initialize with.
+ */
+#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
+
+/**
+ * rcu_assign_pointer() - assign to RCU-protected pointer
+ * @p: pointer to assign to
+ * @v: value to assign (publish)
+ *
+ * Assigns the specified value to the specified RCU-protected
+ * pointer, ensuring that any concurrent RCU readers will see
+ * any prior initialization.
+ *
+ * Inserts memory barriers on architectures that require them
+ * (which is most of them), and also prevents the compiler from
+ * reordering the code that initializes the structure after the pointer
+ * assignment. More importantly, this call documents which pointers
+ * will be dereferenced by RCU read-side code.
+ *
+ * In some special cases, you may use RCU_INIT_POINTER() instead
+ * of rcu_assign_pointer(). RCU_INIT_POINTER() is a bit faster due
+ * to the fact that it does not constrain either the CPU or the compiler.
+ * That said, using RCU_INIT_POINTER() when you should have used
+ * rcu_assign_pointer() is a very bad thing that results in
+ * impossible-to-diagnose memory corruption. So please be careful.
+ * See the RCU_INIT_POINTER() comment header for details.
+ *
+ * Note that rcu_assign_pointer() evaluates each of its arguments only
+ * once, appearances notwithstanding. One of the "extra" evaluations
+ * is in typeof() and the other visible only to sparse (__CHECKER__),
+ * neither of which actually execute the argument. As with most cpp
+ * macros, this execute-arguments-only-once property is important, so
+ * please be careful when making changes to rcu_assign_pointer() and the
+ * other macros that it invokes.
+ */
+#define rcu_assign_pointer(p, v) \
+do { \
+ uintptr_t _r_a_p__v = (uintptr_t)(v); \
+ rcu_check_sparse(p, __rcu); \
+ \
+ if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
+ WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
+ else \
+ smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
+} while (0)
+
+#endif
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 75a2eded7aa2..4b7c52128409 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -29,6 +29,7 @@
#include
#include
#include
+#include
#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
@@ -51,6 +52,11 @@ void __rcu_read_unlock(void);
* types of kernel builds, the rcu_read_lock() nesting depth is unknowable.
*/
#define rcu_preempt_depth() (current->rcu_read_lock_nesting)
+#ifndef CONFIG_PREEMPT_RT
+#define sched_rcu_preempt_depth() rcu_preempt_depth()
+#else
+static inline int sched_rcu_preempt_depth(void) { return 0; }
+#endif
#else /* #ifdef CONFIG_PREEMPT_RCU */
@@ -69,6 +75,8 @@ static inline int rcu_preempt_depth(void)
return 0;
}
+#define sched_rcu_preempt_depth() rcu_preempt_depth()
+
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
/* Internal to kernel */
@@ -154,7 +162,7 @@ static inline void exit_tasks_rcu_finish(void) { }
*
* This macro resembles cond_resched(), except that it is defined to
* report potential quiescent states to RCU-tasks even if the cond_resched()
- * machinery were to be shut off, as some advocate for PREEMPT kernels.
+ * machinery were to be shut off, as some advocate for PREEMPTION kernels.
*/
#define cond_resched_tasks_rcu_qs() \
do { \
@@ -279,7 +287,8 @@ static inline void rcu_preempt_sleep_check(void) { }
#define rcu_sleep_check() \
do { \
rcu_preempt_sleep_check(); \
- RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map), \
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \
+ RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map), \
"Illegal context switch in RCU-bh read-side critical section"); \
RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map), \
"Illegal context switch in RCU-sched read-side critical section"); \
@@ -300,13 +309,6 @@ static inline void rcu_preempt_sleep_check(void) { }
* (e.g., __srcu), should this make sense in the future.
*/
-#ifdef __CHECKER__
-#define rcu_check_sparse(p, space) \
- ((void)(((typeof(*p) space *)p) == p))
-#else /* #ifdef __CHECKER__ */
-#define rcu_check_sparse(p, space)
-#endif /* #else #ifdef __CHECKER__ */
-
#define __rcu_access_pointer(p, space) \
({ \
typeof(*p) *_________p1 = (typeof(*p) *__force)READ_ONCE(p); \
@@ -334,54 +336,6 @@ static inline void rcu_preempt_sleep_check(void) { }
((typeof(*p) __force __kernel *)(________p1)); \
})
-/**
- * RCU_INITIALIZER() - statically initialize an RCU-protected global variable
- * @v: The value to statically initialize with.
- */
-#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
-
-/**
- * rcu_assign_pointer() - assign to RCU-protected pointer
- * @p: pointer to assign to
- * @v: value to assign (publish)
- *
- * Assigns the specified value to the specified RCU-protected
- * pointer, ensuring that any concurrent RCU readers will see
- * any prior initialization.
- *
- * Inserts memory barriers on architectures that require them
- * (which is most of them), and also prevents the compiler from
- * reordering the code that initializes the structure after the pointer
- * assignment. More importantly, this call documents which pointers
- * will be dereferenced by RCU read-side code.
- *
- * In some special cases, you may use RCU_INIT_POINTER() instead
- * of rcu_assign_pointer(). RCU_INIT_POINTER() is a bit faster due
- * to the fact that it does not constrain either the CPU or the compiler.
- * That said, using RCU_INIT_POINTER() when you should have used
- * rcu_assign_pointer() is a very bad thing that results in
- * impossible-to-diagnose memory corruption. So please be careful.
- * See the RCU_INIT_POINTER() comment header for details.
- *
- * Note that rcu_assign_pointer() evaluates each of its arguments only
- * once, appearances notwithstanding. One of the "extra" evaluations
- * is in typeof() and the other visible only to sparse (__CHECKER__),
- * neither of which actually execute the argument. As with most cpp
- * macros, this execute-arguments-only-once property is important, so
- * please be careful when making changes to rcu_assign_pointer() and the
- * other macros that it invokes.
- */
-#define rcu_assign_pointer(p, v) \
-do { \
- uintptr_t _r_a_p__v = (uintptr_t)(v); \
- rcu_check_sparse(p, __rcu); \
- \
- if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \
- WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \
- else \
- smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
-} while (0)
-
/**
* rcu_swap_protected() - swap an RCU and a regular pointer
* @rcu_ptr: RCU pointer
@@ -580,7 +534,7 @@ do { \
*
* You can avoid reading and understanding the next paragraph by
* following this rule: don't put anything in an rcu_read_lock() RCU
- * read-side critical section that would block in a !PREEMPT kernel.
+ * read-side critical section that would block in a !PREEMPTION kernel.
* But if you want the full story, read on!
*
* In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 6fd615a0eea9..138bd1e183e0 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -14,11 +14,15 @@
#define __LINUX_RT_MUTEX_H
#include
+#include
#include
-#include
extern int max_lock_depth; /* for sysctl */
+#ifdef CONFIG_DEBUG_MUTEXES
+#include
+#endif
+
/**
* The rt_mutex structure
*
@@ -31,8 +35,8 @@ struct rt_mutex {
raw_spinlock_t wait_lock;
struct rb_root_cached waiters;
struct task_struct *owner;
-#ifdef CONFIG_DEBUG_RT_MUTEXES
int save_state;
+#ifdef CONFIG_DEBUG_RT_MUTEXES
const char *name, *file;
int line;
void *magic;
@@ -82,16 +86,23 @@ do { \
#define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
#endif
-#define __RT_MUTEX_INITIALIZER(mutexname) \
- { .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \
+#define __RT_MUTEX_INITIALIZER_PLAIN(mutexname) \
+ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \
, .waiters = RB_ROOT_CACHED \
, .owner = NULL \
__DEBUG_RT_MUTEX_INITIALIZER(mutexname) \
- __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)}
+ __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
+
+#define __RT_MUTEX_INITIALIZER(mutexname) \
+ { __RT_MUTEX_INITIALIZER_PLAIN(mutexname) }
#define DEFINE_RT_MUTEX(mutexname) \
struct rt_mutex mutexname = __RT_MUTEX_INITIALIZER(mutexname)
+#define __RT_MUTEX_INITIALIZER_SAVE_STATE(mutexname) \
+ { __RT_MUTEX_INITIALIZER_PLAIN(mutexname) \
+ , .save_state = 1 }
+
/**
* rt_mutex_is_locked - is the mutex locked
* @lock: the mutex to be queried
@@ -115,6 +126,7 @@ extern void rt_mutex_lock(struct rt_mutex *lock);
#endif
extern int rt_mutex_lock_interruptible(struct rt_mutex *lock);
+extern int rt_mutex_lock_killable(struct rt_mutex *lock);
extern int rt_mutex_timed_lock(struct rt_mutex *lock,
struct hrtimer_sleeper *timeout);
diff --git a/include/linux/rwlock_rt.h b/include/linux/rwlock_rt.h
new file mode 100644
index 000000000000..a9c4c2ac4d1f
--- /dev/null
+++ b/include/linux/rwlock_rt.h
@@ -0,0 +1,119 @@
+#ifndef __LINUX_RWLOCK_RT_H
+#define __LINUX_RWLOCK_RT_H
+
+#ifndef __LINUX_SPINLOCK_H
+#error Do not include directly. Use spinlock.h
+#endif
+
+extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
+extern void __lockfunc rt_read_lock(rwlock_t *rwlock);
+extern int __lockfunc rt_write_trylock(rwlock_t *rwlock);
+extern int __lockfunc rt_read_trylock(rwlock_t *rwlock);
+extern void __lockfunc rt_write_unlock(rwlock_t *rwlock);
+extern void __lockfunc rt_read_unlock(rwlock_t *rwlock);
+extern int __lockfunc rt_read_can_lock(rwlock_t *rwlock);
+extern int __lockfunc rt_write_can_lock(rwlock_t *rwlock);
+extern void __rt_rwlock_init(rwlock_t *rwlock, char *name, struct lock_class_key *key);
+
+#define read_can_lock(rwlock) rt_read_can_lock(rwlock)
+#define write_can_lock(rwlock) rt_write_can_lock(rwlock)
+
+#define read_trylock(lock) __cond_lock(lock, rt_read_trylock(lock))
+#define write_trylock(lock) __cond_lock(lock, rt_write_trylock(lock))
+
+static inline int __write_trylock_rt_irqsave(rwlock_t *lock, unsigned long *flags)
+{
+ /* XXX ARCH_IRQ_ENABLED */
+ *flags = 0;
+ return rt_write_trylock(lock);
+}
+
+#define write_trylock_irqsave(lock, flags) \
+ __cond_lock(lock, __write_trylock_rt_irqsave(lock, &(flags)))
+
+#define read_lock_irqsave(lock, flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ rt_read_lock(lock); \
+ flags = 0; \
+ } while (0)
+
+#define write_lock_irqsave(lock, flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ rt_write_lock(lock); \
+ flags = 0; \
+ } while (0)
+
+#define read_lock(lock) rt_read_lock(lock)
+
+#define read_lock_bh(lock) \
+ do { \
+ local_bh_disable(); \
+ rt_read_lock(lock); \
+ } while (0)
+
+#define read_lock_irq(lock) read_lock(lock)
+
+#define write_lock(lock) rt_write_lock(lock)
+
+#define write_lock_bh(lock) \
+ do { \
+ local_bh_disable(); \
+ rt_write_lock(lock); \
+ } while (0)
+
+#define write_lock_irq(lock) write_lock(lock)
+
+#define read_unlock(lock) rt_read_unlock(lock)
+
+#define read_unlock_bh(lock) \
+ do { \
+ rt_read_unlock(lock); \
+ local_bh_enable(); \
+ } while (0)
+
+#define read_unlock_irq(lock) read_unlock(lock)
+
+#define write_unlock(lock) rt_write_unlock(lock)
+
+#define write_unlock_bh(lock) \
+ do { \
+ rt_write_unlock(lock); \
+ local_bh_enable(); \
+ } while (0)
+
+#define write_unlock_irq(lock) write_unlock(lock)
+
+#define read_unlock_irqrestore(lock, flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ (void) flags; \
+ rt_read_unlock(lock); \
+ } while (0)
+
+#define write_unlock_irqrestore(lock, flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ (void) flags; \
+ rt_write_unlock(lock); \
+ } while (0)
+
+#define rwlock_init(rwl) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ __rt_rwlock_init(rwl, #rwl, &__key); \
+} while (0)
+
+/*
+ * Internal functions made global for CPU pinning
+ */
+void __read_rt_lock(struct rt_rw_lock *lock);
+int __read_rt_trylock(struct rt_rw_lock *lock);
+void __write_rt_lock(struct rt_rw_lock *lock);
+int __write_rt_trylock(struct rt_rw_lock *lock);
+void __read_rt_unlock(struct rt_rw_lock *lock);
+void __write_rt_unlock(struct rt_rw_lock *lock);
+
+#endif
diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index 857a72ceb794..c21683f3e14a 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -1,6 +1,10 @@
#ifndef __LINUX_RWLOCK_TYPES_H
#define __LINUX_RWLOCK_TYPES_H
+#if !defined(__LINUX_SPINLOCK_TYPES_H)
+# error "Do not include directly, include spinlock_types.h"
+#endif
+
/*
* include/linux/rwlock_types.h - generic rwlock type definitions
* and initializers
diff --git a/include/linux/rwlock_types_rt.h b/include/linux/rwlock_types_rt.h
new file mode 100644
index 000000000000..546a1f8f1274
--- /dev/null
+++ b/include/linux/rwlock_types_rt.h
@@ -0,0 +1,55 @@
+#ifndef __LINUX_RWLOCK_TYPES_RT_H
+#define __LINUX_RWLOCK_TYPES_RT_H
+
+#ifndef __LINUX_SPINLOCK_TYPES_H
+#error "Do not include directly. Include spinlock_types.h instead"
+#endif
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define RW_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }
+#else
+# define RW_DEP_MAP_INIT(lockname)
+#endif
+
+typedef struct rt_rw_lock rwlock_t;
+
+#define __RW_LOCK_UNLOCKED(name) __RWLOCK_RT_INITIALIZER(name)
+
+#define DEFINE_RWLOCK(name) \
+ rwlock_t name = __RW_LOCK_UNLOCKED(name)
+
+/*
+ * A reader biased implementation primarily for CPU pinning.
+ *
+ * Can be selected as general replacement for the single reader RT rwlock
+ * variant
+ */
+struct rt_rw_lock {
+ struct rt_mutex rtmutex;
+ atomic_t readers;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+};
+
+#define READER_BIAS (1U << 31)
+#define WRITER_BIAS (1U << 30)
+
+#define __RWLOCK_RT_INITIALIZER(name) \
+{ \
+ .readers = ATOMIC_INIT(READER_BIAS), \
+ .rtmutex = __RT_MUTEX_INITIALIZER_SAVE_STATE(name.rtmutex), \
+ RW_DEP_MAP_INIT(name) \
+}
+
+void __rwlock_biased_rt_init(struct rt_rw_lock *lock, const char *name,
+ struct lock_class_key *key);
+
+#define rwlock_biased_rt_init(rwlock) \
+ do { \
+ static struct lock_class_key __key; \
+ \
+ __rwlock_biased_rt_init((rwlock), #rwlock, &__key); \
+ } while (0)
+
+#endif
diff --git a/include/linux/rwsem-rt.h b/include/linux/rwsem-rt.h
new file mode 100644
index 000000000000..2018ff77904a
--- /dev/null
+++ b/include/linux/rwsem-rt.h
@@ -0,0 +1,68 @@
+#ifndef _LINUX_RWSEM_RT_H
+#define _LINUX_RWSEM_RT_H
+
+#ifndef _LINUX_RWSEM_H
+#error "Include rwsem.h"
+#endif
+
+#include
+#include
+
+#define READER_BIAS (1U << 31)
+#define WRITER_BIAS (1U << 30)
+
+struct rw_semaphore {
+ atomic_t readers;
+ struct rt_mutex rtmutex;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+};
+
+#define __RWSEM_INITIALIZER(name) \
+{ \
+ .readers = ATOMIC_INIT(READER_BIAS), \
+ .rtmutex = __RT_MUTEX_INITIALIZER(name.rtmutex), \
+ RW_DEP_MAP_INIT(name) \
+}
+
+#define DECLARE_RWSEM(lockname) \
+ struct rw_semaphore lockname = __RWSEM_INITIALIZER(lockname)
+
+extern void __rwsem_init(struct rw_semaphore *rwsem, const char *name,
+ struct lock_class_key *key);
+
+#define __init_rwsem(sem, name, key) \
+do { \
+ rt_mutex_init(&(sem)->rtmutex); \
+ __rwsem_init((sem), (name), (key)); \
+} while (0)
+
+#define init_rwsem(sem) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ __init_rwsem((sem), #sem, &__key); \
+} while (0)
+
+static inline int rwsem_is_locked(struct rw_semaphore *sem)
+{
+ return atomic_read(&sem->readers) != READER_BIAS;
+}
+
+static inline int rwsem_is_contended(struct rw_semaphore *sem)
+{
+ return atomic_read(&sem->readers) > 0;
+}
+
+extern void __down_read(struct rw_semaphore *sem);
+extern int __down_read_killable(struct rw_semaphore *sem);
+extern int __down_read_trylock(struct rw_semaphore *sem);
+extern void __down_write(struct rw_semaphore *sem);
+extern int __must_check __down_write_killable(struct rw_semaphore *sem);
+extern int __down_write_trylock(struct rw_semaphore *sem);
+extern void __up_read(struct rw_semaphore *sem);
+extern void __up_write(struct rw_semaphore *sem);
+extern void __downgrade_write(struct rw_semaphore *sem);
+
+#endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 00d6054687dd..393f4520befc 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -16,6 +16,11 @@
#include
#include
#include
+
+#ifdef CONFIG_PREEMPT_RT
+#include
+#else /* PREEMPT_RT */
+
#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
#include
#endif
@@ -53,12 +58,6 @@ struct rw_semaphore {
#endif
};
-/*
- * Setting all bits of the owner field except bit 0 will indicate
- * that the rwsem is writer-owned with an unknown owner.
- */
-#define RWSEM_OWNER_UNKNOWN (-2L)
-
/* In all implementations count != 0 means locked */
static inline int rwsem_is_locked(struct rw_semaphore *sem)
{
@@ -121,6 +120,13 @@ static inline int rwsem_is_contended(struct rw_semaphore *sem)
return !list_empty(&sem->wait_list);
}
+#endif /* !PREEMPT_RT */
+
+/*
+ * The functions below are the same for all rwsem implementations including
+ * the RT specific variant.
+ */
+
/*
* lock for reading
*/
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5710b80f8050..4a1ced4dcf16 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -31,6 +31,7 @@
#include
#include
#include
+#include
/* task_struct member predeclarations (sorted alphabetically): */
struct audit_context;
@@ -107,12 +108,8 @@ struct task_group;
__TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
TASK_PARKED)
-#define task_is_traced(task) ((task->state & __TASK_TRACED) != 0)
-
#define task_is_stopped(task) ((task->state & __TASK_STOPPED) != 0)
-#define task_is_stopped_or_traced(task) ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0)
-
#define task_contributes_to_load(task) ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \
(task->flags & PF_FROZEN) == 0 && \
(task->state & TASK_NOLOAD) == 0)
@@ -140,6 +137,9 @@ struct task_group;
smp_store_mb(current->state, (state_value)); \
} while (0)
+#define __set_current_state_no_track(state_value) \
+ current->state = (state_value);
+
#define set_special_state(state_value) \
do { \
unsigned long flags; /* may shadow */ \
@@ -149,6 +149,7 @@ struct task_group;
current->state = (state_value); \
raw_spin_unlock_irqrestore(¤t->pi_lock, flags); \
} while (0)
+
#else
/*
* set_current_state() includes a barrier so that the write of current->state
@@ -193,6 +194,9 @@ struct task_group;
#define set_current_state(state_value) \
smp_store_mb(current->state, (state_value))
+#define __set_current_state_no_track(state_value) \
+ __set_current_state(state_value)
+
/*
* set_special_state() should be used for those states when the blocking task
* can not use the regular condition based wait-loop. In that case we must
@@ -230,6 +234,8 @@ extern void io_schedule_finish(int token);
extern long io_schedule_timeout(long timeout);
extern void io_schedule(void);
+int cpu_nr_pinned(int cpu);
+
/**
* struct prev_cputime - snapshot of system and user cputime
* @utime: time spent in user mode
@@ -631,6 +637,8 @@ struct task_struct {
#endif
/* -1 unrunnable, 0 runnable, >0 stopped: */
volatile long state;
+ /* saved state for "spinlock sleepers" */
+ volatile long saved_state;
/*
* This begins the randomizable portion of task_struct. Only
@@ -700,6 +708,20 @@ struct task_struct {
int nr_cpus_allowed;
const cpumask_t *cpus_ptr;
cpumask_t cpus_mask;
+#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+ int migrate_disable;
+ bool migrate_disable_scheduled;
+# ifdef CONFIG_SCHED_DEBUG
+ int pinned_on_cpu;
+# endif
+#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+# ifdef CONFIG_SCHED_DEBUG
+ int migrate_disable;
+# endif
+#endif
+#ifdef CONFIG_PREEMPT_RT
+ int sleeping_lock;
+#endif
#ifdef CONFIG_PREEMPT_RCU
int rcu_read_lock_nesting;
@@ -913,11 +935,17 @@ struct task_struct {
/* Signal handlers: */
struct signal_struct *signal;
struct sighand_struct *sighand;
+ struct sigqueue *sigqueue_cache;
+
sigset_t blocked;
sigset_t real_blocked;
/* Restored if set_restore_sigmask() was used: */
sigset_t saved_sigmask;
struct sigpending pending;
+#ifdef CONFIG_PREEMPT_RT
+ /* TODO: move me into ->restart_block ? */
+ struct kernel_siginfo forced_info;
+#endif
unsigned long sas_ss_sp;
size_t sas_ss_size;
unsigned int sas_ss_flags;
@@ -944,6 +972,7 @@ struct task_struct {
raw_spinlock_t pi_lock;
struct wake_q_node wake_q;
+ struct wake_q_node wake_q_sleeper;
#ifdef CONFIG_RT_MUTEXES
/* PI waiters blocked on a rt_mutex held by this task: */
@@ -978,6 +1007,9 @@ struct task_struct {
int softirqs_enabled;
int softirq_context;
#endif
+#ifdef CONFIG_PREEMPT_RT
+ int softirq_count;
+#endif
#ifdef CONFIG_LOCKDEP
# define MAX_LOCK_DEPTH 48UL
@@ -1241,6 +1273,12 @@ struct task_struct {
unsigned int sequential_io;
unsigned int sequential_io_avg;
#endif
+#ifdef CONFIG_PREEMPT_RT
+# if defined CONFIG_HIGHMEM || defined CONFIG_X86_32
+ int kmap_idx;
+ pte_t kmap_pte[KM_TYPE_NR];
+# endif
+#endif
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
unsigned long task_state_change;
#endif
@@ -1672,6 +1710,7 @@ extern struct task_struct *find_get_task_by_vpid(pid_t nr);
extern int wake_up_state(struct task_struct *tsk, unsigned int state);
extern int wake_up_process(struct task_struct *tsk);
+extern int wake_up_lock_sleeper(struct task_struct *tsk);
extern void wake_up_new_task(struct task_struct *tsk);
#ifdef CONFIG_SMP
@@ -1754,6 +1793,89 @@ static inline int test_tsk_need_resched(struct task_struct *tsk)
return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED));
}
+#ifdef CONFIG_PREEMPT_LAZY
+static inline void set_tsk_need_resched_lazy(struct task_struct *tsk)
+{
+ set_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY);
+}
+
+static inline void clear_tsk_need_resched_lazy(struct task_struct *tsk)
+{
+ clear_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY);
+}
+
+static inline int test_tsk_need_resched_lazy(struct task_struct *tsk)
+{
+ return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY));
+}
+
+static inline int need_resched_lazy(void)
+{
+ return test_thread_flag(TIF_NEED_RESCHED_LAZY);
+}
+
+static inline int need_resched_now(void)
+{
+ return test_thread_flag(TIF_NEED_RESCHED);
+}
+
+#else
+static inline void clear_tsk_need_resched_lazy(struct task_struct *tsk) { }
+static inline int need_resched_lazy(void) { return 0; }
+
+static inline int need_resched_now(void)
+{
+ return test_thread_flag(TIF_NEED_RESCHED);
+}
+
+#endif
+
+
+static inline bool __task_is_stopped_or_traced(struct task_struct *task)
+{
+ if (task->state & (__TASK_STOPPED | __TASK_TRACED))
+ return true;
+#ifdef CONFIG_PREEMPT_RT
+ if (task->saved_state & (__TASK_STOPPED | __TASK_TRACED))
+ return true;
+#endif
+ return false;
+}
+
+static inline bool task_is_stopped_or_traced(struct task_struct *task)
+{
+ bool traced_stopped;
+
+#ifdef CONFIG_PREEMPT_RT
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&task->pi_lock, flags);
+ traced_stopped = __task_is_stopped_or_traced(task);
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+#else
+ traced_stopped = __task_is_stopped_or_traced(task);
+#endif
+ return traced_stopped;
+}
+
+static inline bool task_is_traced(struct task_struct *task)
+{
+ bool traced = false;
+
+ if (task->state & __TASK_TRACED)
+ return true;
+#ifdef CONFIG_PREEMPT_RT
+ /* in case the task is sleeping on tasklist_lock */
+ raw_spin_lock_irq(&task->pi_lock);
+ if (task->state & __TASK_TRACED)
+ traced = true;
+ else if (task->saved_state & __TASK_TRACED)
+ traced = true;
+ raw_spin_unlock_irq(&task->pi_lock);
+#endif
+ return traced;
+}
+
/*
* cond_resched() and cond_resched_lock(): latency reduction via
* explicit rescheduling in places that are safe. The return
@@ -1806,6 +1928,23 @@ static __always_inline bool need_resched(void)
return unlikely(tif_need_resched());
}
+#ifdef CONFIG_PREEMPT_RT
+static inline void sleeping_lock_inc(void)
+{
+ current->sleeping_lock++;
+}
+
+static inline void sleeping_lock_dec(void)
+{
+ current->sleeping_lock--;
+}
+
+#else
+
+static inline void sleeping_lock_inc(void) { }
+static inline void sleeping_lock_dec(void) { }
+#endif
+
/*
* Wrappers for p->thread_info->cpu access. No-op on UP.
*/
@@ -1997,4 +2136,6 @@ int sched_trace_rq_cpu(struct rq *rq);
const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
+extern struct task_struct *takedown_cpu_task;
+
#endif
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index a132d875d351..af8d898cc840 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -49,6 +49,16 @@ static inline void mmdrop(struct mm_struct *mm)
__mmdrop(mm);
}
+#ifdef CONFIG_PREEMPT_RT
+extern void __mmdrop_delayed(struct rcu_head *rhp);
+static inline void mmdrop_delayed(struct mm_struct *mm)
+{
+ if (atomic_dec_and_test(&mm->mm_count))
+ call_rcu(&mm->delayed_drop, __mmdrop_delayed);
+}
+#else
+# define mmdrop_delayed(mm) mmdrop(mm)
+#endif
void mmdrop(struct mm_struct *mm);
/*
diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h
index 26a2013ac39c..6e2dff721547 100644
--- a/include/linux/sched/wake_q.h
+++ b/include/linux/sched/wake_q.h
@@ -58,6 +58,17 @@ static inline bool wake_q_empty(struct wake_q_head *head)
extern void wake_q_add(struct wake_q_head *head, struct task_struct *task);
extern void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task);
-extern void wake_up_q(struct wake_q_head *head);
+extern void wake_q_add_sleeper(struct wake_q_head *head, struct task_struct *task);
+extern void __wake_up_q(struct wake_q_head *head, bool sleeper);
+
+static inline void wake_up_q(struct wake_q_head *head)
+{
+ __wake_up_q(head, false);
+}
+
+static inline void wake_up_q_sleeper(struct wake_q_head *head)
+{
+ __wake_up_q(head, true);
+}
#endif /* _LINUX_SCHED_WAKE_Q_H */
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index a42a29952889..e5207897c33e 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -221,20 +221,30 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
return __read_seqcount_retry(s, start);
}
-
-
-static inline void raw_write_seqcount_begin(seqcount_t *s)
+static inline void __raw_write_seqcount_begin(seqcount_t *s)
{
s->sequence++;
smp_wmb();
}
-static inline void raw_write_seqcount_end(seqcount_t *s)
+static inline void raw_write_seqcount_begin(seqcount_t *s)
+{
+ preempt_disable_rt();
+ __raw_write_seqcount_begin(s);
+}
+
+static inline void __raw_write_seqcount_end(seqcount_t *s)
{
smp_wmb();
s->sequence++;
}
+static inline void raw_write_seqcount_end(seqcount_t *s)
+{
+ __raw_write_seqcount_end(s);
+ preempt_enable_rt();
+}
+
/**
* raw_write_seqcount_barrier - do a seq write barrier
* @s: pointer to seqcount_t
@@ -435,10 +445,33 @@ typedef struct {
/*
* Read side functions for starting and finalizing a read side section.
*/
+#ifndef CONFIG_PREEMPT_RT
static inline unsigned read_seqbegin(const seqlock_t *sl)
{
return read_seqcount_begin(&sl->seqcount);
}
+#else
+/*
+ * Starvation safe read side for RT
+ */
+static inline unsigned read_seqbegin(seqlock_t *sl)
+{
+ unsigned ret;
+
+repeat:
+ ret = READ_ONCE(sl->seqcount.sequence);
+ if (unlikely(ret & 1)) {
+ /*
+ * Take the lock and let the writer proceed (i.e. evtl
+ * boost it), otherwise we could loop here forever.
+ */
+ spin_unlock_wait(&sl->lock);
+ goto repeat;
+ }
+ smp_rmb();
+ return ret;
+}
+#endif
static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
{
@@ -453,36 +486,45 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
static inline void write_seqlock(seqlock_t *sl)
{
spin_lock(&sl->lock);
- write_seqcount_begin(&sl->seqcount);
+ __raw_write_seqcount_begin(&sl->seqcount);
+}
+
+static inline int try_write_seqlock(seqlock_t *sl)
+{
+ if (spin_trylock(&sl->lock)) {
+ __raw_write_seqcount_begin(&sl->seqcount);
+ return 1;
+ }
+ return 0;
}
static inline void write_sequnlock(seqlock_t *sl)
{
- write_seqcount_end(&sl->seqcount);
+ __raw_write_seqcount_end(&sl->seqcount);
spin_unlock(&sl->lock);
}
static inline void write_seqlock_bh(seqlock_t *sl)
{
spin_lock_bh(&sl->lock);
- write_seqcount_begin(&sl->seqcount);
+ __raw_write_seqcount_begin(&sl->seqcount);
}
static inline void write_sequnlock_bh(seqlock_t *sl)
{
- write_seqcount_end(&sl->seqcount);
+ __raw_write_seqcount_end(&sl->seqcount);
spin_unlock_bh(&sl->lock);
}
static inline void write_seqlock_irq(seqlock_t *sl)
{
spin_lock_irq(&sl->lock);
- write_seqcount_begin(&sl->seqcount);
+ __raw_write_seqcount_begin(&sl->seqcount);
}
static inline void write_sequnlock_irq(seqlock_t *sl)
{
- write_seqcount_end(&sl->seqcount);
+ __raw_write_seqcount_end(&sl->seqcount);
spin_unlock_irq(&sl->lock);
}
@@ -491,7 +533,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
unsigned long flags;
spin_lock_irqsave(&sl->lock, flags);
- write_seqcount_begin(&sl->seqcount);
+ __raw_write_seqcount_begin(&sl->seqcount);
return flags;
}
@@ -501,7 +543,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
static inline void
write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
{
- write_seqcount_end(&sl->seqcount);
+ __raw_write_seqcount_end(&sl->seqcount);
spin_unlock_irqrestore(&sl->lock, flags);
}
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index bb2bc99388ca..be74bb7cae64 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -7,6 +7,7 @@
#ifndef _LINUX_SERIAL_8250_H
#define _LINUX_SERIAL_8250_H
+#include
#include
#include
#include
@@ -123,6 +124,8 @@ struct uart_8250_port {
#define MSR_SAVE_FLAGS UART_MSR_ANY_DELTA
unsigned char msr_saved_flags;
+ atomic_t console_printing;
+
struct uart_8250_dma *dma;
const struct uart_8250_ops *ops;
@@ -174,6 +177,8 @@ void serial8250_init_port(struct uart_8250_port *up);
void serial8250_set_defaults(struct uart_8250_port *up);
void serial8250_console_write(struct uart_8250_port *up, const char *s,
unsigned int count);
+void serial8250_console_write_atomic(struct uart_8250_port *up, const char *s,
+ unsigned int count);
int serial8250_console_setup(struct uart_port *port, char *options, bool probe);
extern void serial8250_set_isa_configurator(void (*v)
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 1a5f88316b08..1870c233799e 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -255,6 +255,7 @@ static inline void init_sigpending(struct sigpending *sig)
}
extern void flush_sigqueue(struct sigpending *queue);
+extern void flush_task_sigqueue(struct task_struct *tsk);
/* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
static inline int valid_signal(unsigned long sig)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 68139cc2f3ca..f6dcd58a7453 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -293,6 +293,7 @@ struct sk_buff_head {
__u32 qlen;
spinlock_t lock;
+ raw_spinlock_t raw_lock;
};
struct sk_buff;
@@ -1858,6 +1859,12 @@ static inline void skb_queue_head_init(struct sk_buff_head *list)
__skb_queue_head_init(list);
}
+static inline void skb_queue_head_init_raw(struct sk_buff_head *list)
+{
+ raw_spin_lock_init(&list->raw_lock);
+ __skb_queue_head_init(list);
+}
+
static inline void skb_queue_head_init_class(struct sk_buff_head *list,
struct lock_class_key *class)
{
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 6fc856c9eda5..01ff6ed61591 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -15,6 +15,7 @@
#include
typedef void (*smp_call_func_t)(void *info);
+typedef bool (*smp_cond_func_t)(int cpu, void *info);
struct __call_single_data {
struct llist_node llist;
smp_call_func_t func;
@@ -49,13 +50,11 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
* cond_func returns a positive value. This may include the local
* processor.
*/
-void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
- smp_call_func_t func, void *info, bool wait,
- gfp_t gfp_flags);
+void on_each_cpu_cond(smp_cond_func_t cond_func, smp_call_func_t func,
+ void *info, bool wait);
-void on_each_cpu_cond_mask(bool (*cond_func)(int cpu, void *info),
- smp_call_func_t func, void *info, bool wait,
- gfp_t gfp_flags, const struct cpumask *mask);
+void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
+ void *info, bool wait, const struct cpumask *mask);
int smp_call_function_single_async(int cpu, call_single_data_t *csd);
@@ -222,6 +221,9 @@ static inline int get_boot_cpu_id(void)
#define get_cpu() ({ preempt_disable(); __smp_processor_id(); })
#define put_cpu() preempt_enable()
+#define get_cpu_light() ({ migrate_disable(); __smp_processor_id(); })
+#define put_cpu_light() migrate_enable()
+
/*
* Callback to arch code if there's nosmp or maxcpus=0 on the
* boot command line:
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 031ce8617df8..826ddd48e605 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -307,7 +307,11 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
})
/* Include rwlock functions */
-#include
+#ifdef CONFIG_PREEMPT_RT
+# include
+#else
+# include
+#endif
/*
* Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
@@ -318,6 +322,10 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
# include
#endif
+#ifdef CONFIG_PREEMPT_RT
+# include
+#else /* PREEMPT_RT */
+
/*
* Map the spin_lock functions to the raw variants for PREEMPT_RT=n
*/
@@ -438,6 +446,8 @@ static __always_inline int spin_is_contended(spinlock_t *lock)
#define assert_spin_locked(lock) assert_raw_spin_locked(&(lock)->rlock)
+#endif /* !PREEMPT_RT */
+
/*
* Pull the atomic_t declaration:
* (asm-mips/atomic.h needs above definitions)
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index b762eaba4cdf..60f593969fe1 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -187,6 +187,8 @@ static inline int __raw_spin_trylock_bh(raw_spinlock_t *lock)
return 0;
}
-#include
+#ifndef CONFIG_PREEMPT_RT
+# include
+#endif
#endif /* __LINUX_SPINLOCK_API_SMP_H */
diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h
new file mode 100644
index 000000000000..3696a77fa77d
--- /dev/null
+++ b/include/linux/spinlock_rt.h
@@ -0,0 +1,156 @@
+#ifndef __LINUX_SPINLOCK_RT_H
+#define __LINUX_SPINLOCK_RT_H
+
+#ifndef __LINUX_SPINLOCK_H
+#error Do not include directly. Use spinlock.h
+#endif
+
+#include
+
+extern void
+__rt_spin_lock_init(spinlock_t *lock, const char *name, struct lock_class_key *key);
+
+#define spin_lock_init(slock) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ rt_mutex_init(&(slock)->lock); \
+ __rt_spin_lock_init(slock, #slock, &__key); \
+} while (0)
+
+extern void __lockfunc rt_spin_lock(spinlock_t *lock);
+extern unsigned long __lockfunc rt_spin_lock_trace_flags(spinlock_t *lock);
+extern void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass);
+extern void __lockfunc rt_spin_unlock(spinlock_t *lock);
+extern void __lockfunc rt_spin_unlock_wait(spinlock_t *lock);
+extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags);
+extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock);
+extern int __lockfunc rt_spin_trylock(spinlock_t *lock);
+extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock);
+
+/*
+ * lockdep-less calls, for derived types like rwlock:
+ * (for trylock they can use rt_mutex_trylock() directly.
+ * Migrate disable handling must be done at the call site.
+ */
+extern void __lockfunc __rt_spin_lock(struct rt_mutex *lock);
+extern void __lockfunc __rt_spin_trylock(struct rt_mutex *lock);
+extern void __lockfunc __rt_spin_unlock(struct rt_mutex *lock);
+
+#define spin_lock(lock) rt_spin_lock(lock)
+
+#define spin_lock_bh(lock) \
+ do { \
+ local_bh_disable(); \
+ rt_spin_lock(lock); \
+ } while (0)
+
+#define spin_lock_irq(lock) spin_lock(lock)
+
+#define spin_do_trylock(lock) __cond_lock(lock, rt_spin_trylock(lock))
+
+#define spin_trylock(lock) \
+({ \
+ int __locked; \
+ __locked = spin_do_trylock(lock); \
+ __locked; \
+})
+
+#ifdef CONFIG_LOCKDEP
+# define spin_lock_nested(lock, subclass) \
+ do { \
+ rt_spin_lock_nested(lock, subclass); \
+ } while (0)
+
+#define spin_lock_bh_nested(lock, subclass) \
+ do { \
+ local_bh_disable(); \
+ rt_spin_lock_nested(lock, subclass); \
+ } while (0)
+
+# define spin_lock_irqsave_nested(lock, flags, subclass) \
+ do { \
+ typecheck(unsigned long, flags); \
+ flags = 0; \
+ rt_spin_lock_nested(lock, subclass); \
+ } while (0)
+#else
+# define spin_lock_nested(lock, subclass) spin_lock(lock)
+# define spin_lock_bh_nested(lock, subclass) spin_lock_bh(lock)
+
+# define spin_lock_irqsave_nested(lock, flags, subclass) \
+ do { \
+ typecheck(unsigned long, flags); \
+ flags = 0; \
+ spin_lock(lock); \
+ } while (0)
+#endif
+
+#define spin_lock_irqsave(lock, flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ flags = 0; \
+ spin_lock(lock); \
+ } while (0)
+
+static inline unsigned long spin_lock_trace_flags(spinlock_t *lock)
+{
+ unsigned long flags = 0;
+#ifdef CONFIG_TRACE_IRQFLAGS
+ flags = rt_spin_lock_trace_flags(lock);
+#else
+ spin_lock(lock); /* lock_local */
+#endif
+ return flags;
+}
+
+/* FIXME: we need rt_spin_lock_nest_lock */
+#define spin_lock_nest_lock(lock, nest_lock) spin_lock_nested(lock, 0)
+
+#define spin_unlock(lock) rt_spin_unlock(lock)
+
+#define spin_unlock_bh(lock) \
+ do { \
+ rt_spin_unlock(lock); \
+ local_bh_enable(); \
+ } while (0)
+
+#define spin_unlock_irq(lock) spin_unlock(lock)
+
+#define spin_unlock_irqrestore(lock, flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ (void) flags; \
+ spin_unlock(lock); \
+ } while (0)
+
+#define spin_trylock_bh(lock) __cond_lock(lock, rt_spin_trylock_bh(lock))
+#define spin_trylock_irq(lock) spin_trylock(lock)
+
+#define spin_trylock_irqsave(lock, flags) \
+ rt_spin_trylock_irqsave(lock, &(flags))
+
+#define spin_unlock_wait(lock) rt_spin_unlock_wait(lock)
+
+#ifdef CONFIG_GENERIC_LOCKBREAK
+# define spin_is_contended(lock) ((lock)->break_lock)
+#else
+# define spin_is_contended(lock) (((void)(lock), 0))
+#endif
+
+static inline int spin_can_lock(spinlock_t *lock)
+{
+ return !rt_mutex_is_locked(&lock->lock);
+}
+
+static inline int spin_is_locked(spinlock_t *lock)
+{
+ return rt_mutex_is_locked(&lock->lock);
+}
+
+static inline void assert_spin_locked(spinlock_t *lock)
+{
+ BUG_ON(!spin_is_locked(lock));
+}
+
+#endif
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 24b4e6f2c1a2..8d896d3e1a01 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -9,77 +9,15 @@
* Released under the General Public License (GPL).
*/
-#if defined(CONFIG_SMP)
-# include
-#else
-# include
-#endif
-
-#include
-
-typedef struct raw_spinlock {
- arch_spinlock_t raw_lock;
-#ifdef CONFIG_DEBUG_SPINLOCK
- unsigned int magic, owner_cpu;
- void *owner;
-#endif
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
- struct lockdep_map dep_map;
-#endif
-} raw_spinlock_t;
-
-#define SPINLOCK_MAGIC 0xdead4ead
-
-#define SPINLOCK_OWNER_INIT ((void *)-1L)
-
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }
-#else
-# define SPIN_DEP_MAP_INIT(lockname)
-#endif
+#include
-#ifdef CONFIG_DEBUG_SPINLOCK
-# define SPIN_DEBUG_INIT(lockname) \
- .magic = SPINLOCK_MAGIC, \
- .owner_cpu = -1, \
- .owner = SPINLOCK_OWNER_INIT,
+#ifndef CONFIG_PREEMPT_RT
+# include
+# include
#else
-# define SPIN_DEBUG_INIT(lockname)
+# include
+# include
+# include
#endif
-#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \
- { \
- .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \
- SPIN_DEBUG_INIT(lockname) \
- SPIN_DEP_MAP_INIT(lockname) }
-
-#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \
- (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
-
-#define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
-
-typedef struct spinlock {
- union {
- struct raw_spinlock rlock;
-
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map))
- struct {
- u8 __padding[LOCK_PADSIZE];
- struct lockdep_map dep_map;
- };
-#endif
- };
-} spinlock_t;
-
-#define __SPIN_LOCK_INITIALIZER(lockname) \
- { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
-
-#define __SPIN_LOCK_UNLOCKED(lockname) \
- (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
-
-#define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
-
-#include
-
#endif /* __LINUX_SPINLOCK_TYPES_H */
diff --git a/include/linux/spinlock_types_nort.h b/include/linux/spinlock_types_nort.h
new file mode 100644
index 000000000000..f1dac1fb1d6a
--- /dev/null
+++ b/include/linux/spinlock_types_nort.h
@@ -0,0 +1,33 @@
+#ifndef __LINUX_SPINLOCK_TYPES_NORT_H
+#define __LINUX_SPINLOCK_TYPES_NORT_H
+
+#ifndef __LINUX_SPINLOCK_TYPES_H
+#error "Do not include directly. Include spinlock_types.h instead"
+#endif
+
+/*
+ * The non RT version maps spinlocks to raw_spinlocks
+ */
+typedef struct spinlock {
+ union {
+ struct raw_spinlock rlock;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map))
+ struct {
+ u8 __padding[LOCK_PADSIZE];
+ struct lockdep_map dep_map;
+ };
+#endif
+ };
+} spinlock_t;
+
+#define __SPIN_LOCK_INITIALIZER(lockname) \
+ { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
+
+#define __SPIN_LOCK_UNLOCKED(lockname) \
+ (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
+
+#define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
+
+#endif
diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h
new file mode 100644
index 000000000000..822bf64a61d3
--- /dev/null
+++ b/include/linux/spinlock_types_raw.h
@@ -0,0 +1,55 @@
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
+#define __LINUX_SPINLOCK_TYPES_RAW_H
+
+#include
+
+#if defined(CONFIG_SMP)
+# include
+#else
+# include
+#endif
+
+#include
+
+typedef struct raw_spinlock {
+ arch_spinlock_t raw_lock;
+#ifdef CONFIG_DEBUG_SPINLOCK
+ unsigned int magic, owner_cpu;
+ void *owner;
+#endif
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+} raw_spinlock_t;
+
+#define SPINLOCK_MAGIC 0xdead4ead
+
+#define SPINLOCK_OWNER_INIT ((void *)-1L)
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }
+#else
+# define SPIN_DEP_MAP_INIT(lockname)
+#endif
+
+#ifdef CONFIG_DEBUG_SPINLOCK
+# define SPIN_DEBUG_INIT(lockname) \
+ .magic = SPINLOCK_MAGIC, \
+ .owner_cpu = -1, \
+ .owner = SPINLOCK_OWNER_INIT,
+#else
+# define SPIN_DEBUG_INIT(lockname)
+#endif
+
+#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \
+ { \
+ .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \
+ SPIN_DEBUG_INIT(lockname) \
+ SPIN_DEP_MAP_INIT(lockname) }
+
+#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \
+ (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
+
+#define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
+
+#endif
diff --git a/include/linux/spinlock_types_rt.h b/include/linux/spinlock_types_rt.h
new file mode 100644
index 000000000000..3e3d8c5f7a9a
--- /dev/null
+++ b/include/linux/spinlock_types_rt.h
@@ -0,0 +1,48 @@
+#ifndef __LINUX_SPINLOCK_TYPES_RT_H
+#define __LINUX_SPINLOCK_TYPES_RT_H
+
+#ifndef __LINUX_SPINLOCK_TYPES_H
+#error "Do not include directly. Include spinlock_types.h instead"
+#endif
+
+#include
+
+/*
+ * PREEMPT_RT: spinlocks - an RT mutex plus lock-break field:
+ */
+typedef struct spinlock {
+ struct rt_mutex lock;
+ unsigned int break_lock;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+} spinlock_t;
+
+#ifdef CONFIG_DEBUG_RT_MUTEXES
+# define __RT_SPIN_INITIALIZER(name) \
+ { \
+ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \
+ .save_state = 1, \
+ .file = __FILE__, \
+ .line = __LINE__ , \
+ }
+#else
+# define __RT_SPIN_INITIALIZER(name) \
+ { \
+ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \
+ .save_state = 1, \
+ }
+#endif
+
+/*
+.wait_list = PLIST_HEAD_INIT_RAW((name).lock.wait_list, (name).lock.wait_lock)
+*/
+
+#define __SPIN_LOCK_UNLOCKED(name) \
+ { .lock = __RT_SPIN_INITIALIZER(name.lock), \
+ SPIN_DEP_MAP_INIT(name) }
+
+#define DEFINE_SPINLOCK(name) \
+ spinlock_t name = __SPIN_LOCK_UNLOCKED(name)
+
+#endif
diff --git a/include/linux/spinlock_types_up.h b/include/linux/spinlock_types_up.h
index c09b6407ae1b..b0243ba07fb7 100644
--- a/include/linux/spinlock_types_up.h
+++ b/include/linux/spinlock_types_up.h
@@ -1,10 +1,6 @@
#ifndef __LINUX_SPINLOCK_TYPES_UP_H
#define __LINUX_SPINLOCK_TYPES_UP_H
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
/*
* include/linux/spinlock_types_up.h - spinlock type definitions for UP
*
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index f9a0c6189852..27d32887103d 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -26,6 +26,8 @@ struct cpu_stop_work {
cpu_stop_fn_t fn;
void *arg;
struct cpu_stop_done *done;
+ /* Did not run due to disabled stopper; for nowait debug checks */
+ bool disabled;
};
int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg);
diff --git a/include/linux/swait.h b/include/linux/swait.h
index 73e06e9986d4..f426a0661aa0 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -160,7 +160,9 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq)
extern void swake_up_one(struct swait_queue_head *q);
extern void swake_up_all(struct swait_queue_head *q);
extern void swake_up_locked(struct swait_queue_head *q);
+extern void swake_up_all_locked(struct swait_queue_head *q);
+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
extern void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state);
extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 063c0c1e112b..1ddf6a825468 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -12,6 +12,7 @@
#include
#include
#include
+#include
#include
struct notifier_block;
@@ -328,6 +329,7 @@ extern unsigned long nr_free_pagecache_pages(void);
/* linux/mm/swap.c */
+DECLARE_LOCAL_IRQ_LOCK(swapvec_lock);
extern void lru_cache_add(struct page *);
extern void lru_cache_add_anon(struct page *page);
extern void lru_cache_add_file(struct page *page);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index e93e249a4e9b..c88b9cecc78a 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -97,7 +97,17 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
#define test_thread_flag(flag) \
test_ti_thread_flag(current_thread_info(), flag)
-#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
+#ifdef CONFIG_PREEMPT_LAZY
+#define tif_need_resched() (test_thread_flag(TIF_NEED_RESCHED) || \
+ test_thread_flag(TIF_NEED_RESCHED_LAZY))
+#define tif_need_resched_now() (test_thread_flag(TIF_NEED_RESCHED))
+#define tif_need_resched_lazy() test_thread_flag(TIF_NEED_RESCHED_LAZY))
+
+#else
+#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
+#define tif_need_resched_now() test_thread_flag(TIF_NEED_RESCHED)
+#define tif_need_resched_lazy() 0
+#endif
#ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
static inline int arch_within_stack_frames(const void * const stack,
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..10bbf986bc1d 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -62,6 +62,8 @@ struct trace_entry {
unsigned char flags;
unsigned char preempt_count;
int pid;
+ unsigned char migrate_disable;
+ unsigned char preempt_lazy_count;
};
#define TRACE_EVENT_TYPE_MAX \
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 38555435a64a..e2777169c539 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -182,6 +182,7 @@ static __always_inline void pagefault_disabled_dec(void)
*/
static inline void pagefault_disable(void)
{
+ migrate_disable();
pagefault_disabled_inc();
/*
* make sure to have issued the store before a pagefault
@@ -198,6 +199,7 @@ static inline void pagefault_enable(void)
*/
barrier();
pagefault_disabled_dec();
+ migrate_enable();
}
/*
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index bdeda4b079fe..dec95902b138 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -54,7 +54,9 @@ DECLARE_PER_CPU(struct vm_event_state, vm_event_states);
*/
static inline void __count_vm_event(enum vm_event_item item)
{
+ preempt_disable_rt();
raw_cpu_inc(vm_event_states.event[item]);
+ preempt_enable_rt();
}
static inline void count_vm_event(enum vm_event_item item)
@@ -64,7 +66,9 @@ static inline void count_vm_event(enum vm_event_item item)
static inline void __count_vm_events(enum vm_event_item item, long delta)
{
+ preempt_disable_rt();
raw_cpu_add(vm_event_states.event[item], delta);
+ preempt_enable_rt();
}
static inline void count_vm_events(enum vm_event_item item, long delta)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 3eb7cae8206c..1781c47d81f0 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -10,6 +10,7 @@
#include
#include
+#include
typedef struct wait_queue_entry wait_queue_entry_t;
@@ -20,6 +21,7 @@ int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int
#define WQ_FLAG_EXCLUSIVE 0x01
#define WQ_FLAG_WOKEN 0x02
#define WQ_FLAG_BOOKMARK 0x04
+#define WQ_FLAG_CUSTOM 0x08
/*
* A single wait-queue entry structure:
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index ca23860adbb9..599572f9dcd1 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -6,6 +6,7 @@
#include
#include
#include
+#include
struct gnet_stats_basic_cpu {
struct gnet_stats_basic_packed bstats;
@@ -36,15 +37,15 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
spinlock_t *lock, struct gnet_dump *d,
int padattr);
-int gnet_stats_copy_basic(const seqcount_t *running,
+int gnet_stats_copy_basic(net_seqlock_t *running,
struct gnet_dump *d,
struct gnet_stats_basic_cpu __percpu *cpu,
struct gnet_stats_basic_packed *b);
-void __gnet_stats_copy_basic(const seqcount_t *running,
+void __gnet_stats_copy_basic(net_seqlock_t *running,
struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu,
struct gnet_stats_basic_packed *b);
-int gnet_stats_copy_basic_hw(const seqcount_t *running,
+int gnet_stats_copy_basic_hw(net_seqlock_t *running,
struct gnet_dump *d,
struct gnet_stats_basic_cpu __percpu *cpu,
struct gnet_stats_basic_packed *b);
@@ -64,13 +65,13 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
struct net_rate_estimator __rcu **rate_est,
spinlock_t *lock,
- seqcount_t *running, struct nlattr *opt);
+ net_seqlock_t *running, struct nlattr *opt);
void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
struct gnet_stats_basic_cpu __percpu *cpu_bstats,
struct net_rate_estimator __rcu **ptr,
spinlock_t *lock,
- seqcount_t *running, struct nlattr *opt);
+ net_seqlock_t *running, struct nlattr *opt);
bool gen_estimator_active(struct net_rate_estimator __rcu **ptr);
bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
struct gnet_stats_rate_est64 *sample);
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 8ec77bfdc1a4..6cdf3a074197 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -459,7 +459,7 @@ static inline int neigh_hh_bridge(struct hh_cache *hh, struct sk_buff *skb)
}
#endif
-static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb)
+static inline int neigh_hh_output(struct hh_cache *hh, struct sk_buff *skb)
{
unsigned int hh_alen = 0;
unsigned int seq;
@@ -502,7 +502,7 @@ static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb
static inline int neigh_output(struct neighbour *n, struct sk_buff *skb,
bool skip_cache)
{
- const struct hh_cache *hh = &n->hh;
+ struct hh_cache *hh = &n->hh;
if ((n->nud_state & NUD_CONNECTED) && hh->hh_len && !skip_cache)
return neigh_hh_output(hh, skb);
@@ -543,7 +543,7 @@ struct neighbour_cb {
#define NEIGH_CB(skb) ((struct neighbour_cb *)(skb)->cb)
-static inline void neigh_ha_snapshot(char *dst, const struct neighbour *n,
+static inline void neigh_ha_snapshot(char *dst, struct neighbour *n,
const struct net_device *dev)
{
unsigned int seq;
diff --git a/include/net/net_seq_lock.h b/include/net/net_seq_lock.h
new file mode 100644
index 000000000000..67710bace741
--- /dev/null
+++ b/include/net/net_seq_lock.h
@@ -0,0 +1,15 @@
+#ifndef __NET_NET_SEQ_LOCK_H__
+#define __NET_NET_SEQ_LOCK_H__
+
+#ifdef CONFIG_PREEMPT_RT
+# define net_seqlock_t seqlock_t
+# define net_seq_begin(__r) read_seqbegin(__r)
+# define net_seq_retry(__r, __s) read_seqretry(__r, __s)
+
+#else
+# define net_seqlock_t seqcount_t
+# define net_seq_begin(__r) read_seqcount_begin(__r)
+# define net_seq_retry(__r, __s) read_seqcount_retry(__r, __s)
+#endif
+
+#endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3d03756e1069..e6afb4b9cede 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -10,6 +10,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -100,7 +101,7 @@ struct Qdisc {
struct sk_buff_head gso_skb ____cacheline_aligned_in_smp;
struct qdisc_skb_head q;
struct gnet_stats_basic_packed bstats;
- seqcount_t running;
+ net_seqlock_t running;
struct gnet_stats_queue qstats;
unsigned long state;
struct Qdisc *next_sched;
@@ -138,7 +139,11 @@ static inline bool qdisc_is_running(struct Qdisc *qdisc)
{
if (qdisc->flags & TCQ_F_NOLOCK)
return spin_is_locked(&qdisc->seqlock);
+#ifdef CONFIG_PREEMPT_RT
+ return spin_is_locked(&qdisc->running.lock) ? true : false;
+#else
return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
+#endif
}
static inline bool qdisc_is_percpu_stats(const struct Qdisc *q)
@@ -162,17 +167,27 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
} else if (qdisc_is_running(qdisc)) {
return false;
}
+#ifdef CONFIG_PREEMPT_RT
+ if (try_write_seqlock(&qdisc->running))
+ return true;
+ return false;
+#else
/* Variant of write_seqcount_begin() telling lockdep a trylock
* was attempted.
*/
raw_write_seqcount_begin(&qdisc->running);
seqcount_acquire(&qdisc->running.dep_map, 0, 1, _RET_IP_);
return true;
+#endif
}
static inline void qdisc_run_end(struct Qdisc *qdisc)
{
+#ifdef CONFIG_PREEMPT_RT
+ write_sequnlock(&qdisc->running);
+#else
write_seqcount_end(&qdisc->running);
+#endif
if (qdisc->flags & TCQ_F_NOLOCK)
spin_unlock(&qdisc->seqlock);
}
@@ -542,7 +557,7 @@ static inline spinlock_t *qdisc_root_sleeping_lock(const struct Qdisc *qdisc)
return qdisc_lock(root);
}
-static inline seqcount_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc)
+static inline net_seqlock_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc)
{
struct Qdisc *root = qdisc_root_sleeping(qdisc);
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index d89969aa9942..095be1d66f31 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -215,7 +215,7 @@ bool xen_running_on_version_or_later(unsigned int major, unsigned int minor);
void xen_efi_runtime_setup(void);
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
static inline void xen_preemptible_hcall_begin(void)
{
@@ -239,6 +239,6 @@ static inline void xen_preemptible_hcall_end(void)
__this_cpu_write(xen_in_preemptible_hcall, false);
}
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
#endif /* INCLUDE_XEN_OPS_H */
diff --git a/init/Kconfig b/init/Kconfig
index 6db3e310a5e4..86f3a8f467ae 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -888,6 +888,7 @@ config CFS_BANDWIDTH
config RT_GROUP_SCHED
bool "Group scheduling for SCHED_RR/FIFO"
depends on CGROUP_SCHED
+ depends on !PREEMPT_RT
default n
help
This feature lets you explicitly allocate real CPU bandwidth
@@ -1767,6 +1768,7 @@ choice
config SLAB
bool "SLAB"
+ depends on !PREEMPT_RT
select HAVE_HARDENED_USERCOPY_ALLOCATOR
help
The regular slab allocator that is established and known to work
@@ -1787,6 +1789,7 @@ config SLUB
config SLOB
depends on EXPERT
bool "SLOB (Simple Allocator)"
+ depends on !PREEMPT_RT
help
SLOB replaces the stock allocator with a drastically simpler
allocator. SLOB is generally more space efficient but
@@ -1852,7 +1855,7 @@ config SHUFFLE_PAGE_ALLOCATOR
config SLUB_CPU_PARTIAL
default y
- depends on SLUB && SMP
+ depends on SLUB && SMP && !PREEMPT_RT
bool "SLUB per cpu partial cache"
help
Per cpu partial caches accelerate objects allocation and freeing
diff --git a/init/init_task.c b/init/init_task.c
index bd403ed3e418..eba8446e3865 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -74,6 +74,10 @@ struct task_struct init_task
.cpus_ptr = &init_task.cpus_mask,
.cpus_mask = CPU_MASK_ALL,
.nr_cpus_allowed= NR_CPUS,
+#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) && \
+ defined(CONFIG_SCHED_DEBUG)
+ .pinned_on_cpu = -1,
+#endif
.mm = NULL,
.active_mm = &init_mm,
.restart_block = {
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index e0852dc333ac..3de8fd11873b 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -101,7 +101,7 @@ config UNINLINE_SPIN_UNLOCK
# unlock and unlock_irq functions are inlined when:
# - DEBUG_SPINLOCK=n and ARCH_INLINE_*LOCK=y
# or
-# - DEBUG_SPINLOCK=n and PREEMPT=n
+# - DEBUG_SPINLOCK=n and PREEMPTION=n
#
# unlock_bh and unlock_irqrestore functions are inlined when:
# - DEBUG_SPINLOCK=n and ARCH_INLINE_*LOCK=y
@@ -139,7 +139,7 @@ config INLINE_SPIN_UNLOCK_BH
config INLINE_SPIN_UNLOCK_IRQ
def_bool y
- depends on !PREEMPT || ARCH_INLINE_SPIN_UNLOCK_IRQ
+ depends on !PREEMPTION || ARCH_INLINE_SPIN_UNLOCK_IRQ
config INLINE_SPIN_UNLOCK_IRQRESTORE
def_bool y
@@ -168,7 +168,7 @@ config INLINE_READ_LOCK_IRQSAVE
config INLINE_READ_UNLOCK
def_bool y
- depends on !PREEMPT || ARCH_INLINE_READ_UNLOCK
+ depends on !PREEMPTION || ARCH_INLINE_READ_UNLOCK
config INLINE_READ_UNLOCK_BH
def_bool y
@@ -176,7 +176,7 @@ config INLINE_READ_UNLOCK_BH
config INLINE_READ_UNLOCK_IRQ
def_bool y
- depends on !PREEMPT || ARCH_INLINE_READ_UNLOCK_IRQ
+ depends on !PREEMPTION || ARCH_INLINE_READ_UNLOCK_IRQ
config INLINE_READ_UNLOCK_IRQRESTORE
def_bool y
@@ -205,7 +205,7 @@ config INLINE_WRITE_LOCK_IRQSAVE
config INLINE_WRITE_UNLOCK
def_bool y
- depends on !PREEMPT || ARCH_INLINE_WRITE_UNLOCK
+ depends on !PREEMPTION || ARCH_INLINE_WRITE_UNLOCK
config INLINE_WRITE_UNLOCK_BH
def_bool y
@@ -213,7 +213,7 @@ config INLINE_WRITE_UNLOCK_BH
config INLINE_WRITE_UNLOCK_IRQ
def_bool y
- depends on !PREEMPT || ARCH_INLINE_WRITE_UNLOCK_IRQ
+ depends on !PREEMPTION || ARCH_INLINE_WRITE_UNLOCK_IRQ
config INLINE_WRITE_UNLOCK_IRQRESTORE
def_bool y
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index deff97217496..7871ec3c9a02 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -1,5 +1,11 @@
# SPDX-License-Identifier: GPL-2.0-only
+config HAVE_PREEMPT_LAZY
+ bool
+
+config PREEMPT_LAZY
+ def_bool y if HAVE_PREEMPT_LAZY && PREEMPT_RT
+
choice
prompt "Preemption Model"
default PREEMPT_NONE
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 728ffec52cf3..18db26d5b77b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -17,9 +17,62 @@
(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \
BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
+/*
+ * The bucket lock has two protection scopes:
+ *
+ * 1) Serializing concurrent operations from BPF programs on differrent
+ * CPUs
+ *
+ * 2) Serializing concurrent operations from BPF programs and sys_bpf()
+ *
+ * BPF programs can execute in any context including perf, kprobes and
+ * tracing. As there are almost no limits where perf, kprobes and tracing
+ * can be invoked from the lock operations need to be protected against
+ * deadlocks. Deadlocks can be caused by recursion and by an invocation in
+ * the lock held section when functions which acquire this lock are invoked
+ * from sys_bpf(). BPF recursion is prevented by incrementing the per CPU
+ * variable bpf_prog_active, which prevents BPF programs attached to perf
+ * events, kprobes and tracing to be invoked before the prior invocation
+ * from one of these contexts completed. sys_bpf() uses the same mechanism
+ * by pinning the task to the current CPU and incrementing the recursion
+ * protection accross the map operation.
+ *
+ * This has subtle implications on PREEMPT_RT. PREEMPT_RT forbids certain
+ * operations like memory allocations (even with GFP_ATOMIC) from atomic
+ * contexts. This is required because even with GFP_ATOMIC the memory
+ * allocator calls into code pathes which acquire locks with long held lock
+ * sections. To ensure the deterministic behaviour these locks are regular
+ * spinlocks, which are converted to 'sleepable' spinlocks on RT. The only
+ * true atomic contexts on an RT kernel are the low level hardware
+ * handling, scheduling, low level interrupt handling, NMIs etc. None of
+ * these contexts should ever do memory allocations.
+ *
+ * As regular device interrupt handlers and soft interrupts are forced into
+ * thread context, the existing code which does
+ * spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
+ * just works.
+ *
+ * In theory the BPF locks could be converted to regular spinlocks as well,
+ * but the bucket locks and percpu_freelist locks can be taken from
+ * arbitrary contexts (perf, kprobes, tracepoints) which are required to be
+ * atomic contexts even on RT. These mechanisms require preallocated maps,
+ * so there is no need to invoke memory allocations within the lock held
+ * sections.
+ *
+ * BPF maps which need dynamic allocation are only used from (forced)
+ * thread context on RT and can therefore use regular spinlocks which in
+ * turn allows to invoke memory allocations from the lock held section.
+ *
+ * On a non RT kernel this distinction is neither possible nor required.
+ * spinlock maps to raw_spinlock and the extra code is optimized out by the
+ * compiler.
+ */
struct bucket {
struct hlist_nulls_head head;
- raw_spinlock_t lock;
+ union {
+ raw_spinlock_t raw_lock;
+ spinlock_t lock;
+ };
};
struct bpf_htab {
@@ -57,6 +110,51 @@ struct htab_elem {
char key[0] __aligned(8);
};
+static inline bool htab_is_prealloc(const struct bpf_htab *htab)
+{
+ return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
+}
+
+static inline bool htab_use_raw_lock(const struct bpf_htab *htab)
+{
+ return (!IS_ENABLED(CONFIG_PREEMPT_RT) || htab_is_prealloc(htab));
+}
+
+static void htab_init_buckets(struct bpf_htab *htab)
+{
+ unsigned i;
+
+ for (i = 0; i < htab->n_buckets; i++) {
+ INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
+ if (htab_use_raw_lock(htab))
+ raw_spin_lock_init(&htab->buckets[i].raw_lock);
+ else
+ spin_lock_init(&htab->buckets[i].lock);
+ }
+}
+
+static inline unsigned long htab_lock_bucket(const struct bpf_htab *htab,
+ struct bucket *b)
+{
+ unsigned long flags;
+
+ if (htab_use_raw_lock(htab))
+ raw_spin_lock_irqsave(&b->raw_lock, flags);
+ else
+ spin_lock_irqsave(&b->lock, flags);
+ return flags;
+}
+
+static inline void htab_unlock_bucket(const struct bpf_htab *htab,
+ struct bucket *b,
+ unsigned long flags)
+{
+ if (htab_use_raw_lock(htab))
+ raw_spin_unlock_irqrestore(&b->raw_lock, flags);
+ else
+ spin_unlock_irqrestore(&b->lock, flags);
+}
+
static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
static bool htab_is_lru(const struct bpf_htab *htab)
@@ -71,11 +169,6 @@ static bool htab_is_percpu(const struct bpf_htab *htab)
htab->map.map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH;
}
-static bool htab_is_prealloc(const struct bpf_htab *htab)
-{
- return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
-}
-
static inline void htab_elem_set_ptr(struct htab_elem *l, u32 key_size,
void __percpu *pptr)
{
@@ -306,8 +399,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
struct bpf_htab *htab;
- int err, i;
u64 cost;
+ int err;
htab = kzalloc(sizeof(*htab), GFP_USER);
if (!htab)
@@ -369,10 +462,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
else
htab->hashrnd = get_random_int();
- for (i = 0; i < htab->n_buckets; i++) {
- INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
- raw_spin_lock_init(&htab->buckets[i].lock);
- }
+ htab_init_buckets(htab);
if (prealloc) {
err = prealloc_init(htab);
@@ -580,7 +670,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
b = __select_bucket(htab, tgt_l->hash);
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
if (l == tgt_l) {
@@ -588,7 +678,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
break;
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return l == tgt_l;
}
@@ -860,8 +950,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
*/
}
- /* bpf_map_update_elem() can be called in_irq() */
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -902,7 +991,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
}
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return ret;
}
@@ -940,8 +1029,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
return -ENOMEM;
memcpy(l_new->key + round_up(map->key_size, 8), value, map->value_size);
- /* bpf_map_update_elem() can be called in_irq() */
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -960,7 +1048,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
if (ret)
bpf_lru_push_free(&htab->lru, &l_new->lru_node);
@@ -995,8 +1083,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
b = __select_bucket(htab, hash);
head = &b->head;
- /* bpf_map_update_elem() can be called in_irq() */
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1019,7 +1106,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
}
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return ret;
}
@@ -1059,8 +1146,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
return -ENOMEM;
}
- /* bpf_map_update_elem() can be called in_irq() */
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1082,7 +1168,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
}
ret = 0;
err:
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
if (l_new)
bpf_lru_push_free(&htab->lru, &l_new->lru_node);
return ret;
@@ -1120,7 +1206,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
b = __select_bucket(htab, hash);
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l = lookup_elem_raw(head, hash, key, key_size);
@@ -1130,7 +1216,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
ret = 0;
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
return ret;
}
@@ -1152,7 +1238,7 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
b = __select_bucket(htab, hash);
head = &b->head;
- raw_spin_lock_irqsave(&b->lock, flags);
+ flags = htab_lock_bucket(htab, b);
l = lookup_elem_raw(head, hash, key, key_size);
@@ -1161,7 +1247,7 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
ret = 0;
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ htab_unlock_bucket(htab, b, flags);
if (l)
bpf_lru_push_free(&htab->lru, &l->lru_node);
return ret;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 56e6c75d354d..3b3c420bc8ed 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -34,7 +34,7 @@ struct lpm_trie {
size_t n_entries;
size_t max_prefixlen;
size_t data_size;
- raw_spinlock_t lock;
+ spinlock_t lock;
};
/* This trie implements a longest prefix match algorithm that can be used to
@@ -315,7 +315,7 @@ static int trie_update_elem(struct bpf_map *map,
if (key->prefixlen > trie->max_prefixlen)
return -EINVAL;
- raw_spin_lock_irqsave(&trie->lock, irq_flags);
+ spin_lock_irqsave(&trie->lock, irq_flags);
/* Allocate and fill a new node */
@@ -422,7 +422,7 @@ static int trie_update_elem(struct bpf_map *map,
kfree(im_node);
}
- raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+ spin_unlock_irqrestore(&trie->lock, irq_flags);
return ret;
}
@@ -442,7 +442,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
if (key->prefixlen > trie->max_prefixlen)
return -EINVAL;
- raw_spin_lock_irqsave(&trie->lock, irq_flags);
+ spin_lock_irqsave(&trie->lock, irq_flags);
/* Walk the tree looking for an exact key/length match and keeping
* track of the path we traverse. We will need to know the node
@@ -518,7 +518,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
kfree_rcu(node, rcu);
out:
- raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+ spin_unlock_irqrestore(&trie->lock, irq_flags);
return ret;
}
@@ -575,7 +575,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
if (ret)
goto out_err;
- raw_spin_lock_init(&trie->lock);
+ spin_lock_init(&trie->lock);
return &trie->map;
out_err:
diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 6e090140b924..b367430e611c 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -25,12 +25,18 @@ void pcpu_freelist_destroy(struct pcpu_freelist *s)
free_percpu(s->freelist);
}
+static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head,
+ struct pcpu_freelist_node *node)
+{
+ node->next = head->first;
+ head->first = node;
+}
+
static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
struct pcpu_freelist_node *node)
{
raw_spin_lock(&head->lock);
- node->next = head->first;
- head->first = node;
+ pcpu_freelist_push_node(head, node);
raw_spin_unlock(&head->lock);
}
@@ -56,21 +62,16 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
u32 nr_elems)
{
struct pcpu_freelist_head *head;
- unsigned long flags;
int i, cpu, pcpu_entries;
pcpu_entries = nr_elems / num_possible_cpus() + 1;
i = 0;
- /* disable irq to workaround lockdep false positive
- * in bpf usage pcpu_freelist_populate() will never race
- * with pcpu_freelist_push()
- */
- local_irq_save(flags);
for_each_possible_cpu(cpu) {
again:
head = per_cpu_ptr(s->freelist, cpu);
- ___pcpu_freelist_push(head, buf);
+ /* No locking required as this is not visible yet. */
+ pcpu_freelist_push_node(head, buf);
i++;
buf += elem_size;
if (i == nr_elems)
@@ -78,7 +79,6 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
if (i % pcpu_entries)
goto again;
}
- local_irq_restore(flags);
}
struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 173e983619d7..e753900ff137 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -40,6 +40,9 @@ static void do_up_read(struct irq_work *entry)
{
struct stack_map_irq_work *work;
+ if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT)))
+ return;
+
work = container_of(entry, struct stack_map_irq_work, irq_work);
up_read_non_owner(work->sem);
work->sem = NULL;
@@ -288,10 +291,18 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
struct stack_map_irq_work *work = NULL;
if (irqs_disabled()) {
- work = this_cpu_ptr(&up_read_work);
- if (work->irq_work.flags & IRQ_WORK_BUSY)
- /* cannot queue more up_read, fallback */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ work = this_cpu_ptr(&up_read_work);
+ if (work->irq_work.flags & IRQ_WORK_BUSY)
+ /* cannot queue more up_read, fallback */
+ irq_work_busy = true;
+ } else {
+ /*
+ * PREEMPT_RT does not allow to trylock mmap sem in
+ * interrupt disabled context. Force the fallback code.
+ */
irq_work_busy = true;
+ }
}
/*
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bf03d04a9e2f..9b91fbb93882 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -794,8 +794,7 @@ static int map_lookup_elem(union bpf_attr *attr)
goto done;
}
- preempt_disable();
- this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
err = bpf_percpu_hash_copy(map, key, value);
@@ -836,8 +835,7 @@ static int map_lookup_elem(union bpf_attr *attr)
}
rcu_read_unlock();
}
- this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ bpf_enable_instrumentation();
done:
if (err)
@@ -934,11 +932,7 @@ static int map_update_elem(union bpf_attr *attr)
goto out;
}
- /* must increment bpf_prog_active to avoid kprobe+bpf triggering from
- * inside bpf map update or delete otherwise deadlocks are possible
- */
- preempt_disable();
- __this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
err = bpf_percpu_hash_update(map, key, value, attr->flags);
@@ -969,8 +963,7 @@ static int map_update_elem(union bpf_attr *attr)
err = map->ops->map_update_elem(map, key, value, attr->flags);
rcu_read_unlock();
}
- __this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ bpf_enable_instrumentation();
maybe_wait_bpf_programs(map);
out:
free_value:
@@ -1016,13 +1009,11 @@ static int map_delete_elem(union bpf_attr *attr)
goto out;
}
- preempt_disable();
- __this_cpu_inc(bpf_prog_active);
+ bpf_disable_instrumentation();
rcu_read_lock();
err = map->ops->map_delete_elem(map, key);
rcu_read_unlock();
- __this_cpu_dec(bpf_prog_active);
- preempt_enable();
+ bpf_enable_instrumentation();
maybe_wait_bpf_programs(map);
out:
kfree(key);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 507474f79195..d885b9835ec3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7953,26 +7953,48 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
}
}
+static bool is_preallocated_map(struct bpf_map *map)
+{
+ if (!check_map_prealloc(map))
+ return false;
+ if (map->inner_map_meta && !check_map_prealloc(map->inner_map_meta))
+ return false;
+ return true;
+}
+
static int check_map_prog_compatibility(struct bpf_verifier_env *env,
struct bpf_map *map,
struct bpf_prog *prog)
{
- /* Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use
- * preallocated hash maps, since doing memory allocation
- * in overflow_handler can crash depending on where nmi got
- * triggered.
+ /*
+ * Validate that trace type programs use preallocated hash maps.
+ *
+ * For programs attached to PERF events this is mandatory as the
+ * perf NMI can hit any arbitrary code sequence.
+ *
+ * All other trace types using preallocated hash maps are unsafe as
+ * well because tracepoint or kprobes can be inside locked regions
+ * of the memory allocator or at a place where a recursion into the
+ * memory allocator would see inconsistent state.
+ *
+ * On RT enabled kernels run-time allocation of all trace type
+ * programs is strictly prohibited due to lock type constraints. On
+ * !RT kernels it is allowed for backwards compatibility reasons for
+ * now, but warnings are emitted so developers are made aware of
+ * the unsafety and can fix their programs before this is enforced.
*/
- if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
- if (!check_map_prealloc(map)) {
+ if (is_tracing_prog_type(prog->type) && !is_preallocated_map(map)) {
+ if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
verbose(env, "perf_event programs can only use preallocated hash map\n");
return -EINVAL;
}
- if (map->inner_map_meta &&
- !check_map_prealloc(map->inner_map_meta)) {
- verbose(env, "perf_event programs can only use preallocated inner hash map\n");
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ verbose(env, "trace type programs can only use preallocated hash map\n");
return -EINVAL;
}
+ WARN_ONCE(1, "trace type BPF program uses run-time allocation\n");
+ verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n");
}
if ((is_tracing_prog_type(prog->type) ||
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 35faf082a709..806fc9d46ddd 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1957,7 +1957,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
cgrp->dom_cgrp = cgrp;
cgrp->max_descendants = INT_MAX;
cgrp->max_depth = INT_MAX;
- INIT_LIST_HEAD(&cgrp->rstat_css_list);
prev_cputime_init(&cgrp->prev_cputime);
for_each_subsys(ss, ssid)
@@ -5027,12 +5026,6 @@ static void css_release_work_fn(struct work_struct *work)
list_del_rcu(&css->sibling);
if (ss) {
- /* css release path */
- if (!list_empty(&css->rstat_css_node)) {
- cgroup_rstat_flush(cgrp);
- list_del_rcu(&css->rstat_css_node);
- }
-
cgroup_idr_replace(&ss->css_idr, NULL, css->id);
if (ss->css_released)
ss->css_released(css);
@@ -5094,7 +5087,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
css->id = -1;
INIT_LIST_HEAD(&css->sibling);
INIT_LIST_HEAD(&css->children);
- INIT_LIST_HEAD(&css->rstat_css_node);
css->serial_nr = css_serial_nr_next++;
atomic_set(&css->online_cnt, 0);
@@ -5103,9 +5095,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
css_get(css->parent);
}
- if (cgroup_on_dfl(cgrp) && ss->css_rstat_flush)
- list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list);
-
BUG_ON(cgroup_css(cgrp, ss));
}
@@ -5207,7 +5196,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
err_list_del:
list_del_rcu(&css->sibling);
err_free_css:
- list_del_rcu(&css->rstat_css_node);
INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
return ERR_PTR(err);
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c87ee6412b36..680034721002 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -345,7 +345,7 @@ void cpuset_read_unlock(void)
percpu_up_read(&cpuset_rwsem);
}
-static DEFINE_SPINLOCK(callback_lock);
+static DEFINE_RAW_SPINLOCK(callback_lock);
static struct workqueue_struct *cpuset_migrate_mm_wq;
@@ -1255,7 +1255,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd,
* Newly added CPUs will be removed from effective_cpus and
* newly deleted ones will be added back to effective_cpus.
*/
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (adding) {
cpumask_or(parent->subparts_cpus,
parent->subparts_cpus, tmp->addmask);
@@ -1274,7 +1274,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd,
}
parent->nr_subparts_cpus = cpumask_weight(parent->subparts_cpus);
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
return cmd == partcmd_update;
}
@@ -1379,7 +1379,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
continue;
rcu_read_unlock();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cp->effective_cpus, tmp->new_cpus);
if (cp->nr_subparts_cpus &&
@@ -1410,7 +1410,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
= cpumask_weight(cp->subparts_cpus);
}
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
WARN_ON(!is_in_v2_mode() &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
@@ -1528,7 +1528,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
return -EINVAL;
}
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
/*
@@ -1539,7 +1539,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
cs->cpus_allowed);
cs->nr_subparts_cpus = cpumask_weight(cs->subparts_cpus);
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
update_cpumasks_hier(cs, &tmp);
@@ -1733,9 +1733,9 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
continue;
rcu_read_unlock();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cp->effective_mems = *new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
WARN_ON(!is_in_v2_mode() &&
!nodes_equal(cp->mems_allowed, cp->effective_mems));
@@ -1803,9 +1803,9 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
if (retval < 0)
goto done;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cs->mems_allowed = trialcs->mems_allowed;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
/* use trialcs->mems_allowed as a temp variable */
update_nodemasks_hier(cs, &trialcs->mems_allowed);
@@ -1896,9 +1896,9 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
|| (is_spread_page(cs) != is_spread_page(trialcs)));
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cs->flags = trialcs->flags;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed)
rebuild_sched_domains_locked();
@@ -2407,7 +2407,7 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
cpuset_filetype_t type = seq_cft(sf)->private;
int ret = 0;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
switch (type) {
case FILE_CPULIST:
@@ -2429,7 +2429,7 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
ret = -EINVAL;
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
return ret;
}
@@ -2742,14 +2742,14 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
cpuset_inc();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (is_in_v2_mode()) {
cpumask_copy(cs->effective_cpus, parent->effective_cpus);
cs->effective_mems = parent->effective_mems;
cs->use_parent_ecpus = true;
parent->child_ecpus_count++;
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
goto out_unlock;
@@ -2776,12 +2776,12 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
}
rcu_read_unlock();
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cs->mems_allowed = parent->mems_allowed;
cs->effective_mems = parent->mems_allowed;
cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
out_unlock:
percpu_up_write(&cpuset_rwsem);
put_online_cpus();
@@ -2837,7 +2837,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
static void cpuset_bind(struct cgroup_subsys_state *root_css)
{
percpu_down_write(&cpuset_rwsem);
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (is_in_v2_mode()) {
cpumask_copy(top_cpuset.cpus_allowed, cpu_possible_mask);
@@ -2848,7 +2848,7 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css)
top_cpuset.mems_allowed = top_cpuset.effective_mems;
}
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
percpu_up_write(&cpuset_rwsem);
}
@@ -2945,12 +2945,12 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
{
bool is_empty;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cs->cpus_allowed, new_cpus);
cpumask_copy(cs->effective_cpus, new_cpus);
cs->mems_allowed = *new_mems;
cs->effective_mems = *new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
/*
* Don't call update_tasks_cpumask() if the cpuset becomes empty,
@@ -2987,10 +2987,10 @@ hotplug_update_tasks(struct cpuset *cs,
if (nodes_empty(*new_mems))
*new_mems = parent_cs(cs)->effective_mems;
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
cpumask_copy(cs->effective_cpus, new_cpus);
cs->effective_mems = *new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
if (cpus_updated)
update_tasks_cpumask(cs);
@@ -3145,7 +3145,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
/* synchronize cpus_allowed to cpu_active_mask */
if (cpus_updated) {
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (!on_dfl)
cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
/*
@@ -3165,17 +3165,17 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
}
}
cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
/* we don't mess with cpumasks of tasks in top_cpuset */
}
/* synchronize mems_allowed to N_MEMORY */
if (mems_updated) {
- spin_lock_irq(&callback_lock);
+ raw_spin_lock_irq(&callback_lock);
if (!on_dfl)
top_cpuset.mems_allowed = new_mems;
top_cpuset.effective_mems = new_mems;
- spin_unlock_irq(&callback_lock);
+ raw_spin_unlock_irq(&callback_lock);
update_tasks_nodemask(&top_cpuset);
}
@@ -3276,11 +3276,11 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
{
unsigned long flags;
- spin_lock_irqsave(&callback_lock, flags);
+ raw_spin_lock_irqsave(&callback_lock, flags);
rcu_read_lock();
guarantee_online_cpus(task_cs(tsk), pmask);
rcu_read_unlock();
- spin_unlock_irqrestore(&callback_lock, flags);
+ raw_spin_unlock_irqrestore(&callback_lock, flags);
}
/**
@@ -3341,11 +3341,11 @@ nodemask_t cpuset_mems_allowed(struct task_struct *tsk)
nodemask_t mask;
unsigned long flags;
- spin_lock_irqsave(&callback_lock, flags);
+ raw_spin_lock_irqsave(&callback_lock, flags);
rcu_read_lock();
guarantee_online_mems(task_cs(tsk), &mask);
rcu_read_unlock();
- spin_unlock_irqrestore(&callback_lock, flags);
+ raw_spin_unlock_irqrestore(&callback_lock, flags);
return mask;
}
@@ -3437,14 +3437,14 @@ bool __cpuset_node_allowed(int node, gfp_t gfp_mask)
return true;
/* Not hardwall and node outside mems_allowed: scan up cpusets */
- spin_lock_irqsave(&callback_lock, flags);
+ raw_spin_lock_irqsave(&callback_lock, flags);
rcu_read_lock();
cs = nearest_hardwall_ancestor(task_cs(current));
allowed = node_isset(node, cs->mems_allowed);
rcu_read_unlock();
- spin_unlock_irqrestore(&callback_lock, flags);
+ raw_spin_unlock_irqrestore(&callback_lock, flags);
return allowed;
}
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 4a942d4e9763..49315054a1bc 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -139,7 +139,7 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(struct cgroup *pos,
}
/* see cgroup_rstat_flush() */
-static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
+static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
__releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
{
int cpu;
@@ -151,27 +151,17 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
cpu);
struct cgroup *pos = NULL;
- raw_spin_lock(cpu_lock);
- while ((pos = cgroup_rstat_cpu_pop_updated(pos, cgrp, cpu))) {
- struct cgroup_subsys_state *css;
-
+ raw_spin_lock_irq(cpu_lock);
+ while ((pos = cgroup_rstat_cpu_pop_updated(pos, cgrp, cpu)))
cgroup_base_stat_flush(pos, cpu);
- rcu_read_lock();
- list_for_each_entry_rcu(css, &pos->rstat_css_list,
- rstat_css_node)
- css->ss->css_rstat_flush(css, cpu);
- rcu_read_unlock();
- }
- raw_spin_unlock(cpu_lock);
+ raw_spin_unlock_irq(cpu_lock);
- /* if @may_sleep, play nice and yield if necessary */
- if (may_sleep && (need_resched() ||
- spin_needbreak(&cgroup_rstat_lock))) {
- spin_unlock_irq(&cgroup_rstat_lock);
+ if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
+ spin_unlock(&cgroup_rstat_lock);
if (!cond_resched())
cpu_relax();
- spin_lock_irq(&cgroup_rstat_lock);
+ spin_lock(&cgroup_rstat_lock);
}
}
}
@@ -193,24 +183,9 @@ void cgroup_rstat_flush(struct cgroup *cgrp)
{
might_sleep();
- spin_lock_irq(&cgroup_rstat_lock);
- cgroup_rstat_flush_locked(cgrp, true);
- spin_unlock_irq(&cgroup_rstat_lock);
-}
-
-/**
- * cgroup_rstat_flush_irqsafe - irqsafe version of cgroup_rstat_flush()
- * @cgrp: target cgroup
- *
- * This function can be called from any context.
- */
-void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&cgroup_rstat_lock, flags);
- cgroup_rstat_flush_locked(cgrp, false);
- spin_unlock_irqrestore(&cgroup_rstat_lock, flags);
+ spin_lock(&cgroup_rstat_lock);
+ cgroup_rstat_flush_locked(cgrp);
+ spin_unlock(&cgroup_rstat_lock);
}
/**
@@ -222,21 +197,21 @@ void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp)
*
* This function may block.
*/
-void cgroup_rstat_flush_hold(struct cgroup *cgrp)
+static void cgroup_rstat_flush_hold(struct cgroup *cgrp)
__acquires(&cgroup_rstat_lock)
{
might_sleep();
- spin_lock_irq(&cgroup_rstat_lock);
- cgroup_rstat_flush_locked(cgrp, true);
+ spin_lock(&cgroup_rstat_lock);
+ cgroup_rstat_flush_locked(cgrp);
}
/**
* cgroup_rstat_flush_release - release cgroup_rstat_flush_hold()
*/
-void cgroup_rstat_flush_release(void)
+static void cgroup_rstat_flush_release(void)
__releases(&cgroup_rstat_lock)
{
- spin_unlock_irq(&cgroup_rstat_lock);
+ spin_unlock(&cgroup_rstat_lock);
}
int cgroup_rstat_init(struct cgroup *cgrp)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 7527825ac7da..2b6aec3623c1 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -332,12 +332,12 @@ void lockdep_assert_cpus_held(void)
static void lockdep_acquire_cpus_lock(void)
{
- rwsem_acquire(&cpu_hotplug_lock.rw_sem.dep_map, 0, 0, _THIS_IP_);
+ rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_);
}
static void lockdep_release_cpus_lock(void)
{
- rwsem_release(&cpu_hotplug_lock.rw_sem.dep_map, 1, _THIS_IP_);
+ rwsem_release(&cpu_hotplug_lock.dep_map, 1, _THIS_IP_);
}
/*
@@ -864,6 +864,15 @@ static int take_cpu_down(void *_param)
int err, cpu = smp_processor_id();
int ret;
+#ifdef CONFIG_PREEMPT_RT
+ /*
+ * If any tasks disabled migration before we got here,
+ * go back and sleep again.
+ */
+ if (cpu_nr_pinned(cpu))
+ return -EAGAIN;
+#endif
+
/* Ensure this CPU doesn't handle any more interrupts. */
err = __cpu_disable();
if (err < 0)
@@ -893,6 +902,10 @@ static int take_cpu_down(void *_param)
return 0;
}
+#ifdef CONFIG_PREEMPT_RT
+struct task_struct *takedown_cpu_task;
+#endif
+
static int takedown_cpu(unsigned int cpu)
{
struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
@@ -907,11 +920,39 @@ static int takedown_cpu(unsigned int cpu)
*/
irq_lock_sparse();
+#ifdef CONFIG_PREEMPT_RT
+ WARN_ON_ONCE(takedown_cpu_task);
+ takedown_cpu_task = current;
+
+again:
+ /*
+ * If a task pins this CPU after we pass this check, take_cpu_down
+ * will return -EAGAIN.
+ */
+ for (;;) {
+ int nr_pinned;
+
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ nr_pinned = cpu_nr_pinned(cpu);
+ if (nr_pinned == 0)
+ break;
+ schedule();
+ }
+ set_current_state(TASK_RUNNING);
+#endif
+
/*
* So now all preempt/rcu users must observe !cpu_active().
*/
err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
+#ifdef CONFIG_PREEMPT_RT
+ if (err == -EAGAIN)
+ goto again;
+#endif
if (err) {
+#ifdef CONFIG_PREEMPT_RT
+ takedown_cpu_task = NULL;
+#endif
/* CPU refused to die */
irq_unlock_sparse();
/* Unpark the hotplug thread so we can rollback there */
@@ -930,6 +971,9 @@ static int takedown_cpu(unsigned int cpu)
wait_for_ap_thread(st, false);
BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
+#ifdef CONFIG_PREEMPT_RT
+ takedown_cpu_task = NULL;
+#endif
/* Interrupts are moved away from the dying cpu, reenable alloc/free */
irq_unlock_sparse();
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 09e1cc22221f..31f831012d21 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8990,7 +8990,6 @@ static void bpf_overflow_handler(struct perf_event *event,
int ret = 0;
ctx.regs = perf_arch_bpf_user_pt_regs(regs);
- preempt_disable();
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
goto out;
rcu_read_lock();
@@ -8998,7 +8997,6 @@ static void bpf_overflow_handler(struct perf_event *event,
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
- preempt_enable();
if (!ret)
return;
@@ -10288,7 +10286,7 @@ static struct pmu *perf_init_event(struct perf_event *event)
goto unlock;
}
- list_for_each_entry_rcu(pmu, &pmus, entry) {
+ list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
ret = perf_try_init_event(pmu, event);
if (!ret)
goto unlock;
diff --git a/kernel/exit.c b/kernel/exit.c
index fa46977b9c07..d981bb1b94c8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -161,7 +161,7 @@ static void __exit_signal(struct task_struct *tsk)
* Do this under ->siglock, we can race with another thread
* doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
*/
- flush_sigqueue(&tsk->pending);
+ flush_task_sigqueue(tsk);
tsk->sighand = NULL;
spin_unlock(&sighand->siglock);
@@ -258,6 +258,7 @@ void rcuwait_wake_up(struct rcuwait *w)
wake_up_process(task);
rcu_read_unlock();
}
+EXPORT_SYMBOL_GPL(rcuwait_wake_up);
/*
* Determine if a process group is "orphaned", according to the POSIX
diff --git a/kernel/fork.c b/kernel/fork.c
index e3d5963d8c6f..4d26fb72f29f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -43,6 +43,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -289,7 +290,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
return;
}
- vfree_atomic(tsk->stack);
+ vfree(tsk->stack);
return;
}
#endif
@@ -696,6 +697,19 @@ void __mmdrop(struct mm_struct *mm)
}
EXPORT_SYMBOL_GPL(__mmdrop);
+#ifdef CONFIG_PREEMPT_RT
+/*
+ * RCU callback for delayed mm drop. Not strictly rcu, but we don't
+ * want another facility to make this work.
+ */
+void __mmdrop_delayed(struct rcu_head *rhp)
+{
+ struct mm_struct *mm = container_of(rhp, struct mm_struct, delayed_drop);
+
+ __mmdrop(mm);
+}
+#endif
+
static void mmdrop_async_fn(struct work_struct *work)
{
struct mm_struct *mm;
@@ -737,6 +751,15 @@ void __put_task_struct(struct task_struct *tsk)
WARN_ON(refcount_read(&tsk->usage));
WARN_ON(tsk == current);
+ /*
+ * Remove function-return probe instances associated with this
+ * task and put them back on the free list.
+ */
+ kprobe_flush_task(tsk);
+
+ /* Task is done with its stack. */
+ put_task_stack(tsk);
+
cgroup_free(tsk);
task_numa_free(tsk, true);
security_task_free(tsk);
@@ -927,6 +950,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->splice_pipe = NULL;
tsk->task_frag.page = NULL;
tsk->wake_q.next = NULL;
+ tsk->wake_q_sleeper.next = NULL;
account_kernel_stack(tsk, 1);
@@ -1920,6 +1944,7 @@ static __latent_entropy struct task_struct *copy_process(
spin_lock_init(&p->alloc_lock);
init_sigpending(&p->pending);
+ p->sigqueue_cache = NULL;
p->utime = p->stime = p->gtime = 0;
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
diff --git a/kernel/futex.c b/kernel/futex.c
index 5660c02b01b0..9be686d4b379 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -962,7 +962,9 @@ static void exit_pi_state_list(struct task_struct *curr)
if (head->next != next) {
/* retain curr->pi_lock for the loop invariant */
raw_spin_unlock(&pi_state->pi_mutex.wait_lock);
+ raw_spin_unlock_irq(&curr->pi_lock);
spin_unlock(&hb->lock);
+ raw_spin_lock_irq(&curr->pi_lock);
put_pi_state(pi_state);
continue;
}
@@ -1571,6 +1573,7 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_
struct task_struct *new_owner;
bool postunlock = false;
DEFINE_WAKE_Q(wake_q);
+ DEFINE_WAKE_Q(wake_sleeper_q);
int ret = 0;
new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
@@ -1630,13 +1633,13 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_
pi_state->owner = new_owner;
raw_spin_unlock(&new_owner->pi_lock);
- postunlock = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q);
-
+ postunlock = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q,
+ &wake_sleeper_q);
out_unlock:
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
if (postunlock)
- rt_mutex_postunlock(&wake_q);
+ rt_mutex_postunlock(&wake_q, &wake_sleeper_q);
return ret;
}
@@ -2260,6 +2263,16 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
requeue_pi_wake_futex(this, &key2, hb2);
drop_count++;
continue;
+ } else if (ret == -EAGAIN) {
+ /*
+ * Waiter was woken by timeout or
+ * signal and has set pi_blocked_on to
+ * PI_WAKEUP_INPROGRESS before we
+ * tried to enqueue it on the rtmutex.
+ */
+ this->pi_state = NULL;
+ put_pi_state(pi_state);
+ continue;
} else if (ret) {
/*
* rt_mutex_start_proxy_lock() detected a
@@ -2968,7 +2981,7 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
goto no_block;
}
- rt_mutex_init_waiter(&rt_waiter);
+ rt_mutex_init_waiter(&rt_waiter, false);
/*
* On PREEMPT_RT_FULL, when hb->lock becomes an rt_mutex, we must not
@@ -2984,6 +2997,14 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
* before __rt_mutex_start_proxy_lock() is done.
*/
raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock);
+ /*
+ * the migrate_disable() here disables migration in the in_atomic() fast
+ * path which is enabled again in the following spin_unlock(). We have
+ * one migrate_disable() pending in the slow-path which is reversed
+ * after the raw_spin_unlock_irq() where we leave the atomic context.
+ */
+ migrate_disable();
+
spin_unlock(q.lock_ptr);
/*
* __rt_mutex_start_proxy_lock() unconditionally enqueues the @rt_waiter
@@ -2992,6 +3013,7 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
*/
ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock);
+ migrate_enable();
if (ret) {
if (ret == 1)
@@ -3140,10 +3162,19 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
* rt_waiter. Also see the WARN in wake_futex_pi().
*/
raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+ /*
+ * Magic trickery for now to make the RT migrate disable
+ * logic happy. The following spin_unlock() happens with
+ * interrupts disabled so the internal migrate_enable()
+ * won't undo the migrate_disable() which was issued when
+ * locking hb->lock.
+ */
+ migrate_disable();
spin_unlock(&hb->lock);
/* drops pi_state->pi_mutex.wait_lock */
ret = wake_futex_pi(uaddr, uval, pi_state);
+ migrate_enable();
put_pi_state(pi_state);
@@ -3315,7 +3346,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
struct hrtimer_sleeper timeout, *to;
struct futex_pi_state *pi_state = NULL;
struct rt_mutex_waiter rt_waiter;
- struct futex_hash_bucket *hb;
+ struct futex_hash_bucket *hb, *hb2;
union futex_key key2 = FUTEX_KEY_INIT;
struct futex_q q = futex_q_init;
int res, ret;
@@ -3336,7 +3367,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
* The waiter is allocated on our stack, manipulated by the requeue
* code while we sleep on uaddr.
*/
- rt_mutex_init_waiter(&rt_waiter);
+ rt_mutex_init_waiter(&rt_waiter, false);
ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE);
if (unlikely(ret != 0))
@@ -3367,20 +3398,55 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
/* Queue the futex_q, drop the hb lock, wait for wakeup. */
futex_wait_queue_me(hb, &q, to);
- spin_lock(&hb->lock);
- ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
- spin_unlock(&hb->lock);
- if (ret)
- goto out_put_keys;
+ /*
+ * On RT we must avoid races with requeue and trying to block
+ * on two mutexes (hb->lock and uaddr2's rtmutex) by
+ * serializing access to pi_blocked_on with pi_lock.
+ */
+ raw_spin_lock_irq(¤t->pi_lock);
+ if (current->pi_blocked_on) {
+ /*
+ * We have been requeued or are in the process of
+ * being requeued.
+ */
+ raw_spin_unlock_irq(¤t->pi_lock);
+ } else {
+ /*
+ * Setting pi_blocked_on to PI_WAKEUP_INPROGRESS
+ * prevents a concurrent requeue from moving us to the
+ * uaddr2 rtmutex. After that we can safely acquire
+ * (and possibly block on) hb->lock.
+ */
+ current->pi_blocked_on = PI_WAKEUP_INPROGRESS;
+ raw_spin_unlock_irq(¤t->pi_lock);
+
+ spin_lock(&hb->lock);
+
+ /*
+ * Clean up pi_blocked_on. We might leak it otherwise
+ * when we succeeded with the hb->lock in the fast
+ * path.
+ */
+ raw_spin_lock_irq(¤t->pi_lock);
+ current->pi_blocked_on = NULL;
+ raw_spin_unlock_irq(¤t->pi_lock);
+
+ ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
+ spin_unlock(&hb->lock);
+ if (ret)
+ goto out_put_keys;
+ }
/*
- * In order for us to be here, we know our q.key == key2, and since
- * we took the hb->lock above, we also know that futex_requeue() has
- * completed and we no longer have to concern ourselves with a wakeup
- * race with the atomic proxy lock acquisition by the requeue code. The
- * futex_requeue dropped our key1 reference and incremented our key2
- * reference count.
+ * In order to be here, we have either been requeued, are in
+ * the process of being requeued, or requeue successfully
+ * acquired uaddr2 on our behalf. If pi_blocked_on was
+ * non-null above, we may be racing with a requeue. Do not
+ * rely on q->lock_ptr to be hb2->lock until after blocking on
+ * hb->lock or hb2->lock. The futex_requeue dropped our key1
+ * reference and incremented our key2 reference count.
*/
+ hb2 = hash_futex(&key2);
/* Check if the requeue code acquired the second futex for us. */
if (!q.rt_waiter) {
@@ -3389,7 +3455,8 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
* did a lock-steal - fix up the PI-state in that case.
*/
if (q.pi_state && (q.pi_state->owner != current)) {
- spin_lock(q.lock_ptr);
+ spin_lock(&hb2->lock);
+ BUG_ON(&hb2->lock != q.lock_ptr);
ret = fixup_pi_state_owner(uaddr2, &q, current);
if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current) {
pi_state = q.pi_state;
@@ -3400,7 +3467,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
* the requeue_pi() code acquired for us.
*/
put_pi_state(q.pi_state);
- spin_unlock(q.lock_ptr);
+ spin_unlock(&hb2->lock);
}
} else {
struct rt_mutex *pi_mutex;
@@ -3414,7 +3481,8 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
pi_mutex = &q.pi_state->pi_mutex;
ret = rt_mutex_wait_proxy_lock(pi_mutex, to, &rt_waiter);
- spin_lock(q.lock_ptr);
+ spin_lock(&hb2->lock);
+ BUG_ON(&hb2->lock != q.lock_ptr);
if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter))
ret = 0;
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a4ace611f47f..3306fabed221 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -185,10 +185,16 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
{
irqreturn_t retval;
unsigned int flags = 0;
+ struct pt_regs *regs = get_irq_regs();
+ u64 ip = regs ? instruction_pointer(regs) : 0;
retval = __handle_irq_event_percpu(desc, &flags);
- add_interrupt_randomness(desc->irq_data.irq, flags);
+#ifdef CONFIG_PREEMPT_RT
+ desc->random_ip = ip;
+#else
+ add_interrupt_randomness(desc->irq_data.irq, flags, ip);
+#endif
if (!noirqdebug)
note_interrupt(desc, retval);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 3b1d0a4725a4..aea20f2675f0 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1129,6 +1129,12 @@ static int irq_thread(void *data)
if (action_ret == IRQ_WAKE_THREAD)
irq_wake_secondary(desc, action);
+#ifdef CONFIG_PREEMPT_RT
+ migrate_disable();
+ add_interrupt_randomness(action->irq, 0,
+ desc->random_ip ^ (unsigned long) action);
+ migrate_enable();
+#endif
wake_threads_waitq(desc);
}
@@ -2711,7 +2717,7 @@ EXPORT_SYMBOL_GPL(irq_get_irqchip_state);
* This call sets the internal irqchip state of an interrupt,
* depending on the value of @which.
*
- * This function should be called with preemption disabled if the
+ * This function should be called with migration disabled if the
* interrupt controller has per-cpu registers.
*/
int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index 2ed97a7c9b2a..9e97124946a6 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -442,6 +442,10 @@ MODULE_PARM_DESC(noirqdebug, "Disable irq lockup detection when true");
static int __init irqfixup_setup(char *str)
{
+#ifdef CONFIG_PREEMPT_RT
+ pr_warn("irqfixup boot option not supported w/ CONFIG_PREEMPT_RT\n");
+ return 1;
+#endif
irqfixup = 1;
printk(KERN_WARNING "Misrouted IRQ fixup support enabled.\n");
printk(KERN_WARNING "This may impact system performance.\n");
@@ -454,6 +458,10 @@ module_param(irqfixup, int, 0644);
static int __init irqpoll_setup(char *str)
{
+#ifdef CONFIG_PREEMPT_RT
+ pr_warn("irqpoll boot option not supported w/ CONFIG_PREEMPT_RT\n");
+ return 1;
+#endif
irqfixup = 2;
printk(KERN_WARNING "Misrouted IRQ fixup and polling support "
"enabled\n");
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index d42acaf81886..aab68fdbcb8d 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -18,6 +18,7 @@
#include
#include
#include
+#include
#include
@@ -60,13 +61,19 @@ void __weak arch_irq_work_raise(void)
/* Enqueue on current CPU, work must already be claimed and preempt disabled */
static void __irq_work_queue_local(struct irq_work *work)
{
+ struct llist_head *list;
+ bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT);
+
+ lazy_work = work->flags & IRQ_WORK_LAZY;
+
/* If the work is "lazy", handle it from next tick if any */
- if (work->flags & IRQ_WORK_LAZY) {
- if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
- tick_nohz_tick_stopped())
- arch_irq_work_raise();
- } else {
- if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+ if (lazy_work || (realtime && !(work->flags & IRQ_WORK_HARD_IRQ)))
+ list = this_cpu_ptr(&lazy_list);
+ else
+ list = this_cpu_ptr(&raised_list);
+
+ if (llist_add(&work->llnode, list)) {
+ if (!lazy_work || tick_nohz_tick_stopped())
arch_irq_work_raise();
}
}
@@ -108,9 +115,16 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
preempt_disable();
if (cpu != smp_processor_id()) {
+ struct llist_head *list;
+
/* Arch remote IPI send/receive backend aren't NMI safe */
WARN_ON_ONCE(in_nmi());
- if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(work->flags & IRQ_WORK_HARD_IRQ))
+ list = &per_cpu(lazy_list, cpu);
+ else
+ list = &per_cpu(raised_list, cpu);
+
+ if (llist_add(&work->llnode, list))
arch_send_call_function_single_ipi(cpu);
} else {
__irq_work_queue_local(work);
@@ -129,9 +143,8 @@ bool irq_work_needs_cpu(void)
raised = this_cpu_ptr(&raised_list);
lazy = this_cpu_ptr(&lazy_list);
- if (llist_empty(raised) || arch_irq_work_has_interrupt())
- if (llist_empty(lazy))
- return false;
+ if (llist_empty(raised) && llist_empty(lazy))
+ return false;
/* All work should have been flushed before going offline */
WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -145,8 +158,12 @@ static void irq_work_run_list(struct llist_head *list)
struct llist_node *llnode;
unsigned long flags;
+#ifndef CONFIG_PREEMPT_RT
+ /*
+ * nort: On RT IRQ-work may run in SOFTIRQ context.
+ */
BUG_ON(!irqs_disabled());
-
+#endif
if (llist_empty(list))
return;
@@ -178,7 +195,16 @@ static void irq_work_run_list(struct llist_head *list)
void irq_work_run(void)
{
irq_work_run_list(this_cpu_ptr(&raised_list));
- irq_work_run_list(this_cpu_ptr(&lazy_list));
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ /*
+ * NOTE: we raise softirq via IPI for safety,
+ * and execute in irq_work_tick() to move the
+ * overhead from hard to soft irq context.
+ */
+ if (!llist_empty(this_cpu_ptr(&lazy_list)))
+ raise_softirq(TIMER_SOFTIRQ);
+ } else
+ irq_work_run_list(this_cpu_ptr(&lazy_list));
}
EXPORT_SYMBOL_GPL(irq_work_run);
@@ -188,8 +214,17 @@ void irq_work_tick(void)
if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
irq_work_run_list(raised);
+
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ irq_work_run_list(this_cpu_ptr(&lazy_list));
+}
+
+#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT)
+void irq_work_tick_soft(void)
+{
irq_work_run_list(this_cpu_ptr(&lazy_list));
}
+#endif
/*
* Synchronize against the irq_work @entry, ensures the entry is not
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 15d70a90b50d..e7cef4e2e01f 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -972,7 +972,6 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
- printk_safe_flush_on_panic();
__crash_kexec(regs);
/*
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 35859da8bd4f..dfff31ed644a 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -138,6 +138,15 @@ KERNEL_ATTR_RO(vmcoreinfo);
#endif /* CONFIG_CRASH_CORE */
+#if defined(CONFIG_PREEMPT_RT)
+static ssize_t realtime_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", 1);
+}
+KERNEL_ATTR_RO(realtime);
+#endif
+
/* whether file capabilities are enabled */
static ssize_t fscaps_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
@@ -228,6 +237,9 @@ static struct attribute * kernel_attrs[] = {
#ifndef CONFIG_TINY_RCU
&rcu_expedited_attr.attr,
&rcu_normal_attr.attr,
+#endif
+#ifdef CONFIG_PREEMPT_RT
+ &realtime_attr.attr,
#endif
NULL
};
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 45452facff3b..fa253e0db751 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -3,7 +3,7 @@
# and is generally not a function of system call inputs.
KCOV_INSTRUMENT := n
-obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
+obj-y += semaphore.o rwsem.o percpu-rwsem.o
ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
@@ -12,19 +12,23 @@ CFLAGS_REMOVE_mutex-debug.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_rtmutex-debug.o = $(CC_FLAGS_FTRACE)
endif
-obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
obj-$(CONFIG_LOCKDEP) += lockdep.o
ifeq ($(CONFIG_PROC_FS),y)
obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
endif
obj-$(CONFIG_SMP) += spinlock.o
-obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o
obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
+ifneq ($(CONFIG_PREEMPT_RT),y)
+obj-y += mutex.o
+obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
+obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
+endif
+obj-$(CONFIG_PREEMPT_RT) += mutex-rt.o rwsem-rt.o rwlock-rt.o
obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o
obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o
obj-$(CONFIG_WW_MUTEX_SELFTEST) += test-ww_mutex.o
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bca0f7f71cde..7dd5214f8d18 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4416,6 +4416,7 @@ static void check_flags(unsigned long flags)
}
}
+#ifndef CONFIG_PREEMPT_RT
/*
* We dont accurately track softirq state in e.g.
* hardirq contexts (such as on 4KSTACKS), so only
@@ -4430,6 +4431,7 @@ static void check_flags(unsigned long flags)
DEBUG_LOCKS_WARN_ON(!current->softirqs_enabled);
}
}
+#endif
if (!debug_locks)
print_irqtrace_events(current);
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index e09562818bb7..7ad06e6df176 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -16,7 +16,6 @@
#include
#include
#include
-#include
#include
#include
#include
diff --git a/kernel/locking/mutex-rt.c b/kernel/locking/mutex-rt.c
new file mode 100644
index 000000000000..4f81595c0f52
--- /dev/null
+++ b/kernel/locking/mutex-rt.c
@@ -0,0 +1,223 @@
+/*
+ * kernel/rt.c
+ *
+ * Real-Time Preemption Support
+ *
+ * started by Ingo Molnar:
+ *
+ * Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar
+ * Copyright (C) 2006, Timesys Corp., Thomas Gleixner
+ *
+ * historic credit for proving that Linux spinlocks can be implemented via
+ * RT-aware mutexes goes to many people: The Pmutex project (Dirk Grambow
+ * and others) who prototyped it on 2.4 and did lots of comparative
+ * research and analysis; TimeSys, for proving that you can implement a
+ * fully preemptible kernel via the use of IRQ threading and mutexes;
+ * Bill Huey for persuasively arguing on lkml that the mutex model is the
+ * right one; and to MontaVista, who ported pmutexes to 2.6.
+ *
+ * This code is a from-scratch implementation and is not based on pmutexes,
+ * but the idea of converting spinlocks to mutexes is used here too.
+ *
+ * lock debugging, locking tree, deadlock detection:
+ *
+ * Copyright (C) 2004, LynuxWorks, Inc., Igor Manyilov, Bill Huey
+ * Released under the General Public License (GPL).
+ *
+ * Includes portions of the generic R/W semaphore implementation from:
+ *
+ * Copyright (c) 2001 David Howells (dhowells@redhat.com).
+ * - Derived partially from idea by Andrea Arcangeli
+ * - Derived also from comments by Linus
+ *
+ * Pending ownership of locks and ownership stealing:
+ *
+ * Copyright (C) 2005, Kihon Technologies Inc., Steven Rostedt
+ *
+ * (also by Steven Rostedt)
+ * - Converted single pi_lock to individual task locks.
+ *
+ * By Esben Nielsen:
+ * Doing priority inheritance with help of the scheduler.
+ *
+ * Copyright (C) 2006, Timesys Corp., Thomas Gleixner
+ * - major rework based on Esben Nielsens initial patch
+ * - replaced thread_info references by task_struct refs
+ * - removed task->pending_owner dependency
+ * - BKL drop/reacquire for semaphore style locks to avoid deadlocks
+ * in the scheduler return path as discussed with Steven Rostedt
+ *
+ * Copyright (C) 2006, Kihon Technologies Inc.
+ * Steven Rostedt
+ * - debugged and patched Thomas Gleixner's rework.
+ * - added back the cmpxchg to the rework.
+ * - turned atomic require back on for SMP.
+ */
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include "rtmutex_common.h"
+
+/*
+ * struct mutex functions
+ */
+void __mutex_do_init(struct mutex *mutex, const char *name,
+ struct lock_class_key *key)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ /*
+ * Make sure we are not reinitializing a held lock:
+ */
+ debug_check_no_locks_freed((void *)mutex, sizeof(*mutex));
+ lockdep_init_map(&mutex->dep_map, name, key, 0);
+#endif
+ mutex->lock.save_state = 0;
+}
+EXPORT_SYMBOL(__mutex_do_init);
+
+void __lockfunc _mutex_lock(struct mutex *lock)
+{
+ mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(_mutex_lock);
+
+void __lockfunc _mutex_lock_io(struct mutex *lock)
+{
+ int token;
+
+ token = io_schedule_prepare();
+ _mutex_lock(lock);
+ io_schedule_finish(token);
+}
+EXPORT_SYMBOL_GPL(_mutex_lock_io);
+
+int __lockfunc _mutex_lock_interruptible(struct mutex *lock)
+{
+ int ret;
+
+ mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ ret = __rt_mutex_lock_state(&lock->lock, TASK_INTERRUPTIBLE);
+ if (ret)
+ mutex_release(&lock->dep_map, 1, _RET_IP_);
+ return ret;
+}
+EXPORT_SYMBOL(_mutex_lock_interruptible);
+
+int __lockfunc _mutex_lock_killable(struct mutex *lock)
+{
+ int ret;
+
+ mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ ret = __rt_mutex_lock_state(&lock->lock, TASK_KILLABLE);
+ if (ret)
+ mutex_release(&lock->dep_map, 1, _RET_IP_);
+ return ret;
+}
+EXPORT_SYMBOL(_mutex_lock_killable);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+void __lockfunc _mutex_lock_nested(struct mutex *lock, int subclass)
+{
+ mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_);
+ __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(_mutex_lock_nested);
+
+void __lockfunc _mutex_lock_io_nested(struct mutex *lock, int subclass)
+{
+ int token;
+
+ token = io_schedule_prepare();
+
+ mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_);
+ __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE);
+
+ io_schedule_finish(token);
+}
+EXPORT_SYMBOL_GPL(_mutex_lock_io_nested);
+
+void __lockfunc _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
+{
+ mutex_acquire_nest(&lock->dep_map, 0, 0, nest, _RET_IP_);
+ __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(_mutex_lock_nest_lock);
+
+int __lockfunc _mutex_lock_interruptible_nested(struct mutex *lock, int subclass)
+{
+ int ret;
+
+ mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_);
+ ret = __rt_mutex_lock_state(&lock->lock, TASK_INTERRUPTIBLE);
+ if (ret)
+ mutex_release(&lock->dep_map, 1, _RET_IP_);
+ return ret;
+}
+EXPORT_SYMBOL(_mutex_lock_interruptible_nested);
+
+int __lockfunc _mutex_lock_killable_nested(struct mutex *lock, int subclass)
+{
+ int ret;
+
+ mutex_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
+ ret = __rt_mutex_lock_state(&lock->lock, TASK_KILLABLE);
+ if (ret)
+ mutex_release(&lock->dep_map, 1, _RET_IP_);
+ return ret;
+}
+EXPORT_SYMBOL(_mutex_lock_killable_nested);
+#endif
+
+int __lockfunc _mutex_trylock(struct mutex *lock)
+{
+ int ret = __rt_mutex_trylock(&lock->lock);
+
+ if (ret)
+ mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+
+ return ret;
+}
+EXPORT_SYMBOL(_mutex_trylock);
+
+void __lockfunc _mutex_unlock(struct mutex *lock)
+{
+ mutex_release(&lock->dep_map, 1, _RET_IP_);
+ __rt_mutex_unlock(&lock->lock);
+}
+EXPORT_SYMBOL(_mutex_unlock);
+
+/**
+ * atomic_dec_and_mutex_lock - return holding mutex if we dec to 0
+ * @cnt: the atomic which we are to dec
+ * @lock: the mutex to return holding if we dec to 0
+ *
+ * return true and hold lock if we dec to 0, return false otherwise
+ */
+int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock)
+{
+ /* dec if we can't possibly hit 0 */
+ if (atomic_add_unless(cnt, -1, 1))
+ return 0;
+ /* we might hit 0, so take the lock */
+ mutex_lock(lock);
+ if (!atomic_dec_and_test(cnt)) {
+ /* when we actually did the dec, we didn't hit 0 */
+ mutex_unlock(lock);
+ return 0;
+ }
+ /* we hit 0, and we hold the lock */
+ return 1;
+}
+EXPORT_SYMBOL(atomic_dec_and_mutex_lock);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index 364d38a0c444..e4a4a56037bb 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -1,27 +1,29 @@
// SPDX-License-Identifier: GPL-2.0-only
#include
-#include
#include
+#include
#include
#include
#include