Subject: fs/dcache: disable preemption on i_dir_seq's write side
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Fri Oct 20 11:29:53 2017 +0200

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

i_dir_seq is a sequence counter with a lock which is represented by the lowest
bit. The writer atomically updates the counter which ensures that it can be
modified by only one writer at a time.
The commit introducing this change claims that the lock has been integrated
into the counter for space reasons within the inode struct. The i_dir_seq
member is within a union which shares also a pointer. That means by using
seqlock_t we would have a sequence counter and a lock without increasing the
size of the data structure on 64bit and 32bit would grow by 4 bytes. With
lockdep enabled the size would grow and on PREEMPT_RT the spinlock_t is also
larger.

In order to keep this construct working on PREEMPT_RT, the writer needs to
disable preemption while obtaining the lock on the sequence counter / starting
the write critical section. The writer acquires an otherwise unrelated
spinlock_t which serves the same purpose on !PREEMPT_RT. With enabled
preemption a high priority reader could preempt the writer and live lock the
system while waiting for the locked bit to disappear.

Another solution would be to have global spinlock_t which is always acquired
by the writer. The reader would then acquire the lock if the sequence count is
odd and by doing so force the writer out of the critical section. The global
spinlock_t could be replaced by a hashed lock based on the address of the inode
to lower the lock contention.

For now, manually disable preemption on PREEMPT_RT to avoid live locks.

Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/dcache.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
---
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2563,7 +2563,13 @@ EXPORT_SYMBOL(d_rehash);
 
 static inline unsigned start_dir_add(struct inode *dir)
 {
-
+	/*
+	 * The caller has a spinlock_t (dentry::d_lock) acquired which disables
+	 * preemption on !PREEMPT_RT. On PREEMPT_RT the lock does not disable
+	 * preemption and it has be done explicitly.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_disable();
 	for (;;) {
 		unsigned n = dir->i_dir_seq;
 		if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n)
@@ -2575,6 +2581,8 @@ static inline unsigned start_dir_add(str
 static inline void end_dir_add(struct inode *dir, unsigned n)
 {
 	smp_store_release(&dir->i_dir_seq, n + 2);
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_enable();
 }
 
 static void d_wait_lookup(struct dentry *dentry)