On Fri, 11 Oct 2024 at 06:06, Chris Mason wrote: > > - Linus's starvation observation. It doesn't feel like there's enough > load to cause this, especially given us sitting in truncate, where it > should be pretty unlikely to have multiple procs banging on the page in > question. Yeah, I think the starvation can only possibly happen in fdatasync-like paths where it's waiting for existing writeback without holding the page lock. And while Christian has had those backtraces too, the truncate path is not one of them. That said, just because I wanted to see how nasty it is, I looked into changing the rules for folio_wake_bit(). Christian, just to clarify, this is not for you to test - this is very experimental - but maybe Willy has comments on it. Because it *might* be possible to do something like the attached, where we do the page flags changes atomically but without any locks if there are no waiters, but if there is a waiter on the page, we always clear the page flag bit atomically under the waitqueue lock as we wake up the waiter. I changed the name (and the return value) of the folio_xor_flags_has_waiters() function to just not have any possibility of semantic mixup, but basically instead of doing the xor atomically and unconditionally (and returning whether we had waiters), it now does it conditionally only if we do *not* have waiters, and returns true if successful. And if there were waiters, it moves the flag clearing into the wakeup function. That in turn means that the "while whiteback" loop can go back to be just a non-looping "if writeback", and folio_wait_writeback() can't get into any starvation with new writebacks always showing up. The reason I say it *might* be possible to do something like this is that it changes __folio_end_writeback() to no longer necessarily clear the writeback bit under the XA lock. If there are waiters, we'll clear it later (after releasing the lock) in the caller. Willy? What do you think? Clearly this now makes PG_writeback not synchronized with the PAGECACHE_TAG_WRITEBACK tag, but the reason I think it might be ok is that the code that *sets* the PG_writeback bit in __folio_start_writeback() only ever starts with a page that isn't under writeback, and has a VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); at the top of the function even outside the XA lock. So I don't think these *need* to be synchronized under the XA lock, and I think the folio flag wakeup atomicity might be more important than the XA writeback tag vs folio writeback bit. But I'm not going to really argue for this patch at all - I wanted to look at how bad it was, I wrote it, I'm actually running it on my machine now and it didn't *immediately* blow up in my face, so it *may* work just fine. The patch is fairly simple, and apart from the XA tagging issue is seems very straightforward. I'm just not sure it's worth synchronizing one part just to at the same time de-synchronize another.. Linus