[PATCH RFC] mm, writeback: flush plugged IO in wakeup_flusher

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC] mm, writeback: flush plugged IO in wakeup_flusher_threads()
@ 2016-08-04 18:36 Konstantin Khlebnikov
  2016-08-04 19:21 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Konstantin Khlebnikov @ 2016-08-04 18:36 UTC (permalink / raw)
  To: linux-mm
  Cc: Jens Axboe, Michal Hocko, Mel Gorman, linux-raid, Dave Chinner,
	Tejun Heo, Andrew Morton, Shaohua Li

I've found funny live-lock between raid10 barriers during resync and memory
controller hard limits. Inside mpage_readpages() task holds on its plug bio
which blocks barrier in raid10. Its memory cgroup have no free memory thus
task goes into reclaimer but all reclaimable pages are dirty and cannot be
written because raid10 is rebuilding and stuck on barrier.

Common flush of such IO in schedule() never happens because machine where
that happened has a lot of free cpus and task never goes sleep.

Lock is 'live' because changing memory limit or killing tasks which holds
that stuck bio unblock whole progress.

That was happened in 3.18.x but I see no difference in upstream logic.
Theoretically this might happen even without memory cgroup.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/fs-writeback.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 56c8fda436c0..ed58863cdb5d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1948,6 +1948,12 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason)
 {
 	struct backing_dev_info *bdi;

+	/*
+	 * If we are expecting writeback progress we must submit plugged IO.
+	 */
+	if (blk_needs_flush_plug(current))
+		blk_schedule_flush_plug(current);
+
 	if (!nr_pages)
 		nr_pages = get_nr_dirty_pages();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] mm, writeback: flush plugged IO in wakeup_flusher_threads()
  2016-08-04 18:36 [PATCH RFC] mm, writeback: flush plugged IO in wakeup_flusher_threads() Konstantin Khlebnikov
@ 2016-08-04 19:21 ` Jens Axboe
  2016-08-05  5:44   ` Konstantin Khlebnikov
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2016-08-04 19:21 UTC (permalink / raw)
  To: Konstantin Khlebnikov, linux-mm
  Cc: Michal Hocko, Mel Gorman, linux-raid, Dave Chinner, Tejun Heo,
	Andrew Morton, Shaohua Li

On 08/04/2016 12:36 PM, Konstantin Khlebnikov wrote:
> I've found funny live-lock between raid10 barriers during resync and memory
> controller hard limits. Inside mpage_readpages() task holds on its plug bio
> which blocks barrier in raid10. Its memory cgroup have no free memory thus
> task goes into reclaimer but all reclaimable pages are dirty and cannot be
> written because raid10 is rebuilding and stuck on barrier.
>
> Common flush of such IO in schedule() never happens because machine where
> that happened has a lot of free cpus and task never goes sleep.
>
> Lock is 'live' because changing memory limit or killing tasks which holds
> that stuck bio unblock whole progress.
>
> That was happened in 3.18.x but I see no difference in upstream logic.
> Theoretically this might happen even without memory cgroup.

So the issue is that the caller of wakeup_flusher_threads() ends up 
never going to sleep, hence the plug is never auto-flushed. I didn't 
quite understand your reasoning for why it never sleeps above, but that 
must be the gist of it.

I don't see anything inherently wrong with the fix.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] mm, writeback: flush plugged IO in wakeup_flusher_threads()
  2016-08-04 19:21 ` Jens Axboe
@ 2016-08-05  5:44   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 3+ messages in thread
From: Konstantin Khlebnikov @ 2016-08-05  5:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Konstantin Khlebnikov, linux-mm, Michal Hocko, Mel Gorman,
	linux-raid, Dave Chinner, Tejun Heo, Andrew Morton, Shaohua Li

On Thu, Aug 4, 2016 at 10:21 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 08/04/2016 12:36 PM, Konstantin Khlebnikov wrote:
>>
>> I've found funny live-lock between raid10 barriers during resync and
>> memory
>> controller hard limits. Inside mpage_readpages() task holds on its plug
>> bio
>> which blocks barrier in raid10. Its memory cgroup have no free memory thus
>> task goes into reclaimer but all reclaimable pages are dirty and cannot be
>> written because raid10 is rebuilding and stuck on barrier.
>>
>> Common flush of such IO in schedule() never happens because machine where
>> that happened has a lot of free cpus and task never goes sleep.
>>
>> Lock is 'live' because changing memory limit or killing tasks which holds
>> that stuck bio unblock whole progress.
>>
>> That was happened in 3.18.x but I see no difference in upstream logic.
>> Theoretically this might happen even without memory cgroup.
>
>
> So the issue is that the caller of wakeup_flusher_threads() ends up never
> going to sleep, hence the plug is never auto-flushed. I didn't quite
> understand your reasoning for why it never sleeps above, but that must be
> the gist of it.

Ah right, simple context switch doesn't flush plug, so count of cpus
is irrelevant.

>
> I don't see anything inherently wrong with the fix.
>
> --
> Jens Axboe
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-08-05  5:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04 18:36 [PATCH RFC] mm, writeback: flush plugged IO in wakeup_flusher_threads() Konstantin Khlebnikov
2016-08-04 19:21 ` Jens Axboe
2016-08-05  5:44   ` Konstantin Khlebnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox