* [PATCH] vmscan: do not throttle kthreads due to too_many_isolated @ 2015-11-25 15:36 Vladimir Davydov 2015-11-25 15:45 ` Vlastimil Babka 0 siblings, 1 reply; 9+ messages in thread From: Vladimir Davydov @ 2015-11-25 15:36 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Michal Hocko, Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel Block device drivers often hand off io request processing to kernel threads (example: device mapper). If such a thread calls kmalloc, it can dive into direct reclaim path and end up waiting for too_many_isolated to return false, blocking writeback. This can lead to a dead lock if the pages were isolated by processes performing memcg reclaim, because they call wait_on_page_writeback upon encountering a page under writeback, which will never finish if bio_endio is to be called by the kernel thread stuck in the reclaimer, waiting for the isolated pages to be put back. I've never encountered such a dead lock on vanilla kernel, neither have I tried to reproduce it. However, I faced it with an out-of-tree block device driver, which uses a kernel thread for completing bios: the kernel thread got stuck busy-checking too_many_isolated on the DMA zone, which had only 3 inactive and 68 isolated file pages (2163 pages were free); the pages were isolated by memcg processes waiting for writeback to finish. I don't see anything that could prevent this in case of e.g. device mapper. Let's fix this problem by making too_many_isolated always return false for kernel threads. Apart from fixing the possible dead lock in case of the legacy cgroup hierarchy, this makes sense even if the unified hierarchy is used, where processes performing memcg reclaim will never call wait_on_page_writeback, because kernel threads might be responsible for cleaning pages necessary for reclaim - BTW throttle_direct_reclaim never throttles kernel threads for the same reason. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 9d553b07bb86..0f1318a52b23 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1457,7 +1457,7 @@ static int too_many_isolated(struct zone *zone, int file, { unsigned long inactive, isolated; - if (current_is_kswapd()) + if (current->flags & PF_KTHREAD) return 0; if (!sane_reclaim(sc)) -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-11-25 15:36 [PATCH] vmscan: do not throttle kthreads due to too_many_isolated Vladimir Davydov @ 2015-11-25 15:45 ` Vlastimil Babka 2015-11-25 16:27 ` Vladimir Davydov 0 siblings, 1 reply; 9+ messages in thread From: Vlastimil Babka @ 2015-11-25 15:45 UTC (permalink / raw) To: Vladimir Davydov, Andrew Morton Cc: Johannes Weiner, Michal Hocko, Mel Gorman, linux-mm, linux-kernel On 11/25/2015 04:36 PM, Vladimir Davydov wrote: > Block device drivers often hand off io request processing to kernel > threads (example: device mapper). If such a thread calls kmalloc, it can > dive into direct reclaim path and end up waiting for too_many_isolated > to return false, blocking writeback. This can lead to a dead lock if the Shouldn't such allocation lack __GFP_IO to prevent this and other kinds of deadlocks? And/or have mempools? PF_KTHREAD looks like a big hammer to me that will solve only one potential problem... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-11-25 15:45 ` Vlastimil Babka @ 2015-11-25 16:27 ` Vladimir Davydov 2015-11-26 8:16 ` Vladimir Davydov 0 siblings, 1 reply; 9+ messages in thread From: Vladimir Davydov @ 2015-11-25 16:27 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Mel Gorman, linux-mm, linux-kernel On Wed, Nov 25, 2015 at 04:45:13PM +0100, Vlastimil Babka wrote: > On 11/25/2015 04:36 PM, Vladimir Davydov wrote: > > Block device drivers often hand off io request processing to kernel > > threads (example: device mapper). If such a thread calls kmalloc, it can > > dive into direct reclaim path and end up waiting for too_many_isolated > > to return false, blocking writeback. This can lead to a dead lock if the > > Shouldn't such allocation lack __GFP_IO to prevent this and other kinds of > deadlocks? And/or have mempools? Not necessarily. loopback is an example: it can call grab_cache_write_begin -> add_to_page_cache_lru with GFP_KERNEL. > PF_KTHREAD looks like a big hammer to me that will solve only one > potential problem... This problem can result in processes hanging forever. Any ideas how this could be fixed in a better way? Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-11-25 16:27 ` Vladimir Davydov @ 2015-11-26 8:16 ` Vladimir Davydov 2015-11-27 12:50 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Vladimir Davydov @ 2015-11-26 8:16 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Mel Gorman, linux-mm, linux-kernel On Wed, Nov 25, 2015 at 07:27:57PM +0300, Vladimir Davydov wrote: > On Wed, Nov 25, 2015 at 04:45:13PM +0100, Vlastimil Babka wrote: > > On 11/25/2015 04:36 PM, Vladimir Davydov wrote: > > > Block device drivers often hand off io request processing to kernel > > > threads (example: device mapper). If such a thread calls kmalloc, it can > > > dive into direct reclaim path and end up waiting for too_many_isolated > > > to return false, blocking writeback. This can lead to a dead lock if the > > > > Shouldn't such allocation lack __GFP_IO to prevent this and other kinds of > > deadlocks? And/or have mempools? > > Not necessarily. loopback is an example: it can call > grab_cache_write_begin -> add_to_page_cache_lru with GFP_KERNEL. Anyway, kthreads that use GFP_NOIO and/or mempool aren't safe either, because it isn't an allocation context problem: the reclaimer locks up not because it tries to take an fs/io lock the caller holds, but because it waits for isolated pages to be put back, which will never happen, since processes that isolated them depend on the kthread making progress. This is purely a reclaimer heuristic, which kmalloc users are not aware of. My point is that, in contrast to userspace processes, it is dangerous to throttle kthreads in the reclaimer, because they might be responsible for reclaimer progress (e.g. performing writeback). Regarding side effects of this patch. Well, there aren't many kthreads out there, so I don't believe this can put the system under the risk of thrashing because of isolating too many reclaimable pages. Thanks, Vladimir > > > PF_KTHREAD looks like a big hammer to me that will solve only one > > potential problem... > > This problem can result in processes hanging forever. Any ideas how this > could be fixed in a better way? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-11-26 8:16 ` Vladimir Davydov @ 2015-11-27 12:50 ` Michal Hocko 2015-11-27 13:40 ` Vladimir Davydov 0 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2015-11-27 12:50 UTC (permalink / raw) To: Vladimir Davydov Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Mel Gorman, linux-mm, linux-kernel On Thu 26-11-15 11:16:24, Vladimir Davydov wrote: > On Wed, Nov 25, 2015 at 07:27:57PM +0300, Vladimir Davydov wrote: > > On Wed, Nov 25, 2015 at 04:45:13PM +0100, Vlastimil Babka wrote: > > > On 11/25/2015 04:36 PM, Vladimir Davydov wrote: > > > > Block device drivers often hand off io request processing to kernel > > > > threads (example: device mapper). If such a thread calls kmalloc, it can > > > > dive into direct reclaim path and end up waiting for too_many_isolated > > > > to return false, blocking writeback. This can lead to a dead lock if the > > > > > > Shouldn't such allocation lack __GFP_IO to prevent this and other kinds of > > > deadlocks? And/or have mempools? > > > > Not necessarily. loopback is an example: it can call > > grab_cache_write_begin -> add_to_page_cache_lru with GFP_KERNEL. AFAIR loop driver reduces the gfp_maks via inode mapping. > Anyway, kthreads that use GFP_NOIO and/or mempool aren't safe either, > because it isn't an allocation context problem: the reclaimer locks up > not because it tries to take an fs/io lock the caller holds, but because > it waits for isolated pages to be put back, which will never happen, > since processes that isolated them depend on the kthread making > progress. This is purely a reclaimer heuristic, which kmalloc users are > not aware of. > > My point is that, in contrast to userspace processes, it is dangerous to > throttle kthreads in the reclaimer, because they might be responsible > for reclaimer progress (e.g. performing writeback). Wouldn't it be better if your writeback kthread did PF_MEMALLOC/__GFP_MEMALLOC instead because it is in fact a reclaimer so it even get to the reclaim. There way too many allocations done from the kernel thread context to be not throttled (just look at worker threads). -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-11-27 12:50 ` Michal Hocko @ 2015-11-27 13:40 ` Vladimir Davydov 2015-11-27 15:01 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Vladimir Davydov @ 2015-11-27 13:40 UTC (permalink / raw) To: Michal Hocko Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Mel Gorman, linux-mm, linux-kernel On Fri, Nov 27, 2015 at 01:50:05PM +0100, Michal Hocko wrote: > On Thu 26-11-15 11:16:24, Vladimir Davydov wrote: > > On Wed, Nov 25, 2015 at 07:27:57PM +0300, Vladimir Davydov wrote: > > > On Wed, Nov 25, 2015 at 04:45:13PM +0100, Vlastimil Babka wrote: > > > > On 11/25/2015 04:36 PM, Vladimir Davydov wrote: > > > > > Block device drivers often hand off io request processing to kernel > > > > > threads (example: device mapper). If such a thread calls kmalloc, it can > > > > > dive into direct reclaim path and end up waiting for too_many_isolated > > > > > to return false, blocking writeback. This can lead to a dead lock if the > > > > > > > > Shouldn't such allocation lack __GFP_IO to prevent this and other kinds of > > > > deadlocks? And/or have mempools? > > > > > > Not necessarily. loopback is an example: it can call > > > grab_cache_write_begin -> add_to_page_cache_lru with GFP_KERNEL. > > AFAIR loop driver reduces the gfp_maks via inode mapping. Yeah, it does, missed that, thanks for pointing this out. But it doesn't really make much difference, because it still can get stuck in too_many_isolated, although it does reduce the chance of this happening. When I hit it, DMA only got 3 inactive file pages and 68 isolated file pages, as I mentioned in the comment to the patch, so even >> 3 wouldn't save us. > > > Anyway, kthreads that use GFP_NOIO and/or mempool aren't safe either, > > because it isn't an allocation context problem: the reclaimer locks up > > not because it tries to take an fs/io lock the caller holds, but because > > it waits for isolated pages to be put back, which will never happen, > > since processes that isolated them depend on the kthread making > > progress. This is purely a reclaimer heuristic, which kmalloc users are > > not aware of. > > > > My point is that, in contrast to userspace processes, it is dangerous to > > throttle kthreads in the reclaimer, because they might be responsible > > for reclaimer progress (e.g. performing writeback). > > Wouldn't it be better if your writeback kthread did PF_MEMALLOC/__GFP_MEMALLOC > instead because it is in fact a reclaimer so it even get to the reclaim. The driver we use is similar to loop. It works as a proxy to fs it works on top of. Allowing it to access emergency reserves would deplete them quickly, just like in case of plain loop. The problem is not about our driver, in fact. I'm pretty sure one can hit it when using memcg along with loop or dm-crypt for instance. > > There way too many allocations done from the kernel thread context to be > not throttled (just look at worker threads). What about throttling them only once then? diff --git a/mm/vmscan.c b/mm/vmscan.c index 97ba9e1cde09..9253f4531b9c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1578,6 +1578,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, /* We are about to die and free our memory. Return now. */ if (fatal_signal_pending(current)) return SWAP_CLUSTER_MAX; + + if (current->flags & PF_KTHREAD) + break; } lru_add_drain(); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-11-27 13:40 ` Vladimir Davydov @ 2015-11-27 15:01 ` Michal Hocko 2015-12-01 11:25 ` Vladimir Davydov 0 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2015-11-27 15:01 UTC (permalink / raw) To: Vladimir Davydov Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Mel Gorman, linux-mm, linux-kernel On Fri 27-11-15 16:40:03, Vladimir Davydov wrote: > On Fri, Nov 27, 2015 at 01:50:05PM +0100, Michal Hocko wrote: > > On Thu 26-11-15 11:16:24, Vladimir Davydov wrote: [...] > > > Anyway, kthreads that use GFP_NOIO and/or mempool aren't safe either, > > > because it isn't an allocation context problem: the reclaimer locks up > > > not because it tries to take an fs/io lock the caller holds, but because > > > it waits for isolated pages to be put back, which will never happen, > > > since processes that isolated them depend on the kthread making > > > progress. This is purely a reclaimer heuristic, which kmalloc users are > > > not aware of. > > > > > > My point is that, in contrast to userspace processes, it is dangerous to > > > throttle kthreads in the reclaimer, because they might be responsible > > > for reclaimer progress (e.g. performing writeback). > > > > Wouldn't it be better if your writeback kthread did PF_MEMALLOC/__GFP_MEMALLOC > > instead because it is in fact a reclaimer so it even get to the reclaim. > > The driver we use is similar to loop. It works as a proxy to fs it works > on top of. Allowing it to access emergency reserves would deplete them > quickly, just like in case of plain loop. OK, I see. I thought it would be using only a very limited amount of memory for the writeback. > The problem is not about our driver, in fact. I'm pretty sure one can > hit it when using memcg along with loop or dm-crypt for instance. I am not familiar much with neither but from a quick look the loop driver doesn't use mempools tool, it simply relays the data to the underlaying file and relies on the underlying fs to write all the pages and only prevents from the recursion by clearing GFP_FS and GFP_IO. Then I am not really sure how can we guarantee a forward progress. The GFP_NOFS allocation might loop inside the allocator endlessly and so the writeback wouldn't make any progress. This doesn't seem to be only memcg specific. The global case would just replace the deadlock by a livelock. I certainly must be missing something here. > > There way too many allocations done from the kernel thread context to be > > not throttled (just look at worker threads). > > What about throttling them only once then? This still sounds way too broad to me and I am even not sure it solves the problem. If anything I think we really should make it specific only to those callers who are really required to make a forward progress. What about PF_LESS_THROTTLE? NFS is already using this flag for a similar purpose and we indeed do not throttle at few places during the reclaim. So I would expect current_may_throttle(current) check there although I must confess I have no idea about the whole condition right now. > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 97ba9e1cde09..9253f4531b9c 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1578,6 +1578,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > /* We are about to die and free our memory. Return now. */ > if (fatal_signal_pending(current)) > return SWAP_CLUSTER_MAX; > + > + if (current->flags & PF_KTHREAD) > + break; > } > > lru_add_drain(); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-11-27 15:01 ` Michal Hocko @ 2015-12-01 11:25 ` Vladimir Davydov 2015-12-02 13:52 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Vladimir Davydov @ 2015-12-01 11:25 UTC (permalink / raw) To: Michal Hocko Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Mel Gorman, linux-mm, linux-kernel On Fri, Nov 27, 2015 at 04:01:33PM +0100, Michal Hocko wrote: > On Fri 27-11-15 16:40:03, Vladimir Davydov wrote: > > On Fri, Nov 27, 2015 at 01:50:05PM +0100, Michal Hocko wrote: > > > On Thu 26-11-15 11:16:24, Vladimir Davydov wrote: > [...] > > > > Anyway, kthreads that use GFP_NOIO and/or mempool aren't safe either, > > > > because it isn't an allocation context problem: the reclaimer locks up > > > > not because it tries to take an fs/io lock the caller holds, but because > > > > it waits for isolated pages to be put back, which will never happen, > > > > since processes that isolated them depend on the kthread making > > > > progress. This is purely a reclaimer heuristic, which kmalloc users are > > > > not aware of. > > > > > > > > My point is that, in contrast to userspace processes, it is dangerous to > > > > throttle kthreads in the reclaimer, because they might be responsible > > > > for reclaimer progress (e.g. performing writeback). > > > > > > Wouldn't it be better if your writeback kthread did PF_MEMALLOC/__GFP_MEMALLOC > > > instead because it is in fact a reclaimer so it even get to the reclaim. > > > > The driver we use is similar to loop. It works as a proxy to fs it works > > on top of. Allowing it to access emergency reserves would deplete them > > quickly, just like in case of plain loop. > > OK, I see. I thought it would be using only a very limited amount of > memory for the writeback. Even if it does, there's still a chance of a dead lock: even if kthread uses mempool + NOIO, it can go into direct reclaim and start looping in too_many_isolated AFAIU. The probability of this happening is much less though. > > > The problem is not about our driver, in fact. I'm pretty sure one can > > hit it when using memcg along with loop or dm-crypt for instance. > > I am not familiar much with neither but from a quick look the loop > driver doesn't use mempools tool, it simply relays the data to the > underlaying file and relies on the underlying fs to write all the pages > and only prevents from the recursion by clearing GFP_FS and GFP_IO. Then > I am not really sure how can we guarantee a forward progress. The > GFP_NOFS allocation might loop inside the allocator endlessly and so > the writeback wouldn't make any progress. This doesn't seem to be only > memcg specific. The global case would just replace the deadlock by a > livelock. I certainly must be missing something here. Yeah, I think you're right. If the loop kthread gets stuck in the reclaimer, it might occur that other processes will isolate, reclaim and then dirty reclaimed pages, preventing the kthread from running and cleaning memory, so that we might end up with all memory being under writeback and no reclaimable memory left for kthread to run and clean it. Due to dirty limit, this is unlikely to happen though, but I'm not sure. OTOH, with legacy memcg, there is no dirty limit and we can isolate a lot of pages (SWAP_CLUSTER_MAX = 512 now) per process and wait on page writeback to complete before releasing them, which sounds bad. And we can't just remove this wait_on_page_writeback from shrink_page_list, otherwise OOM might be triggered prematurely. May be, we should putback rotated pages and release all reclaimed pages before initiating wait? > > > > There way too many allocations done from the kernel thread context to be > > > not throttled (just look at worker threads). > > > > What about throttling them only once then? > > This still sounds way too broad to me and I am even not sure it solves > the problem. If anything I think we really should make it specific > only to those callers who are really required to make a forward > progress. What about PF_LESS_THROTTLE? NFS is already using this flag for a > similar purpose and we indeed do not throttle at few places during the > reclaim. So I would expect current_may_throttle(current) check there > although I must confess I have no idea about the whole condition right > now. Yeah, thanks for the tip. I'll take a look. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] vmscan: do not throttle kthreads due to too_many_isolated 2015-12-01 11:25 ` Vladimir Davydov @ 2015-12-02 13:52 ` Michal Hocko 0 siblings, 0 replies; 9+ messages in thread From: Michal Hocko @ 2015-12-02 13:52 UTC (permalink / raw) To: Vladimir Davydov Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Mel Gorman, linux-mm, linux-kernel On Tue 01-12-15 14:25:45, Vladimir Davydov wrote: > On Fri, Nov 27, 2015 at 04:01:33PM +0100, Michal Hocko wrote: > > On Fri 27-11-15 16:40:03, Vladimir Davydov wrote: [...] > > > The problem is not about our driver, in fact. I'm pretty sure one can > > > hit it when using memcg along with loop or dm-crypt for instance. > > > > I am not familiar much with neither but from a quick look the loop > > driver doesn't use mempools tool, it simply relays the data to the > > underlaying file and relies on the underlying fs to write all the pages > > and only prevents from the recursion by clearing GFP_FS and GFP_IO. Then > > I am not really sure how can we guarantee a forward progress. The > > GFP_NOFS allocation might loop inside the allocator endlessly and so > > the writeback wouldn't make any progress. This doesn't seem to be only > > memcg specific. The global case would just replace the deadlock by a > > livelock. I certainly must be missing something here. > > Yeah, I think you're right. If the loop kthread gets stuck in the > reclaimer, it might occur that other processes will isolate, reclaim and > then dirty reclaimed pages, preventing the kthread from running and > cleaning memory, so that we might end up with all memory being under > writeback and no reclaimable memory left for kthread to run and clean > it. Due to dirty limit, this is unlikely to happen though, but I'm not > sure. > > OTOH, with legacy memcg, there is no dirty limit and we can isolate a > lot of pages (SWAP_CLUSTER_MAX = 512 now) per process and wait on page > writeback to complete before releasing them, which sounds bad. And we > can't just remove this wait_on_page_writeback from shrink_page_list, > otherwise OOM might be triggered prematurely. May be, we should putback > rotated pages and release all reclaimed pages before initiating wait? So you want to write for the page while it is on the LRU? I think it would be better to simply not throttle the loopback kthread endlessly in the first place. I was tempted to add an exception and do not wait for writeback on loopback (and alike) devices but this is a dirty hack and I think it is reasonable to rely on the global case to work properly and guarantee a forward progress. And this seems broken currently: FS1 (marks page Writeback) -> request -> BLK -> LOOP -> vfs_iter_write -> FS2 So the writeback bit will be set until vfs_iter_write finishes and that doesn't require data to be written down to the FS2 backing store AFAICS. But what happens if vfs_iter_write gets throttled on the dirty limit or FS2 performs a GFP_NOFS allocation? We will get stuck there for ever without any progress. If loopback FS load is dominant then we are screwed. So I think the global case needs a fix. If it works properly our waiting in the memcg reclaim will be finite as well. On the other hand if all the reclaimable memory is marked as writeback or dirty then we should trigger OOM killer sooner or later so I am not so sure this is such a big deal. I have to play with this though. Anyway I guess we at least want to wait for the writebit killable. I will cook up a patch. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-12-02 13:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-25 15:36 [PATCH] vmscan: do not throttle kthreads due to too_many_isolated Vladimir Davydov 2015-11-25 15:45 ` Vlastimil Babka 2015-11-25 16:27 ` Vladimir Davydov 2015-11-26 8:16 ` Vladimir Davydov 2015-11-27 12:50 ` Michal Hocko 2015-11-27 13:40 ` Vladimir Davydov 2015-11-27 15:01 ` Michal Hocko 2015-12-01 11:25 ` Vladimir Davydov 2015-12-02 13:52 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox