* [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc()
@ 2011-03-11 20:35 Prasad Joshi
2011-03-11 21:01 ` David Rientjes
0 siblings, 1 reply; 6+ messages in thread
From: Prasad Joshi @ 2011-03-11 20:35 UTC (permalink / raw)
To: linux-mm, Andrew Morton, Anand Mitra
A filesystem might run into a problem while calling
__vmalloc(GFP_NOFS) inside a lock.
It is expected than __vmalloc when called with GFP_NOFS should not
callback the filesystem code even incase of the increased memory
pressure. But the problem is that even if we pass this flag, __vmalloc
itself allocates memory with GFP_KERNEL.
Using GFP_KERNEL allocations may go into the memory reclaim path and
try to free memory by calling file system clear_inode/evict_inode
function. Which might lead into deadlock.
For further details
https://bugzilla.kernel.org/show_bug.cgi?id=30702
http://marc.info/?l=linux-mm&m=128942194520631&w=4
The patch passes the gfp allocation flag all the way down to those
allocating functions.
arch/alpha/include/asm/pgalloc.h | 18 ++-------
arch/arm/include/asm/pgalloc.h | 12 +-----
arch/avr32/include/asm/pgalloc.h | 8 +----
arch/cris/include/asm/pgalloc.h | 10 +----
arch/frv/include/asm/pgalloc.h | 3 --
arch/frv/include/asm/pgtable.h | 1 -
arch/frv/mm/pgalloc.c | 9 +----
arch/ia64/include/asm/pgalloc.h | 24 ++-----------
arch/m32r/include/asm/pgalloc.h | 11 ++----
arch/m68k/include/asm/motorola_pgalloc.h | 19 ++--------
arch/m68k/include/asm/sun3_pgalloc.h | 14 ++------
arch/m68k/mm/memory.c | 9 +----
arch/microblaze/include/asm/pgalloc.h | 3 --
arch/microblaze/mm/pgtable.c | 12 ++-----
arch/mips/include/asm/pgalloc.h | 22 ++++--------
arch/mn10300/include/asm/pgalloc.h | 2 -
arch/mn10300/mm/pgtable.c | 10 +----
arch/parisc/include/asm/pgalloc.h | 20 ++--------
arch/powerpc/include/asm/pgalloc-32.h | 2 -
arch/powerpc/include/asm/pgalloc-64.h | 29 +++------------
arch/powerpc/mm/pgtable_32.c | 10 +----
arch/s390/include/asm/pgalloc.h | 28 +++------------
arch/s390/mm/pgtable.c | 22 +++---------
arch/score/include/asm/pgalloc.h | 14 +++----
arch/sh/include/asm/pgalloc.h | 8 +----
arch/sh/mm/pgtable.c | 8 +----
arch/sparc/include/asm/pgalloc_32.h | 5 ---
arch/sparc/include/asm/pgalloc_64.h | 17 +--------
arch/tile/include/asm/pgalloc.h | 11 +-----
arch/tile/mm/pgtable.c | 10 +----
arch/um/include/asm/pgalloc.h | 1 -
arch/um/kernel/mem.c | 21 +++--------
arch/x86/include/asm/pgalloc.h | 17 +--------
arch/x86/mm/pgtable.c | 9 +----
arch/xtensa/include/asm/pgalloc.h | 9 +----
arch/xtensa/mm/pgtable.c | 10 +----
include/asm-generic/4level-fixup.h | 8 +---
include/asm-generic/pgtable-nopmd.h | 3 +-
include/asm-generic/pgtable-nopud.h | 1 -
include/linux/mm.h | 40 +++++---------------
mm/memory.c | 14 +++----
mm/vmalloc.c | 58 ++++++++++--------------------
42 files changed, 121 insertions(+), 441 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc()
2011-03-11 20:35 [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc() Prasad Joshi
@ 2011-03-11 21:01 ` David Rientjes
2011-03-11 21:13 ` Prasad Joshi
2011-03-12 2:05 ` Anand Mitra
0 siblings, 2 replies; 6+ messages in thread
From: David Rientjes @ 2011-03-11 21:01 UTC (permalink / raw)
To: Prasad Joshi; +Cc: linux-mm, Andrew Morton, Anand Mitra
On Fri, 11 Mar 2011, Prasad Joshi wrote:
> A filesystem might run into a problem while calling
> __vmalloc(GFP_NOFS) inside a lock.
>
> It is expected than __vmalloc when called with GFP_NOFS should not
> callback the filesystem code even incase of the increased memory
> pressure. But the problem is that even if we pass this flag, __vmalloc
> itself allocates memory with GFP_KERNEL.
>
> Using GFP_KERNEL allocations may go into the memory reclaim path and
> try to free memory by calling file system clear_inode/evict_inode
> function. Which might lead into deadlock.
>
> For further details
> https://bugzilla.kernel.org/show_bug.cgi?id=30702
> http://marc.info/?l=linux-mm&m=128942194520631&w=4
>
> The patch passes the gfp allocation flag all the way down to those
> allocating functions.
>
You're going to run into trouble by hard-wiring __GFP_REPEAT into all of
the pte allocations because if GFP_NOFS is used then direct reclaim will
usually fail (see the comment for do_try_to_free_pages(): If the caller is
!__GFP_FS then the probability of a failure is reasonably high) and, if
it does so continuously, then the page allocator will loop forever. This
bit should probably be moved a level higher in your architecture changes
to the caller passing GFP_KERNEL.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc()
2011-03-11 21:01 ` David Rientjes
@ 2011-03-11 21:13 ` Prasad Joshi
2011-03-11 23:08 ` David Rientjes
2011-03-12 2:05 ` Anand Mitra
1 sibling, 1 reply; 6+ messages in thread
From: Prasad Joshi @ 2011-03-11 21:13 UTC (permalink / raw)
To: David Rientjes; +Cc: linux-mm, Andrew Morton, Anand Mitra
On Fri, Mar 11, 2011 at 9:01 PM, David Rientjes <rientjes@google.com> wrote:
> On Fri, 11 Mar 2011, Prasad Joshi wrote:
>
>> A filesystem might run into a problem while calling
>> __vmalloc(GFP_NOFS) inside a lock.
>>
>> It is expected than __vmalloc when called with GFP_NOFS should not
>> callback the filesystem code even incase of the increased memory
>> pressure. But the problem is that even if we pass this flag, __vmalloc
>> itself allocates memory with GFP_KERNEL.
>>
>> Using GFP_KERNEL allocations may go into the memory reclaim path and
>> try to free memory by calling file system clear_inode/evict_inode
>> function. Which might lead into deadlock.
>>
>> For further details
>> https://bugzilla.kernel.org/show_bug.cgi?id=30702
>> http://marc.info/?l=linux-mm&m=128942194520631&w=4
>>
>> The patch passes the gfp allocation flag all the way down to those
>> allocating functions.
>>
>
> You're going to run into trouble by hard-wiring __GFP_REPEAT into all of
> the pte allocations because if GFP_NOFS is used then direct reclaim will
> usually fail (see the comment for do_try_to_free_pages(): If the caller is
> !__GFP_FS then the probability of a failure is reasonably high) and, if
> it does so continuously, then the page allocator will loop forever. This
> bit should probably be moved a level higher in your architecture changes
> to the caller passing GFP_KERNEL.
Thanks a lot for your reply. I should have seen your mail before
sending 23 mails :(
I will make the changes suggested by you and will resend all of the
patches again.
Thanks and Regards,
Prasad
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc()
2011-03-11 21:13 ` Prasad Joshi
@ 2011-03-11 23:08 ` David Rientjes
0 siblings, 0 replies; 6+ messages in thread
From: David Rientjes @ 2011-03-11 23:08 UTC (permalink / raw)
To: Prasad Joshi; +Cc: linux-mm, Andrew Morton, Anand Mitra
On Fri, 11 Mar 2011, Prasad Joshi wrote:
> Thanks a lot for your reply. I should have seen your mail before
> sending 23 mails :(
> I will make the changes suggested by you and will resend all of the
> patches again.
>
Thanks for taking this effort on. A couple other points:
- each patch should have a different subject prefixed with the subsystem
that it touches (for example: "x86: add gfp flags variant of
pte_alloc_one") and the maintainers should be cc'd. Check
scripts/get_maintainer.pl or the MAINTAINERS file. Also, for changes
that touch all arch code you'll want to cc linux-arch@vger.kernel.org
as well.
- each change needs to have a proper changelog prior to your
signed-off-by line to explain why the change is being done and in
preparation for supporting non-GFP_KERNEL allocations from __vmalloc().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc()
2011-03-11 21:01 ` David Rientjes
2011-03-11 21:13 ` Prasad Joshi
@ 2011-03-12 2:05 ` Anand Mitra
2011-03-13 1:22 ` David Rientjes
1 sibling, 1 reply; 6+ messages in thread
From: Anand Mitra @ 2011-03-12 2:05 UTC (permalink / raw)
To: David Rientjes; +Cc: Prasad Joshi, linux-mm, Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 1721 bytes --]
On Fri, Mar 11, 2011 at 1:01 PM, David Rientjes <rientjes@google.com> wrote:
>
>
> You're going to run into trouble by hard-wiring __GFP_REPEAT into all of
> the pte allocations because if GFP_NOFS is used then direct reclaim will
> usually fail (see the comment for do_try_to_free_pages(): If the caller is
> !__GFP_FS then the probability of a failure is reasonably high) and, if
> it does so continuously, then the page allocator will loop forever. This
> bit should probably be moved a level higher in your architecture changes
> to the caller passing GFP_KERNEL.
>
I'll repeat my understanding of the scenario you have pointed out to
make sure we have understood you correctly.
On the broad level the changes will cause a __GFP_NOFS flag to be
present in pte allocation which were earlier absent. The impact of
this is serious when both __GFP_REPEAT and __GFP_NOFS is set because
1) __GFP_NOFS will result in very few pages being reclaimed (can't go
to the filesystems)
2) __GFP_REPEAT will cause both the reclaim and allocation to retry
more aggressively if not indefinitely based on the influence the
flag in functions should_alloc_retry & should_continue_reclaim
Effectively we need memory for use by the filesystem but we can't go
back to the filesystem to claim it. Without the suggested patch we
would actually try to claim space from the filesystem which would work
most of the times but would deadlock occasionally. With the suggested
patch as you have pointed out we can possibly get into a low memory
hang. I am not sure there is a way out of this, should this be
considered as genuinely low memory condition out of which the system
might or might not crawl out of ?
regards
--
Anand Mitra
[-- Attachment #2: Type: text/html, Size: 2346 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc()
2011-03-12 2:05 ` Anand Mitra
@ 2011-03-13 1:22 ` David Rientjes
0 siblings, 0 replies; 6+ messages in thread
From: David Rientjes @ 2011-03-13 1:22 UTC (permalink / raw)
To: Anand Mitra; +Cc: Prasad Joshi, linux-mm, Andrew Morton
On Fri, 11 Mar 2011, Anand Mitra wrote:
> I'll repeat my understanding of the scenario you have pointed out to
> make sure we have understood you correctly.
>
> On the broad level the changes will cause a __GFP_NOFS flag to be
> present in pte allocation which were earlier absent. The impact of
> this is serious when both __GFP_REPEAT and __GFP_NOFS is set because
>
> 1) __GFP_NOFS will result in very few pages being reclaimed (can't go
> to the filesystems)
> 2) __GFP_REPEAT will cause both the reclaim and allocation to retry
> more aggressively if not indefinitely based on the influence the
> flag in functions should_alloc_retry & should_continue_reclaim
>
Yes, __GFP_REPEAT will loop in the page allocator forever if no pages can
be reclaimed, probably as the result of being !__GFP_FS -- the oom killer
also won't kill any processes to free memory because it requires __GFP_FS
(to ensure we don't kill something unnecessarily just because this
allocation is !__GFP_FS and direct reclaim has a high liklihood of
failure).
> Effectively we need memory for use by the filesystem but we can't go
> back to the filesystem to claim it. Without the suggested patch we
> would actually try to claim space from the filesystem which would work
> most of the times but would deadlock occasionally. With the suggested
> patch as you have pointed out we can possibly get into a low memory
> hang. I am not sure there is a way out of this, should this be
> considered as genuinely low memory condition out of which the system
> might or might not crawl out of ?
>
As suggested in my email, I think you should pass "GFP_KERNEL |
__GFP_REPEAT" into the lower level functions in this patchset instead of
just GFP_KERNEL and not hard-wire __GFP_REPEAT into the lower level
functions. GFP_NOFS | __GFP_REPEAT is a very risky combination that
shouldn't be used anywhere in the kernel because it risks infinitely
looping in the page allocator when memory is low. The callers passing
only GFP_NOFS should handle the possiblity of returning NULL
appropraitely.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-13 1:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-11 20:35 [RFC][PATCH 00/25]: Propagating GFP_NOFS inside __vmalloc() Prasad Joshi
2011-03-11 21:01 ` David Rientjes
2011-03-11 21:13 ` Prasad Joshi
2011-03-11 23:08 ` David Rientjes
2011-03-12 2:05 ` Anand Mitra
2011-03-13 1:22 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox