linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>, Neil Brown <neilb@suse.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mike Christie <michaelc@cs.wisc.edu>,
	Eric B Munson <emunson@mgebm.net>,
	Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Subject: Re: [PATCH 08/17] net: Do not coalesce skbs belonging to PFMEMALLOC sockets
Date: Wed, 20 Jun 2012 14:04:52 +0200	[thread overview]
Message-ID: <1340193892.4604.865.camel@edumazet-glaptop> (raw)
In-Reply-To: <1340192652-31658-9-git-send-email-mgorman@suse.de>

On Wed, 2012-06-20 at 12:44 +0100, Mel Gorman wrote:
> Commit [bad43ca8: net: introduce skb_try_coalesce()] introduced an
> optimisation to coalesce skbs to reduce memory usage and cache line
> misses. In the case where the socket is used for swapping this can result
> in a warning like the following.
> 
> [  110.476565] nbd0: page allocation failure: order:0, mode:0x20
> [  110.476568] Pid: 2714, comm: nbd0 Not tainted 3.5.0-rc2-swapnbd-v12r2-slab #3
> [  110.476569] Call Trace:
> [  110.476573]  [<ffffffff811042d3>] warn_alloc_failed+0xf3/0x160
> [  110.476578]  [<ffffffff81107c92>] __alloc_pages_nodemask+0x6e2/0x930
> [  110.476582]  [<ffffffff81107c92>] ?  __alloc_pages_nodemask+0x6e2/0x930
> [  110.476588]  [<ffffffff81149f09>] kmem_getpages+0x59/0x1a0
> [  110.476593]  [<ffffffff8114ae5b>] fallback_alloc+0x17b/0x260
> [  110.476597]  [<ffffffff8114ac26>] ____cache_alloc_node+0x96/0x150
> [  110.476602]  [<ffffffff8114a458>] kmem_cache_alloc_node+0x78/0x1b0
> [  110.476607]  [<ffffffff8136c127>] __alloc_skb+0x57/0x1e0
> [  110.476612]  [<ffffffff813b9f81>] sk_stream_alloc_skb+0x41/0x120
> [  110.476617]  [<ffffffff813c8c72>] tcp_fragment+0x62/0x370
> [  110.476622]  [<ffffffff813c8fb9>] tso_fragment+0x39/0x180
> [  110.476628]  [<ffffffff813ca2a9>] tcp_write_xmit+0x1a9/0x3f0
> [  110.476634]  [<ffffffff813ca556>] __tcp_push_pending_frames+0x26/0xd0
> [  110.476639]  [<ffffffff813c61f5>] tcp_rcv_established+0x385/0x760
> [  110.476644]  [<ffffffff813ce671>] tcp_v4_do_rcv+0x111/0x1f0
> [  110.476648]  [<ffffffff81367259>] release_sock+0x99/0x140
> [  110.476652]  [<ffffffff813ba82b> tcp_sendmsg+0x7cb/0xe80
> [  110.476657]  [<ffffffff813df9b4>] inet_sendmsg+0x64/0xb0
> [  110.476661]  [<ffffffff811f0a00>] ? security_socket_sendmsg+0x10/0x20
> [  110.476666]  [<ffffffff81361dd8>] sock_sendmsg+0xf8/0x130
> [  110.476672]  [<ffffffff8124ba4c>] ? cpumask_next_and+0x3c/0x50
> [  110.476677]  [<ffffffff8107b053>] ? update_sd_lb_stats+0x123/0x620
> [  110.476683]  [<ffffffff8105164f>] ? recalc_sigpending+0x1f/0x70
> [  110.476688]  [<ffffffff81051e17>] ? __set_task_blocked+0x37/0x80
> [  110.476693]  [<ffffffff81361e51>] kernel_sendmsg+0x41/0x60
> [  110.476698]  [<ffffffffa048d417>] sock_xmit+0xb7/0x300 [nbd]
> [  110.476703]  [<ffffffff8107bad7>] ? load_balance+0xd7/0x490
> [  110.476710]  [<ffffffffa048d7ac>] nbd_send_req+0x14c/0x270 [nbd]
> [  110.476716]  [<ffffffffa048e21e>] nbd_handle_req+0x9e/0x180 [nbd]
> [  110.476721]  [<ffffffffa048e4f2>] nbd_thread+0xb2/0x150 [nbd]
> [  110.476725]  [<ffffffff81062580>] ? wake_up_bit+0x40/0x40
> [  110.476730]  [<ffffffffa048e440>] ? do_nbd_request+0x140/0x140 [nbd]
> [  110.476733]  [<ffffffff81061d7e>] kthread+0x9e/0xb0
> [  110.476739]  [<ffffffff81439d64>] kernel_thread_helper+0x4/0x10
> [  110.476743]  [<ffffffff81061ce0>] ? flush_kthread_worker+0xc0/0xc0
> [  110.476748]  [<ffffffff81439d60>] ? gs_change+0x13/0x13
> 
> There were two ways this could be addressed. The first would be to
> teach __tcp_push_pending_frames() to use __GFP_MEMALLOC if the socket
> has SOCK_MEMALLOC set. This potentially defers the time of allocation
> to a point where we are applying greater pressure on PFMEMALLOC reserves
> which is undesirable.  The second approach is to disable skb coalescing
> for SOCK_MEMALLOC sockets and process them immediately. This patch takes
> the second approach.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  net/core/skbuff.c    |    7 +++++++
>  net/ipv4/tcp_input.c |    8 ++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d78671e..1d6ecc8 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3370,6 +3370,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  
>  	*fragstolen = false;
>  
> +	/*
> +	 * Avoid coalescing of SOCK_MEMALLOC socks are we do not want to defer
> +	 * RX/TX to a time when pfmemallo reserves are under greater pressure
> +	 */
> +	if (sk_memalloc_socks() && sock_flag(to->sk, SOCK_MEMALLOC))
> +		return false;
> +
>  	if (skb_cloned(to))
>  		return false;
>  
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index b224eb8..448f130 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4553,6 +4553,14 @@ static bool tcp_try_coalesce(struct sock *sk,
>  
>  	*fragstolen = false;
>  
> +	/*
> +	 * Do not attempt merging if the socket is used by the VM for swapping.
> +	 * Attempts to defer can result in allocation failures during RX when
> +	 * an attempt is made to push pending frames
> +	 */
> +	if (sk_memalloc_socks() && sock_flag(sk, SOCK_MEMALLOC))
> +		return false;
> +
>  	if (tcp_hdr(from)->fin)
>  		return false;
>  


This makes absolutely no sense to me.

This patch changes input path, while your stack trace is about output
path and a packet being fragmented.

If you really want to avoid this kind of thing, you'll need to disable
TSO/GSO on the socket.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-06-20 12:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 11:43 [PATCH 00/17] Swap-over-NBD without deadlocking V12 (resend) Mel Gorman
2012-06-20 11:43 ` [PATCH 01/17] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman
2012-06-21 21:30   ` Rik van Riel
2012-06-22  9:34     ` Mel Gorman
2012-06-20 11:43 ` [PATCH 02/17] mm: slub: Optimise the SLUB fast path to avoid pfmemalloc checks Mel Gorman
2012-06-20 11:43 ` [PATCH 03/17] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves Mel Gorman
2012-06-20 11:43 ` [PATCH 04/17] mm: allow PF_MEMALLOC from softirq context Mel Gorman
2012-06-20 11:44 ` [PATCH 05/17] mm: Only set page->pfmemalloc when ALLOC_NO_WATERMARKS was used Mel Gorman
2012-06-20 11:44 ` [PATCH 06/17] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK Mel Gorman
2012-06-20 11:44 ` [PATCH 07/17] net: Introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket Mel Gorman
2012-06-20 12:06   ` Eric Dumazet
2012-06-20 14:20     ` Mel Gorman
2012-06-20 11:44 ` [PATCH 08/17] net: Do not coalesce skbs belonging to PFMEMALLOC sockets Mel Gorman
2012-06-20 12:04   ` Eric Dumazet [this message]
2012-06-20 13:36     ` Mel Gorman
2012-06-20 13:51       ` Eric Dumazet
2012-06-20 14:18         ` Mel Gorman
2012-06-20 11:44 ` [PATCH 09/17] netvm: Allow the use of __GFP_MEMALLOC by specific sockets Mel Gorman
2012-06-20 11:44 ` [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves Mel Gorman
2012-06-21 16:09   ` Sebastian Andrzej Siewior
2012-06-22 10:00     ` Mel Gorman
2012-06-21 16:30   ` Sebastian Andrzej Siewior
2012-06-21 16:38     ` Eric Dumazet
2012-06-21 16:43       ` Sebastian Andrzej Siewior
2012-06-22 10:54     ` Mel Gorman
2012-06-22 11:27       ` Sebastian Andrzej Siewior
2012-06-20 11:44 ` [PATCH 11/17] netvm: Propagate page->pfmemalloc to skb Mel Gorman
2012-06-20 11:44 ` [PATCH 12/17] netvm: Propagate page->pfmemalloc from skb_alloc_page " Mel Gorman
2012-06-20 11:44 ` [PATCH 13/17] netvm: Set PF_MEMALLOC as appropriate during SKB processing Mel Gorman
2012-06-20 11:44 ` [PATCH 14/17] mm: Micro-optimise slab to avoid a function call Mel Gorman
2012-06-20 11:44 ` [PATCH 15/17] nbd: Set SOCK_MEMALLOC for access to PFMEMALLOC reserves Mel Gorman
2012-06-20 11:44 ` [PATCH 16/17] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Mel Gorman
2012-06-20 11:44 ` [PATCH 17/17] mm: Account for the number of times direct reclaimers get throttled Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1340193892.4604.865.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=emunson@mgebm.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=michaelc@cs.wisc.edu \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    --cc=sebastian@breakpoint.cc \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox