linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: "Gražvydas Ignotas" <notasas@gmail.com>
Cc: Wei Wang <weiwan@google.com>, Shakeel Butt <shakeelb@google.com>,
	Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org
Subject: Re: UDP rx packet loss in a cgroup with a memory limit
Date: Wed, 17 Aug 2022 13:13:56 -0400	[thread overview]
Message-ID: <Yv0h1PFxmK7rVWpy@cmpxchg.org> (raw)
In-Reply-To: <CANOLnOPeOi0gxYwd5+ybdv5w=RZEh5JakJPE9xgrSL1cecZHbw@mail.gmail.com>

On Wed, Aug 17, 2022 at 07:50:13PM +0300, Gražvydas Ignotas wrote:
> On Tue, Aug 16, 2022 at 9:52 PM Gražvydas Ignotas <notasas@gmail.com> wrote:
> > Basically, when there is git activity in the container with a memory
> > limit, other processes in the same container start to suffer (very)
> > occasional network issues (mostly DNS lookup failures).
> 
> ok I've traced this and it's failing in try_charge_memcg(), which
> doesn't seem to be trying too hard because it's called from irq
> context.
> 
> Here is the backtrace:
>  <IRQ>
>  ? fib_validate_source+0xb4/0x100
>  ? ip_route_input_slow+0xa11/0xb70
>  mem_cgroup_charge_skmem+0x4b/0xf0
>  __sk_mem_raise_allocated+0x17f/0x3e0
>  __udp_enqueue_schedule_skb+0x220/0x270
>  udp_queue_rcv_one_skb+0x330/0x5e0
>  udp_unicast_rcv_skb+0x75/0x90
>  __udp4_lib_rcv+0x1ba/0xca0
>  ? ip_rcv_finish_core.constprop.0+0x63/0x490
>  ip_protocol_deliver_rcu+0xd6/0x230
>  ip_local_deliver_finish+0x73/0xa0
>  __netif_receive_skb_one_core+0x8b/0xa0
>  process_backlog+0x8e/0x120
>  __napi_poll+0x2c/0x160
>  net_rx_action+0x2a2/0x360
>  ? rebalance_domains+0xeb/0x3b0
>  __do_softirq+0xeb/0x2eb
>  __irq_exit_rcu+0xb9/0x110
>  sysvec_apic_timer_interrupt+0xa2/0xd0
>  </IRQ>
> 
> Calling mem_cgroup_print_oom_meminfo() in such a case reveals:
> 
> memory: usage 7812476kB, limit 7812500kB, failcnt 775198
> swap: usage 0kB, limit 0kB, failcnt 0
> Memory cgroup stats for
> /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb8f4f0e9_fb95_4f2d_8443_e6a78f235c9a.slice/docker-9e7cad93b2e0774d49148474989b41fe6d67a5985d059d08d9d64495f1539a81.scope:
> anon 348016640
> file 7502163968
> kernel 146997248
> kernel_stack 327680
> pagetables 2224128
> percpu 0
> sock 4096
> vmalloc 0
> shmem 0
> zswap 0
> zswapped 0
> file_mapped 112041984
> file_dirty 1181028352
> file_writeback 2686976
> swapcached 0
> anon_thp 44040192
> file_thp 0
> shmem_thp 0
> inactive_anon 350756864
> active_anon 36864
> inactive_file 3614003200
> active_file 3888070656
> unevictable 0
> slab_reclaimable 143692600
> slab_unreclaimable 545120
> slab 144237720
> workingset_refault_anon 0
> workingset_refault_file 2318
> workingset_activate_anon 0
> workingset_activate_file 2318
> workingset_restore_anon 0
> workingset_restore_file 0
> workingset_nodereclaim 0
> pgfault 334152
> pgmajfault 1238
> pgrefill 3400
> pgscan 819608
> pgsteal 791005
> pgactivate 949122
> pgdeactivate 1694
> pglazyfree 0
> pglazyfreed 0
> zswpin 0
> zswpout 0
> thp_fault_alloc 709
> thp_collapse_alloc 0
> 
> So it basically renders UDP inoperable because of disk cache. I hope
> this is not the intended behavior. Naturally booting with
> cgroup.memory=nosocket solves this issue.

This is most likely a regression caused by this patch:

commit 4b1327be9fe57443295ae86fe0fcf24a18469e9f
Author: Wei Wang <weiwan@google.com>
Date:   Tue Aug 17 12:40:03 2021 -0700

    net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()
    
    Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(),
    to give more control to the networking stack and enable it to change
    memcg charging behavior. In the future, the networking stack may decide
    to avoid oom-kills when fallbacks are more appropriate.
    
    One behavior change in mem_cgroup_charge_skmem() by this patch is to
    avoid force charging by default and let the caller decide when and if
    force charging is needed through the presence or absence of
    __GFP_NOFAIL.
    
    Signed-off-by: Wei Wang <weiwan@google.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

We never used to fail these allocations. Cgroups don't have a
kswapd-style watermark reclaimer, so the network relied on
force-charging and leaving reclaim to allocations that can block.
Now it seems network packets could just fail indefinitely.

The changelog is a bit terse given how drastic the behavior change
is. Wei, Shakeel, can you fill in why this was changed? Can we revert
this for the time being?


       reply	other threads:[~2022-08-17 17:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANOLnON11vzvVdyJfW+QJ36siWR4-s=HJ2aRKpRy7CP=aRPoSw@mail.gmail.com>
     [not found] ` <CANOLnOPeOi0gxYwd5+ybdv5w=RZEh5JakJPE9xgrSL1cecZHbw@mail.gmail.com>
2022-08-17 17:13   ` Johannes Weiner [this message]
2022-08-17 17:37     ` Shakeel Butt
2022-08-17 18:16       ` Wei Wang
2022-08-17 20:12         ` Gražvydas Ignotas
2022-10-13  4:36           ` Shakeel Butt
2022-10-13 14:22             ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv0h1PFxmK7rVWpy@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=notasas@gmail.com \
    --cc=shakeelb@google.com \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox