From: Johannes Weiner <hannes@cmpxchg.org>
To: "Gražvydas Ignotas" <notasas@gmail.com>
Cc: Wei Wang <weiwan@google.com>, Shakeel Butt <shakeelb@google.com>,
Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org
Subject: Re: UDP rx packet loss in a cgroup with a memory limit
Date: Wed, 17 Aug 2022 13:13:56 -0400 [thread overview]
Message-ID: <Yv0h1PFxmK7rVWpy@cmpxchg.org> (raw)
In-Reply-To: <CANOLnOPeOi0gxYwd5+ybdv5w=RZEh5JakJPE9xgrSL1cecZHbw@mail.gmail.com>
On Wed, Aug 17, 2022 at 07:50:13PM +0300, Gražvydas Ignotas wrote:
> On Tue, Aug 16, 2022 at 9:52 PM Gražvydas Ignotas <notasas@gmail.com> wrote:
> > Basically, when there is git activity in the container with a memory
> > limit, other processes in the same container start to suffer (very)
> > occasional network issues (mostly DNS lookup failures).
>
> ok I've traced this and it's failing in try_charge_memcg(), which
> doesn't seem to be trying too hard because it's called from irq
> context.
>
> Here is the backtrace:
> <IRQ>
> ? fib_validate_source+0xb4/0x100
> ? ip_route_input_slow+0xa11/0xb70
> mem_cgroup_charge_skmem+0x4b/0xf0
> __sk_mem_raise_allocated+0x17f/0x3e0
> __udp_enqueue_schedule_skb+0x220/0x270
> udp_queue_rcv_one_skb+0x330/0x5e0
> udp_unicast_rcv_skb+0x75/0x90
> __udp4_lib_rcv+0x1ba/0xca0
> ? ip_rcv_finish_core.constprop.0+0x63/0x490
> ip_protocol_deliver_rcu+0xd6/0x230
> ip_local_deliver_finish+0x73/0xa0
> __netif_receive_skb_one_core+0x8b/0xa0
> process_backlog+0x8e/0x120
> __napi_poll+0x2c/0x160
> net_rx_action+0x2a2/0x360
> ? rebalance_domains+0xeb/0x3b0
> __do_softirq+0xeb/0x2eb
> __irq_exit_rcu+0xb9/0x110
> sysvec_apic_timer_interrupt+0xa2/0xd0
> </IRQ>
>
> Calling mem_cgroup_print_oom_meminfo() in such a case reveals:
>
> memory: usage 7812476kB, limit 7812500kB, failcnt 775198
> swap: usage 0kB, limit 0kB, failcnt 0
> Memory cgroup stats for
> /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb8f4f0e9_fb95_4f2d_8443_e6a78f235c9a.slice/docker-9e7cad93b2e0774d49148474989b41fe6d67a5985d059d08d9d64495f1539a81.scope:
> anon 348016640
> file 7502163968
> kernel 146997248
> kernel_stack 327680
> pagetables 2224128
> percpu 0
> sock 4096
> vmalloc 0
> shmem 0
> zswap 0
> zswapped 0
> file_mapped 112041984
> file_dirty 1181028352
> file_writeback 2686976
> swapcached 0
> anon_thp 44040192
> file_thp 0
> shmem_thp 0
> inactive_anon 350756864
> active_anon 36864
> inactive_file 3614003200
> active_file 3888070656
> unevictable 0
> slab_reclaimable 143692600
> slab_unreclaimable 545120
> slab 144237720
> workingset_refault_anon 0
> workingset_refault_file 2318
> workingset_activate_anon 0
> workingset_activate_file 2318
> workingset_restore_anon 0
> workingset_restore_file 0
> workingset_nodereclaim 0
> pgfault 334152
> pgmajfault 1238
> pgrefill 3400
> pgscan 819608
> pgsteal 791005
> pgactivate 949122
> pgdeactivate 1694
> pglazyfree 0
> pglazyfreed 0
> zswpin 0
> zswpout 0
> thp_fault_alloc 709
> thp_collapse_alloc 0
>
> So it basically renders UDP inoperable because of disk cache. I hope
> this is not the intended behavior. Naturally booting with
> cgroup.memory=nosocket solves this issue.
This is most likely a regression caused by this patch:
commit 4b1327be9fe57443295ae86fe0fcf24a18469e9f
Author: Wei Wang <weiwan@google.com>
Date: Tue Aug 17 12:40:03 2021 -0700
net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()
Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(),
to give more control to the networking stack and enable it to change
memcg charging behavior. In the future, the networking stack may decide
to avoid oom-kills when fallbacks are more appropriate.
One behavior change in mem_cgroup_charge_skmem() by this patch is to
avoid force charging by default and let the caller decide when and if
force charging is needed through the presence or absence of
__GFP_NOFAIL.
Signed-off-by: Wei Wang <weiwan@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We never used to fail these allocations. Cgroups don't have a
kswapd-style watermark reclaimer, so the network relied on
force-charging and leaving reclaim to allocations that can block.
Now it seems network packets could just fail indefinitely.
The changelog is a bit terse given how drastic the behavior change
is. Wei, Shakeel, can you fill in why this was changed? Can we revert
this for the time being?
next parent reply other threads:[~2022-08-17 17:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CANOLnON11vzvVdyJfW+QJ36siWR4-s=HJ2aRKpRy7CP=aRPoSw@mail.gmail.com>
[not found] ` <CANOLnOPeOi0gxYwd5+ybdv5w=RZEh5JakJPE9xgrSL1cecZHbw@mail.gmail.com>
2022-08-17 17:13 ` Johannes Weiner [this message]
2022-08-17 17:37 ` Shakeel Butt
2022-08-17 18:16 ` Wei Wang
2022-08-17 20:12 ` Gražvydas Ignotas
2022-10-13 4:36 ` Shakeel Butt
2022-10-13 14:22 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yv0h1PFxmK7rVWpy@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=notasas@gmail.com \
--cc=shakeelb@google.com \
--cc=weiwan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox