From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC561C25B08 for ; Wed, 17 Aug 2022 18:16:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB69B6B0073; Wed, 17 Aug 2022 14:16:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A3FA16B0074; Wed, 17 Aug 2022 14:16:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B9868D0001; Wed, 17 Aug 2022 14:16:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 75F6C6B0073 for ; Wed, 17 Aug 2022 14:16:30 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4FDAF1A057A for ; Wed, 17 Aug 2022 18:16:30 +0000 (UTC) X-FDA: 79809889698.01.62E6B57 Received: from mail-ua1-f50.google.com (mail-ua1-f50.google.com [209.85.222.50]) by imf02.hostedemail.com (Postfix) with ESMTP id D330E801DA for ; Wed, 17 Aug 2022 18:16:29 +0000 (UTC) Received: by mail-ua1-f50.google.com with SMTP id h19so1972032uan.9 for ; Wed, 17 Aug 2022 11:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc; bh=jF6fOBoAniDTOaX3xOb+ItOH+jdFpFd4N7hbk4radeI=; b=Hy/0vTxvR/IbdTJaHDjir14s55iqwxip4Iepnmjnm32M6/zAZLAYOTlvhqRchovhvo G9tgYsrpU/Qtf1Dg834zqh1N3EG9qCrns0MNfm6RJrtrslpGWI9Jddt/omTIRAdpJV5E sWXz9NdtAFabSfm8UfzSSRSo2wJel32FOY7ynoCJvJ+HTZ1X3Mkv6J6NWZiFrBSgggUN rZUclZaZXkggKv4BBz1kq7kVNCJL9NPoSadKfUEE0QU+1A6h5ObMmZOOe3ajGWjsk5Ex kCqwVSJR1Q+dkYfVbmqgiy92Y8oJbp293OoWH7E0SRfIVCXxm0GZwGr4pyuiTMr0ao2G htug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=jF6fOBoAniDTOaX3xOb+ItOH+jdFpFd4N7hbk4radeI=; b=6/Nylbbafof9FKFueVIe0Awx1z3abk2FlVi9lODO54GhEi4oF8wDUuec0XLCcWXpdB +/BpBYQRJXvktUn89o5csCZArQGb9zVv5LE/5EBDd45ihN+e0MY6IGjM+qjUcV+mTDtA 4+zkTQIWqqINlzKd9g/RZ2rq8WAi6yRzlVER543t18v0oxLENl4KIVpAyCoesIe4wOOt D/dK0NCPKKP8L3Rz5JWo+nG4MgE/2eRwMVd52J8HBj5DhLWIKFc62FHKKVXOisW0dk9C cLNr6iyVDQFECi1ykiq1KMtHSrO2mtKznL+sAJN7tfZHe+/0xQ00Ni0+qp4uzaSmP+PP YxAw== X-Gm-Message-State: ACgBeo2fOIVJ6gmpVXg9kNXblQHeTZM3uWz4XlVt06Xoa4OoJ/1e4env S+WufIn1VkBP18K3b9Vcc9HJOmv9G2gqyK91uYlGxA== X-Google-Smtp-Source: AA6agR7I4J9m2paPHhPmXbf3JfEy7X63kMNvlXJ2YCx+5DxM+t4a5mcrSEAv8SgS9oc1EzYeZEImA6WG5Gr2J7tBg/U= X-Received: by 2002:a9f:3641:0:b0:384:78e4:3b9d with SMTP id s1-20020a9f3641000000b0038478e43b9dmr11281756uad.90.1660760188926; Wed, 17 Aug 2022 11:16:28 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wei Wang Date: Wed, 17 Aug 2022 11:16:18 -0700 Message-ID: Subject: Re: UDP rx packet loss in a cgroup with a memory limit To: Shakeel Butt Cc: Johannes Weiner , Eric Dumazet , netdev , =?UTF-8?Q?Gra=C5=BEvydas_Ignotas?= , Michal Hocko , Roman Gushchin , Linux MM , Cgroups Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Hy/0vTxv"; spf=pass (imf02.hostedemail.com: domain of weiwan@google.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=weiwan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660760189; a=rsa-sha256; cv=none; b=jwA/qVrsqrl3WVwyzajcRg6ax+ImdrTMSFcdHOR4EabM0brk4ZeVgEqRKiYcTpQjVoOzMK Zvfc9LHi/IEIoHmKiiZU35YsgEMT4cB+rsXXEXAuOX7Q6Yc+Vh/4R53YwONL3Ni15N85eL e9qbQVu4zFz0m2OAS+p0cQ1yJURgwRU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660760189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jF6fOBoAniDTOaX3xOb+ItOH+jdFpFd4N7hbk4radeI=; b=v9f6VTDJbBTTH+UW7lQVV+js4C6SgVogk3tcYFB8Zh9Z7AyQ3x4Ef7v1ntRTlXCYzLlzCj VDsFH5iRQYaYKPnpojkglkviJm2/zvufWNVCNE5qPYZJFOOFOLsBontCHMGxoo2GCXnx/p 29sx2gz5fbZews83rQtz6fUQy044SvI= Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Hy/0vTxv"; spf=pass (imf02.hostedemail.com: domain of weiwan@google.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=weiwan@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: xx5o4qmzufrgtbpq74udbizsdnbwy46d X-Rspamd-Queue-Id: D330E801DA X-HE-Tag: 1660760189-21797 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 17, 2022 at 10:37 AM Shakeel Butt wrote: > > + Eric and netdev > > On Wed, Aug 17, 2022 at 10:13 AM Johannes Weiner wro= te: > > > > On Wed, Aug 17, 2022 at 07:50:13PM +0300, Gra=C5=BEvydas Ignotas wrote: > > > On Tue, Aug 16, 2022 at 9:52 PM Gra=C5=BEvydas Ignotas wrote: > > > > Basically, when there is git activity in the container with a memor= y > > > > limit, other processes in the same container start to suffer (very) > > > > occasional network issues (mostly DNS lookup failures). > > > > > > ok I've traced this and it's failing in try_charge_memcg(), which > > > doesn't seem to be trying too hard because it's called from irq > > > context. > > > > > > Here is the backtrace: > > > > > > ? fib_validate_source+0xb4/0x100 > > > ? ip_route_input_slow+0xa11/0xb70 > > > mem_cgroup_charge_skmem+0x4b/0xf0 > > > __sk_mem_raise_allocated+0x17f/0x3e0 > > > __udp_enqueue_schedule_skb+0x220/0x270 > > > udp_queue_rcv_one_skb+0x330/0x5e0 > > > udp_unicast_rcv_skb+0x75/0x90 > > > __udp4_lib_rcv+0x1ba/0xca0 > > > ? ip_rcv_finish_core.constprop.0+0x63/0x490 > > > ip_protocol_deliver_rcu+0xd6/0x230 > > > ip_local_deliver_finish+0x73/0xa0 > > > __netif_receive_skb_one_core+0x8b/0xa0 > > > process_backlog+0x8e/0x120 > > > __napi_poll+0x2c/0x160 > > > net_rx_action+0x2a2/0x360 > > > ? rebalance_domains+0xeb/0x3b0 > > > __do_softirq+0xeb/0x2eb > > > __irq_exit_rcu+0xb9/0x110 > > > sysvec_apic_timer_interrupt+0xa2/0xd0 > > > > > > > > > Calling mem_cgroup_print_oom_meminfo() in such a case reveals: > > > > > > memory: usage 7812476kB, limit 7812500kB, failcnt 775198 > > > swap: usage 0kB, limit 0kB, failcnt 0 > > > Memory cgroup stats for > > > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb8f4f0= e9_fb95_4f2d_8443_e6a78f235c9a.slice/docker-9e7cad93b2e0774d49148474989b41f= e6d67a5985d059d08d9d64495f1539a81.scope: > > > anon 348016640 > > > file 7502163968 > > > kernel 146997248 > > > kernel_stack 327680 > > > pagetables 2224128 > > > percpu 0 > > > sock 4096 > > > vmalloc 0 > > > shmem 0 > > > zswap 0 > > > zswapped 0 > > > file_mapped 112041984 > > > file_dirty 1181028352 > > > file_writeback 2686976 > > > swapcached 0 > > > anon_thp 44040192 > > > file_thp 0 > > > shmem_thp 0 > > > inactive_anon 350756864 > > > active_anon 36864 > > > inactive_file 3614003200 > > > active_file 3888070656 > > > unevictable 0 > > > slab_reclaimable 143692600 > > > slab_unreclaimable 545120 > > > slab 144237720 > > > workingset_refault_anon 0 > > > workingset_refault_file 2318 > > > workingset_activate_anon 0 > > > workingset_activate_file 2318 > > > workingset_restore_anon 0 > > > workingset_restore_file 0 > > > workingset_nodereclaim 0 > > > pgfault 334152 > > > pgmajfault 1238 > > > pgrefill 3400 > > > pgscan 819608 > > > pgsteal 791005 > > > pgactivate 949122 > > > pgdeactivate 1694 > > > pglazyfree 0 > > > pglazyfreed 0 > > > zswpin 0 > > > zswpout 0 > > > thp_fault_alloc 709 > > > thp_collapse_alloc 0 > > > > > > So it basically renders UDP inoperable because of disk cache. I hope > > > this is not the intended behavior. Naturally booting with > > > cgroup.memory=3Dnosocket solves this issue. > > > > This is most likely a regression caused by this patch: > > > > commit 4b1327be9fe57443295ae86fe0fcf24a18469e9f > > Author: Wei Wang > > Date: Tue Aug 17 12:40:03 2021 -0700 > > > > net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem() > > > > Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(), > > to give more control to the networking stack and enable it to chang= e > > memcg charging behavior. In the future, the networking stack may de= cide > > to avoid oom-kills when fallbacks are more appropriate. > > > > One behavior change in mem_cgroup_charge_skmem() by this patch is t= o > > avoid force charging by default and let the caller decide when and = if > > force charging is needed through the presence or absence of > > __GFP_NOFAIL. > > > > Signed-off-by: Wei Wang > > Reviewed-by: Shakeel Butt > > Signed-off-by: David S. Miller > > > > We never used to fail these allocations. Cgroups don't have a > > kswapd-style watermark reclaimer, so the network relied on > > force-charging and leaving reclaim to allocations that can block. > > Now it seems network packets could just fail indefinitely. > > > > The changelog is a bit terse given how drastic the behavior change > > is. Wei, Shakeel, can you fill in why this was changed? Can we revert > > this for the time being? > > Does reverting the patch fix the issue? However I don't think it will. > > Please note that we still have the force charging as before this > patch. Previously when mem_cgroup_charge_skmem() force charges, it > returns false and __sk_mem_raise_allocated takes suppress_allocation > code path. Based on some heuristics, it may allow it or it may > uncharge and return failure. The force charging logic in __sk_mem_raise_allocated only gets considered on tx path for STREAM socket. So it probably does not take effect on UDP path. And, that logic is NOT being altered in the above patch. So specifically for UDP receive path, what happens in __sk_mem_raise_allocated() BEFORE the above patch is: - mem_cgroup_charge_skmem() gets called: - try_charge() with GFP_NOWAIT gets called and failed - try_charge() with __GFP_NOFAIL - return false - goto suppress_allocation: - mem_cgroup_uncharge_skmem() gets called - return 0 (which means failure) AFTER the above patch, what happens in __sk_mem_raise_allocated() is: - mem_cgroup_charge_skmem() gets called: - try_charge() with GFP_NOWAIT gets called and failed - return false - goto suppress_allocation: - We no longer calls mem_cgroup_uncharge_skmem() - return 0 So I agree with Shakeel, that this change shouldn't alter the behavior of the above call path in such a situation. But do let us know if reverting this change has any effect on your test. > > The given patch has not changed any heuristic. It has only changed > when forced charging happens. After the path the initial call > mem_cgroup_charge_skmem() can fail and we take suppress_allocation > code path and if heuristics allow, we force charge with __GFP_NOFAIL.