From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f71.google.com (mail-it0-f71.google.com [209.85.214.71]) by kanga.kvack.org (Postfix) with ESMTP id 6D4946B0003 for ; Tue, 7 Aug 2018 07:02:11 -0400 (EDT) Received: by mail-it0-f71.google.com with SMTP id g187-v6so5095602ita.7 for ; Tue, 07 Aug 2018 04:02:11 -0700 (PDT) Received: from us.icdsoft.com (us.icdsoft.com. [192.252.146.184]) by mx.google.com with ESMTPS id j129-v6si707261iof.202.2018.08.07.04.02.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Aug 2018 04:02:09 -0700 (PDT) Subject: Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure From: Georgi Nikolov References: <20180730135744.GT24267@dhcp22.suse.cz> <89ea4f56-6253-4f51-0fb7-33d7d4b60cfa@icdsoft.com> <20180730183820.GA24267@dhcp22.suse.cz> <56597af4-73c6-b549-c5d5-b3a2e6441b8e@icdsoft.com> <6838c342-2d07-3047-e723-2b641bc6bf79@suse.cz> <8105b7b3-20d3-5931-9f3c-2858021a4e12@icdsoft.com> <20180731140520.kpotpihqsmiwhh7l@breakpoint.cc> <20180801083349.GF16767@dhcp22.suse.cz> <20180802085043.GC10808@dhcp22.suse.cz> <85c86f17-6f96-6f01-2a3c-e2bad0ccb317@icdsoft.com> Message-ID: <5b5e872e-5785-2cfd-7d53-e19e017e5636@icdsoft.com> Date: Tue, 7 Aug 2018 14:02:00 +0300 MIME-Version: 1.0 In-Reply-To: <85c86f17-6f96-6f01-2a3c-e2bad0ccb317@icdsoft.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Vlastimil Babka , Florian Westphal , Andrew Morton , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, netfilter-devel@vger.kernel.org On 08/06/2018 11:42 AM, Georgi Nikolov wrote: > On 08/02/2018 11:50 AM, Michal Hocko wrote: >> In other words, why don't we simply do the following? Note that this i= s >> not tested. I have also no idea what is the lifetime of this allocatio= n. >> Is it bound to any specific process or is it a namespace bound? If the= >> later then the memcg OOM killer might wipe the whole memcg down withou= t >> making any progress. This would make the whole namespace unsuable unti= l >> somebody intervenes. Is this acceptable? >> --- >> From 4dec96eb64954a7e58264ed551afadf62ca4c5f7 Mon Sep 17 00:00:00 2001= >> From: Michal Hocko >> Date: Thu, 2 Aug 2018 10:38:57 +0200 >> Subject: [PATCH] netfilter/x_tables: do not fail xt_alloc_table_info t= oo >> easilly >> >> eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() >> in xt_alloc_table_info()") has unintentionally fortified >> xt_alloc_table_info allocation when __GFP_RETRY has been dropped from >> the vmalloc fallback. Later on there was a syzbot report that this >> can lead to OOM killer invocations when tables are too large and >> 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") >> has been merged to restore the original behavior. Georgi Nikolov howev= er >> noticed that he is not able to install his iptables anymore so this ca= n >> be seen as a regression. >> >> The primary argument for 0537250fdc6c was that this allocation path >> shouldn't really trigger the OOM killer and kill innocent tasks. On th= e >> other hand the interface requires root and as such should allow what t= he >> admin asks for. Root inside a namespaces makes this more complicated >> because those might be not trusted in general. If they are not then su= ch >> namespaces should be restricted anyway. Therefore drop the __GFP_NORET= RY >> and replace it by __GFP_ACCOUNT to enfore memcg constrains on it. >> >> Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggres= sive") >> Reported-by: Georgi Nikolov >> Suggested-by: Vlastimil Babka >> Signed-off-by: Michal Hocko >> --- >> net/netfilter/x_tables.c | 7 +------ >> 1 file changed, 1 insertion(+), 6 deletions(-) >> >> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c >> index d0d8397c9588..b769408e04ab 100644 >> --- a/net/netfilter/x_tables.c >> +++ b/net/netfilter/x_tables.c >> @@ -1178,12 +1178,7 @@ struct xt_table_info *xt_alloc_table_info(unsig= ned int size) >> if (sz < sizeof(*info) || sz >=3D XT_MAX_TABLE_SIZE) >> return NULL; >> =20 >> - /* __GFP_NORETRY is not fully supported by kvmalloc but it should >> - * work reasonably well if sz is too large and bail out rather >> - * than shoot all processes down before realizing there is nothing >> - * more to reclaim. >> - */ >> - info =3D kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); >> + info =3D kvmalloc(sz, GFP_KERNEL | __GFP_ACCOUNT); >> if (!info) >> return NULL; >> =20 > I will check if this change fixes the problem. > > Regards, > > -- > Georgi Nikolov I can't reproduce it anymore. If i understand correctly this way memory allocated will be accounted to kmem of this cgroup (if inside cgroup).