From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id 89BA56B0005 for ; Tue, 31 Jul 2018 09:55:40 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id e5-v6so2700929itf.3 for ; Tue, 31 Jul 2018 06:55:40 -0700 (PDT) Received: from us.icdsoft.com (us.icdsoft.com. [192.252.146.184]) by mx.google.com with ESMTPS id 123-v6si10754669iox.169.2018.07.31.06.55.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Jul 2018 06:55:38 -0700 (PDT) Subject: Re: [Bug 200651] New: cgroups iptables-restor: vmalloc: allocation failure References: <67d5e4ef-c040-6852-ad93-6f2528df0982@suse.cz> <20180726074219.GU28386@dhcp22.suse.cz> <36043c6b-4960-8001-4039-99525dcc3e05@suse.cz> <20180726080301.GW28386@dhcp22.suse.cz> <98788618-94dc-5837-d627-8bbfa1ddea57@icdsoft.com> <20180730135744.GT24267@dhcp22.suse.cz> <89ea4f56-6253-4f51-0fb7-33d7d4b60cfa@icdsoft.com> <20180730183820.GA24267@dhcp22.suse.cz> <56597af4-73c6-b549-c5d5-b3a2e6441b8e@icdsoft.com> <6838c342-2d07-3047-e723-2b641bc6bf79@suse.cz> From: Georgi Nikolov Message-ID: <8105b7b3-20d3-5931-9f3c-2858021a4e12@icdsoft.com> Date: Tue, 31 Jul 2018 16:55:26 +0300 MIME-Version: 1.0 In-Reply-To: <6838c342-2d07-3047-e723-2b641bc6bf79@suse.cz> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka , Michal Hocko Cc: Andrew Morton , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, netfilter-devel@vger.kernel.org, fw@strlen.de On 07/31/2018 09:38 AM, Vlastimil Babka wrote: > On 07/30/2018 08:51 PM, Georgi Nikolov wrote: >> On 07/30/2018 09:38 PM, Michal Hocko wrote: >>> On Mon 30-07-18 18:54:24, Georgi Nikolov wrote: >>> [...] >>>> No i was wrong. The regression starts actually with 0537250fdc6c8. >>>> - old code, which opencodes kvmalloc, is masking error but error is = there >>>> - kvmalloc without GFP_NORETRY works fine, but probably can consume = a >>>> lot of memory - commit: eacd86ca3b036 >>>> - kvmalloc with GFP_NORETRY shows error - commit: 0537250fdc6c8 >>> OK. >>> >>>>>> What is correct way to fix it. >>>>>> - inside xt_alloc_table_info remove GFP_NORETRY from kvmalloc or a= dd >>>>>> this flag only for sizes bigger than some threshold >>>>> This would reintroduce issue fixed by 0537250fdc6c8. Note that >>>>> kvmalloc(GFP_KERNEL | __GFP_NORETRY) is more or less equivalent to = the >>>>> original code (well, except for __GFP_NOWARN). >>>> So probably we should pass GFP_NORETRY only for large requests (abov= e >>>> some threshold). >>> What would be the treshold? This is not really my area so I just want= ed >>> to keep the original code semantic. >>> =20 >>>>>> - inside kvmalloc_node remove GFP_NORETRY from >>>>>> __vmalloc_node_flags_caller (i don't know if it honors this flag, = or >>>>>> the problem is elsewhere) >>>>> No, not really. This is basically equivalent to kvmalloc(GFP_KERNEL= ). >>>>> >>>>> I strongly suspect that this is not a regression in this code but r= ather >>>>> a side effect of larger memory fragmentation caused by something el= se. >>>>> In any case do you see this failure also without artificial test ca= se >>>>> with a standard workload? >>>> Yes i can see failures with standard workload, in fact it was hard t= o >>>> reproduce it. >>>> Here is the error from production servers where allocation is smalle= r: >>>> iptables: vmalloc: allocation failure, allocated 131072 of 225280 by= tes, >>>> mode:0x14010c0(GFP_KERNEL|__GFP_NORETRY), nodemask=3D(null) >>>> >>>> I didn't understand if vmalloc honors GFP_NORETRY. >>> 0537250fdc6c8 changelog tries to explain. kvmalloc doesn't really >>> support the GFP_NORETRY remantic because that would imply the request= >>> wouldn't trigger the oom killer but in rare cases this might happen >>> (e.g. when page tables are allocated because those are hardcoded GFP_= KERNEL). >>> >>> That being said, I have no objection to use GFP_KERNEL if it helps re= al >>> workloads but we probably need some cap... >> Probably Vlastimil Babka can propose some limit: > No, I think that's rather for the netfilter folks to decide. However, i= t > seems there has been the debate already [1] and it was not found. The > conclusion was that __GFP_NORETRY worked fine before, so it should work= > again after it's added back. But now we know that it doesn't... > > [1] https://lore.kernel.org/lkml/20180130140104.GE21609@dhcp22.suse.cz/= T/#u Yes i see. I will add Florian Westphal to CC list. netfilter-devel is already in this list so probably have to wait for their opinion. >> On Thu 26-07-18 09:18:57, Vlastimil Babka wrote: >> This is likely the kvmalloc() in xt_alloc_table_info(). Between 4.13 a= nd >> 4.17 it shouldn't use __GFP_NORETRY, but looks like commit 0537250fdc6= c >> ("netfilter: x_tables: make allocation less aggressive") was backporte= d >> to 4.14. Removing __GFP_NORETRY might help here, but bring back other >> issues. Less than 4MB is not that much though, maybe find some "sane" >> limit and use __GFP_NORETRY only above that? >> >> >> Regards, >> >> -- >> Georgi Nikolov >> >> Regards, -- Georgi Nikolov