From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F1CAC33CB1 for ; Fri, 17 Jan 2020 07:58:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 663572073A for ; Fri, 17 Jan 2020 07:58:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 663572073A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 01B726B0329; Fri, 17 Jan 2020 02:58:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F0E8E6B032A; Fri, 17 Jan 2020 02:58:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4A526B032B; Fri, 17 Jan 2020 02:58:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id CF5806B0329 for ; Fri, 17 Jan 2020 02:58:03 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 829202479 for ; Fri, 17 Jan 2020 07:58:03 +0000 (UTC) X-FDA: 76386372846.03.tank55_8081fb5fbf428 X-HE-Tag: tank55_8081fb5fbf428 X-Filterd-Recvd-Size: 5715 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Jan 2020 07:58:02 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 77C6DADDF; Fri, 17 Jan 2020 07:58:01 +0000 (UTC) Date: Fri, 17 Jan 2020 08:57:59 +0100 From: Michal Hocko To: Li Xinhai Cc: Mike Kravetz , "linux-mm@kvack.org" , akpm , "yang.shi" , n-horiguchi Subject: Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone Message-ID: <20200117075759.GH19428@dhcp22.suse.cz> References: <1578993378-10860-1-git-send-email-lixinhai.lxh@gmail.com> <1578993378-10860-2-git-send-email-lixinhai.lxh@gmail.com> <2020011422092314671410@gmail.com> <20200116075933.GN19428@dhcp22.suse.cz> <20200117103819968585195@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20200117103819968585195@gmail.com> User-Agent: Mutt/1.12.2 (2019-09-21) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 17-01-20 10:38:21, Li Xinhai wrote: > On 2020-01-17=A0at 03:22=A0Mike Kravetz=A0wrote: > >On 1/15/20 11:59 PM, Michal Hocko wrote: > >> On Wed 15-01-20 13:07:17, Mike Kravetz wrote: > >>> What should we do? > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>> 1) Nothing more than optimizations by Li Xinhai.=A0 Behavior that c= ould be > >>>=A0=A0=A0 seen as conflicting with man page has existed since v3.12 = and I am > >>>=A0=A0=A0 not aware of any complaints. > >>> 2) In addition to optimizations by Li Xinhai, modify code to truly = ignore > >>>=A0=A0=A0 MPOL_MF_STRICT for huge page mappings.=A0 This would be fa= irly easy to do > >>>=A0=A0=A0 after a failure of migrate_pages().=A0 We could simply tra= verse the list > >>>=A0=A0=A0 of pages that were not migrated looking for any non-hugetl= b page. > >>> 3) Remove the statement "MPOL_MF_STRICT is ignored on huge page map= pings." > >>>=A0=A0=A0 and modify code accordingly. > >>> > >>> My suggestion would be for 1 or 2.=A0 Thoughts? > >> > >> And why do we exactly need to do anything at all? There is an > >> inconsistency that has been there for years without anybody noticing= . > >> NUMA API is a mess on its own and unfixable at this stage, there wil= l > >> always be some corner cases. If there is no real workload hitting th= is > >> incosistency and suffering, I would rather not touch this at all. > >> Unless the change would clean up the code or make it more maintainab= le. > > > >That is a very valid point.=A0 Sometimes we as developers get focused = on the > >actual code changes and fail to ask the question "does this really nee= d to > >be changed?" or "what value do the code changes provide?". > > > >Li Xinhai came up with two optimizations in how the mbind code deals w= ith > >hugetlb pages.=A0 This 'sub-optimal' code has existed for more than 6 = years. > >Unless I am mistaken, nobody has actually complained or noticed this b= ehavior. > >I believe Li Xinhai noticed this inefficient code via code inspection.= =A0 Of > >course, based on what we know today one could write a test program to = show > >the inefficient behavior.=A0 However, no real users have noticed this = during > >the past 6 years. > > > >The proposed code changes are fairly simple.=A0 However, I would not s= ay that > >they clean up the code or make it more maintainable.=A0 They essential= ly add > >or modify two checks to bail out early for hugetlb vma's if the flag w= hich > >is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specif= ied. > >If one is trying to follow the entire mbind code path for hugetlb page= s, > >these patches will make that easier follow/understand.=A0 That is simp= ly > >because one can ignore downstream code/functionality. > > > >Based on Michal's criteria above, I now believe the code changes shoul= d not > >be made.=A0 Yes, they are fairly simple.=A0 However, even simple chang= es have > >the potential to break something (build breakage with v1 of patch).=A0= We should > >leave this code as is unless issues are reported by users.=20 >=20 > Acctually I am the user of this API, and when using STRICT alone to kno= w if > pages are misplaced and take action later, it seems=A0don't work consis= tantly > on hugepage. Initially,=A0I assumed that I didn't use it correctly, tha= t flag must > use with MOVE*? After check the code, found that flag didn't been handl= ed, > and finally noticed that there is a NOTE about it in MAN page. This is the first time you are mentioning an actual use case. This should have been expressed from the very begining. Including an explanation of what the use case is and how it is affected by the current behavior. > I'd like the STRICT been handled, but at present since this is not goin= g to > be implemented, I have to handle differently on hugetlb mapping. >=20 > At meantime, I thought that this patch can be useful and posted it, bec= ause > for scenarios=A0where hugetlb mapping are handled with other mappings, = less > cost for the mbind call.=A0(although it=A0didn't help my current case) >=20 > I think if we could provid better performance, why not do that only bec= ause > that code has exists for 6 years? Do you have any numbers to prove performance improvements? I believe arguments against the patch have been already presented. --=20 Michal Hocko SUSE Labs