From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ECBFCA9EC0 for ; Wed, 30 Oct 2019 04:33:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5FE412083E for ; Wed, 30 Oct 2019 04:33:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5FE412083E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F13CB6B0007; Wed, 30 Oct 2019 00:33:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC2C86B0008; Wed, 30 Oct 2019 00:33:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD7E46B000A; Wed, 30 Oct 2019 00:33:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id B4CB96B0007 for ; Wed, 30 Oct 2019 00:33:05 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 5A9FE8249980 for ; Wed, 30 Oct 2019 04:33:05 +0000 (UTC) X-FDA: 76099181130.06.eggs99_8f3ecf5dbd909 X-HE-Tag: eggs99_8f3ecf5dbd909 X-Filterd-Recvd-Size: 5599 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Wed, 30 Oct 2019 04:33:04 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04446;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0Tgg3fnq_1572409976; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tgg3fnq_1572409976) by smtp.aliyun-inc.com(127.0.0.1); Wed, 30 Oct 2019 12:32:59 +0800 Subject: Re: mbind() breaks its API definition since v5.2 by commit d883544515aa (mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified) To: Li Xinhai , "linux-mm@kvack.org" , akpm , torvalds Cc: Vlastimil Babka , Linux API , Michal Hocko , Hugh Dickins , "linux-kernel@vger.kernel.org" , lixinhai_lxh References: <2019103010274679257634@gmail.com> <2019103011122763779044@gmail.com> From: Yang Shi Message-ID: Date: Tue, 29 Oct 2019 21:32:54 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <2019103011122763779044@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/29/19 8:12 PM, Li Xinhai wrote: > On 2019-10-30=C2=A0at 10:50=C2=A0Yang Shi=C2=A0wrote: >> >> On 10/29/19 7:27 PM, Li Xinhai wrote: >>> One change in do_mbind() of this commit has suspicious usage of retur= n value of >>> queue_pages_range(), excerpt as below: >>> >>> --- >>> @@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, uns= igned long len, >>> =C2=A0 =C2=A0 if (err) >>> =C2=A0 =C2=A0 goto mpol_out; >>> =20 >>> - err =3D queue_pages_range(mm, start, end, nmask, >>> + ret =3D queue_pages_range(mm, start, end, nmask, >>> =C2=A0 =C2=A0 =C2=A0flags | MPOL_MF_INVERT, &pagelist); >>> - if (!err) >>> - err =3D mbind_range(mm, start, end, new); >>> + >>> + if (ret < 0) { =C2=A0 =C2=A0 =C2=A0/////// convert to all possible = 'ret' to '-EIO' <<<< >>> + err =3D -EIO; >>> + goto up_out; >>> + } >>> + >>> + err =3D mbind_range(mm, start, end, new); >>> =20 >>> =C2=A0 =C2=A0 if (!err) { >>> =C2=A0 =C2=A0 int nr_failed =3D 0; >>> --- >>> >>> Note that inside=C2=A0queue_pages_range(), the call to=C2=A0walk_page= _range() may return >>> errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, t= hose error >>> codes are no longer reported to user space application. >>> >>> =C2=A0 From user space, the mbind() call need to reported error, wit= h EFAULT, as example: >>> EFAULT >>> Part or all of the memory range specified by nodemask and maxnode poi= nts >>> outside your accessible address space. Or, there was an unmapped hole= in the >>> specified memory range specified by addr and len. >> Thanks for catching this. That commit was aimed to correct the return >> values for some corner cases in mbind(), but it should not alter the >> errno for other failure cases, i.e. -EFAULT. >> >> Could you please try the below patch (build test only)? >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index 4ae967b..99df43a 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsign= ed >> long len, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 flags | MPOL_MF_INVERT, &pagelist); >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (ret < 0) { >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 err =3D -EIO; >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 err =3D ret; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 goto up_out; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> >> > This seems do not work, because the 'pagelist' would have some pages qu= eued > into it, need to put back those pages instead of return quickly. > > So, we need to remove this page leak as well. <<<<<< > > In my understanding, revert the changes as I quoted above may solve it,= but not sure > the details about=C2=A0changes at end of do_mbind(), should keep them a= t there without > further change? Thanks for pointing this out. We don't have to revert this commit to=20 handle the non-empty pagelist correctly. The simplest way is to just put=20 those pages back and I'm supposed this is also the preferred way since=20 mbind_range() is not called to really apply the policy so those pages=20 should not be migrated. The below patch should solve this: diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4ae967b..d80025c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1286,7 +1286,10 @@ static long do_mbind(unsigned long start,=20 unsigned long len, =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 flags | MPOL_MF_INVERT, &pagelist); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (ret < 0) { -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 err =3D -EIO; +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 if (!list_empty(&pagelist)) +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 putback_mova= ble_pages(&pagelist); + +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 err =3D ret; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 goto up_out; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > > - Xinhai > >>> Please correct me if this is the intended change(and will have update= d API >>> definition), or something was misunderstood. >>> >>> -Xinhai > >