From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75E10C433F5 for ; Wed, 3 Nov 2021 11:09:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 00279610FD for ; Wed, 3 Nov 2021 11:09:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 00279610FD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 61CBB6B006C; Wed, 3 Nov 2021 07:09:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CBD86B0071; Wed, 3 Nov 2021 07:09:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BAD5940007; Wed, 3 Nov 2021 07:09:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 3BEC96B006C for ; Wed, 3 Nov 2021 07:09:15 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EF1A98249980 for ; Wed, 3 Nov 2021 11:09:14 +0000 (UTC) X-FDA: 78767347428.31.BC93E4C Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf03.hostedemail.com (Postfix) with ESMTP id 4C61630000B8 for ; Wed, 3 Nov 2021 11:09:06 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0Uutr6n2_1635937739; Received: from 30.21.164.37(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Uutr6n2_1635937739) by smtp.aliyun-inc.com(127.0.0.1); Wed, 03 Nov 2021 19:09:00 +0800 Subject: Re: [PATCH] mm: migrate: Correct the hugetlb migration stats From: Baolin Wang To: Zi Yan Cc: akpm@linux-foundation.org, shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <677EF981-F33E-4002-AA38-DD669C319284@nvidia.com> <29aa9c6e-7191-71bb-d8a3-e2695b18fa3e@linux.alibaba.com> <7f45b2c8-fd2c-345a-ec6c-43b8b1c06de1@linux.alibaba.com> <7E44019D-2A5D-4BA7-B4D5-00D4712F1687@nvidia.com> <98eccb43-a8d1-fde1-65ab-6f81fed1faa3@linux.alibaba.com> Message-ID: <5060297c-0b93-e132-f464-2c1b8d8888e3@linux.alibaba.com> Date: Wed, 3 Nov 2021 19:09:45 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <98eccb43-a8d1-fde1-65ab-6f81fed1faa3@linux.alibaba.com> Content-Type: text/plain; charset=utf-8; format=flowed X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 4C61630000B8 X-Stat-Signature: mbki3wighy9arra1dixio54bar6gkt1o Authentication-Results: imf03.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 47.88.44.36 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com X-HE-Tag: 1635937746-191711 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On 2021/11/3 3:30, Zi Yan wrote: >> On 2 Nov 2021, at 2:08, Baolin Wang wrote: >> >>> On 2021/11/1 23:12, Zi Yan wrote: >>>> On 1 Nov 2021, at 2:54, Baolin Wang wrote: >>>> >>>>> On 2021/10/29 23:43, Zi Yan wrote: >>>>>> On 29 Oct 2021, at 3:42, Baolin Wang wrote: >>>>>> >>>>>>> Now hugetlb migration is also available for some scenarios, such = as >>>>>>> soft offling or memory compaction. So we should correct the=20 >>>>>>> migration >>>>>> >>>>>> hugetlb migration is available at the time if (PageHuge(page)) bra= nch >>>>>> is added. I am not sure what is new here. >>>>> >>>>> No new things actually, sorry for confusing and will update the=20 >>>>> commit message in next version. >>>>> >>>>>> >>>>>>> stats for hugetlb with using compound_nr() instead of thp_nr_page= s() >>>>>>> to get the number of pages. >>>>>> >>>>>> nr_failed records the number of pages, not subpages. It is=20 >>>>>> returned to >>>>> >>>>> I also think nr_failed should record the number of pages, not the=20 >>>>> number of hugetlb, if I understand you correctly. >>>>> >>>>>> user space when move_pages() syscall is used. After your change, >>>>>> if users try to migrate a list of pages including THPs and/or huge= tlb >>>>>> pages and some of THPs and/or hugetlb fail to migrate, move_pages(= ) >>>>>> will return a number larger than the number of pages the users tri= ed >>>>> >>>>> OK, thanks for pointing out the issue. >>>>> >>>>> But before my patch, we've already returned the number of pages=20 >>>>> successed or failed for THP migration, instead of the number of=20 >>>>> THP. That means if we just move only 1 page by >>>> >>>> Ah, you are right. >>>> >>>>> move_pages() and if this page is 2M THP, so move_pages() will=20 >>>>> return 512 if failed to migrate, which is larger than the page=20 >>>>> count specified from user. >>>>> >>>>> if (err > 0) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0err +=3D nr_pages - i - 1; >>>> >>>> I am not sure this is right for user-space. >>>> >>>>> >>>>> On the other hand, the stats of PGMIGRATE_SUCCESS/PGMIGRATE_FAIL=20 >>>>> should stand for the number of pages, instead of the number of=20 >>>>> hugetlb. Also for hugetlb migration when memory compaction, we've=20 >>>>> already counted the number of pages for a hugetlb into=20 >>>>> cc->nr_migratepages, if the hugetlb migration failed, the trace=20 >>>>> stat of compaction will be confusing if we return the number of=20 >>>>> hugetlb. >>>>> >>>>> trace_mm_compaction_migratepages(cc->nr_migratepages,=20 >>>>> err,=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &cc->m= igratepages); >>>>> >>>>> So I think the stats of hugetlb migration should be consistent with= =20 >>>>> THP. >>>> >>>> It makes sense to me. >>>> >>>>> >>>>>> to migrate. I am not sure this is the change we want. Or at least, >>>>>> the comment of migrate_pages() and the manpage of move_pages() nee= d >>>>>> to be changed and linux-api mailing list should be cc=E2=80=99d. >>>>> >>>>> I don't think we should update the comments of migrate_pages(),=20 >>>>> "Returns the number of pages that were not migrated" makes sense to= =20 >>>>> me if I understand correctly. >>>>> >>>>> For the manpage of move_pages(), as you said, the the returned=20 >>>>> non-migrate page numbers can be larger than the numbers specified=20 >>>>> from user if failed to migrate a THP or a hugetlb. I am not sure if= =20 >>>>> we should change the manpage, since the THP already did, but I can=20 >>>>> send a patch to update the manpage if you think this is still=20 >>>>> necessary. Thanks. >>>> >>>> I am not sure changing manpage would help the users of move_pages()=20 >>>> after >>>> think about it again, since users might not know all the THP and/or=20 >>>> hugetlb >>>> information=C2=A0=C2=A0=C2=A0 when they call move_pages() and they j= ust pass a list=20 >>>> of N pages. > >>>> I just wonder if we could fix the rc value of migrate_pages to retur= n >>>> the number of {base page, THP, hugetlb} instead, so that move_pages(= ) >>>> can get its return value right. >>> >>> IMO it will break the usage in other places if we change the rc value= =20 >>> of migrate_pages(), for example, the page migration when doing memory= =20 >>> compaction as I said before, which will expect the number of normal=20 >>> pages. Meanwhile the THP page can be split into normal pages during=20 >>> migration, so it will not be consistent if we return the number of TH= P. >> >> You mean the above trace_mm_compaction_migratepages()? I checked all=20 >> migrate_pages() >=20 > Right. >=20 >> callers and none of them cares about the actual number of non-migrated= =20 >> pages, except >> do_move_pages_to_node() and trace_mm_compaction_migratepages(). The=20 >> former expects >> the number of before-split-and-not-subpage pages, whereas the latter=20 >> expects, like >> you said, the number of base pages. >=20 > Yes. >=20 >>> >>> Changing the return value of migrate_pages() will make things more=20 >>> complicated, and I am not sure whether it is worth doing. Any=20 >>> suggestion? Thanks. >> >> How about 1) fixing migrate_pages() to return the number of=20 >> before-split-and-not-subpage >> pages, and 2) replace err with nr_succeeded (you can get it via=20 >> *ret_succeeded in >> migrate_pages()) in trace_mm_compaction_migratepages()? As a result,=20 >> user-space move_pages() >> will be fixed and trace_mm_compaction_migratepages() gives a >> different but correct number as you want (you can still get=20 >> nr_nonmigrated =3D >> nr_migratepages - nr_succeeded). >=20 > OK, sounds reasonable to me, I can try it. Thanks for your input. After more thinking, I am still strugglling to handle the THP split=20 case. Suppose the move_pages() tries to move 1 page, which is a 2M THP.=20 If the 2M THP is split into 512 normal pages during migration, and 256=20 normal pages are migrated successfully, others are failed. So we should=20 return 1 or 256 non-migrated pages from migrate_pages()? Anyway I posted new RFC patchset [1] according to your suggestion, we=20 can talk about there. [1]=20 https://lore.kernel.org/linux-mm/cover.1635936218.git.baolin.wang@linux.a= libaba.com/