From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8789EC2BA1B for ; Wed, 8 Apr 2020 18:51:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 46B042075E for ; Wed, 8 Apr 2020 18:51:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46B042075E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CAAC48E000E; Wed, 8 Apr 2020 14:51:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5C2F8E0006; Wed, 8 Apr 2020 14:51:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B70E68E000E; Wed, 8 Apr 2020 14:51:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 9BF808E0006 for ; Wed, 8 Apr 2020 14:51:31 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5584C801A59D for ; Wed, 8 Apr 2020 18:51:31 +0000 (UTC) X-FDA: 76685581182.26.net63_819689945095d X-HE-Tag: net63_819689945095d X-Filterd-Recvd-Size: 5014 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Wed, 8 Apr 2020 18:51:30 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0Tv.3iEv_1586371884; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tv.3iEv_1586371884) by smtp.aliyun-inc.com(127.0.0.1); Thu, 09 Apr 2020 02:51:26 +0800 Subject: Re: [PATCHv2 5/8] khugepaged: Allow to callapse a page shared across fork To: "Kirill A. Shutemov" Cc: akpm@linux-foundation.org, Andrea Arcangeli , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> <20200403112928.19742-6-kirill.shutemov@linux.intel.com> <20200408131044.xzlheacvslrbwrja@box> From: Yang Shi Message-ID: <107630f5-bbde-3f78-23e9-6f6b3113d709@linux.alibaba.com> Date: Wed, 8 Apr 2020 11:51:22 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20200408131044.xzlheacvslrbwrja@box> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/8/20 6:10 AM, Kirill A. Shutemov wrote: > On Mon, Apr 06, 2020 at 01:50:56PM -0700, Yang Shi wrote: >> >> On 4/3/20 4:29 AM, Kirill A. Shutemov wrote: >>> The page can be included into collapse as long as it doesn't have ext= ra >>> pins (from GUP or otherwise). >>> >>> Signed-off-by: Kirill A. Shutemov >>> --- >>> mm/khugepaged.c | 25 ++++++++++++++----------- >>> 1 file changed, 14 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>> index 57ff287caf6b..1e7e6543ebca 100644 >>> --- a/mm/khugepaged.c >>> +++ b/mm/khugepaged.c >>> @@ -581,11 +581,18 @@ static int __collapse_huge_page_isolate(struct = vm_area_struct *vma, >>> } >>> /* >>> - * cannot use mapcount: can't collapse if there's a gup pin. >>> - * The page must only be referenced by the scanned process >>> - * and page swap cache. >>> + * Check if the page has any GUP (or other external) pins. >>> + * >>> + * The page table that maps the page has been already unlinked >>> + * from the page table tree and this process cannot get >>> + * additinal pin on the page. >>> + * >>> + * New pins can come later if the page is shared across fork, >>> + * but not for the this process. It is fine. The other process >>> + * cannot write to the page, only trigger CoW. >>> */ >>> - if (page_count(page) !=3D 1 + PageSwapCache(page)) { >>> + if (total_mapcount(page) + PageSwapCache(page) !=3D >>> + page_count(page)) { >> This check looks fine for base page, but what if the page is PTE-mappe= d THP? >> The following patch made this possible. >> >> If it is PTE-mapped THP and the page is in swap cache, the refcount wo= uld be >> 512 + the number of PTE-mapped pages. >> >> Shall we do the below change in the following patch? >> >> extra_pins =3D PageSwapCache(page) ? nr_ccompound(page) - 1 : 0; >> if (total_mapcount(page) + PageSwapCache(page) !=3D page_count(page) - >> extra_pins) { >> ... > Looks like you're right. > > It would be nice to have a test case to demonstrate the issue. > > Is there any way to trigger moving the page to swap cache? I don't see = it > immediately. It sounds not easy to trigger since it totally depends on timing, I'm=20 wondering we may have to use MADV_PAGEOUT? Something below off the top=20 of my head may trigger this? =C2=A0=C2=A0=C2=A0 CPU=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0 A=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0= CPU=C2=A0=C2=A0=C2=A0 B =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0=20 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 CPU=C2=A0=C2=A0= =C2=A0 C In parent: MADV_HUGEPAGE page fault to fill with THP fork =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 In Child: MADV_NOHUGEPAGE MADV_DONTNEED (split pmd) =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 MADV_PAGEOUT =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 -> add= _to_swap =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 khugepaged scan parent and try to=20 collapse PTE-mapped =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 -> try= _to_unmap When doing MADV_DONTNEED we need make sure head page is unmapped since=20 MADV_PAGEOUT would call page_mapcount(page) to skip shared mapping. >