From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15A4DECAAD2 for ; Thu, 1 Sep 2022 06:58:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A05A6B0072; Thu, 1 Sep 2022 02:58:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 950368D0001; Thu, 1 Sep 2022 02:58:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F16A6B0074; Thu, 1 Sep 2022 02:58:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6D6B56B0072 for ; Thu, 1 Sep 2022 02:58:54 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5063FA05BF for ; Thu, 1 Sep 2022 06:58:54 +0000 (UTC) X-FDA: 79862614188.13.F73C087 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id EC7A8180043 for ; Thu, 1 Sep 2022 06:58:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662015533; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NP3BgHkO7zL9QtOduyDvBAI5h7SLpCA3Geb5dem5JMs=; b=IXqWGOYOQHE2PVGDb2DoQ6N2UWrWnXWFztmMnkeo7yHzw9vpQgmRAaBj6C2J6FdzBayx6u cxJnkCK74j5iuir1LEqcpDNxbeRRHMLYngHUABvFnW+x/jqqWRNfwq7pVNTlbL7R5SBdML OppfJpaWVlug+mU3JfUmGIdkTlG/NaE= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-160-v7LQv09ZNG-bDZONsQqR_Q-1; Thu, 01 Sep 2022 02:58:50 -0400 X-MC-Unique: v7LQv09ZNG-bDZONsQqR_Q-1 Received: by mail-wm1-f71.google.com with SMTP id i132-20020a1c3b8a000000b003a537064611so9414213wma.4 for ; Wed, 31 Aug 2022 23:58:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=NP3BgHkO7zL9QtOduyDvBAI5h7SLpCA3Geb5dem5JMs=; b=HDAA+f0QpG/n96DoGGR25WwtpfyN0H8q51Q+E9Gkr6dIntW1ctpok0OwLyONqJLkTZ EZL35wOCnxYHs/PU6Weoin/g+wUxZlgnjvRpSDXCAhZa3KCYSjVWmTWkxIq2xazdhUQF XFK74h3fw+qcGqVDt9GyCnoDM0r2SePg1hSJPXyy4xLyk0Y7wfMDjgkGL98gcJdJodYG iuudJdA4e19yTPjOQQNr+ZCVVDwl5EFQhxoTwQ6gcUN8rf4V/20D9ryY9X1rgRbJYRzp SUk5o6DrGBMXGMBjsB9zGWvFEr5NQSbpLocSyXPXyqo+gBJykgGPILgjUVJPaROaMJ3C iP0w== X-Gm-Message-State: ACgBeo2LadFDcuNjN9xhcc4MTD01+9ofFOC3psZzSn4vNgMtoUfZoOnl UYqrqHfZvh1NhXs51d5gyGqgvnLuqgMn5+59gpTp8nv17jS1x5tfpN5YWms7Q6iPWbBbb5CerZ7 rOxgASzXH5Ic= X-Received: by 2002:a7b:cd11:0:b0:3a8:3f6c:9abf with SMTP id f17-20020a7bcd11000000b003a83f6c9abfmr4234649wmj.30.1662015529170; Wed, 31 Aug 2022 23:58:49 -0700 (PDT) X-Google-Smtp-Source: AA6agR6selz8IoQMvN8smZKwySY3y3y8uR2hOOnmvhTkF7tEVcHaCNxWJ6MsInQkmFJjyfxSFj4zUA== X-Received: by 2002:a7b:cd11:0:b0:3a8:3f6c:9abf with SMTP id f17-20020a7bcd11000000b003a83f6c9abfmr4234634wmj.30.1662015528910; Wed, 31 Aug 2022 23:58:48 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:9e00:fec0:7e96:15cb:742? (p200300cbc7079e00fec07e9615cb0742.dip0.t-ipconnect.de. [2003:cb:c707:9e00:fec0:7e96:15cb:742]) by smtp.gmail.com with ESMTPSA id bg41-20020a05600c3ca900b003a62bc1735asm5003223wmb.9.2022.08.31.23.58.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 31 Aug 2022 23:58:48 -0700 (PDT) Message-ID: Date: Thu, 1 Sep 2022 08:58:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Subject: Re: [PATCH v1] mm/ksm: update stale comment in write_protect_page() To: Yang Shi Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Jason Gunthorpe , John Hubbard , Andrea Arcangeli , Hugh Dickins , Peter Xu , "Kirill A. Shutemov" References: <20220831083024.37138-1-david@redhat.com> <4845ae71-b7dd-1707-ebc3-2eb3521e7fa0@redhat.com> <94c3217d-df73-2b6b-21f0-95baf117c584@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IXqWGOYO; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662015534; a=rsa-sha256; cv=none; b=xDZdldZfYI13cOKXSBgxJDVVtN4USuovpJcwdhDZunTvqB9ARwfwG5ZtvFE3BuqMMgrBzF UydC8Rq9TyW6zo/Wo0ARfgY9OLQmP2yDLwQtDcs29LAz/z025L7I1QSkdmQk5DH+HjDLpq oQ5EE3jiu7oaql07PU9MXK8/MM/n8DI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662015534; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NP3BgHkO7zL9QtOduyDvBAI5h7SLpCA3Geb5dem5JMs=; b=XHf0xuCxrcDG4fphVQ3iq7DtcsbgYGXIZPBX8fdVmwmOTpk7U1ivhirBSCh2lLFl32WD0y m1ocHjJFas9ks5lIFN79U5rNu4/aJfmPCujRvF9ftFv6fQlLLuIJelf3ZYvtyNyCG0+bZ4 Gr0xJt2Rv6dSYZOFcsswg4ZRgKcOXVU= X-Rspam-User: X-Rspamd-Queue-Id: EC7A8180043 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IXqWGOYO; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: jmmqrt5zm86ho39w4kocxma1zw1edj4q X-Rspamd-Server: rspam11 X-HE-Tag: 1662015533-863569 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.09.22 00:18, Yang Shi wrote: > On Wed, Aug 31, 2022 at 12:43 PM Yang Shi wrote: >> >> On Wed, Aug 31, 2022 at 12:36 PM David Hildenbrand wrote: >>> >>> On 31.08.22 21:34, Yang Shi wrote: >>>> On Wed, Aug 31, 2022 at 12:15 PM David Hildenbrand wrote: >>>>> >>>>> On 31.08.22 21:08, Yang Shi wrote: >>>>>> On Wed, Aug 31, 2022 at 11:29 AM David Hildenbrand wrote: >>>>>>> >>>>>>> On 31.08.22 19:55, Yang Shi wrote: >>>>>>>> On Wed, Aug 31, 2022 at 1:30 AM David Hildenbrand wrote: >>>>>>>>> >>>>>>>>> The comment is stale, because a TLB flush is no longer sufficient and >>>>>>>>> required to synchronize against concurrent GUP-fast. This used to be true >>>>>>>>> in the past, whereby a TLB flush would have implied an IPI on architectures >>>>>>>>> that support GUP-fast, resulting in GUP-fast that disables local interrupts >>>>>>>>> from completing before completing the flush. >>>>>>>> >>>>>>>> Hmm... it seems there might be problem for THP collapse IIUC. THP >>>>>>>> collapse clears and flushes pmd before doing anything on pte and >>>>>>>> relies on interrupt disable of fast GUP to serialize against fast GUP. >>>>>>>> But if TLB flush is no longer sufficient, then we may run into the >>>>>>>> below race IIUC: >>>>>>>> >>>>>>>> CPU A CPU B >>>>>>>> THP collapse fast GUP >>>>>>>> >>>>>>>> gup_pmd_range() <-- see valid pmd >>>>>>>> >>>>>>>> gup_pte_range() <-- work on pte >>>>>>>> clear pmd and flush TLB >>>>>>>> __collapse_huge_page_isolate() >>>>>>>> isolate page <-- before GUP bump refcount >>>>>>>> >>>>>>>> pin the page >>>>>>>> __collapse_huge_page_copy() >>>>>>>> copy data to huge page >>>>>>>> clear pte (don't flush TLB) >>>>>>>> Install huge pmd for huge page >>>>>>>> >>>>>>>> return the obsolete page >>>>>>> >>>>>>> Hm, the is_refcount_suitable() check runs while the PTE hasn't been >>>>>>> cleared yet. And we don't check if the PMD changed once we're in >>>>>>> gup_pte_range(). >>>>>> >>>>>> Yes >>>>>> >>>>>>> >>>>>>> The comment most certainly should be stale as well -- unless there is >>>>>>> some kind of an implicit IPI broadcast being done. >>>>>>> >>>>>>> 2667f50e8b81 mentions: "The RCU page table free logic coupled with an >>>>>>> IPI broadcast on THP split (which is a rare event), allows one to >>>>>>> protect a page table walker by merely disabling the interrupts during >>>>>>> the walk." >>>>>>> >>>>>>> I'm not able to quickly locate that IPI broadcast -- maybe there is one >>>>>>> being done here (in collapse) as well? >>>>>> >>>>>> The TLB flush may call IPI. I'm supposed it is arch dependent, right? >>>>>> Some do use IPI, some may not. >>>>> >>>>> Right, and the whole idea of the RCU GUP-fast was to support >>>>> architectures that don't do it. x86-64 does it. IIRC, powerpc doesn't do >>>>> it -- but maybe it does so for PMDs? >>>> >>>> It looks powerpc does issue IPI for pmd flush. But arm64 doesn't IIRC. >>>> >>>> So maybe we should implement pmdp_collapse_flush() for those arches to >>>> issue IPI. >>> >>> ... or find another way to detect and handle this in GUP-fast? >>> >>> Not sure if, for handling PMDs, it could be sufficient to propagate the >>> pmdp pointer + value and double check that the values didn't change. >> >> Should work too, right before pinning the page. > > I actually mean the same place for checking pte. So, something like: > > diff --git a/mm/gup.c b/mm/gup.c > index 5abdaf487460..2b0703403902 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2392,7 +2392,8 @@ static int gup_pte_range(pmd_t pmd, unsigned > long addr, unsigned long end, > goto pte_unmap; > } > > - if (unlikely(pte_val(pte) != pte_val(*ptep))) { > + if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) || > + unlikely(pte_val(pte) != pte_val(*ptep))) { > gup_put_folio(folio, 1, flags); > goto pte_unmap; > } > > It doesn't build, just shows the idea. Exactly what I had in mind. We should add a comment spelling out that this is for handling huge PMD collapse. -- Thanks, David / dhildenb