From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A50CBC001B5 for ; Fri, 2 Sep 2022 15:24:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2D6A8D001B; Fri, 2 Sep 2022 11:24:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDCE88D0014; Fri, 2 Sep 2022 11:24:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7D1D8D001B; Fri, 2 Sep 2022 11:24:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 943378D0014 for ; Fri, 2 Sep 2022 11:24:08 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6158941142 for ; Fri, 2 Sep 2022 15:24:08 +0000 (UTC) X-FDA: 79867516176.17.57ECCFD Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf26.hostedemail.com (Postfix) with ESMTP id 174AD140074 for ; Fri, 2 Sep 2022 15:24:07 +0000 (UTC) Received: by mail-pf1-f173.google.com with SMTP id y127so2210216pfy.5 for ; Fri, 02 Sep 2022 08:24:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=G7HynGLQc1lxwSQPRc+SEtUVh5DnkPFwTvXscgNVKWk=; b=MIlwnhk5iohQXB4yWWluNoV6PH4wvqbewJceO2QyTOWiyuMFfxZw7qWmRdahwYFcqN oJ1aqy/j7x2ywrMdz/G9DHxFQsVPz+oe9bmmVpKTKu+pAFG3CqvKgq1D0dnT5ZpaTlhy UnEwJKlWjAP8lzZWMVPMS2cCAkHe05TcsKio4irTr4s0Qc/KIxNeyKzGpV8Rf0cdG8yo Giu43fLMEyeF/XJujQia3KWdCNc0ZesT+BA4HrG9Ukts3l+3QnpXs+IfpXCcBpMdlM7G 7qPhcTv2fVUq/hSYFFOcYicWALAIX3xcuWLcT/gy84LMjfGi+fwQ732XPrI//kDkvgKV AHGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=G7HynGLQc1lxwSQPRc+SEtUVh5DnkPFwTvXscgNVKWk=; b=BGHkB30AZjA5c4vUft0F7AYxx3MRcLOASp1SBDHXp0q6y6c9YKpSZiRt+I4q3RdaNo 7PmphM22TmQFLCpJ1kJArsIenCJ5pWh9anZGSnDA6s4/b7G0QbKU8w54H37gP2rxhlVz vJ7z5zMZTiOWPmDeN8lrJMvPijOt0jW7rA1AvPbVa2BOedxOkYCV5NGc1d25evRtH4tZ 1kv6mUR4IcCe58EzPEf1DCRkD257HGEvwYW7etrYt1JApU4ud6/Mb9VezDkpvYOyiAbu raCI5BC5rlgMy4Ld7t3KBi6UlcOh7CA2OyGbVZ4KnipUSsNJoccfbNm/eOL/jjP4VeZz B6ow== X-Gm-Message-State: ACgBeo17zeoZRp0j+naTGVRaj6QHJzFnWNHZk8EsfnZaSTP86uC8LjC7 vLohC8bcSLyqIA+H4mJDeWxxG5kqDJ/HcuJRIPU= X-Google-Smtp-Source: AA6agR4YmgwoquE39yMPajOiIFDzgAcGehCGWd2q/bUhcymtso6xZjcjiHr0UAhIbMOJ2E42FiPO4L050S/2+xKXJD4= X-Received: by 2002:a65:6a05:0:b0:42c:87a0:ea77 with SMTP id m5-20020a656a05000000b0042c87a0ea77mr16909957pgu.75.1662132246849; Fri, 02 Sep 2022 08:24:06 -0700 (PDT) MIME-Version: 1.0 References: <20220901222707.477402-1-shy828301@gmail.com> <1688bcf0-01b2-0f89-68db-d9d66e207bc6@redhat.com> In-Reply-To: <1688bcf0-01b2-0f89-68db-d9d66e207bc6@redhat.com> From: Yang Shi Date: Fri, 2 Sep 2022 08:23:54 -0700 Message-ID: Subject: Re: [PATCH] mm: gup: fix the fast GUP race against THP collapse To: David Hildenbrand Cc: Peter Xu , kirill.shutemov@linux.intel.com, jhubbard@nvidia.com, jgg@nvidia.com, hughd@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662132248; a=rsa-sha256; cv=none; b=csc1MZWoCrrXb/7bS4XifMvuEd/DRp7KcjGH64Io8rbO8lpZzln+vZc78nTMVmU1SWXRNr nuzt/nEds9cyGdl3QvXeXkquP8pFwGGbybxPgMEvss8oAp7iOiU/7r5JVT5mdal7GGXwXS xBM4Z2H5cZfs+QqBS/XYwSqMcRXBuoU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MIlwnhk5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662132248; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G7HynGLQc1lxwSQPRc+SEtUVh5DnkPFwTvXscgNVKWk=; b=WOBtVLGYWZoXz6iJs7GgCtrnYgLlOKuvUsD7QW0PIJ7uHpUZM2WxUfCfZ8wjhTriDIANCh kHSvmY4XkPeT5UCsQxryw7fw9PNEXQtOaVFJNi1qwwsNdxLldyeMo6yio/MPoebTxaAjWF EZ0aim7eEXclFSVD27cXL1ZstWiJuVY= X-Stat-Signature: 55mxxs3gz47xbb9jijz5ix3x3i1dqpam X-Rspamd-Queue-Id: 174AD140074 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MIlwnhk5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1662132247-632061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 1, 2022 at 11:39 PM David Hildenbrand wrote: > > On 02.09.22 01:50, Yang Shi wrote: > > On Thu, Sep 1, 2022 at 4:26 PM Peter Xu wrote: > >> > >> Hi, Yang, > >> > >> On Thu, Sep 01, 2022 at 03:27:07PM -0700, Yang Shi wrote: > >>> Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm: > >>> introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer > >>> sufficient to handle concurrent GUP-fast in all cases, it only handles > >>> traditional IPI-based GUP-fast correctly. > >> > >> If TLB flush (or, IPI broadcasts) used to work to protect against gup-fast, > >> I'm kind of confused why it's not sufficient even if with RCU gup? Isn't > >> that'll keep working as long as interrupt disabled (which current fast-gup > >> will still do)? > > > > Actually the wording was copied from David's commit log for his > > PageAnonExclusive fix. My understanding is the IPI broadcast still > > works, but it may not be supported by all architectures and not > > preferred anymore. So we should avoid depending on IPI broadcast IIUC. > > Right. Not all architectures perform an IPI broadcast on TLB flush. > > IPI broadcasts will continue working until we use RCU instead of > disabling local interrupts in GUP-fast. > > > >>> CPU A CPU B > >>> THP collapse fast GUP > >>> gup_pmd_range() <-- see valid pmd > >>> gup_pte_range() <-- work on pte > >>> pmdp_collapse_flush() <-- clear pmd and flush > >>> __collapse_huge_page_isolate() > >>> check page pinned <-- before GUP bump refcount > >>> pin the page > >>> check PTE <-- no change > >>> __collapse_huge_page_copy() > >>> copy data to huge page > >>> ptep_clear() > >>> install huge pmd for the huge page > >>> return the stale page > >>> discard the stale page > >>> > >>> The race could be fixed by checking whether PMD is changed or not after > >>> taking the page pin in fast GUP, just like what it does for PTE. If the > >>> PMD is changed it means there may be parallel THP collapse, so GUP > >>> should back off. > >> > >> Could the race also be fixed by impl pmdp_collapse_flush() correctly for > >> the archs that are missing? Do you know which arch(s) is broken with it? > > > > Yes, and this was suggested by me in the first place, but per the > > suggestion from John and David, this is not the preferred way. I think > > it is because: > > > > Firstly, using IPI to serialize against fast GUP is not recommended > > anymore since fast GUP does check PTE then back off so we should avoid > > it. > > Secondly, if checking PMD then backing off could solve the problem, > > why do we still need broadcast IPI? It doesn't sound performant. > > I'd say, using an IPI is the old-styled way of doing things. Sure, using > an IPI broadcast will work (and IMHO it's a lot easier to > not-get-wrong). But it somewhat contradicts to the new way of doing things. > > >> > >> It's just not clear to me whether this patch is an optimization or a fix, > >> if it's a fix whether the IPI broadcast in ppc pmdp_collapse_flush() would > >> still be needed. > > > > It is a fix and the fix will make IPI broadcast not useful anymore. > > I'd wonder how "easy" adding the IPI broadcast would be -- IOW, if the > IPI fix has a real advantage. Not sure either, but I guess calling a dummy function via IPI broadcast should just work. Powepc does so. > > > -- > Thanks, > > David / dhildenb >