From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C1A2C38A2D for ; Mon, 24 Oct 2022 20:19:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF10C940008; Mon, 24 Oct 2022 16:19:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA127940007; Mon, 24 Oct 2022 16:19:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C69D8940008; Mon, 24 Oct 2022 16:19:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B7057940007 for ; Mon, 24 Oct 2022 16:19:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 787081A0D0B for ; Mon, 24 Oct 2022 20:19:41 +0000 (UTC) X-FDA: 80056958562.28.E97AC2A Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf29.hostedemail.com (Postfix) with ESMTP id 26202120043 for ; Mon, 24 Oct 2022 20:19:40 +0000 (UTC) Received: by mail-qv1-f42.google.com with SMTP id x15so7252018qvp.1 for ; Mon, 24 Oct 2022 13:19:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=91T5Fm7XnNKkoCT04MlZt8PlvkehhJwmM3iCe4VSqZ0=; b=Z9EeVyBXNd9oUt/ydVxYIH3o38pxdMklbwkGV3LphkIsAwNPQJzioPy3DaRn0rGhQS 5EC/Dkdzr01glyKtRiGRMcMFUbDMBjL8eclmj0LTgBEgatwemSsYk+8YMsBS+gH4sggw d21ZWPaAQBrNoqlvhKiAZug3pAo8fhsTAqsHU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=91T5Fm7XnNKkoCT04MlZt8PlvkehhJwmM3iCe4VSqZ0=; b=OSlsXkjFLikmeFHRoMOQA2B69a8cuvTBWLvj/fj3EPLNwMywjzLuSbCaKUTwgKn6MZ AI+ldEh9FgEE9moyGIeANhUlLZw72xosoupCNCPvtXq0uh5tfTe37/WY1uRy5JmIDJ9y 9Lgu/nSjwPrMpe5dQhJqlzhEsYbjnA9jaCaFnUHS0k5M2xTvC0nFh8xGABcgFpd6unEW Z0HB2CDJpsMS6exGrdJSt1ZqE64basCu8CqRajswIuUuZJ3RQYclpHvV7Wov4AZppy8s IiTJROQ0N4F6lbfhBh1NopCkd5TioBdszdgb8JmGedgsw90CnMeLfmanGZMrd+7FKXfX ALtg== X-Gm-Message-State: ACrzQf1Y6XNlF132m9oTx39YQJRSMdcnG/F5/8GBQ99mzJdYbRzHgcxj Fje19Nni9SZbwdN96muVthcz7RXy3EY3bw== X-Google-Smtp-Source: AMsMyM5cK7cQ6rqHCkCOPDNivB6NNp8yfm7Wf72Z0r5vH71xLhdUTqtii4qO1eiXUJJmDhZ3/A7oMA== X-Received: by 2002:ad4:5deb:0:b0:4bb:837e:5499 with SMTP id jn11-20020ad45deb000000b004bb837e5499mr2750085qvb.106.1666642779972; Mon, 24 Oct 2022 13:19:39 -0700 (PDT) Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com. [209.85.128.174]) by smtp.gmail.com with ESMTPSA id u6-20020a37ab06000000b006eed094dcdasm557114qke.70.2022.10.24.13.19.39 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 24 Oct 2022 13:19:39 -0700 (PDT) Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-3691e040abaso95524717b3.9 for ; Mon, 24 Oct 2022 13:19:39 -0700 (PDT) X-Received: by 2002:a81:114e:0:b0:36a:fc80:fa62 with SMTP id 75-20020a81114e000000b0036afc80fa62mr12975148ywr.58.1666642779066; Mon, 24 Oct 2022 13:19:39 -0700 (PDT) MIME-Version: 1.0 References: <20221022111403.531902164@infradead.org> <20221022114424.515572025@infradead.org> <2c800ed1-d17a-def4-39e1-09281ee78d05@nvidia.com> In-Reply-To: From: Linus Torvalds Date: Mon, 24 Oct 2022 13:19:22 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment To: Jann Horn Cc: Peter Zijlstra , John Hubbard , x86@kernel.org, willy@infradead.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, aarcange@redhat.com, kirill.shutemov@linux.intel.com, jroedel@suse.de, ubizjak@gmail.com Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666642781; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=91T5Fm7XnNKkoCT04MlZt8PlvkehhJwmM3iCe4VSqZ0=; b=nvwqnFHMHXw/o304MAfbOoNBficlODZv02BGxcT2uCdSIzMmbe8AVH7ppC3rDRgTurMaAU Ma5FF4fe8qNDhKVS2qEtB6THmTwjGVGot+1tf/MDRozN2Vf747g6HodAyqqqEwIYSinIiN H1dZ2bqd99DBspS541VcuL6PLVl4CNA= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Z9EeVyBX; dmarc=none; spf=pass (imf29.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.219.42 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666642781; a=rsa-sha256; cv=none; b=ZuqaOa5RjYzUhwv6d5Rgtya0A9Xz+WNofpwpeIGZ6GE9+hR4fA3MgVU1qvlYs+r0GrnMGD ieIQwY1zckCxICMWSOP/phdm6Ab8ghUShDP4/ebgkBXdfN8JYunPSCYfnGAdBFB97Tv03T fMDbQgBRBYWhi7PCTbcATX0YZDwyU7o= X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 26202120043 X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Z9EeVyBX; dmarc=none; spf=pass (imf29.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.219.42 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Stat-Signature: q4pmke6hnfsz8u34czkoo65ex41xx9wf X-HE-Tag: 1666642780-46153 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 24, 2022 at 12:58 PM Jann Horn wrote: > > Unless I'm completely misunderstanding what's going on here, the whole > "remove_table" thing only happens when you "remove a table", meaning > you free an entire *pagetable*. Just zapping PTEs doesn't trigger that > logic. I do have to admit that I'd be happier if this code - and the GUP code that also relies on "interrupts off" behavior - would just use a sequence counter instead. Relying on blocking IPI's is clever, but also clearly very subtle and somewhat dangerous. I think our GUP code is a *lot* more important than some "legacy x86-32 has problems in case you have an incredibly unlikely race that re-populates the page table with a different page that just happens to be exactly the same MOD-4GB", so honestly, I don't think the load-tearing is even worth worrying about - if you have hardware that is good enough at virtualizing things, it's almost certainly already 64-bit, and running 32-bit virtual machines with PAE you really only have yourself to blame. So I can't find it in myself to care about the 32-bit tearing thing, but this discussion makes me worried about Fast GUP. Note that even with proper atomic pte_t pte = ptep_get_lockless(ptep); in gup_pte_range(), and even if the page tables are RCU-free'd, that just means that the 'ptep' access itself is safe. But then you have the whole "the lookup of the page pointer is not atomic" wrt that. And right now that GUP code does rely on the "block IPI" to make it basically valid. I don't think it matters if GUP races with munmap or madvise() or something like that - if you get the old page, that's still a valid page, and the user only has himself to blame. But if we have memory pressure that causes vmscan to push out a page, and it gets replaced with a new page, and GUP gets the old page with no serialization, that sounds like a possible source of data inconsistency. I don't know if this can happen, but the whole "interrupts disabled doesn't actually block IPI's and synchronize with TLB flushes" really sounds like it would affect GUP too. And be much more serious there than on some x86-32 platform that nobody should be using anyway. Linus