From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37F1BC001DE for ; Fri, 28 Jul 2023 19:41:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC8CE6B0071; Fri, 28 Jul 2023 15:41:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B51BD6B0074; Fri, 28 Jul 2023 15:41:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CAF88D0001; Fri, 28 Jul 2023 15:41:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8CE7B6B0071 for ; Fri, 28 Jul 2023 15:41:02 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4CE45B29DD for ; Fri, 28 Jul 2023 19:41:02 +0000 (UTC) X-FDA: 81062038764.14.C562877 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 079A31C0017 for ; Fri, 28 Jul 2023 19:40:59 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iQrcmN7c; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690573260; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6CFqQJrBMK5OCT43NGOkfBvaqBjcX1+gjelAIZOy930=; b=MAVGIsZhF2a59D2J9UuPU2o/qnsfkyEz7r6EcPDWSl2jfvK6pQdcVMkNz7Y6erDPJHLIuK Vpl/9nh71yTMFZbMAfgolLUxTdIDgR52OhpNCANVGEWw6t5xkZha+LR4yFfTyIVA6W9UAw VMM7qLTlkbmw0zlUoTifwIx3dsQ7QcA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iQrcmN7c; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690573260; a=rsa-sha256; cv=none; b=vMA2OD0T/VXBsLHv/lIZ346oeR69gDL9F/9OKcIYrVvqiiAAQQop431uQFJcS1XVW1zHJS SQcmwag0XdvQqzvx8Er+I0l8XTQYVuBsUGtq9meSd36rBxZYijn84kuwY+oCvGHNk8WZx9 h5l60LzysynzSz7xjN/LD0yerzIT7ls= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690573259; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6CFqQJrBMK5OCT43NGOkfBvaqBjcX1+gjelAIZOy930=; b=iQrcmN7cQwyE7+wjn6oZJ2d+1S5Lv/mUx5Zlw4+dOxX5HbVvH0e0nY7X2sZnZSOml/bLD3 LPV6Z3pMlvoDgpfLXk1Q5FRidYO50sT4VaRLxsvsKUztg9N6IDuN0Dk3snspr7915dhpfw J38BAtTUOKUYb3is7UfTKdj58Ki1lEU= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-649-UA46zXPnO9OOWUwbGJmJOQ-1; Fri, 28 Jul 2023 15:40:58 -0400 X-MC-Unique: UA46zXPnO9OOWUwbGJmJOQ-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3176549261aso1273434f8f.3 for ; Fri, 28 Jul 2023 12:40:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690573257; x=1691178057; h=content-transfer-encoding:in-reply-to:subject:organization :content-language:references:cc:to:from:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6CFqQJrBMK5OCT43NGOkfBvaqBjcX1+gjelAIZOy930=; b=LKecdmfwZZ0wgSFWMwKMFVtGPo1CBIbcpB9wkVQBu9/0FbKoY12oWb7P5VmIKik6qt 0B0134o1nWSXFoPwQ7qfbgOL7ds+h/+MZv7Q19W4vdeg8efBdfv1Gu5BTNWEcHYoS0pL 5fiqm/sBjdK1mlE+1DnVTzuo3k1z2MhvhjgwHLTdb3fH5lKMKx1FmdcK+hxjkDmobG+A 0SHlk5QSC8MYYqa10cZDrrOy/EjIBxQhnhPFdIwQvvSS5SWagfYaedzlIqnnRRKNOZZ6 l2QbPY+p8PuTqT8fhl1bpnDFXrZRghOTR4D3f9YwSS3q9An5yekQkhJrYLKWB4mFw+OZ RxHg== X-Gm-Message-State: ABy/qLYLkhJt2iE35yzMNAGi2oMDG1iqfMpXy24U67Xv962Lzw67t62R MbRN1EWDFxfCxJjNJoPEyMr75TMepT0MleTRnFtKdKeBWjH6PCkSfLjsfIKHeT9MrCi2sUrYC9e jimYyxnycaws= X-Received: by 2002:a5d:6606:0:b0:313:f9a0:c530 with SMTP id n6-20020a5d6606000000b00313f9a0c530mr2306623wru.52.1690573256904; Fri, 28 Jul 2023 12:40:56 -0700 (PDT) X-Google-Smtp-Source: APBJJlFePgWAB+PbJAsSFIJBhYiJefyMuHGcUdsjOuoT93P9W3fJx6mX384ZMq38qfMLqnXTTYgaQA== X-Received: by 2002:a5d:6606:0:b0:313:f9a0:c530 with SMTP id n6-20020a5d6606000000b00313f9a0c530mr2306606wru.52.1690573256521; Fri, 28 Jul 2023 12:40:56 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:6b00:bf49:f14b:380d:f871? (p200300cbc7066b00bf49f14b380df871.dip0.t-ipconnect.de. [2003:cb:c706:6b00:bf49:f14b:380d:f871]) by smtp.gmail.com with ESMTPSA id w17-20020a05600c015100b003fbfef555d2sm7614164wmm.23.2023.07.28.12.40.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jul 2023 12:40:56 -0700 (PDT) Message-ID: Date: Fri, 28 Jul 2023 21:40:54 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 From: David Hildenbrand To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Andrew Morton , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman References: <20230727212845.135673-1-david@redhat.com> Organization: Red Hat Subject: Re: [PATCH v1 0/4] smaps / mm/gup: fix gup_can_follow_protnone fallout In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 079A31C0017 X-Stat-Signature: 1nmdfhzcz5x4ge785hd1wf968rwkitr7 X-Rspam-User: X-HE-Tag: 1690573259-126302 X-HE-Meta: U2FsdGVkX1/2AHmrqcH2TBhDhOyb++iE321rlZu8b3EAgL7yZKz+vtIEVSbK5QVSXq5cI14+J2NAHFhewXkdTc8JFY8OQlPEc0e4Rx1XCp1XL9F7vtabq9so7RqjUs+xBWILETgccJDfAf7257da0RdV+ivoGAwg9JerScNwdnUidlpSdUbaXy1qpSII6LQrol8xkU+Dl9+OVsxohaN3+9pBlQZa1HRiQltM2V+AYV8OA7nA/ULHT8ATFfnoOgSvvX9rF6uY4ydvRg+QO63KrFtqqeWOc50KsebGqkskOUC2LHKeYd8zm+1KGCEzcbJ9Ua524UfsK5mDDegvK/0t1jZPvvBFmVDT3E6T7Du0yP/YvIhSJopBysWoJAiTA9Gtz/N8x1ZNxP7HG9Xr0cObizto/+LMixKdutIL+H28WtUdLpp9woVzqNxgJhjwc2RmqYutJ/5wyffLmpaAKykkodYkL6YOjEjuKMjtn1e4a18s9nrnGsHXGw9dwSKzAcjM6gNk+C6MLF823wnNF3PjfDUQXCfJxCGD3iTkxiN9U9r2A3ZFpqAHXB3WuPMDTUfMIDYcoIfIRSPu1L1FVAmelUhUq7ZPmkrz0+TOglclPsCuALlJrMD1U7AQvL7fy8jAWtB1PgWIrxhvTQ5OVQkHbDrY4N0EQw3+nz0MSwc8TolJxEP+fzPQbRmeI1cni9EHTxOpXHdILozkGXQJdK/IK5eLXOwOBXXOV8mIdQn6hu1avJ5firVNmvyUeJKP3IHXQ2y3qdxuNJmgZAOPq509/8JXVMfGPk24SQc00auawlKnMeeH9PbmP9UaeG63hqoTZ2cJ3yuCZMa5s9AbVLdkbyaqgFCtPjxR1U1blP/5wHFjAT4IPLtmDHntUJcTYsr2IITXIUPHEBd997o4hbGOZnDh/hYTabdQDKh+e05jluD3AUzzq5Ik9F5DLICzllo/00c8uFJa3FPayg8Jal7 RnoJTggA t6fJVnki0/dGBjMSGs7V263n6KEbbC/JRNoRhTXWK75oe/aXJ01ZwZ0sxp4k0GTAPqNnYmSN9pR1taCirSdFdsMhuf5LoYg034touzRwe/rIsi5xbckv/j2c7+GM1YWSFWe4yYklWCCirlOrygUCZoj9bO84ra/t9gVGO1bWk0i94MFSzBtpbvH+cnYX8iFqRUQPjo60KkJXnKXR/0n096LUrmG7PS0TqRYH0yHPVyl/vyQ7MJpNR5Do2OMk5J72DoinmodnqNU9eE2yKjaNT7Qtp6SpokEPsgaSp/UfP6jgtlZGTRKO97zJUh8hLAqv+VeqUpMzJX8go3D03kbcbaSPDOPFEi28i/ImTxXr10TwdTZOVzP8YKt1/eFgHL0h6WrHNqmInjtRpfXu4MpLogOxdy7bWAT+rLMFHhLL7WagWjO6/LcTBOnb9AUz6I6A4rWGMQ+ULfxuhLE0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 28.07.23 19:30, David Hildenbrand wrote: > On 28.07.23 18:18, Linus Torvalds wrote: >> On Thu, 27 Jul 2023 at 14:28, David Hildenbrand wrote: >>> >>> This is my proposal on how to handle the fallout of 474098edac26 >>> ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") where I >>> accidentially missed that follow_page() and smaps implicitly kept the >>> FOLL_NUMA flag clear by *not* setting it if FOLL_FORCE is absent, to >>> not trigger faults on PROT_NONE-mapped PTEs. >> >> Ugh. > > I was hoping for that reaction, with the assumption that we would get > something cleaner :) > >> >> I hate how it uses FOLL_FORCE that is inherently scary. > > I hate FOLL_FORCE, but I hate FOLL_NUMA even more, because to me it > is FOLL_FORCE in disguise (currently and before 474098edac26, if > FOLL_FORCE is set, FOLL_NUMA won't be set and the other way around). > >> >> Why do we have that "gup_can_follow_protnone()" logic AT ALL? > > That's what I was hoping for. > >> >> Couldn't we just get rid of that disgusting thing, and just say that >> GUP (and follow_page()) always just ignores NUMA hinting, and always >> just follows protnone? >> >> We literally used to have this: >> >> if (!(gup_flags & FOLL_FORCE)) >> gup_flags |= FOLL_NUMA; >> >> ie we *always* set FOLL_NUMA for any sane situation. FOLL_FORCE should >> be the rare crazy case. > > Yes, but my point would be that we now spell that "rare crazy case" > out for follow_page(). > > If you're talking about patch #1, I agree, therefore patch #3 to > avoid all that nasty FOLL_FORCE handling in GUP callers. > > But yeah, if we can avoid all that, great. > >> >> The original reason for not setting FOLL_NUMA all the time is >> documented in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting >> page faults from gup/gup_fast") from way back in 2012: >> >> * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault >> * would be called on PROT_NONE ranges. We must never invoke >> * handle_mm_fault on PROT_NONE ranges or the NUMA hinting >> * page faults would unprotect the PROT_NONE ranges if >> * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd >> * bitflag. So to avoid that, don't set FOLL_NUMA if >> * FOLL_FORCE is set. > > > In handle_mm_fault(), we never call do_numa_page() if > !vma_is_accessible(). Same for do_huge_pmd_numa_page(). > > So, if we would ever end up triggering a page fault on > mprotect(PROT_NONE) ranges (i.e., via FOLL_FORCE), we > would simply do nothing. > > At least that's the hope, I'll take a closer look just to make > sure we're good on all call paths. > >> >> but I don't think the original reason for this is *true* any more. >> >> Because then two years later in 2014, in commit c46a7c817e66 ("x86: >> define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") >> Mel made the code able to distinguish between PROT_NONE and NUMA >> pages, and he changed the comment above too. > > CCing Mel. > > I remember that pte_protnone() can only distinguished between > NUMA vs. actual mprotect(PROT_NONE) by looking at the VMA -- vma_is_accessible(). > > Indeed, include/linux/pgtable.h: > > /* > * Technically a PTE can be PROTNONE even when not doing NUMA balancing but > * the only case the kernel cares is for NUMA balancing and is only ever set > * when the VMA is accessible. For PROT_NONE VMAs, the PTEs are not marked > * _PAGE_PROTNONE so by default, implement the helper as "always no". It > * is the responsibility of the caller to distinguish between PROT_NONE > * protections and NUMA hinting fault protections. > */ > >> >> But I get the very very strong feeling that instead of changing the >> comment, he should have actually removed the comment and the code. >> >> So I get the strong feeling that all these FOLL_NUMA games should just >> go away. You removed the FOLL_NUMA bit, but you replaced it with using >> FOLL_FORCE. >> >> So rather than make this all even more opaque and make it even harder >> to figure out why we have that code in the first place, I think it >> should all just be removed. > > At least to me, spelling FOLL_FORCE in follow_page() now out is much > less opaque then getting that implied by lack of FOLL_NUMA. > >> >> The original reason for FOLL_NUMA simply does not exist any more. We >> know exactly when a page is marked for NUMA faulting, and we should >> simply *ignore* it for GUP and follow_page(). >> >> I think we should treat a NUMA-faulting page as just being present >> (and not NUMA-fault it). >> >> Am I missing something? > > There was the case for "FOLL_PIN represents application behavior and > should trigger NUMA faults", but I guess that can be ignored. Re-reading commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast"), it actually does spell out an important case that we should handle: "KVM secondary MMU page faults will trigger the NUMA hinting page faults through gup_fast -> get_user_pages -> follow_page -> handle_mm_fault." That is still the case today (and important). Not triggering NUMA hinting faults would degrade KVM. Hmm. So three alternatives I see: 1) Use FOLL_FORCE in follow_page() to unconditionally disable protnone checks. Alternatively, have an internal FOLL_NO_PROTNONE flag if we don't like that. 2) Revert the commit and reintroduce unconditional FOLL_NUMA without FOLL_FORCE. 3) Have a FOLL_NUMA that callers like KVM can pass. Thoughts? -- Cheers, David / dhildenb