From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6487C001DE for ; Fri, 28 Jul 2023 17:30:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1129B6B0071; Fri, 28 Jul 2023 13:30:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09DAA8D0002; Fri, 28 Jul 2023 13:30:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E586B8D0001; Fri, 28 Jul 2023 13:30:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D01636B0071 for ; Fri, 28 Jul 2023 13:30:42 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7957EA06D9 for ; Fri, 28 Jul 2023 17:30:42 +0000 (UTC) X-FDA: 81061710324.29.7C6C50E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id E302314000E for ; Fri, 28 Jul 2023 17:30:39 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IqArF8nL; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690565440; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cl36FnvktkYbvD1zJaMZyNiWy19DEs6ZP3kw5lklrZI=; b=VzIAwk7IllhPfnnNVQou7GbEkVJ5Wc2LZNGzEO+8rzCEpjjDPS6+LAz0i0B5bSsQNNmxie DU1fVjg+Oy+ln2yAoUjQWZbHX0kVF1cXLBir9c0Pbf8rtHSOflKiyyw/S9DuLnEfABXDcE krHLIMMJdmI5aPW6+arfbZOnvkoNunU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IqArF8nL; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690565440; a=rsa-sha256; cv=none; b=fUFrYgioGkdlJ59xCveLkfoCwNHraDpc/rSOtgf5HYmSN6p9N8yaaLPhNZL5OPDlTyeFuc TxN4vDWFl8YYHMaSDboEMUH04UG59S83Vy89S9QmI09MYtUj+2b++2IRGg85e02JhmKIpH BO5o3nhJuqsOIDOS87iVR/DNK3OFRlQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690565439; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cl36FnvktkYbvD1zJaMZyNiWy19DEs6ZP3kw5lklrZI=; b=IqArF8nL5RlIjLZMnFgZckWn3WQ/QD2s2wb+XgVv+V4YwaxcdhENAippXr8dL7Vwh8dAR8 6nlZ5QNvcpcHciXsbwkd0xPTQXT0G6D85qbsPR733cuRK0ZH6rzkMcj/rJZY99Bf/Z6ZoG 6usw4MO4sPwm1mF4VJOXwjBieRvS23Y= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-214-R2vDX49rPPidtBrlZ39Cng-1; Fri, 28 Jul 2023 13:30:37 -0400 X-MC-Unique: R2vDX49rPPidtBrlZ39Cng-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3fbb34f7224so16210745e9.2 for ; Fri, 28 Jul 2023 10:30:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690565436; x=1691170236; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cl36FnvktkYbvD1zJaMZyNiWy19DEs6ZP3kw5lklrZI=; b=VHUEEVd+9wz8C/H9KHsU2ulh5QL3kliz8qNzuwF1qUikFDOtJsLlLLtRmpa2EJ3mGi HOaUBAGvT2VAaSopFl3g5RydpMBog1hWlNA4/JkX31MSWBrmmOcH5MxETB8l0z/zpwI2 7LZ4Y0TYRky+Dpy3BgLyhSGEu/5HeKCAhr1BxwDW5O4EWXql6bEzbR/tW99botMDq1N6 PmDDH+bSOJKXBSqmwFCOehWU4XOCHA3ZPVm+DeBE+27btVpAFcuyzhiYbx4mNVXrNHWW GY1bDmicHzOJq/GjLTSE2vTQUcFj6w/67Sy01va8E6ilqeZFM3a2ti8eMgVPUfXKnUjE HgqQ== X-Gm-Message-State: ABy/qLbkx4SaVbxdrEE2eRSk71RXDZ5TD4hKjq6Ple9avOCBHmlcv2he 95CvKoAk9ZY7LYgthhFt8MLLjxO12hA9BxPVsIUhv7pJ3PhUT8OZo3lXZVf3khECtMD62MAhM41 ucyi8Stzdgko= X-Received: by 2002:a05:600c:20c4:b0:3fb:e643:1225 with SMTP id y4-20020a05600c20c400b003fbe6431225mr2509513wmm.13.1690565436053; Fri, 28 Jul 2023 10:30:36 -0700 (PDT) X-Google-Smtp-Source: APBJJlGYuCUIu0HPfQ49F3gjLsVAJ43VYP/e6HVfw6FA/RBySDmmJqoUBeIQbEqCC2TFy39iizz9aA== X-Received: by 2002:a05:600c:20c4:b0:3fb:e643:1225 with SMTP id y4-20020a05600c20c400b003fbe6431225mr2509491wmm.13.1690565435629; Fri, 28 Jul 2023 10:30:35 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:6b00:bf49:f14b:380d:f871? (p200300cbc7066b00bf49f14b380df871.dip0.t-ipconnect.de. [2003:cb:c706:6b00:bf49:f14b:380d:f871]) by smtp.gmail.com with ESMTPSA id m10-20020a7bca4a000000b003fbc0a49b57sm4712526wml.6.2023.07.28.10.30.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jul 2023 10:30:35 -0700 (PDT) Message-ID: Date: Fri, 28 Jul 2023 19:30:33 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Andrew Morton , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman References: <20230727212845.135673-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 0/4] smaps / mm/gup: fix gup_can_follow_protnone fallout In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: E302314000E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: x4krz384s8uu7pjkktigk7my8ykkp13a X-HE-Tag: 1690565439-320007 X-HE-Meta: U2FsdGVkX18VjAVYWgrqsBqkuXb2LPaD1zFgrvfda22c1biGibH+0Klo7lSZnHcNf/fkqvzY+8zRtarGyj/vzhOGIfowqBLpeAZL4M7EagCQQlFb3aIHOjPn4D90NJ12xhKa5Jd+ataPQeqn7I8K6LvvI/JMTzIP9+W5f8hl+Oe3oYBo0ov260ZJQc5eHjzayc6IhE2nXUHYBOmlvMU90i4R7GiFO7O/m2d5I9/lN4ykd6sVMWhPwSQc3A7S8+Xl3OsnSjblI3gUUuY2Ji/7EUXqPWhHCTSitEuvi3CTKg0T1uPeoPtytl+O5R5P/OEngPeaGrzMJJq3NKmt1qLDHNMj3OmzUAZZnUDop1jXqXdJDFg93w+KNmJ+c5qiIu1H6EXPnchoFAIrKXZgJYn6Os0YUGUpEaAHeCZ4FVsoq/8tU9eWjgH6XSXYzDfdBYQ0IEXW+5h1k3Mk5DZ+5bF/0RT1VDGFldI2rGOw+Q3gU8LiA4hZf8vmC8Dtjks43/6E6usDEk/TPcj4mOeOULobwXAV5ubagvz6sAjK8mFdSbrBaTKa1j0+lzUp2aeZNzr0yFytMQTGX3km2UHi8kKOQ1BG/HKA76/+aRsP/+wBYNMeC41a8KeBQSvKqIroZSkoJ0JIhETnuhekvws2iwLkpxH1/hxBeyrUMl8HXLKYpe8z+2c2NwPaM8Xz4lF/p8rCMK0kj9z7N8Q3x8IMFd1xpmFXi5v2X44rQyL/TaSVS1avl9BZHZl0MFqYpWgqjdmczG0gRQE2QSXducAxTl2h1gV1382uqv3cqDTaRqHr0FCIZTn/vAfLZFkCQCzRGO/Xhm4Be7bLrq15Nul++B/CKpUl9fWJLj/FmQxdGOp/PZoUWngxBWVDzFJDxErcW30IdanqEKZkTKPgCKQEzbbX5oIz2t4jQR0/J3gFgzJeaie281U7kAX0sMmJ9uhYKEVZXj74s0ceFr+TEhG+hSr XME8xSji q3WcrKhcGcfK6YpOV5pluUW6Gxaw1isCSCTZ2IK+QNDEpCPWJpmNJfemnNOIBy3A2+P4MIylCN4IOmpEdaO3+HNfRjs+U9De2Yy8R5CAOYvxBWZE8RR7TREfrQJIevUCVkPD/fMYcvjnAjQNllBgM70xVc/4xTrF7U9m6fR2++0X3Rp9qdUecclL8Tbl7B+r2QD2OfXPKe0dla9alNrig/ECD9Gz24gBMHF5Fjt673NAmd1lCBVyMlEn68SQJDpFOLbO3Rygb03o6M02hTnheZHNheGgEgXxNe6GW5gZvAGfRNNRQ169wy72nVoBhq5+5AqgG4RddaFb5L8ls/f5XbwjWl34nd2TaITqD9n/EMG0rMxCv2A1MWdUtrOgUd94uIEFuxX1B9+sWGOzoooUL1+VQ9blDO+5HReTKKgmcqD78fTtc5vSNB0mY+W+kelfPzjGLE6YqAsR6eJA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 28.07.23 18:18, Linus Torvalds wrote: > On Thu, 27 Jul 2023 at 14:28, David Hildenbrand wrote: >> >> This is my proposal on how to handle the fallout of 474098edac26 >> ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") where I >> accidentially missed that follow_page() and smaps implicitly kept the >> FOLL_NUMA flag clear by *not* setting it if FOLL_FORCE is absent, to >> not trigger faults on PROT_NONE-mapped PTEs. > > Ugh. I was hoping for that reaction, with the assumption that we would get something cleaner :) > > I hate how it uses FOLL_FORCE that is inherently scary. I hate FOLL_FORCE, but I hate FOLL_NUMA even more, because to me it is FOLL_FORCE in disguise (currently and before 474098edac26, if FOLL_FORCE is set, FOLL_NUMA won't be set and the other way around). > > Why do we have that "gup_can_follow_protnone()" logic AT ALL? That's what I was hoping for. > > Couldn't we just get rid of that disgusting thing, and just say that > GUP (and follow_page()) always just ignores NUMA hinting, and always > just follows protnone? > > We literally used to have this: > > if (!(gup_flags & FOLL_FORCE)) > gup_flags |= FOLL_NUMA; > > ie we *always* set FOLL_NUMA for any sane situation. FOLL_FORCE should > be the rare crazy case. Yes, but my point would be that we now spell that "rare crazy case" out for follow_page(). If you're talking about patch #1, I agree, therefore patch #3 to avoid all that nasty FOLL_FORCE handling in GUP callers. But yeah, if we can avoid all that, great. > > The original reason for not setting FOLL_NUMA all the time is > documented in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting > page faults from gup/gup_fast") from way back in 2012: > > * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault > * would be called on PROT_NONE ranges. We must never invoke > * handle_mm_fault on PROT_NONE ranges or the NUMA hinting > * page faults would unprotect the PROT_NONE ranges if > * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd > * bitflag. So to avoid that, don't set FOLL_NUMA if > * FOLL_FORCE is set. In handle_mm_fault(), we never call do_numa_page() if !vma_is_accessible(). Same for do_huge_pmd_numa_page(). So, if we would ever end up triggering a page fault on mprotect(PROT_NONE) ranges (i.e., via FOLL_FORCE), we would simply do nothing. At least that's the hope, I'll take a closer look just to make sure we're good on all call paths. > > but I don't think the original reason for this is *true* any more. > > Because then two years later in 2014, in commit c46a7c817e66 ("x86: > define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") > Mel made the code able to distinguish between PROT_NONE and NUMA > pages, and he changed the comment above too. CCing Mel. I remember that pte_protnone() can only distinguished between NUMA vs. actual mprotect(PROT_NONE) by looking at the VMA -- vma_is_accessible(). Indeed, include/linux/pgtable.h: /* * Technically a PTE can be PROTNONE even when not doing NUMA balancing but * the only case the kernel cares is for NUMA balancing and is only ever set * when the VMA is accessible. For PROT_NONE VMAs, the PTEs are not marked * _PAGE_PROTNONE so by default, implement the helper as "always no". It * is the responsibility of the caller to distinguish between PROT_NONE * protections and NUMA hinting fault protections. */ > > But I get the very very strong feeling that instead of changing the > comment, he should have actually removed the comment and the code. > > So I get the strong feeling that all these FOLL_NUMA games should just > go away. You removed the FOLL_NUMA bit, but you replaced it with using > FOLL_FORCE. > > So rather than make this all even more opaque and make it even harder > to figure out why we have that code in the first place, I think it > should all just be removed. At least to me, spelling FOLL_FORCE in follow_page() now out is much less opaque then getting that implied by lack of FOLL_NUMA. > > The original reason for FOLL_NUMA simply does not exist any more. We > know exactly when a page is marked for NUMA faulting, and we should > simply *ignore* it for GUP and follow_page(). > > I think we should treat a NUMA-faulting page as just being present > (and not NUMA-fault it). > > Am I missing something? There was the case for "FOLL_PIN represents application behavior and should trigger NUMA faults", but I guess that can be ignored. But it would be much better to just remove all that if we can. Let me look into some details. Thanks Linus! -- Cheers, David / dhildenb