From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24259C001E0 for ; Fri, 28 Jul 2023 10:13:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5816A6B0072; Fri, 28 Jul 2023 06:13:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 531308D0001; Fri, 28 Jul 2023 06:13:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FACE6B0075; Fri, 28 Jul 2023 06:13:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2FDE46B0072 for ; Fri, 28 Jul 2023 06:13:06 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D217EC119E for ; Fri, 28 Jul 2023 10:13:05 +0000 (UTC) X-FDA: 81060607530.26.FA7542A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 86CE2180016 for ; Fri, 28 Jul 2023 10:13:03 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jFnf9neN; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690539183; a=rsa-sha256; cv=none; b=HzjHvG1pYMSqX04bfqL7eylfcK20HZZ1CwP/jjT9mB2KonhB/vjvKAJGpUlckPvJ/lWlIZ aJ20RBuSD02nkpyTrMsoHjvpD3dvT/izJ9LNTbELDKd7ErsfwJB/wMfrdwjUOelewGs0eD SO9+sRJQ8LlzJCzxxzSsUb0FsxqCRi4= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jFnf9neN; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690539183; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FBSITkR13Me7nA0UvBku+KqeFgHRRHFCLDJ6AjKdk9M=; b=6rc5tCsLWB4VUchMF/kTm0mpc8FJnWiR2FrBkIxoE0YuLC8FMJ83/LNAqkGWiKxqPsW/1M 6woQ6RALx7HYs9VK7wCH8LChHO5AO+ShweWhVzBXJza0rLavbA7J8T3kAR7iHW17lkF5PW 87++24stbYvQNhfIhGd5sZT2lqcqNBk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690539182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FBSITkR13Me7nA0UvBku+KqeFgHRRHFCLDJ6AjKdk9M=; b=jFnf9neNBpjpv18J8HRR/RFkfea9B2RRvj3NPREFacHhlxGzuKPvkM6B9Fr4dQKvkMdOfd TZF5iU9MSiqMyexc7dARDEqlEs00O1u3Z/VX0DDAk+rlCE/7e6lP/yojTfh/CEOitcdH9S 5sbHd856yNCWhUMnzPT6qnAylDU3UYY= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-537-MmObyPbTO4a9Mji5417T6A-1; Fri, 28 Jul 2023 06:13:01 -0400 X-MC-Unique: MmObyPbTO4a9Mji5417T6A-1 Received: by mail-lj1-f197.google.com with SMTP id 38308e7fff4ca-2b9bb2d0b1bso17026171fa.0 for ; Fri, 28 Jul 2023 03:13:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690539180; x=1691143980; h=content-transfer-encoding:in-reply-to:subject:organization :references:cc:to:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FBSITkR13Me7nA0UvBku+KqeFgHRRHFCLDJ6AjKdk9M=; b=hpPL1Mm+YroiJHh+ZcXIkqwdl+zcG2Zcis7OD5rMzdHYS+kjQ7z4IMLR2Pk+yG1Ws0 6vQYFtHYDU503wCO3IinDq9pspXoIJbnN9Mp+gi+3GLnv8/jrK9Yw2RcpYsOmJCtvLgX 09b9FvDsc62I6ADCtmBiQ+ZM5alyUBUwIDz31QUL5L39pn4gMXSvmIOdyyaiT/xe0OEi 6GG1M+qekTq39j1iAEUjQboG+vJhVfLqg2kJ1ycGSWmoL6xwxLt0mkt15tMitANLwoMW f576k/a6GYo/YLszILTIDA/L2qAvGBfg92FgraQIXv61GneZk7K+wuCc7O+NVKL1hg3l /BiQ== X-Gm-Message-State: ABy/qLZ1CQXDMsjqIBhKIJO4UV2XLBZWQug6enVWJd1rGpaZJm7Bed3g kMUpb+Nmz9ODzpPLQIu/5AKr0wMoGEbrHoM+ys2x34MdZzkQukNoDz32UFGu6nEZYvZG/kz5sH0 K4Fg7bIOjsxw= X-Received: by 2002:a2e:9d84:0:b0:2b9:d07f:ee50 with SMTP id c4-20020a2e9d84000000b002b9d07fee50mr1233688ljj.30.1690539179753; Fri, 28 Jul 2023 03:12:59 -0700 (PDT) X-Google-Smtp-Source: APBJJlGkAr1jxNT4mmSwfn2e2XtO3UnoTO1B9ASM5Fa1JtMt3k6efdjWl5GqbHAD0g1vAdfROlzYmg== X-Received: by 2002:a2e:9d84:0:b0:2b9:d07f:ee50 with SMTP id c4-20020a2e9d84000000b002b9d07fee50mr1233664ljj.30.1690539179271; Fri, 28 Jul 2023 03:12:59 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:6b00:bf49:f14b:380d:f871? (p200300cbc7066b00bf49f14b380df871.dip0.t-ipconnect.de. [2003:cb:c706:6b00:bf49:f14b:380d:f871]) by smtp.gmail.com with ESMTPSA id t25-20020a7bc3d9000000b003fc01495383sm6592524wmj.6.2023.07.28.03.12.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jul 2023 03:12:58 -0700 (PDT) Message-ID: <13b14aa6-302e-63cc-2a99-f5c22b9931fc@redhat.com> Date: Fri, 28 Jul 2023 12:12:57 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 From: David Hildenbrand To: John Hubbard , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , stable@vger.kernel.org References: <20230727212845.135673-1-david@redhat.com> <20230727212845.135673-3-david@redhat.com> <55c92738-e402-4657-3d46-162ad2c09d68@nvidia.com> <9de80e22-e89f-2760-34f4-61be5f8fd39c@redhat.com> Organization: Red Hat Subject: Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs In-Reply-To: <9de80e22-e89f-2760-34f4-61be5f8fd39c@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 86CE2180016 X-Stat-Signature: yc47rkxfr5soy5f88nk8pszjtmnqcy68 X-Rspam-User: X-HE-Tag: 1690539183-638043 X-HE-Meta: U2FsdGVkX1/38KAAeInZeXVaI7GDPemU5S1hhep0jiRyohOCNHeT6zn/oZB7kdUoNvxKW/IEM/x7JPxTq2/Ax9yLwyaSjVNiRUOCF77OYbFsj+2NPpVHJuisjPOBYFxyZPDB2wAJ6EesZE8X/7VJVc+kTq5lal0/Iv+dxZ6oU9eQFBYo8fhnNdg2zP3P0JmDlT3E2865YcfDFtd9/Dx8zsjv0SrCkahEb0OzniNcp9Y76QD2t0iccOyJsh0hM3w2KIOGLXiGO5UyO5rRrukzHcF98WP4285vwOMOO2ODwE7uJ1SL9bTNOvogrj9OoAo2YtZqRbvpiMtyLj4QrDjramv6++V3Y3Bft0uA1yh5Wwq/B0XlNE65bKN6yG2B8nvjJhcK/vccEvI4rfWaK/wvgSU+rJtLXWXdSlOjm2JSQQvW74hVCMMecxhpwtFMbTXgyKuHN/nNm7FvXBKTv7jZnpN8571b/bx7ZR9VbxMkARY/kNaiufdan8V7soyR7mfOmj5OIGu3ip8YITQDIRTmdNYiCwOHiQwDZVz1LFEf++N2vtVFIFNRwVF/GdGeT26CnLs0tgW/YvGR5InSfO93VpLx2l7Fu13+jLKg7wadDrRFEQhtnX3KWshQuimf4WmAMLAdAQ1QwO3pgpc/3ReQ4+XvzVfbgNXuuBVO5d9X7oWKWbzs4ONcPlTSQJfYd/iROUkwtWTyTE03+K4Qz70spMU8GomfkTKn03Yums4aEj3XFFu+3ojR32uaEUW9PXOi/i2f6GS2elhuJ5Bdn/vt436c15HH3gkxiM2bgJ+T8/ZhCE3SI1T2VZd8kuTPEWFHeKnFn/e/fFWYsVaF8ZCUZHCDXZWU0nIsnhou+LjegkasvO0gYDP0sn8sj7FmVmMHsGeKdBc2BfVbNU/s78ie6djUFBHZMcVt72anKDwD48LeDapufcTVXMuGTEu8OUyKG26ITOR0rCwL9jX8qvj G5jGALOx jtQ7a3b3V2QpPt/5/OgPYxehwRnKBiCSvx2tzz828VlAKsh+3Qbwg2+IzEV4LDFL2qWllt/Jebl9PZD3U4e5aIN/IYp1aChUowFlYobHqYLGhZZKV/6VrVpR1fxiCS8D80lu4ibqSsw8YS2H0h9gq6FTL4MfnXmcMD/BC6ICgseSuSOLbiew0ms51HONJI+7fACyHz2RhbH2xGs5qyw1B+dIVa9FXtrzx/wQwNHNvX5+dSqtX1E2Ef2l+rM+0pc/CG2HO51tCexprSw2dQ5qc9hGuNap15r0NCzgSFmc/zsMDFBw7KUQg64uAWlkV1G8Uvr/5cHQ7Lfm2Irb+moAnL1Cvxz66ruE08yOLSdoK5PzsUW6KmQg/0aQnuh735cY83fv5cY5wP+iPhP20gWyDivwGAy5S2SOyU0DZ2I6bF4tXJz7Biad/aJmvDrmTNfBuKBKeZxKoWShU4/5sgGPND6k+Fqs/teoO1CXGARr5lO23kKhzTq5n4ZwvJ6wd92QEH9TGt2IYnEv8HAo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 28.07.23 11:08, David Hildenbrand wrote: > On 28.07.23 04:30, John Hubbard wrote: >> On 7/27/23 14:28, David Hildenbrand wrote: >>> We accidentally enforced PROT_NONE PTE/PMD permission checks for >>> follow_page() like we do for get_user_pages() and friends. That was >>> undesired, because follow_page() is usually only used to lookup a currently >>> mapped page, not to actually access it. Further, follow_page() does not >>> actually trigger fault handling, but instead simply fails. >> >> I see that follow_page() is also completely undocumented. And that >> reduces us to deducing how it should be used...these things that >> change follow_page()'s behavior maybe should have a go at documenting >> it too, perhaps. > > I can certainly be motivated to do that. :) > >> >>> >>> Let's restore that behavior by conditionally setting FOLL_FORCE if >>> FOLL_WRITE is not set. This way, for example KSM and migration code will >>> no longer fail on PROT_NONE mapped PTEs/PMDS. >>> >>> Handling this internally doesn't require us to add any new FOLL_FORCE >>> usage outside of GUP code. >>> >>> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA >>> permission checks like in check_vma_flags(), so especially >>> FOLL_FORCE|FOLL_WRITE would be dodgy. >>> >>> This issue was identified by code inspection. We'll add some >>> documentation regarding FOLL_FORCE next. >>> >>> Reported-by: Peter Xu >>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") >>> Cc: >>> Signed-off-by: David Hildenbrand >>> --- >>> mm/gup.c | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 2493ffa10f4b..da9a5cc096ac 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, >>> if (vma_is_secretmem(vma)) >>> return NULL; >>> >>> - if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) >>> + if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE))) >>> return NULL; >> >> This is not a super happy situation: follow_page() is now prohibited >> (see above: we should document that interface) from passing in >> FOLL_FORCE... > > I guess you saw my patch #4. > > If you take a look at the existing callers (that are fortunately very > limited), you'll see that nobody cares. > > Most of the FOLL flags don't make any sense for follow_page(), and > limiting further (ab)use is at least to me very appealing. > >> >>> >>> + /* >>> + * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages >>> + * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's >>> + * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set. >>> + */ >>> + if (!(foll_flags & FOLL_WRITE)) >>> + foll_flags |= FOLL_FORCE; >>> + >> >> ...but then we set it anyway, for special cases. It's awkward because >> FOLL_FORCE is not an "internal to gup" flag (yet?). >> >> I don't yet have suggestions, other than: >> >> 1) Yes, the FOLL_NUMA made things bad. >> >> 2) And they are still very confusing, especially the new use of >> FOLL_FORCE. >> >> ...I'll try to let this soak in and maybe recommend something >> in a more productive way. :) > > What I can offer that might be very appealing is the following: > > Get rid of the flags parameter for follow_page() *completely*. Yes, then > we can even rename FOLL_ to something reasonable in the context where it > is nowadays used ;) > > > Internally, we'll then set > > FOLL_GET | FOLL_DUMP | FOLL_FORCE > > and document exactly what this functions does. Any user that needs > something different should just look into using get_user_pages() instead. > > I can prototype that on top of this work easily. The end result looks something like: /** * follow_page - look up and reference a page descriptor from a user-virtual * address * @vma: vm_area_struct mapping @address * @address: virtual address to look up * * follow_page() will look up the page mapped at the given address and * take a reference on the page. The returned page has to be released using * put_page(). * * follow_page() will not return special (like zero) pages and does not check * PTE protection: the returned page might be mapped PROT_NONE, R/O or R/W. * Consequently, follow_page() will not trigger NUMA hinting faults. * * follow_page() does not trigger page faults. If no page is mapped, or * a special (like zero) page is mapped, it returns %NULL or an error pointer. * * Note: new users with different requirements are probably better off using * one of the get_user_pages() variants or one of the walk_page_range() * variants. * * Return: the mapped (struct page *), %NULL if no mapping exists, or * an error pointer if there is a mapping to something not represented * by a page descriptor (see also vm_normal_page()) or the zero page. */ struct page *follow_page(struct vm_area_struct *vma, unsigned long address) { struct follow_page_context ctx = { NULL }; unsigned long gup_flags; struct page *page; if (vma_is_secretmem(vma)) return NULL; /* * FOLL_GET: We always want a reference on the returned page. * FOL_DUMP: Ignore special (like zero) pages. * FOLL_FORCE: Succeeded on PROT_NONE-mapped pages. */ gup_flags = FOLL_GET | FOLL_DUMP | FOLL_FORCE; page = follow_page_mask(vma, address, gup_flags, &ctx); if (ctx.pgmap) put_dev_pagemap(ctx.pgmap); return page; } -- Cheers, David / dhildenb