From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0CCAC433F5 for ; Wed, 13 Apr 2022 10:28:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 475096B0072; Wed, 13 Apr 2022 06:28:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 425766B0073; Wed, 13 Apr 2022 06:28:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C59F6B0074; Wed, 13 Apr 2022 06:28:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 125F16B0072 for ; Wed, 13 Apr 2022 06:28:22 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id CBCE91211BC for ; Wed, 13 Apr 2022 10:28:21 +0000 (UTC) X-FDA: 79351481202.08.C755BD7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 30A7340004 for ; Wed, 13 Apr 2022 10:28:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649845700; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9cvessJekG7n0XRzYWC8tIHQh00KK0c1sN5UMgfc5VQ=; b=ahzt2YcY3ae82eZsYPB9zjeHSNsms0T3GEpRgocyWsxDlK2FNP8oqhEt+pd2tTQ61Vzqcg 6S01L0hjVmPibGG8zPNkCY/4ZEGJjua3pizhF2B8I8EpkWftjCkQrlh+XgEiU+zeuJbRk6 zjxKf9WAwfZkdAlLQsLYbDXZUzYOtws= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-588-2rMadnN6N8yaeCUvZVGs1g-1; Wed, 13 Apr 2022 06:28:19 -0400 X-MC-Unique: 2rMadnN6N8yaeCUvZVGs1g-1 Received: by mail-wm1-f70.google.com with SMTP id q25-20020a1ce919000000b0038ead791083so649857wmc.6 for ; Wed, 13 Apr 2022 03:28:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=9cvessJekG7n0XRzYWC8tIHQh00KK0c1sN5UMgfc5VQ=; b=Axg7BJTSoDidap11Nc+9wpj3UUA6kkq7PLxc+YLRp0rOJFFzEUVSUgfFdtoeNlGjqn yFjAGeqoTxcZwYhQaBWnoJtV3Td3i+hdC+rqMiQGf+0WSvLukXQFPnbdSj5XbMdOgWnU hRNwyVUt2yU6IjJicctQFLVScob58SVWHEKn5aopk7ktk9SuNPtjBj5Fi9CGlPgArIEV 6nZQYXIiT6EPEV/wZN48fa36PTrOETSxH0UjRXeSNa1rg+rSh3aD/xNnOTjkarZtLDcy 6cLDkCXPDB+sDiKacnYs7wMyluntEQN/wVGsA2Hhanqm+/g+ok39w5/7HeovsPL4w9ez ZlPw== X-Gm-Message-State: AOAM5307YRbbxNZDrzNNjNQhJOk3JpKiE0Rs4gcbdTxTb5Q3GmvSp7QC 8D+qqountewRLNAV7eFr4o+s6bFjKtxKE1iLgZglPnBq+CQpKEPxjMOTDhOYP/B81CXXqYxCs3l 7tIYplveD4rc= X-Received: by 2002:adf:fb48:0:b0:203:f986:874a with SMTP id c8-20020adffb48000000b00203f986874amr31815056wrs.614.1649845698371; Wed, 13 Apr 2022 03:28:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDjAyzY9+bbZ4QmogJ/ZGciXVpxXUjcRfRJYBteQAreyavACUuv0Zv0uFkHtPAw+KcIuCsvw== X-Received: by 2002:adf:fb48:0:b0:203:f986:874a with SMTP id c8-20020adffb48000000b00203f986874amr31815021wrs.614.1649845697983; Wed, 13 Apr 2022 03:28:17 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:5800:1078:ebb9:e2c3:ea8c? (p200300cbc70458001078ebb9e2c3ea8c.dip0.t-ipconnect.de. [2003:cb:c704:5800:1078:ebb9:e2c3:ea8c]) by smtp.gmail.com with ESMTPSA id r9-20020a05600c320900b0038f0894d80csm866511wmp.7.2022.04.13.03.28.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Apr 2022 03:28:17 -0700 (PDT) Message-ID: Date: Wed, 13 Apr 2022 12:28:15 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 Subject: Re: [PATCH v3 11/16] mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon() pages To: Vlastimil Babka , linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Pedro Gomes , Oded Gabbay , linux-mm@kvack.org References: <20220329160440.193848-1-david@redhat.com> <20220329160440.193848-12-david@redhat.com> <84c0bcbb-5c8f-d3b2-2a8c-d68462d0bc04@suse.cz> From: David Hildenbrand Organization: Red Hat In-Reply-To: <84c0bcbb-5c8f-d3b2-2a8c-d68462d0bc04@suse.cz> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: bkhg5qzqcdxu99eyan6rujdheyentr67 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ahzt2YcY; spf=none (imf07.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 30A7340004 X-HE-Tag: 1649845701-378212 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 13.04.22 10:25, Vlastimil Babka wrote: > On 3/29/22 18:04, David Hildenbrand wrote: >> The basic question we would like to have a reliable and efficient answer >> to is: is this anonymous page exclusive to a single process or might it >> be shared? We need that information for ordinary/single pages, hugetlb >> pages, and possibly each subpage of a THP. >> >> Introduce a way to mark an anonymous page as exclusive, with the >> ultimate goal of teaching our COW logic to not do "wrong COWs", whereby >> GUP pins lose consistency with the pages mapped into the page table, >> resulting in reported memory corruptions. >> >> Most pageflags already have semantics for anonymous pages, however, >> PG_mappedtodisk should never apply to pages in the swapcache, so let's >> reuse that flag. >> >> As PG_has_hwpoisoned also uses that flag on the second tail page of a >> compound page, convert it to PG_error instead, which is marked as >> PF_NO_TAIL, so never used for tail pages. >> >> Use custom page flag modification functions such that we can do >> additional sanity checks. The semantics we'll put into some kernel doc >> in the future are: >> >> " >> PG_anon_exclusive is *usually* only expressive in combination with a >> page table entry. Depending on the page table entry type it might >> store the following information: >> >> Is what's mapped via this page table entry exclusive to the >> single process and can be mapped writable without further >> checks? If not, it might be shared and we might have to COW. >> >> For now, we only expect PTE-mapped THPs to make use of >> PG_anon_exclusive in subpages. For other anonymous compound >> folios (i.e., hugetlb), only the head page is logically mapped and >> holds this information. >> >> For example, an exclusive, PMD-mapped THP only has PG_anon_exclusive >> set on the head page. When replacing the PMD by a page table full >> of PTEs, PG_anon_exclusive, if set on the head page, will be set on >> all tail pages accordingly. Note that converting from a PTE-mapping >> to a PMD mapping using the same compound page is currently not >> possible and consequently doesn't require care. >> >> If GUP wants to take a reliable pin (FOLL_PIN) on an anonymous page, >> it should only pin if the relevant PG_anon_bit is set. In that case, > > ^ PG_anon_exclusive bit ? > >> the pin will be fully reliable and stay consistent with the pages >> mapped into the page table, as the bit cannot get cleared (e.g., by >> fork(), KSM) while the page is pinned. For anonymous pages that >> are mapped R/W, PG_anon_exclusive can be assumed to always be set >> because such pages cannot possibly be shared. >> >> The page table lock protecting the page table entry is the primary >> synchronization mechanism for PG_anon_exclusive; GUP-fast that does >> not take the PT lock needs special care when trying to clear the >> flag. >> >> Page table entry types and PG_anon_exclusive: >> * Present: PG_anon_exclusive applies. >> * Swap: the information is lost. PG_anon_exclusive was cleared. >> * Migration: the entry holds this information instead. >> PG_anon_exclusive was cleared. >> * Device private: PG_anon_exclusive applies. >> * Device exclusive: PG_anon_exclusive applies. >> * HW Poison: PG_anon_exclusive is stale and not changed. >> >> If the page may be pinned (FOLL_PIN), clearing PG_anon_exclusive is >> not allowed and the flag will stick around until the page is freed >> and folio->mapping is cleared. > > Or also if it's unpinned? I'm afraid I didn't get your question. Once the page is no longer pinned, we can succeed in clearing PG_anon_exclusive (just like pinning never happened). Does that answer your question? > >> " >> >> We won't be clearing PG_anon_exclusive on destructive unmapping (i.e., >> zapping) of page table entries, page freeing code will handle that when >> also invalidate page->mapping to not indicate PageAnon() anymore. >> Letting information about exclusivity stick around will be an important >> property when adding sanity checks to unpinning code. >> >> Note that we properly clear the flag in free_pages_prepare() via >> PAGE_FLAGS_CHECK_AT_PREP for each individual subpage of a compound page, >> so there is no need to manually clear the flag. >> >> Signed-off-by: David Hildenbrand > > Acked-by: Vlastimil Babka Thanks! > >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3663,6 +3663,17 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> goto out_nomap; >> } >> >> + /* >> + * PG_anon_exclusive reuses PG_mappedtodisk for anon pages. A swap pte >> + * must never point at an anonymous page in the swapcache that is >> + * PG_anon_exclusive. Sanity check that this holds and especially, that >> + * no filesystem set PG_mappedtodisk on a page in the swapcache. Sanity >> + * check after taking the PT lock and making sure that nobody >> + * concurrently faulted in this page and set PG_anon_exclusive. >> + */ >> + BUG_ON(!PageAnon(page) && PageMappedToDisk(page)); >> + BUG_ON(PageAnon(page) && PageAnonExclusive(page)); >> + > > Hmm, dunno why not VM_BUG_ON? Getting PageAnonExclusive accidentally set by a file system would result in an extremely unpleasant security issue. I most surely want to catch something like that in any case, especially in the foreseeable future. -- Thanks, David / dhildenb