From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C869C433F5 for ; Thu, 17 Feb 2022 06:08:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E76866B0075; Thu, 17 Feb 2022 01:08:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DFC536B0078; Thu, 17 Feb 2022 01:08:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C505D6B007B; Thu, 17 Feb 2022 01:08:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id B2E026B0075 for ; Thu, 17 Feb 2022 01:08:04 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5FC6293D95 for ; Thu, 17 Feb 2022 06:08:04 +0000 (UTC) X-FDA: 79151241288.25.A83C8FF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id CF4DC1C0002 for ; Thu, 17 Feb 2022 06:08:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645078083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZSU30vCJtk7GotZgltWgn1HG21KfRp7j8JvnndBOhWw=; b=aae/IZX3xJWhqhMDwyYfYLsazld5ideSAqvBOew0caLV7bjp/PhQGvoEiAeL6E6xI3r8xX nwnBYHp4kIPZr7jUPsHusvYaQzPp/FC1fJ9GBfc670mV5zD6LTOWNK7I2KDEXD8gYWLWGT sqVbrR8JM3JBLE0G8G9HshpHXw9KW7I= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-16-I2vvCoKcMrqKPHpxyYII4w-1; Thu, 17 Feb 2022 01:08:02 -0500 X-MC-Unique: I2vvCoKcMrqKPHpxyYII4w-1 Received: by mail-pj1-f69.google.com with SMTP id jf17-20020a17090b175100b001b90cf26a4eso2804792pjb.3 for ; Wed, 16 Feb 2022 22:08:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZSU30vCJtk7GotZgltWgn1HG21KfRp7j8JvnndBOhWw=; b=UzoadJLZCeAmrD1aIJwu79kY88Q4GzH8v8tCXpTBGBiGpYjjV0BB9tUIStji+BOAwz 8Eg3ZKvFhxYKvYirn9TrD6CkjfP2koNj6QvA4jffEWqlNdFYFB5BCLG2FJe90JPyT0hr wGs96jwLjCbroWhY0ALT9oToi61VnRrt9e9GinbN+6f2PklxwciPjJ8WPipos7NtfoGB up/lxVLDdY556qqAI3sjhFy9fxodpr1niEYhY43/LhznOx4LJYudCT785vT6P2/1pw3L 5j43k/bdU/XqrrmLnhO1FJWrQJ4IPzO//KvC+5X5OuQk85q7F630AcZOWf+CYSFnQ8Ut Tabg== X-Gm-Message-State: AOAM532VfTX8KZA9yIl5TT884eXKurdXIFAhwZB/t5Vepbx2hMvIT77p nAyy4J4RvLLY93in2L3jhYzEfetGtSoQ/scjOAV37idSXv3pYY56i4M6uv0dpKiPG63n4Zqu53l 5nprAp6iHxHMEJ9tGpwMCYDH3cpua3YQenfzHnX7MscqdAJ7zDCr+E310Ae0C X-Received: by 2002:a17:902:da91:b0:14d:58de:5667 with SMTP id j17-20020a170902da9100b0014d58de5667mr1392103plx.95.1645078080733; Wed, 16 Feb 2022 22:08:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJyDUnDyEDIU63kvmjH9FX83zNUGtXLAelwg+LX3uRrYzwGaMsuEvfOVDWTrPD9itq0ndmmSlQ== X-Received: by 2002:a17:902:da91:b0:14d:58de:5667 with SMTP id j17-20020a170902da9100b0014d58de5667mr1392070plx.95.1645078080296; Wed, 16 Feb 2022 22:08:00 -0800 (PST) Received: from localhost.localdomain ([94.177.118.126]) by smtp.gmail.com with ESMTPSA id j8sm224230pjc.11.2022.02.16.22.07.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 16 Feb 2022 22:08:00 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Alistair Popple , Matthew Wilcox , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli , Hugh Dickins , Yang Shi , Vlastimil Babka , John Hubbard , Andrew Morton , "Kirill A . Shutemov" Subject: [PATCH v5 1/4] mm: Don't skip swap entry even if zap_details specified Date: Thu, 17 Feb 2022 14:07:43 +0800 Message-Id: <20220217060746.71256-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220217060746.71256-1-peterx@redhat.com> References: <20220217060746.71256-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: CF4DC1C0002 X-Stat-Signature: byj4bgjymzehwzufh9c9qraqcft8mpoe Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="aae/IZX3"; spf=none (imf20.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1645078083-475589 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The "details" pointer shouldn't be the token to decide whether we should skip swap entries. For example, when the callers specified details->zap_mapping=3D=3DNULL, i= t means the user wants to zap all the pages (including COWed pages), then w= e need to look into swap entries because there can be private COWed pages that was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly leaving some of the swap entries while we should have zapped them. A reproducer of the problem: =3D=3D=3D8<=3D=3D=3D #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size =3D getpagesize(); shmem_fd =3D memfd_create("test", 0); assert(shmem_fd >=3D 0); ret =3D ftruncate(shmem_fd, page_size * 2); assert(ret =3D=3D 0); buffer =3D mmap(NULL, page_size * 2, PROT_READ | PROT_WRI= TE, MAP_PRIVATE, shmem_fd, 0); assert(buffer !=3D MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] =3D 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret =3D ftruncate(shmem_fd, page_size); assert(ret =3D=3D 0); /* Recover the size */ ret =3D ftruncate(shmem_fd, page_size * 2); assert(ret =3D=3D 0); /* Re-read the data, it should be all zero */ val =3D buffer[page_size]; if (val =3D=3D 0) printf("Good\n"); else printf("BUG\n"); } =3D=3D=3D8<=3D=3D=3D We don't need to touch up the pmd path, because pmd never had a issue wit= h swap entries. For example, shmem pmd migration will always be split into pte level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhil= e we should do the same check upon migration entry, hwpoison entry and genuine swap entries too. To be explicit, we should still remember to keep the private entries if even_cows=3D=3Dfalse, and always zap them when even_cows=3D=3Dtrue. The issue seems to exist starting from the initial commit of git. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/memory.c | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..533da5d6c32c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1313,6 +1313,17 @@ struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ }; =20 +/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->zap_mapping; +} + /* * We set details->zap_mapping when we want to unmap shared but keep pri= vate * pages. Return true if skip zapping this page, false otherwise. @@ -1320,11 +1331,15 @@ struct zap_details { static inline bool zap_skip_check_mapping(struct zap_details *details, struct page *page) { - if (!details || !page) + /* If we can make a decision without *page.. */ + if (should_zap_cows(details)) + return false; + + /* E.g. the caller passes NULL for the case of a zero page */ + if (!page) return false; =20 - return details->zap_mapping && - (details->zap_mapping !=3D page_rmapping(page)); + return details->zap_mapping !=3D page_rmapping(page); } =20 static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1405,17 +1420,24 @@ static unsigned long zap_pte_range(struct mmu_gat= her *tlb, continue; } =20 - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* Genuine swap entry, hence a private anon page */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page; =20 page =3D pfn_swap_entry_to_page(entry); + if (zap_skip_check_mapping(details, page)) + continue; rss[mm_counter(page)]--; + } else if (is_hwpoison_entry(entry)) { + if (!should_zap_cows(details)) + continue; + } else { + /* We should have covered all the swap entry types */ + WARN_ON_ONCE(1); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); --=20 2.32.0