From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2574C6FD1D for ; Tue, 4 Apr 2023 21:21:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BAAC6B0071; Tue, 4 Apr 2023 17:21:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16B406B0074; Tue, 4 Apr 2023 17:21:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00BCD6B0075; Tue, 4 Apr 2023 17:21:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DFFCF6B0071 for ; Tue, 4 Apr 2023 17:21:39 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AE76840205 for ; Tue, 4 Apr 2023 21:21:39 +0000 (UTC) X-FDA: 80644980318.11.11B09E5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 8B470C0012 for ; Tue, 4 Apr 2023 21:21:37 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZRhmqSYB; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680643297; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gsMxgLwi5eUVcBhQhI32IDdSM+QagF5X2xNccwIYBYA=; b=pEeLKp9q/QzR52tdbRMAdAZjyas2LLblQBplAYIyNkBwfbjj3XMDxPJhmZZaA5sjiBP1WK rhr9jdDrvwgy8IIlS0SvnbPN2cfdGpjUQ6ihlbyIVS0rPmqyvLQRwADXaVyuEmZ6Ta3n7r ZyG7tS40K+kXM0sbYbhbPj2N9dmJjE8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZRhmqSYB; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680643297; a=rsa-sha256; cv=none; b=urixGOr4kIZ5GLp9Bhs0ylKAGaqGOctgFChqeHAG5z1Ydj+Xq+//n1tXA7l3guF/vpQIvs fegtLlYf6ar9rwUdhNy7KIPYAkjNkXgxotSGbCnjxqD2ZoCd8nUdtfshtAV+JjOZJ6D/f/ GOW/ka6ITRwm6e6QZPWoz6xpViqlK0I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680643296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gsMxgLwi5eUVcBhQhI32IDdSM+QagF5X2xNccwIYBYA=; b=ZRhmqSYB3KFpOMRgTyW4AT6wDE847Zs1x4nU4exHujQ2d2MQt3kwHoryQPMx+fHRW4RFYB 2zDKFCULiRSu8WkX2tMjpoyzzLrl8KOhYgIsFAv43bJ9d4Jg6HWtGcM7H9BQq2Sc+05DPd rndcAiyJnQnIOqduknesJio+9kGdU3I= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-623-3Sw9GELMNUC1meHTJI58Bw-1; Tue, 04 Apr 2023 17:21:35 -0400 X-MC-Unique: 3Sw9GELMNUC1meHTJI58Bw-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-5b27caf4dffso26806d6.0 for ; Tue, 04 Apr 2023 14:21:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680643295; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gsMxgLwi5eUVcBhQhI32IDdSM+QagF5X2xNccwIYBYA=; b=fWwnmAX+xkAK/9sGN1EiBkh7sBCstdjNHUtGl1E2h9C6Kp7RThAs4SkKXpPGcSU8KP GAmdNNLIeKIDOXdzRcP3jXMruYLg8EDmOG32eqPOtUsd93RzXZMOR9IvBmBLFnMWCljE 5s2SM3m2MS4xTRAEI6JjjeX+4ZBhUlQOx6EONvHhlTik+YVimDnu7VVx3WwZRZiutRDR MtfoCw/TxTtlydadutLeNVKDe+/lIDc0VM/PrLSVjWmJetvnU3R+SlNevnzwgUNsHFFP vNLS7Vgk2mx/1/x/72K5Aup9d/SzfRXSC5iZmMZe665lYR6ZYUqj5SBPKmANR8Dja8RU Zuzg== X-Gm-Message-State: AAQBX9en/FG3MoExO1rLAuqKXr0lUVa4FYSuuRHv2ufSlk1eNgsQSsJk UDQb/q2oLBRyNU0Nhmg8IPD0jW75YDiKAXBFGl08YeNysjgJ5dOUaniauoNK8etTcnA35p/3noy OnAzdp20Fng8= X-Received: by 2002:a05:6214:5283:b0:5ab:28aa:2418 with SMTP id kj3-20020a056214528300b005ab28aa2418mr137499qvb.5.1680643294837; Tue, 04 Apr 2023 14:21:34 -0700 (PDT) X-Google-Smtp-Source: AKy350Z5G3zlEuI6W9NLsv9PcRg2wlevGozJifkmCtjgALWcuBkKDAlM0pkKRYAYK2uHBL9g7DG2Nw== X-Received: by 2002:a05:6214:5283:b0:5ab:28aa:2418 with SMTP id kj3-20020a056214528300b005ab28aa2418mr137479qvb.5.1680643294498; Tue, 04 Apr 2023 14:21:34 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id p12-20020a0cf68c000000b005dd8b9345f8sm3643736qvn.144.2023.04.04.14.21.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Apr 2023 14:21:33 -0700 (PDT) Date: Tue, 4 Apr 2023 17:21:31 -0400 From: Peter Xu To: David Stevens Cc: linux-mm@kvack.org, Hugh Dickins , Andrew Morton , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org Subject: Re: [PATCH v6 4/4] mm/khugepaged: maintain page cache uptodate flag Message-ID: References: <20230404120117.2562166-1-stevensd@google.com> <20230404120117.2562166-5-stevensd@google.com> MIME-Version: 1.0 In-Reply-To: <20230404120117.2562166-5-stevensd@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8B470C0012 X-Rspam-User: X-Stat-Signature: 1mek3xun5ibggg8ajmcfcqogknfn6erg X-HE-Tag: 1680643297-755437 X-HE-Meta: U2FsdGVkX1/0P9aBsj1wg0ASoq+JCP08UffvkSchO1PJOMgaLUYnXf6IQ/EUCa+mnHuK2DZShaRtvZk1ndbre4kIvyhAH6d+BeA3e2IL5MtRMx5e8nmQWYJYpUbTu+CPgH6Rl0cY5zNnTaI9AjWEjqgkKQqgeoQqLsJOvHZWYvuaX50t912MR5eDaZpAL794OpswLS/17d89lKaK+IXLO+NTcr88cE21m4aDWmIJ1OX4k8K7iJWlTJhlGlWYXi0pzmy4zYuGkkVDa0E1qsPwjOmDzIljud+/J3enQbwvvjaJlMTPLoSK+OX86o/SpoaSLVGzGCEsaIMVlRWs8sqpLdvBURiJHV0noXBFt2cVu7O5QL7iE5VSBoWXEHhwiQX/wbwQh3IYtrHdtPp2YNNU70ylMP26oFF6zJUbJn+hxzGCks8P4tc9r10qKc1fGECfuO6CDl+z0Secx4EZAd4RrRFbKx/UFhG+bjGHE2C4RZrd/cF1419PQ+f+8N0Z24snUdva4JtLOL7Az7RtRtDmf4Sq//q1d33cucwMtTSnljQfRFoYRz/BzXKH7XMeEXZDorJW4x8WdHIGjUJ7viRLNdp/J5gRczfGtqQ8rSTT/fwXHqTKCU3JsFbcUufPoD/OgAEvWZlc6zl6agiujwKKgBHnhSvlZUmQ1jtJYDgUfeynGx4PBH0QQTW9a/kXpSPwSCWOPr531npX04cihIIe0TNbHKoIp1b0x/CzVzPMe9Ysg6TcPYkTaOyvMIRwLg7kzpYiIPrbo2Y4ukNe+cnwbN4HjGwXjI7ifr4afCJwSO3aaR417P3y6CVMrYGbF07gaU4XkEkNLsRpGKRmZ/gf4HkwGqm42Y5ecRuLGl/0xXPXkXW+4MmMykFGIBWhb6j6intCbib7uwqF/Hy2L/rVTNUCj4EvWnm+Vc7NuZHTLUMnIZAXwRnOUKvC7J0hWss3QooLc7OW/vL4VeGOUJE 9GQyql0C 0btcoFpram++v2ek3YYSvX7jdOt2cH41k05z2CKKzcQ28SUHJ2UzPtynKFsEoTebI1ch6HVcox32s2lYTA6v0v2uzZ/2psCmLx9R9huLebx1n4JdJuK7OE/kImYK8KHVlW4tKdeOKUzdjTJa2lfsZ14JansD+7qhKzSxELeUQfgwgAkwADA3Od0sImER/2bjh0Jg0Tz1y8mRmNbtLwN2WPIqo/q4kWwM41DFhcQ9WAvo2Pi7ga84xbJTk7hySVeAzXe4ygPkLjarYmdDN8QyBkRXcYHyhTV1WxVqfH7NUSlVuK5HQvB6vB02aYfj5TwJsqFs7yXuQk1/78+lVWb/cqCMb6VuVOZFQzQ1SifrSXkWKSvJFtvsc39Uubp3y9hRBw3x1kA9Tz9Kq5at/bUXb6FXzF9zvxJHd70np7z+VYUdR+lEi/MrAqrTk/w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 04, 2023 at 09:01:17PM +0900, David Stevens wrote: > From: David Stevens > > Make sure that collapse_file doesn't interfere with checking the > uptodate flag in the page cache by only inserting hpage into the page > cache after it has been updated and marked uptodate. This is achieved by > simply not replacing present pages with hpage when iterating over the > target range. > > The present pages are already locked, so replacing them with the locked > hpage before the collapse is finalized is unnecessary. However, it is > necessary to stop freezing the present pages after validating them, > since leaving long-term frozen pages in the page cache can lead to > deadlocks. Simply checking the reference count is sufficient to ensure > that there are no long-term references hanging around that would the > collapse would break. Similar to hpage, there is no reason that the > present pages actually need to be frozen in addition to being locked. > > This fixes a race where folio_seek_hole_data would mistake hpage for > an fallocated but unwritten page. This race is visible to userspace via > data temporarily disappearing from SEEK_DATA/SEEK_HOLE. This also fixes > a similar race where pages could temporarily disappear from mincore. > > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") > Signed-off-by: David Stevens > --- > mm/khugepaged.c | 79 ++++++++++++++++++------------------------------- > 1 file changed, 29 insertions(+), 50 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 7679551e9540..a19aa140fd52 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1855,17 +1855,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > * > * Basic scheme is simple, details are more complex: > * - allocate and lock a new huge page; > - * - scan page cache replacing old pages with the new one > + * - scan page cache, locking old pages > * + swap/gup in pages if necessary; > - * + keep old pages around in case rollback is required; > + * - copy data to new page > + * - handle shmem holes > + * + re-validate that holes weren't filled by someone else > + * + check for userfaultfd PS: some of the changes may belong to previous patch here, but not necessary to repost only for this, just in case there'll be a new one. > * - finalize updates to the page cache; > * - if replacing succeeds: > - * + copy data over; > - * + free old pages; > * + unlock huge page; > + * + free old pages; > * - if replacing failed; > - * + put all pages back and unfreeze them; > - * + restore gaps in the page cache; > + * + unlock old pages > * + unlock and free huge page; > */ > static int collapse_file(struct mm_struct *mm, unsigned long addr, > @@ -1913,12 +1914,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > } > } while (1); > > - /* > - * At this point the hpage is locked and not up-to-date. > - * It's safe to insert it into the page cache, because nobody would > - * be able to map it or use it in another way until we unlock it. > - */ > - > xas_set(&xas, start); > for (index = start; index < end; index++) { > page = xas_next(&xas); > @@ -2076,12 +2071,16 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > VM_BUG_ON_PAGE(page != xas_load(&xas), page); > > /* > - * The page is expected to have page_count() == 3: > + * We control three references to the page: > * - we hold a pin on it; > * - one reference from page cache; > * - one from isolate_lru_page; > + * If those are the only references, then any new usage of the > + * page will have to fetch it from the page cache. That requires > + * locking the page to handle truncate, so any new usage will be > + * blocked until we unlock page after collapse/during rollback. > */ > - if (!page_ref_freeze(page, 3)) { > + if (page_count(page) != 3) { > result = SCAN_PAGE_COUNT; > xas_unlock_irq(&xas); > putback_lru_page(page); Personally I don't see anything wrong with this change to resolve the dead lock. E.g. fast gup race right before unmapping the pgtables seems fine, since we'll just bail out with >3 refcounts (or fast-gup bails out by checking pte changes). Either way looks fine here. So far it looks good to me, but that may not mean much per the history on what I can overlook. It'll be always good to hear from Hugh and others. -- Peter Xu