From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FAEBC05027 for ; Tue, 7 Feb 2023 01:37:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B58E26B0071; Mon, 6 Feb 2023 20:37:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B093A6B0073; Mon, 6 Feb 2023 20:37:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D0FF6B0074; Mon, 6 Feb 2023 20:37:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8DBE16B0071 for ; Mon, 6 Feb 2023 20:37:21 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 61875A9740 for ; Tue, 7 Feb 2023 01:37:21 +0000 (UTC) X-FDA: 80438783082.17.D07D7BD Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf06.hostedemail.com (Postfix) with ESMTP id 799AE18000F for ; Tue, 7 Feb 2023 01:37:19 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=kBifM5uL; spf=pass (imf06.hostedemail.com: domain of stevensd@chromium.org designates 209.85.167.48 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675733839; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eU++fqEUJX8cwR4D2NbMbR+6raSWlUuTlSOTzvIMyWg=; b=tf4c+2KSrRofu5K1M43XXX+wgTGIlOgKZvzYtfHpjy19xohU8kdR5pdOpTqvLvUiVbUjfp OXa2UqyGZtDbpkvqP2zssOZkFwKHrbLRlIKXZPoewuoeByka/0ukRLbkCD+M5fmT/FzuIB Rg9yvDqeEkdsZrGT+63OlkAc/xnY9xI= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=kBifM5uL; spf=pass (imf06.hostedemail.com: domain of stevensd@chromium.org designates 209.85.167.48 as permitted sender) smtp.mailfrom=stevensd@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675733839; a=rsa-sha256; cv=none; b=8Fk0GbFXzL7kQ43LeC5E9psvV41TYrp3Btx4Hg8QF2KsQVih3nYwf9V1yVMXhTsrnkkRbA DbMds0q18QRhfl+6aOEMkBrS28+WrWUx8L7DYGnqtsKFp2srDA74qErn67D2/wvLj45qWq yiMnIXQogysIGnDTyhZSLKgfeOVr64w= Received: by mail-lf1-f48.google.com with SMTP id x40so20287278lfu.12 for ; Mon, 06 Feb 2023 17:37:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=eU++fqEUJX8cwR4D2NbMbR+6raSWlUuTlSOTzvIMyWg=; b=kBifM5uL9aZ5KGZnGGav4RQBQrVOT0claqc8AFQDXa/Ad1Z3Ggzv9iEsrAC/x+UiP8 xInTL9BS47T7ZYdaljK/hh/D846MNQFY3zCEbFWRFbsQjQsFkV5qrPz6wne/CVACXU0S P4WdaS7C/fFv51JDcoyH9GcKHCsdtb9JbO4tQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eU++fqEUJX8cwR4D2NbMbR+6raSWlUuTlSOTzvIMyWg=; b=1PJkCt8nDNaRNxLnMXg9tIDPrzSzqeips0hkEDZ4O76m2ScbAqhbqCE80qRhUKJLzC Sjas6KwLNxgqlaPlVLh7nsb7C9sCtnuBL6xqbXkD1UniK4Grh2yoygv63XpzItokeCBk e/5g1A2qRisvRMQeA2R+fZ5YFltT3M6Soyk10bc0psKi6Kxl62CRyBP1wVghuCpk0iUB +rRtAqD37b1bEOefItIuTUQNsaMMs5G7rCHSTB6GilGaC7sn7X1N5YrDApokzv/C7CmW vbQQlWxJn0s6eMPb3PP4lLeIpH5kbrOKVf4LSVmSe+lBDCa0YZiYV7o04H0yNGrbgkSs hDpg== X-Gm-Message-State: AO0yUKX65gFouZ7qOVllJ8VqQ/emT5NgdcsvWmWcRJpUMts6lW+g4EBq WakhY/Ewqb6W5X6QZIltnI9PYJmP3m5c9th+rZyTgw== X-Google-Smtp-Source: AK7set/CtrNgWAHZLyKi/Mo81kmaz28qsBKopec0QsnnFFgLus7VRxwENd555WGn37yH3UvsQRTpHAaiaqhw8bhcs0k= X-Received: by 2002:ac2:42cf:0:b0:499:f8e4:6dd0 with SMTP id n15-20020ac242cf000000b00499f8e46dd0mr151597lfl.162.1675733837659; Mon, 06 Feb 2023 17:37:17 -0800 (PST) MIME-Version: 1.0 References: <20230206112856.1802547-1-stevensd@google.com> In-Reply-To: From: David Stevens Date: Tue, 7 Feb 2023 10:37:06 +0900 Message-ID: Subject: Re: [PATCH v2] mm/khugepaged: skip shmem with userfaultfd To: Matthew Wilcox Cc: Peter Xu , linux-mm@kvack.org, Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 799AE18000F X-Stat-Signature: rb57j5cr7ereg4bkg4ja4mux8q7asncs X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1675733839-247723 X-HE-Meta: U2FsdGVkX1+BbKS8Do4ZQLPWQAK332E6qr0lQiXA4diKOdxnzCGb3EihsfzxUzl3n8qKi8G86KnJPGCzV1jT+3zfOUGMLpOmQfUqAf+Qja6UwnyufgbRxDmzBnZVhwjNBq0+xaOpG6OA4hnqJh9GjpWmGCelMfMnqfqTMr5X87IniBY9akEcE8brBgABhsz85NYJtiQRxcpH1hMJ8LmzSFBoF+nZ2iw2BbH1JIXPBmCSyIoAR5tocczGGp241gnOmib9VklRANoPHuM9VUJh+UJnl9U94INX0Z83HjRw/mWD34b9HMcxZv72quP4nhU0/N0VqGfekO+caa1IF5SAyWNAR/OH85bFk+hjcF8Id2U4XPe84wlvKCzNfhItG7yC0kRFon9ablSLYetWrW6vx88u9sM64LtXhZgeE/jGJFm7dSUUgM+1p8U2kl3fjE7fInVmwkoEG7mgFDuPV74fmXwXSiyVPtxHRyDSFKhzijdQen7T4+rKZEAv6ap+VSd7LFG486wdmxjcwqYRMRM5nvVIWX0uRS5R0VUmSQL6XBFy/pfu2b5kDh2j9pdsVvLREJ7NT8zb6EIU5A84ZIa810pj5aGRQ9MI4VrRdijeNk4t4NqrFltHuKhAR029NDV9XLVZyLTlQahpfSlsT1lTZjsFGMQ6gqRHqJeso/6GBeHqek3XC6quY0KOheMAEbnBmMsT7G3moPJJBWACYWL3GkIGMRk2i6xG12N4hsIRhXUXP6nzITTcqWfRLu3VJ0DsEnHAM+VA9JhZxD59LbQhspzwLFQLYPkx8n6VXmBo4o3sKU4lH0fzVM5HMjvK13qUIcKmpQUlS2DiSb1u+OZ1Z+6NhScZNTG7VARjQKTEsPQOUFjpPbAxg/oLlsfQjAOxjdf5Jcp9zrH/sLT/Eh+UNqBzxszaJ6nUS4M+EMEbaazgpERMBWO0aQ5F8RtygkCx+5lj204nBkBB2PvsrFH eeyFC9no 8SjGHJQZy/KdveKV3dGW3Q2VQL2ycFOyILTgf2hzY/1WEWsG6Q+ZwvZBpPQ4SrqpCiP3FFFo32WeFefaJx0JqfOV4lnq5NdMQ6gKBh3snIGXMxT3JX8w8Sw6icksJj79yCDVwUJ5GqhQxUEsYpN6WYYmUgjtYpGMSVqgsW6qw86DSZ8gqfjquaNyeGcgzNlNzKbSCifqhDt/RaxRRnbwRdAFLqLgfv3ESGSLScGPVncUE1g+nFgvr6rmWVfJNQx9wqU6k0p5ehsm6jzKtqMzXRWx6aw2OnRGaJyzEjGbySHZXp7fYtvSvQ7ldkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 7, 2023 at 6:50 AM Matthew Wilcox wrote: > > On Mon, Feb 06, 2023 at 03:52:19PM -0500, Peter Xu wrote: > > On Mon, Feb 06, 2023 at 07:01:39PM +0000, Matthew Wilcox wrote: > > > On Mon, Feb 06, 2023 at 08:28:56PM +0900, David Stevens wrote: > > > > This change first makes sure that the intermediate page cache state > > > > during collapse is not visible by moving when gaps are filled to after > > > > the page cache lock is acquired for the final time. This is necessary > > > > because the synchronization provided by locking hpage is insufficient > > > > for functions which operate on the page cache without actually locking > > > > individual pages to examine their content (e.g. shmem_mfill_atomic_pte). > > > > > > I've been a little scared of touching khugepaged because, well, look at > > > that function. But if we are going to touch it, how about this patch > > > first? It does _part_ of what you need by not filling in the holes, > > > but obviously not the part that looks at uffd. > > > > > > It leaves the old pages in-place and frozen. I think this should be > > > safe, but I haven't booted it (not entirely sure what test I'd run > > > to prove that it's not broken) > > > > That logic existed since Kirill's original commit to add shmem thp support > > on khugepaged, so Kirill should be the best to tell.. but so far it seems > > reasonalbe to me to have that extra operation. > > > > The problem is khugepaged will release pgtable lock during collapsing, so > > AFAICT there can be a race where some other thread tries to insert pages > > into page cache in parallel with khugepaged right after khugepaged released > > the page cache lock. > > > > For example, it seems to me new page cache can be inserted when khugepaged > > is copying small page content to the new hpage. This particular race can't happen with either patch, since the missing page cache entries are filled when we create the multi-index entry for hpage. > Mmm, yes, we need to have _something_ in the page cache to block new > pages from being added. It can be either the new or the old pages, > but it can't be NULL. It could even be a RETRY entry, since that'll > have the same effect as a frozen page. > > But both David's patch and mine are wrong. Not sure what to do for > David's problem -- maybe it's OK to have the holes temporarily filled > with frozen / RETRY entries until we get to the point where we check > for an uffd marker? My patch re-counts the holes after acquiring the page cache lock for the final time, right before creating the final hpage multi-index entry. Since we lock present pages while iterating over the target range, they can't have been truncated before our re-validation of nr_none. So if the number of missing pages is still equal to nr_none, then we know that nothing has come along and filled in a missing page. Compared to adding some sort of marker for missing pages, this does add another failure path for collapse, but I don't think there is any race. -David