From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C23E2C05027 for ; Mon, 6 Feb 2023 19:01:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06F5D6B0072; Mon, 6 Feb 2023 14:01:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 01F1D6B0073; Mon, 6 Feb 2023 14:01:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E290C6B0074; Mon, 6 Feb 2023 14:01:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D04226B0072 for ; Mon, 6 Feb 2023 14:01:48 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4180EA0A65 for ; Mon, 6 Feb 2023 19:01:48 +0000 (UTC) X-FDA: 80437786296.08.67CEC96 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf06.hostedemail.com (Postfix) with ESMTP id D0B3018001E for ; Mon, 6 Feb 2023 19:01:44 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=qu2dKo+W; spf=none (imf06.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675710105; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oaeAqB8bAOsBTw4haHM9OiDiD9bOnihJ7tlzKd8np0k=; b=KvOJbobCk1TN2Hx24l/oKOKX99ID+EBQdO2YRpY5BH2Pfl5c3TVPz5VpK7ezytXYkUIa67 /IHEZH1dhjeIT1gwf2GyKNuSYbHgRUD2CRGsowqR9o286jaDa3uZsJErCplkDUnoTqm4xb rQvoYB8dxcmL0TesITQ8wk714w2xgyk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=qu2dKo+W; spf=none (imf06.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675710105; a=rsa-sha256; cv=none; b=jWVmoWBqMa5cs1zYql5uR7BusRPZHkYbM4AQUW12QIgVjhZqbQ11k+w4qylBQt3q4fuAks yGTd8Wn5toYHcuVoCPBa34L6J404yYyaUTVy7jEOcV3dFhm93kppI7uMEh+xZOKX0IXHGD jTXdEq5z4IdCbUw7jIwFO1JBc0JKfw0= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=oaeAqB8bAOsBTw4haHM9OiDiD9bOnihJ7tlzKd8np0k=; b=qu2dKo+WuYPM/VeCSrvfqsGk2G L7iUFottcUkJkEPlds5LQxVic7gyxsaqcBdQX8io+a9cylFcsAawCeqTv8HnffTZK0yS76Fqnha47 Myh9E6FxPxlbBMotZmRoy2Z10URkmSQ67riBYuHogySAOCusWfUlnZhB4gbb0IhPk1CTJq1OknU7B XVJ1XZWx1Dg54+DwgKi12VV8oRE/LybiuH8ZL/DV+/MmwzkHxFCs6ovzNIYOw7bFi2tMxNINdjzXc eCfPysCANlkkRVcCmplgpYPRweIAhCF+VtiFRUiXj4gLSRV3UeqET4KSoVcFbh3xGfg4zAQjYBCl9 IY29578g==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pP6kF-00H0bj-Pw; Mon, 06 Feb 2023 19:01:39 +0000 Date: Mon, 6 Feb 2023 19:01:39 +0000 From: Matthew Wilcox To: David Stevens Cc: linux-mm@kvack.org, Peter Xu , Andrew Morton , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Hugh Dickins , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm/khugepaged: skip shmem with userfaultfd Message-ID: References: <20230206112856.1802547-1-stevensd@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230206112856.1802547-1-stevensd@google.com> X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: naxa17nj4tja6pjw17rqtyfhmn6tk48d X-Rspamd-Queue-Id: D0B3018001E X-HE-Tag: 1675710104-570167 X-HE-Meta: U2FsdGVkX1+m6Th376jNzTvnXVjL3jtf3j99q/mYw7o3AMDjSDzhKI+rQi9S4LBPVv6hCAOOPKjAtERfWQmFlEQgCxvhWcHhl+V4autKlPSKHlRJXKz6kj6i/Svxzj9ihLNFVN6eZ2JBLuglmUK6vMCLu0QC0f3vQSa1eBKApOGLbt5bCnRV4lhDo84zUNBW+Eo4Yb13q/TP4h9tJfQIM4oAWyrtNUVJfkNq11Vy91hpxfrYrhfyoKIg1VIg1ifYTUP42o1gikjqu8kxcoFd4aEqePJHk7HsEsPlktlOP6G3sFJA8uJwjh63WAhOn5EFZOWvSW7kl4X9Zch7p/bA04FbGZggTYMocWdYNOPMNzC+yyiJ+tWe792Debc4VkmIRlIFGqORyhyCmHA5Ofo8Z0pc5anIkO9Aumer4dsXAXZADC/TMXdwwLfHuVwcsMNWnqU0MqAZOk+K5QsVoShcEyRVWCEMo/BXjleP7yv4Xsr0ck8+breCzJeuhpmMH47Z9GzVYAXqwfGGH34L9JjAQLbjW9HP7ua4hnxaunA5Dkc/Qa9/9VdELMLohNNXPFFJB1O31S/hN9bOVV2az/rLqnoich1XjCKVRhy8al8NqYy3mmpmtKtkDT1oo0Du17ZK3r2BTLkyScoFmPEcGo/OtlgaWdvn0lBpM5IvRDyzCBTIzEE2AvcZvUbl4PRFSh38nFY6DmUdxZ66cUOedgNeTqsahVOYtxFhnNDNuH4ZxEqmQcNJYWIe/TZ/fD8r+YPHH60ZfVexnsKK6YPHA+jp8f/scLN6W1dFt+W7yxVlUoSv5G9mDRVBWueCG7KKHZeWRUa0v/6J8prva0WQayMeyBryZU8dK3KCsxPUWvpulbEFuAqTbvAwh2L2PoyQ4k+5Jyec0WiLvDn/kWtaQBGCBGdr9ZQSp7PNZJvt13hTSj1RhRK2tOgtqRFd6qX1+kjKsvhIRZSqPLf7YtEoPgW cNB2cByU fWDt0gZ/9HXBKuBttkAwYgtle9qYVYfWZCQb6Pv5wLshbwzwBb+rdwTpDxjU7DwR2r9mwagSFQPvcEJHKN8b1dWJ/MfMWO23nC8dmNGh1b4blZhfcA0eyUNDLU79wUrwSL9jSevpd02jctNbf3FU8BhSJQA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 06, 2023 at 08:28:56PM +0900, David Stevens wrote: > This change first makes sure that the intermediate page cache state > during collapse is not visible by moving when gaps are filled to after > the page cache lock is acquired for the final time. This is necessary > because the synchronization provided by locking hpage is insufficient > for functions which operate on the page cache without actually locking > individual pages to examine their content (e.g. shmem_mfill_atomic_pte). I've been a little scared of touching khugepaged because, well, look at that function. But if we are going to touch it, how about this patch first? It does _part_ of what you need by not filling in the holes, but obviously not the part that looks at uffd. It leaves the old pages in-place and frozen. I think this should be safe, but I haven't booted it (not entirely sure what test I'd run to prove that it's not broken) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eb38bd1b1b2f..cfd33dff7253 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1845,15 +1845,14 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap/gup in pages if necessary; - * + fill in gaps; + * + freeze the old pages * + keep old pages around in case rollback is required; * - if replacing succeeds: * + copy data over; * + free old pages; * + unlock huge page; * - if replacing failed; - * + put all pages back and unfreeze them; - * + restore gaps in the page cache; + * + unfreeze old pages; * + unlock and free huge page; */ static int collapse_file(struct mm_struct *mm, unsigned long addr, @@ -1930,7 +1929,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = SCAN_FAIL; goto xa_locked; } - xas_store(&xas, hpage); nr_none++; continue; } @@ -2081,8 +2079,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, */ list_add_tail(&page->lru, &pagelist); - /* Finally, replace with the new page. */ - xas_store(&xas, hpage); continue; out_unlock: unlock_page(page); @@ -2195,32 +2191,17 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, shmem_uncharge(mapping->host, nr_none); } - xas_set(&xas, start); - xas_for_each(&xas, page, end - 1) { + list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_del(&page->lru); page = list_first_entry_or_null(&pagelist, struct page, lru); - if (!page || xas.xa_index < page->index) { - if (!nr_none) - break; - nr_none--; - /* Put holes back where they were */ - xas_store(&xas, NULL); - continue; - } - - VM_BUG_ON_PAGE(page->index != xas.xa_index, page); /* Unfreeze the page. */ list_del(&page->lru); page_ref_unfreeze(page, 2); - xas_store(&xas, page); - xas_pause(&xas); - xas_unlock_irq(&xas); unlock_page(page); putback_lru_page(page); - xas_lock_irq(&xas); } - VM_BUG_ON(nr_none); /* * Undo the updates of filemap_nr_thps_inc for non-SHMEM file only. * This undo is not needed unless failure is due to SCAN_COPY_MC.