From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C67DC05027 for ; Wed, 1 Feb 2023 17:36:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12C726B0072; Wed, 1 Feb 2023 12:36:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DB816B007B; Wed, 1 Feb 2023 12:36:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EBDC76B007D; Wed, 1 Feb 2023 12:36:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D938D6B0072 for ; Wed, 1 Feb 2023 12:36:52 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A6823120DA2 for ; Wed, 1 Feb 2023 17:36:52 +0000 (UTC) X-FDA: 80419428264.22.8C48E5E Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf28.hostedemail.com (Postfix) with ESMTP id C84BFC000D for ; Wed, 1 Feb 2023 17:36:50 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HLjT6gQT; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675273010; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lNNBl9TG5gq1FAjUOQrlfZPYidvGY/eJUnN7u2ZEXHk=; b=8Qy0D8Br1laN+GAnO4V0ZFMbsT3tpDYZVgH3PJUh4cZjxZVKuloNjcOgFaS1JI5QRGnQkc KI1xue66fN3yWqSp07IGshY5qoC6cZ7N7CmoqHURYj3fciBVK24ctanirUJcTQE9lgo78a +bFbBxN35AqqugE2r68yBmib3rzN+vE= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HLjT6gQT; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675273010; a=rsa-sha256; cv=none; b=bjAiWj6rAUX1BwoRwMdgoxjZMYrU56iYjESaDANcQChOnT2A+7ehZu0qddLwEASjJVWX9f 4TDLdWy6x0b9f0nlmBRz4R89f6L321Qx7ODHHsJ7P4SNzkUHqT8ikQ2Hw0mwGdzu9Do3yx TGc01XyM2Q700sNAaOPg4M9ebCz1+BI= Received: by mail-pj1-f54.google.com with SMTP id cl23-20020a17090af69700b0022c745bfdc3so2721903pjb.3 for ; Wed, 01 Feb 2023 09:36:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=lNNBl9TG5gq1FAjUOQrlfZPYidvGY/eJUnN7u2ZEXHk=; b=HLjT6gQT3T0IDOJdrOk38fSJKz+tjYdMFkjDZT0UlOusLFPQbx9clVZsB8cJxKCfQG AUNykTCEV9U6DRTsfQh6/XuwXAJVUqno5T67n9Nk7wx6QOgEJS44fJsvnRG5puZlUVIu Wh1Lo7ihEQhLTiUF4nL4firnT9R11xIgTodShfaOJh1afnYJoost0yJ87fknYO8XI/aq Yc6WDeRWfr7zvmxj2X0aDRCX8htv+58XlARI8XXGnJEu8wFmEIXYohyvEYwObfINBIqY VU68MzIPIRyc91xRBAEsnyU41vMaolDkL1ovWC15TiUL7UdZMFJCQAnKjiveV8RDT44O Ywew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lNNBl9TG5gq1FAjUOQrlfZPYidvGY/eJUnN7u2ZEXHk=; b=YtJfKzfF7VoEHRy32aihiA2f00U0mOwVsEwo7GxtTH+tB3GMdv4NBdf4pPEiYNJ03C uF367iGsZvb24YNGwNUs5aIb/Y7EodzHcM5BTTPIXbUAZJheXYn8Qq4m2g8AflJ9yT+/ JDueM/u0FUjSRUsURLuqAsqRSNz+t++zBG4107JxinYLEpKxYTwYFr1vVfcoO+P/y8Mx z0QpCD+3KI15fP04Nm3qR9a1UNJ09O8kFgFJuZ9czAPtc3KBAbWBnnSWLVwdPSS37uf+ ki5X9RfBc+2+4CObxn31+vv23As3cLOVBGlYOAIQtgcFJuDtnwiZlPXXxsbJUuWLDFE+ Au8w== X-Gm-Message-State: AO0yUKVa+dwyPQi6Dsi+84ChPmsJ9WUnNlgsVHMfiiQEV8+ExCgp15Ne mW5SP+K9csEzjp2O/RLTSacPMMMAnjAkzTLRv2c= X-Google-Smtp-Source: AK7set/A7YrXJhNR3dJLJzfHSRgvF4TpgQVc/ddea8Qf0uZJbygZ8rMlbWdqvyI8Gfm8AmHhH0DuXcZzNQwQByTYSgQ= X-Received: by 2002:a17:90a:7f82:b0:22c:377c:3612 with SMTP id m2-20020a17090a7f8200b0022c377c3612mr545267pjl.60.1675273009324; Wed, 01 Feb 2023 09:36:49 -0800 (PST) MIME-Version: 1.0 References: <20230201034137.2463113-1-stevensd@google.com> In-Reply-To: <20230201034137.2463113-1-stevensd@google.com> From: Yang Shi Date: Wed, 1 Feb 2023 09:36:37 -0800 Message-ID: Subject: Re: [PATCH] mm/khugepaged: skip shmem with armed userfaultfd To: David Stevens , Peter Xu , David Hildenbrand , "Kirill A. Shutemov" Cc: linux-mm@kvack.org, Andrew Morton , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: C84BFC000D X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: dzp6q46gegoze4sdjzgd4eicyewntmoo X-HE-Tag: 1675273010-882604 X-HE-Meta: U2FsdGVkX19yAW5S9gdDvqV10JVVl6gVj2O0OS6F8AlnAI77LNswprijU9E/LOHfRewfjScyLGXFf4OrtI7v9CDTEJsDAx72ARFtP2BldtkjZTlH15GkylrgaS/alhB92DHKdmw3n0FvQjwRHHfhjMme1tCLcMrYhLZpz/2K4BwvvaXg1RYYe7+TuNMTo1JChQuCwM/+oFooty+c4j3atlCAk7fiqmdvWFpidhKpUyQd56i71fvWGbPWLoT50h204ifQ0JU3V0mKKDyGsO2S8ETWhgbtZbnjdNhwY/tjelnaBei36P13X8RU6nR5lrrUz75sq10ZXB9l4rI6kQgiw8a1+Ob2R2cFJvhbW4PZ2X7mvoaCfe4mU/CQydmduopn35Y+GSQlOwIDiZ7ss4Wwd6A60Esv25Aw1tS8LQfPhhB10F8QSD70r0EWxZa3blE5ydlK8qrbvspqXRwWtiKhiJs5OVTwF7jNh/B/gLqmSprQ97QnyItB9xzpGnA5Dv1lofSS0qrKSFAX2x6JQUxRnxajZVr+F6cu8H/sDy8An0ntvB6u0EL7tXfbibiK+xWooqbctgciFCc17FtL6tAzvF0JnxoO1W/5O+kK9xo0gobF7uNXOLJ2MbRZymw0PgOEblpsMegidrxVpexpNDzHXrxqLnyCpu14Sp4Vz7xH7iYjOTLz4f+dvbGaYauNJ65Bcpb8cdHzA9yBj3ICQ4Z/ECwW18eh7UVRHHNex921n7L5/zcnHYaYhXt7pEnS5oLwJIffhnUP85drq9pP89qBXSIKtq/T2kNss5Ud4vPaUmym9LGREAEy6XLidHDE/6j3g0V/+K/W2K9CU0IfRRcX0cxQT6D9zmFqs1myyC38SJG59AmLJtwHONrgKmFSH43vOj+WLZmZkkRmU3hDCAoYtoQleqLbDTrxS3t3OzYvyE/+z2JGmezcg5UaHo8tdtnv13Ws8MScGQIfNtgE9Tw 3tn3jV8N +sMDZfSOunsQjvklH0uzTY1qEuNj/mr9mgr5AVmr80Bs+kJ4tTKFvwaWMPvobAjcpP3BTL2frcrYqI9A5RcmWqpfMqE+JBxlaz0XifUSfLQZ5KHI3H5l6Frlu3drshdZaaEdO3pxMqPAWalJ0HEnULRRNLyFsX9wLygrxSkZbgfdEpvGItdxVWTKfS/cBFHbS54zjOIXZVq2QUIXgJhpwsBj/YO293PXKkPuQ/v3IhixzuzjBJhxJCwwD5E+br9q3F2SncErNQRFN6e48J1m7esGGBfWd6SAkmpgdmCGvk4RSrGNqf+kOToEYuOj40YqD4mSWvgGpNHtEGiQLS7LBWzdZNumPZTm0Nnli0mtTrvE40bcRPo+gNLpmOtDTgQJavF7Ag5y3KCcs94U+d9qvXm0JCd1HWN651nTOLiw0zua3v2lkHQIyzzR46PTfk82DSnHtnXIiqNpq+uk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 31, 2023 at 7:42 PM David Stevens wrote: > > From: David Stevens > > Collapsing memory in a vma that has an armed userfaultfd results in > zero-filling any missing pages, which breaks user-space paging for those > filled pages. Avoid khugepage bypassing userfaultfd by not collapsing > pages in shmem reached via scanning a vma with an armed userfaultfd if > doing so would zero-fill any pages. > > Signed-off-by: David Stevens > --- > mm/khugepaged.c | 35 ++++++++++++++++++++++++----------- > 1 file changed, 24 insertions(+), 11 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 79be13133322..48e944fb8972 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1736,8 +1736,8 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > * + restore gaps in the page cache; > * + unlock and free huge page; > */ > -static int collapse_file(struct mm_struct *mm, unsigned long addr, > - struct file *file, pgoff_t start, > +static int collapse_file(struct mm_struct *mm, struct vm_area_struct *vma, > + unsigned long addr, struct file *file, pgoff_t start, > struct collapse_control *cc) > { > struct address_space *mapping = file->f_mapping; > @@ -1784,6 +1784,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > * be able to map it or use it in another way until we unlock it. > */ > > + if (is_shmem) > + mmap_read_lock(mm); If you release mmap_lock before then reacquire it here, the vma is not trusted anymore. It is not safe to use the vma anymore. Since you already read uffd_was_armed before releasing mmap_lock, so you could pass it directly to collapse_file w/o dereferencing vma again. The problem may be false positive (not userfaultfd armed anymore), but it should be fine. Khugepaged could collapse this area in the next round. Also +userfaultfd folks. > + > xas_set(&xas, start); > for (index = start; index < end; index++) { > struct page *page = xas_next(&xas); > @@ -1792,6 +1795,10 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > VM_BUG_ON(index != xas.xa_index); > if (is_shmem) { > if (!page) { > + if (userfaultfd_armed(vma)) { > + result = SCAN_EXCEED_NONE_PTE; > + goto xa_locked; > + } > /* > * Stop if extent has been truncated or > * hole-punched, and is now completely > @@ -2095,6 +2102,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > hpage->mapping = NULL; > } > > + if (is_shmem) > + mmap_read_unlock(mm); > if (hpage) > unlock_page(hpage); > out: > @@ -2108,8 +2117,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > return result; > } > > -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, > - struct file *file, pgoff_t start, > +static int hpage_collapse_scan_file(struct mm_struct *mm, struct vm_area_struct *vma, > + unsigned long addr, struct file *file, pgoff_t start, > struct collapse_control *cc) > { > struct page *page = NULL; > @@ -2118,6 +2127,9 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, > int present, swap; > int node = NUMA_NO_NODE; > int result = SCAN_SUCCEED; > + bool uffd_was_armed = userfaultfd_armed(vma); > + > + mmap_read_unlock(mm); > > present = 0; > swap = 0; > @@ -2193,13 +2205,16 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, > } > rcu_read_unlock(); > > + if (uffd_was_armed && present < HPAGE_PMD_NR) > + result = SCAN_EXCEED_SWAP_PTE; > + > if (result == SCAN_SUCCEED) { > if (cc->is_khugepaged && > present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { > result = SCAN_EXCEED_NONE_PTE; > count_vm_event(THP_SCAN_EXCEED_NONE_PTE); > } else { > - result = collapse_file(mm, addr, file, start, cc); > + result = collapse_file(mm, vma, addr, file, start, cc); > } > } > > @@ -2207,8 +2222,8 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, > return result; > } > #else > -static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, > - struct file *file, pgoff_t start, > +static int hpage_collapse_scan_file(struct mm_struct *mm, struct vm_area_struct *vma, > + unsigned long addr, struct file *file, pgoff_t start, > struct collapse_control *cc) > { > BUILD_BUG(); > @@ -2304,8 +2319,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, > pgoff_t pgoff = linear_page_index(vma, > khugepaged_scan.address); > > - mmap_read_unlock(mm); > - *result = hpage_collapse_scan_file(mm, > + *result = hpage_collapse_scan_file(mm, vma, > khugepaged_scan.address, > file, pgoff, cc); > mmap_locked = false; > @@ -2656,9 +2670,8 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, > struct file *file = get_file(vma->vm_file); > pgoff_t pgoff = linear_page_index(vma, addr); > > - mmap_read_unlock(mm); > mmap_locked = false; > - result = hpage_collapse_scan_file(mm, addr, file, pgoff, > + result = hpage_collapse_scan_file(mm, vma, addr, file, pgoff, > cc); > fput(file); > } else { > -- > 2.39.1.456.gfc5497dd1b-goog > >