From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E816AC28B20 for ; Wed, 2 Apr 2025 19:05:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A62C7280004; Wed, 2 Apr 2025 15:05:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EF3D280001; Wed, 2 Apr 2025 15:05:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83E4C280004; Wed, 2 Apr 2025 15:05:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6232F280001 for ; Wed, 2 Apr 2025 15:05:26 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9D35612090E for ; Wed, 2 Apr 2025 19:05:27 +0000 (UTC) X-FDA: 83290032294.26.FD6DC45 Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) by imf29.hostedemail.com (Postfix) with ESMTP id B6D5C120008 for ; Wed, 2 Apr 2025 19:05:25 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OG2dqWzz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743620725; a=rsa-sha256; cv=none; b=UTxT+L8K1Bq3uqGIsbMwOKZohIDR5ycy/zbtNnE5INr9w6JsZ6o1u6xkC+t18nt0RndJt/ TKCf5ui1SCHAaKm7KKFL+3IwTJprwAbdysYVoEVEtyaY+18qBM+re8EiIFr0E4dbNSnhNn 7i7OZqY01lbLVsP4lTQEcNXTn6RajwE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OG2dqWzz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743620725; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NawvRhJbCe+nOvQp4dfQ7SGUEZvzZ9Crvhv45c3qAgA=; b=cdRyyWL+w/MGJYzlhfuvErKJ+KC5isSms/2Xom6i2db3WqWMZvmWYaDTC0byExVP1xztAo 0vhe4HnXnxutCOPmU65Wl430c8GO3GlG4hAQV64u4S5Z05Zm5RSIqo5aKZLbBrBEZFI+iD 3TFbcsohzs7rxbgjS5V8i4pmcKINlQQ= Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-702628e34f2so1605497b3.0 for ; Wed, 02 Apr 2025 12:05:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743620725; x=1744225525; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NawvRhJbCe+nOvQp4dfQ7SGUEZvzZ9Crvhv45c3qAgA=; b=OG2dqWzzsMY+1WTUyOriE2c9dDQpRudICopHr+Anb2vqwO1NIU8fMHmE2ULTTUXJxy 9tyhHaBNomj+JuyiGYyoy37/UOyoZq+24fDCDch84xiSboFtvoqKrB8gjuVcdRotSQxO EDDtfKxCNRuqnTjLats317mTMpI++YB8jygXUlgWnlkga8JhZ8hl7GaK+g8Lkep7i3c2 aHhKrh8lub9L5rQEkjLrV+wlqgYg8+bbH+jyCb4SRr917YevYcf4bMhlNTrwu3U699C2 TmvyKgu4fnWqTaOH7jQf17yaWK0Gjdlyc8WtKvLwK3KD1KqqZQxhxhIXw2ujEX4aNkB2 acIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743620725; x=1744225525; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NawvRhJbCe+nOvQp4dfQ7SGUEZvzZ9Crvhv45c3qAgA=; b=u07EwF3l+LIxY+qtqilN0nPJNNcvwxYB5ZwZCNDT4o7kqVZDp83d9brkOcXNaS2zjG 7CowbJhXRXF8HHMqFPMyeGTRK+2zzOUdpNHkwAout+VDWa3Jf+hfQD1WMoVIjwzjySzk Qcf+pekWFdGyM+3IRf/b0kvzCFQ5c4pils17u9KV96j+qhHwjBOjiz9zx58uWSikwB93 gt6GtSkO0WwDjnYFhfTPeNj6Yfd0VttiZ+tkwhQtQm4XX5aVEgCGzjBlQYu6o6rovDPW QDt7zdv7tUQqki1tDt2uE/AwwRp4//Yl3mh141VCUPPtXzLBkYUa9UA14upzKQLysTS/ 8rkw== X-Forwarded-Encrypted: i=1; AJvYcCX/r3ed8gvvye1JGRdClGVz2J1GUX020zZz8v3U+/0YHp5A/iWQf4K645AYWJhJtY2LQd4T30/XWw==@kvack.org X-Gm-Message-State: AOJu0YxD+MkzxRDdUM2qQkww/ksUfUKIJ2dkBDw2u+AqqKx6XqNbSeQe U3X13h17ds7bLTXAnTxVXawCnJEQwYq1PaleOBTlYIWf3wQjmRpyvhldmCwgIRISqcAZUTsD7Og iGa7dtbmHhF6u/04sRtLsyqxlpooOjIU61hfa X-Gm-Gg: ASbGncvJr5aOlucOtKNAbjWSxCo8Q4OVX2wd26yyWR6WJ0gaLY7TYYOoNRrZHbk6MDm tZsj8ZMfKw/rsLq2Njv4zODxjmoSLXnCW4AvQNhUl83S2w2H8BY/mkZUO4Wp9TRuxRSLPxzgOlz tNT8y/reDlQTxe2sV8UgU/1RaRD1s5+dfCPre3vRRfvbTWFJszpyZisOSX X-Google-Smtp-Source: AGHT+IE+6V/O1E03crfNrUW9VdhIk2ZpwNIs48NtJ+UG+GFect+hh5zptQm/psh3VIObZ14Z1RjRo7FP6TdMPY3C5TA= X-Received: by 2002:a05:690c:6385:b0:6fb:9474:7b4f with SMTP id 00721157ae682-703ce157a6bmr11804317b3.6.1743620724579; Wed, 02 Apr 2025 12:05:24 -0700 (PDT) MIME-Version: 1.0 References: <20250402160721.97596-1-kalyazin@amazon.com> <20250402160721.97596-2-kalyazin@amazon.com> In-Reply-To: <20250402160721.97596-2-kalyazin@amazon.com> From: James Houghton Date: Wed, 2 Apr 2025 12:04:48 -0700 X-Gm-Features: ATxdqUFxLEaaCcXEK8qWIhziTXJ419iYFObqPUE1oHuLjwZrdLmutDc4m9-xqaw Message-ID: Subject: Re: [PATCH v2 1/5] mm: userfaultfd: generic continue for non hugetlbfs To: Nikita Kalyazin Cc: akpm@linux-foundation.org, pbonzini@redhat.com, shuah@kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lorenzo.stoakes@oracle.com, david@redhat.com, ryan.roberts@arm.com, quic_eberman@quicinc.com, peterx@redhat.com, graf@amazon.de, jgowans@amazon.com, roypat@amazon.co.uk, derekmn@amazon.com, nsaenz@amazon.es, xmarcalx@amazon.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: B6D5C120008 X-Stat-Signature: d8zi7aca6n1fifytja1hn6neyhwx9odm X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1743620725-238234 X-HE-Meta: U2FsdGVkX1/diF1O3wCPeyBTPzNO9z8J+A8yP9xnVHNql/Bn5U4O+M9oiZg09323FAiaV4TkzRYSLLbkW3xUTy4mbUW62vZKN8SaVpTpNJm1Jel7D3l3EFSQIEQu0VgxeI5xRlF7AAlhXaqhc0kc5iDDvbjh6SM5Vf8FWcUiJuzNW/eFo1MC77fZfnobm2CLjd7KmMxyDLr5ZgpEA+LNfK8tQgFsYmwMWZDXm2sgcktHR6RgD7eIVu0ArbXPD2/kemo0dktOgnSBjoGLkM10nBWmepDNZ2tTcThMecsanEAh6RpKU9kgTFhZ+EO6YEbBXBc/jPMYY/o5aneLyu5bHHEyiffNWVl/40Fu+FuiMok1YaYq3FcRH6gy/qMq7zghaSBbsFgTUM89ecGjej8btnABykhdebfkEmfWmIoddXcT3uDAiIvt23wVIr6jFPpSXa/IYFG1meFPv14XAzqYBD7Dv/rkuIk/1nyUqW+hLuMTMaCFRwjsTf6FTmDKTmMaVZMfJle8wHyMZE4LNoOGLDXlxK5tBXyQfv7CCZt8mEue4K1V9aEw0tjGGF9lze118ZGEXYw3o7It6zf/qKo7832kD4W1fQxtCmQd6TtxmQpX3l1QjbsV5YB/eA/mfLVlxntvIH3wIKM1Q0+sQZCdefMtvORJ1J8LFLimZS0VRnxQ4aTLH+N+uY0meIfhaXHM/cYDPnpsPGKIZ3A30By2O4QCAAKgBzmcl0rUorKiu3xLawXCNxLPQu/Dy3KuXMJDWC+LYr6HKLs7M3arwc/6aDyesy+zcqyb/REaKqKjmddd7omaaeXJW5AfjjXwxFI1gI7NESyqLi4jCXN7LfeeVKlTlXlrAkSrXRVXqpzXS+Nx992OTHCMO3UWsEqahYP4F4XP4GXNu/eYAHoqP6vkZPU4BESz9r01pcQw+He3YdrY4adQ/oUasebKdQY++383SUQgQHuixjLpJ8qZdSD 918Oy6fQ hU3OR9JmevEZ2kotodUUSRgRJTBBbKx7fxQ3gxsiCIjozkbRHnpWBLIp9Mg+hfOl8KoT/g/lYyg0lO0hqgZllXjhA9zEOOzIRgDWy4YDUp47h2lo1LaoS43dxJ5IBuppRszrOL+46W5gcKdfc6W9ux6iTpDnnizRg9W7BaXQ4BDYobT2HrSBnSvhlsmIRvJ0qRx4enpuwpD5g9jQ7d1tmcK88ljs31Ee6L5fuNdRHE/ESZZssaScB8wR25OJUJxLGdhTY2jwhH8yswPE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 2, 2025 at 9:07=E2=80=AFAM Nikita Kalyazin wrote: > > Remove shmem-specific code from UFFDIO_CONTINUE implementation for > non-huge pages by calling vm_ops->fault(). A new VMF flag, > FAULT_FLAG_NO_USERFAULT_MINOR, is introduced to avoid recursive call to > handle_userfault(). > > Signed-off-by: Nikita Kalyazin > --- > include/linux/mm_types.h | 3 +++ > mm/hugetlb.c | 2 +- > mm/shmem.c | 3 ++- > mm/userfaultfd.c | 25 ++++++++++++++++++------- > 4 files changed, 24 insertions(+), 9 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 0234f14f2aa6..91a00f2cd565 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -1429,6 +1429,8 @@ enum tlb_flush_reason { > * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cache= d. > * We should only access orig_pte if this flag se= t. > * @FAULT_FLAG_VMA_LOCK: The fault is handled under VMA lock. > + * @FAULT_FLAG_NO_USERFAULT_MINOR: The fault handler must not call userf= aultfd > + * minor handler. Perhaps instead a flag that says to avoid the userfaultfd minor fault handler, maybe we should have a flag to indicate that vm_ops->fault() has been called by UFFDIO_CONTINUE. See below. > * > * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify > * whether we would allow page faults to retry by specifying these two > @@ -1467,6 +1469,7 @@ enum fault_flag { > FAULT_FLAG_UNSHARE =3D 1 << 10, > FAULT_FLAG_ORIG_PTE_VALID =3D 1 << 11, > FAULT_FLAG_VMA_LOCK =3D 1 << 12, > + FAULT_FLAG_NO_USERFAULT_MINOR =3D 1 << 13, > }; > > typedef unsigned int __bitwise zap_flags_t; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 97930d44d460..ba90d48144fc 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6228,7 +6228,7 @@ static vm_fault_t hugetlb_no_page(struct address_sp= ace *mapping, > } > > /* Check for page in userfault range. */ > - if (userfaultfd_minor(vma)) { > + if (userfaultfd_minor(vma) && !(vmf->flags & FAULT_FLAG_N= O_USERFAULT_MINOR)) { > folio_unlock(folio); > folio_put(folio); > /* See comment in userfaultfd_missing() block abo= ve */ > diff --git a/mm/shmem.c b/mm/shmem.c > index 1ede0800e846..5e1911e39dec 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -2467,7 +2467,8 @@ static int shmem_get_folio_gfp(struct inode *inode,= pgoff_t index, > fault_mm =3D vma ? vma->vm_mm : NULL; > > folio =3D filemap_get_entry(inode->i_mapping, index); > - if (folio && vma && userfaultfd_minor(vma)) { > + if (folio && vma && userfaultfd_minor(vma) && > + !(vmf->flags & FAULT_FLAG_NO_USERFAULT_MINOR)) { > if (!xa_is_value(folio)) > folio_put(folio); > *fault_type =3D handle_userfault(vmf, VM_UFFD_MINOR); > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index d06453fa8aba..68a995216789 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -386,24 +386,35 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd= , > unsigned long dst_addr, > uffd_flags_t flags) > { > - struct inode *inode =3D file_inode(dst_vma->vm_file); > - pgoff_t pgoff =3D linear_page_index(dst_vma, dst_addr); > struct folio *folio; > struct page *page; > int ret; > + struct vm_fault vmf =3D { > + .vma =3D dst_vma, > + .address =3D dst_addr, > + .flags =3D FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE | > + FAULT_FLAG_NO_USERFAULT_MINOR, > + .pte =3D NULL, > + .page =3D NULL, > + .pgoff =3D linear_page_index(dst_vma, dst_addr), > + }; > + > + if (!dst_vma->vm_ops || !dst_vma->vm_ops->fault) > + return -EINVAL; > > - ret =3D shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); > - /* Our caller expects us to return -EFAULT if we failed to find f= olio */ > - if (ret =3D=3D -ENOENT) > + ret =3D dst_vma->vm_ops->fault(&vmf); shmem_get_folio() was being called with SGP_NOALLOC, and now it is being called with SGP_CACHE (by shmem_fault()). This will result in a UAPI change: UFFDIO_CONTINUE for a VA without a page in the page cache should result in EFAULT, but now the page will be allocated. SGP_NOALLOC was carefully chosen[1], so I think a better way to do this will be to: 1. Have a FAULT_FLAG_USERFAULT_CONTINUE (or something) 2. In shmem_fault(), if FAULT_FLAG_USERFAULT_CONTINUE, use SGP_NOALLOC instead of SGP_CACHE (and make sure not to drop into handle_userfault(), of course) [1]: https://lore.kernel.org/linux-mm/20220610173812.1768919-1-axelrasmusse= n@google.com/ > + if (ret & VM_FAULT_ERROR) { > ret =3D -EFAULT; > - if (ret) > goto out; > + } > + > + page =3D vmf.page; > + folio =3D page_folio(page); > if (!folio) { What if ret =3D=3D VM_FAULT_RETRY? I think we should retry instead instead of returning -EFAULT. And I'm not sure how VM_FAULT_NOPAGE should be handled, like if we need special logic for it or not. > ret =3D -EFAULT; > goto out; > } > > - page =3D folio_file_page(folio, pgoff); > if (PageHWPoison(page)) { > ret =3D -EIO; > goto out_release; > -- > 2.47.1 >