From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7F02C54F30 for ; Tue, 27 May 2025 04:17:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 207806B007B; Tue, 27 May 2025 00:17:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B8196B0082; Tue, 27 May 2025 00:17:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A6EA6B0083; Tue, 27 May 2025 00:17:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E068D6B007B for ; Tue, 27 May 2025 00:17:26 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 80DF41D5C06 for ; Tue, 27 May 2025 04:17:26 +0000 (UTC) X-FDA: 83487378492.24.8805287 Received: from mail-vs1-f53.google.com (mail-vs1-f53.google.com [209.85.217.53]) by imf13.hostedemail.com (Postfix) with ESMTP id 9B9C320007 for ; Tue, 27 May 2025 04:17:24 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F7OomKbS; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748319444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p+nmL4xRMzH2dXOh1RqFS8ZdirXjb/43RsaMOi8bGU8=; b=eTc57QVvJZz5FdPRG2uPHSBwzGj2806XW0nYO1sJHAQYLJonoLcx+WYm7W+FJAtQ25MQKx 6MQ+whHQZQUYwHowtjZmiPkoGqqbavmsGxVqMhM2AxlrTG9aa5IIj9ueRlrqrCE8eX2zEB pxanCLVCym5aUKidher7G7yFi6HU0o8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F7OomKbS; spf=pass (imf13.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748319444; a=rsa-sha256; cv=none; b=gKn7HyiPBpuOI/CZdXMwnzRqkrRhyo9pJ0Kf2p4juRGq+l4tDdfx8IOV7i5rvRgq+Zyzpx rYvpJbbZmBuT1PKInKe7MMKqtbGTTncCXz9SihSBrYO1A36VXL7gk1+UujNWNYB7ZJSfAA 075x4/1nuG8zuZG1nx7fj1c3enXibLw= Received: by mail-vs1-f53.google.com with SMTP id ada2fe7eead31-4c4ecf86e8bso737846137.2 for ; Mon, 26 May 2025 21:17:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748319443; x=1748924243; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=p+nmL4xRMzH2dXOh1RqFS8ZdirXjb/43RsaMOi8bGU8=; b=F7OomKbSmylUO6hBlhQTbS3B2EY5SXBwRBwTVz3/mL3pMNRNo0JVcb7F9LwhXLS1j5 fofxweg//wcNGHJGbdkTtRKWkyqiLaOmIPf0iEDnbtmGr+Oy3+EkRxh/HOiS2W+3nhtT RctGSQDlDVuyccuCGPvrra5aFu8/YzpePMRLgmwW94dULm1gSgi61ZYdn683UWIOA7O1 ZXJi3YVH2BScvuzJSjGWSmm17MSAJvKDw5UbDF6f6+JEGqtjtyAr/RbtakxVPkI54dGN n4XPjC4RrIhBw+pAa+/ALqjoaSLYPSbgl04ASN6sNqNZhMaCuQ2Dn1AoTPJU0kUDuLqJ Kefg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748319443; x=1748924243; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p+nmL4xRMzH2dXOh1RqFS8ZdirXjb/43RsaMOi8bGU8=; b=ihAt8dWgZ8MHM3UfthJTsrFeRUfESNDY78wnjtyi2Sb6XEpmWHGeOUBeKUcaZJ++PJ iyxX9r2KKmbWhxWkzVnqFXE3VqsB/dEPzm8Jnpq4FKmijiVEjyefnMS8v4GqBEuKn+uL UiUZwEwo0a0MDnEYAo0e+0sYVEw+qbB5mfaxOITiKLo6YshRi21UNAOr9jlLOycNeM1S Fgdb2L7XDK8jGUIlsCHw9wv370Y6IQIYHnvZc8I9jtyyDNpJj9j6RIDHGM5F+QTCaop6 OELIdy3W8dNP/edIuScw96+edalsnDtX6iY0Xq0MvIUXAcUAeYgGf5uV+UO1epMHW5Yi O/XQ== X-Forwarded-Encrypted: i=1; AJvYcCWjXYeZr6Oq6M9Kfn11NdD5VJDSSWOGqsXOSIR/u062n8PnneydldrNI61ndqwv5OVZI9ROytkNUQ==@kvack.org X-Gm-Message-State: AOJu0YxJpjfVvueEfxGfbpUI42hDmwjSuBAC8rhRK/RzVNr5b4tGEDvG wCqakbhKF3gXGoJiuv1BhzO2GkBSve69411NpJ3QVMOiMyS4+rVGyNMHRdqkm3a8oiNI2KaXHxi J5MojXqqPpve+mGMSHM+aUmpq0vfm238= X-Gm-Gg: ASbGncsYQ05CDERxSJW7zAMjQBFNiKwFnpknvszlJLgfcIiF/fYEJFmS1jzZT4kPFfd zvxpXJCeRp4NWANNdfFU4wcIxqZwM5vDhEWvRdMEM8XoEvzOKYoYCQZ7XCkjqakwIVi/NK4RZER n1TXT956mkpuLN2+D+Ooq8ytniZdx59CfGzw== X-Google-Smtp-Source: AGHT+IGtqV+qdb8z7IZ2fhbjRiKOYA7Wq+zaoY3vaW7RnCXXSqRP/bNQqYQhflDpSLNZG8KRJUaontOxiNrkc41qO38= X-Received: by 2002:a05:6102:1520:b0:4da:70a8:73cf with SMTP id ada2fe7eead31-4e424160a3bmr9098058137.20.1748319443611; Mon, 26 May 2025 21:17:23 -0700 (PDT) MIME-Version: 1.0 References: <5abe8b0c-2354-4107-9004-ccf86cf90d25@redhat.com> In-Reply-To: <5abe8b0c-2354-4107-9004-ccf86cf90d25@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 27 May 2025 16:17:12 +1200 X-Gm-Features: AX0GCFvoUl5A5HweNIKUep1FsI4hg4oqYQ-qmOtW0kVbmdEeFStbTBMBJsqM3BM Message-ID: Subject: Re: [BUG]userfaultfd_move fails to move a folio when swap-in occurs concurrently with swap-out To: David Hildenbrand Cc: Peter Xu , Suren Baghdasaryan , Lokesh Gidra , Andrea Arcangeli , Andrew Morton , Linux-MM , Kairui Song , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9B9C320007 X-Stat-Signature: 9kkteqqu1613s9yq54ri4o36r3wyhfdx X-Rspam-User: X-HE-Tag: 1748319444-958290 X-HE-Meta: U2FsdGVkX1/+mifXN7Onr8DEpcKyaQOcOSo87rcSR4iWqysWj/elyPrtOj/NJfcAIZX4d+pG2K9gH2p/jwv+NQUu0JIK0/q4CAOd1xoNFbeg7M9wRYBkCg3MSBXv4K157ENnSqkuOORbAn0Yih9KlgRfRB7eY7EQrTUYF57HREVjqFbnTx50L3iGmMAIQC/CfsWwCW8c8YzdFPuiW0eTzaVfmaLUNLiBO51k9pgBFodrv/gqoB8pNtx1XCdakfkhMuAcbMS4IxK1zPFNTm3kXGDvMkOeBvot10U11d5u2/INA9m/MtGnvfm9XwofMesasn0jOiNJEomlHVFltzGs8KCwexhtK8T7wXHH0fVGd5PXAmrDKr4BvbTLztI2LU7c92wPKIKfeFHhOhETcLIlLy059b6NJkeI0+Xiabxa1KVBSV/D326emFDTIesI71Aw2Qu/7JpFpjo5ow/jp48QSNX80Y9oWkPsMKzOQCjGhCQwhWolgGUIYbkw++PPmjFaTqMzexU836pxdexGlKuJJDwd7XOIw9lMp4HuafBMDNamsFcNrJFqoM4RYSviw5ZFBoZMwWeHI8sHjYSxHhVJ4FWi0kojqep8IT2QxHE/Lszcu8j20t+6CCkiHjBfYdn2YzDET0y4eLgkxHhs/NItw/LmLqKiKNxezycMY0HYGz465xVbyB1cm7GOxjBr/58orc7wz9wl7WJ5dFomVekj0YyZ580WEtStlYppVFjUQzW0JlxALD3mxwCY3Iv8KDhD8fphzuPOX8IVu9wFTyteh4ryWQlYSGnmgjJ+iiFMeCinUsuIIc9iQ3J9b92sIjVfRqrPshLiHa2yOdtCD5DN/AcOLK2DM6ryTobhqE0S03pvme6/ausnXCtHXkdeA/YYDgzh3XyrTWp7vzMkiOhylnwkydhPFsPyYLbzvjHCWpuzbLetYf2Hhb1Klt37eVj4Adwyt32RKbBv28V89Mv AFmfgbt6 IOklb8UslgTBtIxQEqiOAk0Jh/HEJS/8Z1EgUxmxAHwDiVztHwlJy7nc8dHVuOlUeqyZ989MU3GWAdos8LS21AF2Af3ZHbvW+LPrdcq6Y4YAfX25inqJaFcFiuHPQmZU+ZCp7VNwohqv8tD+Xq4z23xCW2kC3alU29ABdqw4D+04zHoVF+2TXrGx+CjwJijXUrRcS/6XB6G6z4zWrl/M9kNnbkqCekV2HMdyku2S/CeVeFa3kt2SYTvbJi9XmHnQyf4n11sjwcdLvZ5Prfdw6L/zDB/mXvUvFpdquQ3KpUo0fyGzZtFLerYUu++4mUlLR4aJy1ZssIdi5bj0tHenTlTEmF9XBIuW6INqnPfPhPh/gcmZZ9dLA/9tr5LNUS/cif6Ob X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 27, 2025 at 12:39=E2=80=AFAM David Hildenbrand wrote: > > On 23.05.25 01:23, Barry Song wrote: > > Hi All, > > Hi! > > > > > I'm encountering another bug that can be easily reproduced using the sm= all > > program below[1], which performs swap-out and swap-in in parallel. > > > > The issue occurs when a folio is being swapped out while it is accessed > > concurrently. In this case, do_swap_page() handles the access. However, > > because the folio is under writeback, do_swap_page() completely removes > > its exclusive attribute. > > > > do_swap_page: > > } else if (exclusive && folio_test_writeback(folio) && > > data_race(si->flags & SWP_STABLE_WRITES)) { > > ... > > exclusive =3D false; > > > > As a result, userfaultfd_move() will return -EBUSY, even though the > > folio is not shared and is in fact exclusively owned. > > > > folio =3D vm_normal_folio(src_vma, src_addr, > > orig_src_pte); > > if (!folio || !PageAnonExclusive(&folio->page)= ) { > > spin_unlock(src_ptl); > > + pr_err("%s %d folio:%lx exclusive:%d > > swapcache:%d\n", > > + __func__, __LINE__, folio, > > PageAnonExclusive(&folio->page), > > + folio_test_swapcache(folio)); > > err =3D -EBUSY; > > goto out; > > } > > > > I understand that shared folios should not be moved. However, in this > > case, the folio is not shared, yet its exclusive flag is not set. > > > > Therefore, I believe PageAnonExclusive is not a reliable indicator of > > whether a folio is truly exclusive to a process. > > It is. The flag *not* being set is not a reliable indicator whether it > is really shared. ;) > > The reason why we have this PAE workaround (dropping the flag) in place > is because the page must not be written to (SWP_STABLE_WRITES). CoW > reuse is not possible. > > uffd moving that page -- and in that same process setting it writable, > see move_present_pte()->pte_mkwrite() -- would be very bad. An alternative approach is to make the folio writable only when we are reasonably certain it is exclusive; otherwise, it remains read-only. If the destination is later written to and the folio has become exclusive, it can be reused directly. If not, a copy-on-write will occur on the destination address, transparently to userspace. This avoids Lokesh=E2=80=99s userspace= -based strategy, which requires forcing a write to the source address. > > > > > The kernel log output is shown below: > > [ 23.009516] move_pages_pte 1285 folio:fffffdffc01bba40 exclusive:0 > > swapcache:1 > > > > I'm still struggling to find a real fix; it seems quite challenging. > > PAE tells you that you can immediately write to that page without going > through CoW. However, here, CoW is required. > > > Please let me know if you have any ideas. In any case It seems > > userspace should fall back to userfaultfd_copy. > > We could try detecting whether the page is now exclusive, to reset PAE. > That will only be possible after writeback completed, so it adds > complexity without being able to move the page in all cases (during > writeback). > > Letting userspace deal with that in these rate scenarios is > significantly easier. Right, this appears to introduce the least change=E2=80=94essentially none= =E2=80=94to the kernel, while shifting more noise to userspace :-) > > -- > Cheers, > > David / dhildenb > Thanks Barry