From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28CF8C54E65 for ; Thu, 22 May 2025 23:44:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A9456B007B; Thu, 22 May 2025 19:44:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 65A416B0083; Thu, 22 May 2025 19:44:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 548976B0085; Thu, 22 May 2025 19:44:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 35E616B007B for ; Thu, 22 May 2025 19:44:06 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9954D5C1D9 for ; Thu, 22 May 2025 23:44:05 +0000 (UTC) X-FDA: 83472174450.29.217428A Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf02.hostedemail.com (Postfix) with ESMTP id AA63280004 for ; Thu, 22 May 2025 23:44:03 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qFDd2wkY; spf=pass (imf02.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=lokeshgidra@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747957443; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B4fiPb/UOCIao9WUneZWXqvmbFihzHoQr99jpS6Rj54=; b=SCy/UCT2BRGbcEzpTrmR85uDzeSZbHi99VSkydqIZFFip5tEBEjFSMhSQdDvhXkfHpUVbt Yi1+go8J2YrM39Ibd1WpF+/iGCxn0hEOrLdXtN32sohf7ktAGiZAzIzvCCr1QKi1xNHCrN YA9oLAVbZfMp+DowUM+Xzc44d5t1Ovo= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qFDd2wkY; spf=pass (imf02.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=lokeshgidra@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747957443; a=rsa-sha256; cv=none; b=MslTkHJtarbodQGoLc+I7fvAPfXq1C+/+KUF6BpY9TdnxPGGL8HhvxqgQTDeDsaLM5hd+b GfTVGW+2UFMVitB0leWLr9PT5Jjsz0IUne/SDe92Kpv0BLZza5uIMLbHeTI6UQ77CH/Dvd b7I9hCCMvP2Ppd7ZCGKOgBsVfbX4M+E= Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-6000791e832so2606a12.1 for ; Thu, 22 May 2025 16:44:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957442; x=1748562242; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=B4fiPb/UOCIao9WUneZWXqvmbFihzHoQr99jpS6Rj54=; b=qFDd2wkYd4P+x0Tfnuyc0kdzP5c9ahvEq/Nz9ed1HvBRQROhSNMrYbZNY0XJWQgbEd 73KFsACPrYV87NLnEHFkB2IUkgC8RTq/5RFDAprjoObDcgMDCjWh+TlbsxH+WbCGb7rD ZK2f4XqDv0ZQBv8K7Yi5vEUmkVdzTVN+7MV5Dk/d4AfGtGHNu69ZHwQ75nB/XyBorgJ3 vCsj/J48dA1S8NV26oISm8N3kTkrJUpsHpnArfPGybjaS9KrroEZRhWV4vG4ES573KIH Sv3MWY/CHvZx7UbO3LNKvWyahs+uFai8tLODr8F1Aur+7c4PT6a/GKFWjPcPMnIZJkI0 Gohg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957442; x=1748562242; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=B4fiPb/UOCIao9WUneZWXqvmbFihzHoQr99jpS6Rj54=; b=jFWhjXIzCpIXr6KIiviP3F5PqHD2MCT4ylF771uo/QFID0WOSAluM5RomvSrpoLOWC 8/IY2NuYZUHNM9D+pYSC2yqqBbJVo0ZTPtpvLSO1U0svTUeCkWHReWe+RqYFutRadF49 UAYIdV+FwSjsM8a7gsKm6Jbe524zQ4cj8RGVUCiOv6LzhY4w6FrmVXQMjok3spdokfWr lW00UkrWJ/ypvhWp95OMWYMfb2ZTqhIqBKIwMJQYpVHUuQ1iWUacEv/4zlqUdEgDGSlC fBbygHa6nIHj5oVJAJwpnDLTeNUMQK7naboReMwTxOKsnJF0WtM3++kfSALbvuRyJLiH fksQ== X-Forwarded-Encrypted: i=1; AJvYcCXq84DAEz14m8TkKtUN5sbhgTfco8KoQbHX0kRnIDF061QuPd6xUpm7Bi4zVFkTkh9vKr9jYkb6Zw==@kvack.org X-Gm-Message-State: AOJu0Yxv6F+EQxUvAEg9hIEUgYKCVOe2svwJ5mkY+P8X8XfjUDgW4vJF 1TVIrodoKBbDcy4dPlB5wm0ZrSYpuf1KUUElgkXjS7WfDz4hco3PHkuK7CoarIOp8QPUeo0SpS5 67R3UnEbcrIZPvmRQZ8zdkNFTR1BAfLDTNe5LE4mM X-Gm-Gg: ASbGncu6djObOVr2S8TNCJZQc7oYVO2XbkigFPgz4X76W4PvXb1KvqoXQFaG+qKhDFz j4wE01yRW4m/1j4d3i9vkrZ/SFWlj0QERs6XwiMoKv1yd+mL9hQh/+8xL8wc0SRrNAgzNgwQHXO c1lnoKsuQffM5rKEe9tKoYQvtUDPcaKXWxHbyGG6m1Uj//3rJR8NolNGUJug+8B2kiSTdeiZH4/ g== X-Google-Smtp-Source: AGHT+IF/egR4yp64IyXN39+uzCPNnsbCBon5KCZlz96v7UsAbEUt1hC00Ko3+8FhoiBAQTR9biQcSWL1DiqVJiyoocI= X-Received: by 2002:a05:6402:206a:b0:5fd:28:c3f6 with SMTP id 4fb4d7f45d1cf-60292d1f59amr21112a12.4.1747957441747; Thu, 22 May 2025 16:44:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Lokesh Gidra Date: Thu, 22 May 2025 16:43:50 -0700 X-Gm-Features: AX0GCFsbtviWjkD6bterSDea-aJcibsxg37Dvva7AcHKzf-PvFffIiYv1agOzAM Message-ID: Subject: Re: [BUG]userfaultfd_move fails to move a folio when swap-in occurs concurrently with swap-out To: Barry Song <21cnbao@gmail.com> Cc: Peter Xu , David Hildenbrand , Suren Baghdasaryan , Andrea Arcangeli , Andrew Morton , Linux-MM , Kairui Song , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: AA63280004 X-Rspamd-Server: rspam09 X-Stat-Signature: jhkgg53kczu6ehdnnchcoepyui9qiqme X-HE-Tag: 1747957443-81582 X-HE-Meta: U2FsdGVkX1/6j48ppPhmmuBqwkeNCEadX8xkRAozo5Vs7KRTO6WvCo8BR+wQc/0T+hQ10iuylDYVF+kTPxYa/3LUrvx3G1tDfzt0vew1qSev8HO91W+MR38VV5ZoUyGUP2JC/qSCVOKLhB6mu/fEdalE9S6V6AtqOvpT6OcJd8oM0d5O0xR4RGHapkGaRWNPmorzJ/Efu8+HQWL2Os2SKC97zTjtM13MRBJ3Z9bAjzURDjo3SG4B4+8z1pxh7bZCpz2Yp7L1SCA0UpZrljP2ReYZzSqOrsU6Uwi+AGbl61HFMYdLvSLlCULEImtTFH0sHVU2pJKclFe9G/YRER+Ye6D2+Af0LrqrcJGTzg2aTfsk0GqpH6P5ndFFN0Y001RYz+4+GUq96lpAA8uZDxNiEC22qbQL9exdVHiuM1qVerjXm/yyz5uXzgZ7Wci279i4YgWeBZq1YVnasGQdWI9gTjFxM7jfekH0kDHg+SpFi9ulnm+T9Z/r2T0QxkX983qxZBctEEqq53qSB2lO6OQiEJUqvVavxxedjVneCU7PzZ99V6U2g1oY6mXh+d/8vw5cMiQFH8l7/8BnPCHwMB1U5L0G8a5tV6txBgCgnkDsPd8sEecYXTqvcgoyrVcmB7PfWz+P/3ylKJ7icJowNljsL2ree/4xJVCWtkoyofSYK5v9rJZsEQ30nj9Ve0qJ6lzrC3+S3f92w+xn7/8KvoW/VgltCwuUijCpYEM855fsA/Z/o83GHVcYeMqQzUK2E4d8h1U/ruyCk8xBGNUO/tG+XVe+Xpw6GWYkTnUSJNnYZ1udXQ+D0Vk6hy1HlKVxbfsxGpKPLFPC/9lW+UiDi0MVyVrugLury7E67H8unEjVs0lyoZe7URaUWjboIIcoJUpAafoJ8jJFDJiUvPRmC3LjyEDqSJZAn7gNWObp4uat9Q9prF9W8EaqzZh7a00hni9d/jebYiMdzJQrY9ZnyaL OBQLRJq3 8LrcMDkGs2QbSlN6FzMLV7rYSNxRPrXBCNIyCZGS30mM2CSZ1LNIlpEXA5AkndZvrVBT+Llpxg1V4RO5o7+4rahJUoqRIDSpAp2ImbzuRJB6IxoqmQOtsRwipae1terVsQNkfZ+DKoypt7+SKAG52VDZxxswHhiGYjUZbF7P9BjA1OFdZvgemxPf81CYi6RasJt+zi5NuyA+SXdAMIfmeqYkiP3N3VqUxXvm63nvJQthIJNLkhbopV/JByLynFoEEjOfmdNRUQl92dqA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Thanks Barry for stress testing MOVE ioctl. It's really helpful :) On Thu, May 22, 2025 at 4:23=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > Hi All, > > I'm encountering another bug that can be easily reproduced using the smal= l > program below[1], which performs swap-out and swap-in in parallel. > > The issue occurs when a folio is being swapped out while it is accessed > concurrently. In this case, do_swap_page() handles the access. However, > because the folio is under writeback, do_swap_page() completely removes > its exclusive attribute. > > do_swap_page: > } else if (exclusive && folio_test_writeback(folio) && > data_race(si->flags & SWP_STABLE_WRITES)) { > ... > exclusive =3D false; > > As a result, userfaultfd_move() will return -EBUSY, even though the > folio is not shared and is in fact exclusively owned. > > folio =3D vm_normal_folio(src_vma, src_addr, > orig_src_pte); > if (!folio || !PageAnonExclusive(&folio->page)) { > spin_unlock(src_ptl); > + pr_err("%s %d folio:%lx exclusive:%d > swapcache:%d\n", > + __func__, __LINE__, folio, > PageAnonExclusive(&folio->page), > + folio_test_swapcache(folio)); > err =3D -EBUSY; > goto out; > } > > I understand that shared folios should not be moved. However, in this > case, the folio is not shared, yet its exclusive flag is not set. > > Therefore, I believe PageAnonExclusive is not a reliable indicator of > whether a folio is truly exclusive to a process. > > The kernel log output is shown below: > [ 23.009516] move_pages_pte 1285 folio:fffffdffc01bba40 exclusive:0 > swapcache:1 > > I'm still struggling to find a real fix; it seems quite challenging. > Please let me know if you have any ideas. In any case It seems > userspace should fall back to userfaultfd_copy. > I'm not sure this is really a bug. A page under write-back is in a way 'busy' isn't it? I am not an expert of anon-exclusive, but it seems to me that an exclusively mapped anonymous page would have it true. So, isn't it expected that a page under write-back will not have it set as the page isn't mapped? I have observed this in my testing as well, and there are a couple of ways to deal with it in userspace. As you suggested, falling back to userfaultfd_copy on receiving -EBUSY is one option. In my case, making a fake store on the src page and then retrying has been working fine. > > > [1] The small program: > > //Just in a couple of seconds, we are running into > //"UFFDIO_MOVE: Device or resource busy" > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > > #define PAGE_SIZE 4096 > #define REGION_SIZE (512 * 1024) > > #ifndef UFFDIO_MOVE > struct uffdio_move { > __u64 dst; > __u64 src; > __u64 len; > #define UFFDIO_MOVE_MODE_DONTWAKE ((__u64)1<<0) > #define UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES ((__u64)1<<1) > __u64 mode; > __s64 move; > }; > > #define _UFFDIO_MOVE (0x05) > #define UFFDIO_MOVE _IOWR(UFFDIO, _UFFDIO_MOVE, struct uffdio_move) > #endif > > > void *src, *dst; > int uffd; > > void *madvise_thread(void *arg) { > for (size_t i =3D 0; i < REGION_SIZE; i +=3D PAGE_SIZE) { > madvise(src + i, PAGE_SIZE, MADV_PAGEOUT); > usleep(100); > } > return NULL; > } > > void *swapin_thread(void *arg) { > volatile char dummy; > for (size_t i =3D 0; i < REGION_SIZE; i +=3D PAGE_SIZE) { > dummy =3D ((char *)src)[i]; > usleep(100); > } > return NULL; > } > > > void *fault_handler_thread(void *arg) { > > struct uffd_msg msg; > struct uffdio_move move; > struct pollfd pollfd =3D { .fd =3D uffd, .events =3D POLLIN }; > pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL); > pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL); > > while (1) { > if (poll(&pollfd, 1, -1) =3D=3D -1) { > perror("poll"); > exit(EXIT_FAILURE); > } > > if (read(uffd, &msg, sizeof(msg)) <=3D 0) { > perror("read"); > exit(EXIT_FAILURE); > } > > > if (msg.event !=3D UFFD_EVENT_PAGEFAULT) { > fprintf(stderr, "Unexpected event\n"); > exit(EXIT_FAILURE); > } > > move.src =3D (unsigned long)src + (msg.arg.pagefault.address - > (unsigned long)dst); > move.dst =3D msg.arg.pagefault.address & ~(PAGE_SIZE - 1); > move.len =3D PAGE_SIZE; > move.mode =3D 0; > > if (ioctl(uffd, UFFDIO_MOVE, &move) =3D=3D -1) { > perror("UFFDIO_MOVE"); > exit(EXIT_FAILURE); > } > } > return NULL; > } > > int main() { > again: > pthread_t thr, madv_thr, swapin_thr; > struct uffdio_api uffdio_api =3D { .api =3D UFFD_API, .features =3D 0= }; > struct uffdio_register uffdio_register; > > src =3D mmap(NULL, REGION_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE > | MAP_ANONYMOUS, -1, 0); > > if (src =3D=3D MAP_FAILED) { > perror("mmap src"); > exit(EXIT_FAILURE); > } > > memset(src, 1, REGION_SIZE); > > dst =3D mmap(NULL, REGION_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE > | MAP_ANONYMOUS, -1, 0); > > if (dst =3D=3D MAP_FAILED) { > perror("mmap dst"); > exit(EXIT_FAILURE); > } > > > uffd =3D syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK); > if (uffd =3D=3D -1) { > perror("userfaultfd"); > exit(EXIT_FAILURE); > } > > > if (ioctl(uffd, UFFDIO_API, &uffdio_api) =3D=3D -1) { > perror("UFFDIO_API"); > exit(EXIT_FAILURE); > } > > uffdio_register.range.start =3D (unsigned long)dst; > uffdio_register.range.len =3D REGION_SIZE; > uffdio_register.mode =3D UFFDIO_REGISTER_MODE_MISSING; > > if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) =3D=3D -1) { > perror("UFFDIO_REGISTER"); > exit(EXIT_FAILURE); > > } > > if (pthread_create(&madv_thr, NULL, madvise_thread, NULL) !=3D 0) { > perror("pthread_create madvise_thread"); > exit(EXIT_FAILURE); > } > > if (pthread_create(&swapin_thr, NULL, swapin_thread, NULL) !=3D 0) { > perror("pthread_create swapin_thread"); > exit(EXIT_FAILURE); > } > > if (pthread_create(&thr, NULL, fault_handler_thread, NULL) !=3D 0) { > perror("pthread_create fault_handler_thread"); > exit(EXIT_FAILURE); > } > > for (size_t i =3D 0; i < REGION_SIZE; i +=3D PAGE_SIZE) { > char val =3D ((char *)dst)[i]; > printf("Accessing dst at offset %zu, value: %d\n", i, val); > } > > pthread_join(madv_thr, NULL); > pthread_join(swapin_thr, NULL); > pthread_cancel(thr); > pthread_join(thr, NULL); > munmap(src, REGION_SIZE); > munmap(dst, REGION_SIZE); > close(uffd); > goto again; > > return 0; > } > > Thanks > Barry