From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76DF5C54ED1 for ; Tue, 27 May 2025 09:00:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0A706B009D; Tue, 27 May 2025 05:00:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EE1DF6B009E; Tue, 27 May 2025 05:00:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD0866B009F; Tue, 27 May 2025 05:00:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BFCBC6B009D for ; Tue, 27 May 2025 05:00:12 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3C805C0325 for ; Tue, 27 May 2025 09:00:12 +0000 (UTC) X-FDA: 83488091064.14.AA922DD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id A3359C0002 for ; Tue, 27 May 2025 09:00:09 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KULTF5Mi; spf=pass (imf22.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748336409; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GXTsXwwwC5Fd6SpkUtTR8cAGZzYMKvi4G91wol1yAfw=; b=Wt3167jPRWpdMXfgVipRzSlPO9xJQ5cBjIm1kd9ShKhnWVIeVeFliC93zu0H1MDeq8yQVo 6FXoELtByMmiN/NMZXASDCw1IbICESgr9UPv5zPL7be9RONkt/eMcOnBvd81z1JsRR5gxA +xIpuBOG8+blKST/kyV5udootrrcVtA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748336409; a=rsa-sha256; cv=none; b=2PZW3+R3cpcbHfPZDgbZfQ74tygQP0jztiio8jJ4tMojL19HftztCNxXBSvmktETRMDBPJ 9vbO88PdDsMtOd8yBXtttyfriSu8Zo2/54LUdtB1iPFVRrTI8RL0JosZWAmpsrF4MM7VON 4Etz1AwVCCzv5O9VgidYrproBEkVbuE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KULTF5Mi; spf=pass (imf22.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748336409; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=GXTsXwwwC5Fd6SpkUtTR8cAGZzYMKvi4G91wol1yAfw=; b=KULTF5MiS6ZSVavx6V69p4/BhHTd4RXL/dide5N5xK/BVB6GRLRRn5+JdUVlQd3l/Sn9sv KT1adQTYY6P76otmAS/4BaB+B/W4QVFkxf3Rf0CptRBs2c1ZqOhuh+DnYZTBCvBBDCz+3a hvRAPYtc8cblnyhZLDzBfO/SGVZ6IWg= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-185-MX8mlh8yNsa3YOEVJotEAA-1; Tue, 27 May 2025 05:00:07 -0400 X-MC-Unique: MX8mlh8yNsa3YOEVJotEAA-1 X-Mimecast-MFC-AGG-ID: MX8mlh8yNsa3YOEVJotEAA_1748336406 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-442d472cf7fso25597125e9.3 for ; Tue, 27 May 2025 02:00:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748336406; x=1748941206; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=GXTsXwwwC5Fd6SpkUtTR8cAGZzYMKvi4G91wol1yAfw=; b=uIqTdxELWCkFKfDMp9LDjl5rBhdiyuuQpwex2farN+saPStL+82LmBypuwKUH6vrSJ hxnWZ29nIwfyW+McIoB1KuXh1dPRI0Tr3k4Sd8Hxpwn6litwoYRA3PLhEiWAeRDvN0Qu JJOEJRV7VL4RhzqPuRnCzhbLVRLQ8BmYLsxeQ8lJhbLNBzAIxVBb/5Z7hFbpOjFNYgFd f8A8Y7e6n3vnXIu4O4D3V3HVH+JbIo7NdXaQ6vFTt0mqSXpZtb298iZ+t979MGlcIxLB 6QSVu3baJLywCs4S4az4Kf9uwHbErSjI83j276/mhL7MWHUbLdXwmC4arMtGHNPCIsly 0OsQ== X-Forwarded-Encrypted: i=1; AJvYcCWV6MfrKSwmwkW7baWlNRrMfHY1Kde0R4D56QTn5bc+H0NY0mfghVmCqlkP0HFDiyupUuI9TIx01w==@kvack.org X-Gm-Message-State: AOJu0Yx7l1s2akiQs9z2rUT9cIuKtS0aoWXKRCGx3cq6KswlUTKsBb72 lRIQm104Se0js1C0Dyqb+OsxGo3s6LVSAVabt1kssRNuBr+n/nNZjgO6d05xngqqpVWuo8deY2n ugB5J6roVz0op98FXl/oBnbKVjTzeJUIyzkD7Tjuv+BDkN155cqC2 X-Gm-Gg: ASbGncvWpZdHGc5vxdW1SmvL9fETcWRxr+yeMRo5+9SdUfkWAYPRgr8Ak5MgBidKiK/ xfh4eh1Pj4UOvgb/wFnVCx8IW9p7YicAMQLC4xGED6HvrCnIzQo9crmxCFu6PzZf/f4+Wup+FaG bhyViEaH2Fa3X8D40sImEQUY6ix9hiugG7MrdBsTQE7gLiriU/5ZdTpAM5bNlfuktnvSog6sTud 1jBp8pSGjF0ecVVnNPVQ9LUIVaDsJ3umeWB1bXl5ZC8UgRvEsOkjTN7riUNTjn1xojPnMnWg0ou GyVHAiDz5cqgfZlqobWQBp2ljlxHuvg00+0ZmP7ICBWf X-Received: by 2002:a05:600c:6286:b0:442:d9f2:c74e with SMTP id 5b1f17b1804b1-44c932f9428mr100371435e9.23.1748336405621; Tue, 27 May 2025 02:00:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGARQ2Bno0QUbxu9dhzCoswDZzm0640a4L6rO+rC2ch1A7P8det11zghRkpyptb9zOMZ4ZLCg== X-Received: by 2002:a05:600c:6286:b0:442:d9f2:c74e with SMTP id 5b1f17b1804b1-44c932f9428mr100370965e9.23.1748336405177; Tue, 27 May 2025 02:00:05 -0700 (PDT) Received: from ?IPV6:2a09:80c0:192:0:5dac:bf3d:c41:c3e7? ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-447f7d9bde0sm262173905e9.40.2025.05.27.02.00.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 May 2025 02:00:04 -0700 (PDT) Message-ID: Date: Tue, 27 May 2025 11:00:03 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [BUG]userfaultfd_move fails to move a folio when swap-in occurs concurrently with swap-out To: Barry Song <21cnbao@gmail.com> Cc: aarcange@redhat.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, peterx@redhat.com, ryncsn@gmail.com, surenb@google.com References: <20250527083722.27309-1-21cnbao@gmail.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <20250527083722.27309-1-21cnbao@gmail.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: k6s-CfGAbjPUpV8VvDgY6Pd3pFnOaPS4fPym5d_Zk4g_1748336406 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A3359C0002 X-Stat-Signature: n68b6dnppj3c1ij4mhi8yhgnmsr3hp5r X-Rspam-User: X-HE-Tag: 1748336409-726621 X-HE-Meta: U2FsdGVkX18465NYvT6H4gpIAUkikkA/maAG70NTr2HYT3rReMJeQ+QoElXfxGstZ8Mk2hXmOe5A4wzljd8vMEmEcFq6Lnp4rJNZjx4vd+OEjOKFWZiLwuCsHPVxvGI96GklR9pbqiDCC2rcHTiNcxG2MtsOPN+I+WXTmgT5bZTQIb4lNwcn4gFKcoXDG/nwawOLLyfNuOBEO83IKXr7tOHGBEGVHwaPIsK7+D4i8cpuMK0uxwJJFw7W3iZH7cEzjFjUOsSok8IAxLeoc1W0iNfTcdktKhUFNEG/6DB31WBlwA1wI3WRMmtVnQhNdryuNDc6QGT5vG/Ce97pHLWRx/KAQF4k0asGTh1mzogiV0f+jK5suep3bBpIn38xD0vywy5LuzIYlslFAHBpxSVJ0zX9pPEeH/qOfUmQiwdB28cqkNReVKTX2PVml37Lg10NaLAdteSh2wmgUVbW+UpOCgsYaMo80OYOz1s5i8PCYcB8wheJ1GKFgQ0U2jNFNwE6h5pQM67juKCJ7YieVC0cB+zHC0e99Ia/GF256S6pFS83kmsCxo4NK6+MmegasuQkEhyKm6/2Lq8cuDLRs79Nh8kwKLXjoDpESJZKsiLVdVYFI9n8B8qELX+tknWny3oAovufLfnivbF1yIvEEGrVGEzigasghBUEvcjWtpMbLD+2JE0wmpatQOgKNdbrH6l5go/LZrrWZEdEXjhG5CP4lYA7VW+O5oq/h2rR+qEZ/exvg3Yju+6jCNgJngP55zbxLRa0p/GMqlTNDF+yzpcMJXMHLnN3+NOK5jHgs6/kYf3KE2h8WK5PhkZwRTAC88fqzm2w4wfnbM0juBU0yT4Hqc2A2yNDRL1wPFdePyZyC1B8CpP52oHaNSz6UPt0nktBwgpkwTtOxz1o6x+65ygKqRoBsQfQvbMIYPQQeieZyenh5JcDsNn4LOkVVGeEms6epv/YV73xIKaMCQklbs8 TzEgOuGr +zG8U5DFLE6D6zXDhjKITjaGDyYpAcqWUqUIPbubrk7AbTf+k4a13x5k4X10h/exYWNWuQ0uEFpABr6s9PsEzF40R+C8RMnDlMDtnrcbalso8MD5jTjs1IHr+ZNzs6ZQyA+L2L0T4pL30QJe1l8Cg4L7+dB6T9ZW8qViORLc3zT8P3Ll/M+olVXl+sMdPU3lF6z4VJIwyb6UlktU6a9wWQsaKcN/p1yBorA6Droftt6R3UbtqJIxOqgAVTFRCkjwwFVoZRFz6lB+ynDfJRy+4OD81Q/0BhcqFzcG2wj5l2TEiwTZhTzEFLJPFmRJF3H/F3IWKBv/vKJ78EYuw1jvNO+9juNxXZIYljbgiDBO6BbCemjolv8yidHW9mQIv+BBEWVwpxf+nnFMuZtcBYdGUpqHFxmKupflXk4PHNEn1GIZ8RgqYcRhJbnHI60oYGMsoUrrtcIvzz4WUCv9n0z0F4cUgU2l3CYUhTT0r2j3n2e23PSdhzCP1Zsz8YlnNgnQ79Gxaz4iyV/XoPOuG57OYQoVFLVgdgfiPE95Q3nRimNtu0Z22YYf8/QYvH2CtAd+Z5BuC96KLrEmw45LeH54ijitsQA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 27.05.25 10:37, Barry Song wrote: > On Tue, May 27, 2025 at 4:17 PM Barry Song <21cnbao@gmail.com> wrote: >> >> On Tue, May 27, 2025 at 12:39 AM David Hildenbrand wrote: >>> >>> On 23.05.25 01:23, Barry Song wrote: >>>> Hi All, >>> >>> Hi! >>> >>>> >>>> I'm encountering another bug that can be easily reproduced using the small >>>> program below[1], which performs swap-out and swap-in in parallel. >>>> >>>> The issue occurs when a folio is being swapped out while it is accessed >>>> concurrently. In this case, do_swap_page() handles the access. However, >>>> because the folio is under writeback, do_swap_page() completely removes >>>> its exclusive attribute. >>>> >>>> do_swap_page: >>>>                 } else if (exclusive && folio_test_writeback(folio) && >>>>                            data_race(si->flags & SWP_STABLE_WRITES)) { >>>>                          ... >>>>                          exclusive = false; >>>> >>>> As a result, userfaultfd_move() will return -EBUSY, even though the >>>> folio is not shared and is in fact exclusively owned. >>>> >>>>                          folio = vm_normal_folio(src_vma, src_addr, >>>> orig_src_pte); >>>>                          if (!folio || !PageAnonExclusive(&folio->page)) { >>>>                                  spin_unlock(src_ptl); >>>> +                               pr_err("%s %d folio:%lx exclusive:%d >>>> swapcache:%d\n", >>>> +                                       __func__, __LINE__, folio, >>>> PageAnonExclusive(&folio->page), >>>> +                                       folio_test_swapcache(folio)); >>>>                                  err = -EBUSY; >>>>                                  goto out; >>>>                          } >>>> >>>> I understand that shared folios should not be moved. However, in this >>>> case, the folio is not shared, yet its exclusive flag is not set. >>>> >>>> Therefore, I believe PageAnonExclusive is not a reliable indicator of >>>> whether a folio is truly exclusive to a process. >>> >>> It is. The flag *not* being set is not a reliable indicator whether it >>> is really shared. ;) >>> >>> The reason why we have this PAE workaround (dropping the flag) in place >>> is because the page must not be written to (SWP_STABLE_WRITES). CoW >>> reuse is not possible. >>> >>> uffd moving that page -- and in that same process setting it writable, >>> see move_present_pte()->pte_mkwrite() -- would be very bad. >> >> An alternative approach is to make the folio writable only when we are >> reasonably certain it is exclusive; otherwise, it remains read-only. If the >> destination is later written to and the folio has become exclusive, it can >> be reused directly. If not, a copy-on-write will occur on the destination >> address, transparently to userspace. This avoids Lokesh’s userspace-based >> strategy, which requires forcing a write to the source address. > > Conceptually, I mean something like this: > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index bc473ad21202..70eaabf4f1a3 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -1047,7 +1047,8 @@ static int move_present_pte(struct mm_struct *mm, > } > if (folio_test_large(src_folio) || > folio_maybe_dma_pinned(src_folio) || > - !PageAnonExclusive(&src_folio->page)) { > + (!PageAnonExclusive(&src_folio->page) && > + folio_mapcount(src_folio) != 1)) { > err = -EBUSY; > goto out; > } > @@ -1070,7 +1071,8 @@ static int move_present_pte(struct mm_struct *mm, > #endif > if (pte_dirty(orig_src_pte)) > orig_dst_pte = pte_mkdirty(orig_dst_pte); > - orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma); > + if (PageAnonExclusive(&src_folio->page)) > + orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma); > > set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte); > out: > @@ -1268,7 +1270,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, > } > > folio = vm_normal_folio(src_vma, src_addr, orig_src_pte); > - if (!folio || !PageAnonExclusive(&folio->page)) { > + if (!folio || (!PageAnonExclusive(&folio->page) && > + folio_mapcount(folio) != 1)) { > spin_unlock(src_ptl); > err = -EBUSY; > goto out; > > I'm not trying to push this approach—unless Lokesh clearly sees that it > could reduce userspace noise. I'm mainly just curious how we might make > the fixup transparent to userspace. :-) And that reveals the exact problem: it's all *very* complicated. :) ... and dangerous when we use the mapcount without having a complete understanding how it all works. What we would have to do for a small folio is 1) Take the folio lock 2) Make sure there is only this present page table mapping: folio_mapcount(folio) != 1 of better !folio_maybe_mapped_shared(folio); 3) Make sure that there are no swap references If in the swapcache, check the actual swapcount 3) Make sure it is not a KSM folio THPs are way, way, way more complicated to get right that way. Likely, the scenario described above cannot happen with a PMD-mapped THP for now at least (we don't have PMD swap entries). Of course, we'd then also have to handle the case when we have a swap pte where the marker is not set (e.g., because of swapout after the described swapin where we dropped the marker). What could be easier is triggering a FAULT_FLAG_UNSHARE fault. It's arguably less optimal in case the core will decide to swapin / CoW, but it leaves the magic to get all this right to the core -- and mimics the approach Lokesh uses. But then, maybe letting userspace just do a uffdio_copy would be even faster (only a single TLB shootdown?). I am also skeptical of calling this a BUG here. It's described to behave exactly like that [1]: EBUSY The pages in the source virtual memory range are either pinned or not exclusive to the process. The kernel might only perform lightweight checks for detecting whether the pages are exclusive. To make the operation more likely to succeed, KSM should be disabled, fork() should be avoided or MADV_DONTFORK should be configured for the source virtual memory area before fork(). Note the "lightweight" and "more likely to succeed". [1] https://lore.kernel.org/lkml/20231206103702.3873743-3-surenb@google.com/ -- Cheers, David / dhildenb