From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB7A2C71136 for ; Tue, 17 Jun 2025 12:07:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E8A76B0096; Tue, 17 Jun 2025 08:07:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 499B26B0098; Tue, 17 Jun 2025 08:07:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AF846B0099; Tue, 17 Jun 2025 08:07:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2F4406B0096 for ; Tue, 17 Jun 2025 08:07:47 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D8894801A5 for ; Tue, 17 Jun 2025 12:07:46 +0000 (UTC) X-FDA: 83564768532.16.9864A4E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 3F1601C0018 for ; Tue, 17 Jun 2025 12:07:44 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AF7GaNXp; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750162064; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A6Te2d8SmKORmL+MjMF/CvD735H81Qg4mzRRaium6K8=; b=5hRqvcWr7/M0s4zOYwEL3pjdiwAZH/p3HuKicqXyRX9hSw/Tr/4vfKXFT2gg1USNvi2S26 X2QjCGOSR4gyTXVc525XspvUozBacQOpza0u7/z4ZZ3+C0f8EZpqIW7cXrchvh2qnsHYhg OD0n8aPNoMI4jRDqS2+vQqR15cwRPvQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AF7GaNXp; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750162064; a=rsa-sha256; cv=none; b=AIih4x9AHoH+Z9kWHQD5kkpdQWaYtTxyM9hufWcGko9+NTcN/i4bVw4J8nur/jJuCOc78K ZJwfOZR9fy1p1CSk4IyLfv3U7XEm/L0N5TZuy5GVN+EfvIuQt/0kDDuXp3KmAs0eB/De0h 1ujRz48l6ymctvpS2qLi4DGHB3AsSgM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750162063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=A6Te2d8SmKORmL+MjMF/CvD735H81Qg4mzRRaium6K8=; b=AF7GaNXpDwywoTdvgGVkmUWa41AVjBwWhby/WbhqIdvmP72rPTUlEekAdonelMdfcc9Mvt n91N9nAKNYOMD+T+Nu6LhM4I+TvS54PGfQ4bUBS0z9AyPINK4zR7/h7YPPoe+Q10nHDa/a lYvBidhdsYE9DP0om40ZIlGirsLepI8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-599-trBDRmDXOrifbyt9Xj0xrw-1; Tue, 17 Jun 2025 08:07:42 -0400 X-MC-Unique: trBDRmDXOrifbyt9Xj0xrw-1 X-Mimecast-MFC-AGG-ID: trBDRmDXOrifbyt9Xj0xrw_1750162061 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3a4f6ff23ccso3249281f8f.2 for ; Tue, 17 Jun 2025 05:07:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750162061; x=1750766861; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=A6Te2d8SmKORmL+MjMF/CvD735H81Qg4mzRRaium6K8=; b=qdTw49n5PkpWDZ6J/pgSTttzvnSP2PJAWSDCER1o5zy52KEeOzS7ZsTAl1q+qgglcn vuvB0Mxief9mqYHnzh5gAWoG3JY4nQcVUSr0ahmrWv7dhMmAUuPxgIvthvUFxP5DTYV3 hfuB4jpwgWczFbe4eFh8q58Y3TK6izDkJRhPh1lOISopSJkESvjSvMjgYKKOohHNYuwh GbZfOcjK6LoCJwvD1LMDplQWbPAKjtP1S1Bt+gFOUs+AE0xkHyp9i9KEiKdpGhz7bgAG pgq55hW6nsG9GY7Q8++XtAvcmenlIykpUMngYbAJGYtHgPOCsfQeCBEV3rq1Y6REQ1On Rt7w== X-Forwarded-Encrypted: i=1; AJvYcCWbawOFoMaCt1j2sxFUv6tzTqn0l8X43BKZxxM7njM1fyHVOGrL+Ac782HsfBl12vA5gBSUTjPXUA==@kvack.org X-Gm-Message-State: AOJu0YwT24oEbhLdyevvKY68PEplCOEHWFVjxzUe6TrzjaXqzbjTenAO cdImmPYVEvZfYPhH0cipaohuwyknuL7WUy9V5DdcpXApMgOgSJ/K8oyB3HvEdMjNEXxap09ppSo v0ZVuhCpHXpsmbGfL0qRtP2Zn/N69qkFASI01YiOadMzACiTA7e5F X-Gm-Gg: ASbGncvZVVZe9M8m4K7ZYtvP/SKz0Ixekd1RMh9I4x6TdP1PicbBXIpcvLXYfQIEFCx CMIZhTdgn13Ivh2Qb4VMbM0JM4ozAEFXK4rS7GV1ohOKm/XELRMh4ocyoK5lZSB6Rs9WT1JFvbP ikJ5+zLYTRvzBENjXfqLl+F8+s2F92PEYtZhsi1kLajFbEM99a52gmsXGp0oUsvDmbgtmSED5VG jl6FIIiaMjCukVtdL33fWcaqmFXU/exCPEW55jG9voA4PILpw78uLHkvtLDpOk+afeHNBMJZEmE AgYDxuyfDxO4/g8HUGllj3OjG68Nptnp+ZxKbMuvLkkkSc6hEJwXw8abCBSykEbxODy1EZAdOu+ 0/gZAT3XVQXWmdNRb3T2sx4jo+FYXmkynrF05hnBK6qqr8zo= X-Received: by 2002:a05:6000:2582:b0:3a5:1222:ac64 with SMTP id ffacd0b85a97d-3a572e82147mr10299495f8f.38.1750162060715; Tue, 17 Jun 2025 05:07:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFtIfcpG+yjbTYMfj8/fQ/P4dUCekxMz0fmZW/E/QEh4Lhnkkhf+v/nguaYm046a2iMPdFJVg== X-Received: by 2002:a05:6000:2582:b0:3a5:1222:ac64 with SMTP id ffacd0b85a97d-3a572e82147mr10299441f8f.38.1750162060153; Tue, 17 Jun 2025 05:07:40 -0700 (PDT) Received: from ?IPV6:2003:d8:2f31:700:3851:c66a:b6b9:3490? (p200300d82f3107003851c66ab6b93490.dip0.t-ipconnect.de. [2003:d8:2f31:700:3851:c66a:b6b9:3490]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b08a2bsm14044028f8f.62.2025.06.17.05.07.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Jun 2025 05:07:39 -0700 (PDT) Message-ID: <018c0663-dffb-49d0-895c-63bc9e5f9aec@redhat.com> Date: Tue, 17 Jun 2025 14:07:38 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 01/11] mm/mremap: introduce more mergeable mremap via MREMAP_RELOCATE_ANON To: Lorenzo Stoakes Cc: Andrew Morton , Vlastimil Babka , Jann Horn , "Liam R . Howlett" , Suren Baghdasaryan , Matthew Wilcox , Pedro Falcato , Rik van Riel , Harry Yoo , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Jakub Matena , Wei Yang , Barry Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <22a80f22ba2082b28ee0b0a925eb3dbb37c2a786.1749473726.git.lorenzo.stoakes@oracle.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: aKQe-g2LTaXezI3Y9QW0IEqgqbIxoJRnxpkLopmdhy4_1750162061 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: 6spm1atbq56yuq5fqk3ewdmxwm184cxh X-Rspamd-Queue-Id: 3F1601C0018 X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1750162064-203893 X-HE-Meta: U2FsdGVkX1/uR+o2oE8wN2f15VtIcC/5hEplcelqqkZH6a8ZHU6+yLsZwGPbYSpYF8SvF0+Cynl5k2JiIachcA5+zBDKcB8ThdhvQYlPO5Gt7cr2vDr2wm+nX0r3gJC5N4Lcy130dgdXoec/HoZjNn0Xt+ZiX7eB2m1eo2I4MnkVlXEP+YYbOg1rljB9q+/R73s8vU/153vXKE6ndashBUeXwjcuqo2uH8BgdF2z6wZJ5qdY9DbmHUnyJljzT5Chf70JZ7N38i/N8sy6vCzFtDTJtTE/KmDPU+6euW/K+X0O7KUzDi54fJK3qHUgQNp0m0++NDpDl6OItFUN6rnc7xc6hjicQUG5U3h6KZr0mvloj9jfqvZpCND+ugyd8OJ7u3c+QxGaGfvA9DtdIoiwOmRYiRmrxOOo5R4xj1e9LHgXJCYtNBljFb0CoSKw+63RGEgwthYRdSMeDr2QKm/yHDK//4FYQFET3FX1+wc5sSkbcq9zMqucKmzU1ycBSfDr5h7s3NfzkRoFLXuwoXy9U3wwCAiwMyiPFNDeFF8ZIOdVuURePvzZV2o179O3VVxYigt9UYYnBMhJcf7Ru3CReqX/nVnMR6k5xgZCBtAbgGGzbDZci2QfuqpUaHZCB5/WfB6MD3xJgqCCtjo47kWbgkpkV+mhxhFAICoa4buW5RoIm4kZggBchp9ubozJdC/YEgSTlc8OINIv/cSQUZC6f6GpHS/wXXdJzb/fwUjM4aA1yrlWJpesXc04ugsOC0v7tdoMGVaXFjE6TI6plMcdnIL0LsXQXxg0aNiOnGeTy78ys/D9hbU+Xa+W/23YIBdZ+u8Dcxd22Vfj0zdNhsBPqbEVRf6aM+tau+M7mk9+oLSDIJyInD058Q1H4KN8mGVsJ2F6XzDVW5Bb9OORkhJhPEIBPowtrBC42svbXukDjOS71/7YrEdq1V7OlG1xJ0uxDZymKvtOwdxsYaDiaxb Q1FFJT4M q3LwDUvPOPXkirplSW+KXzVh8PFfw9RxU8bKFE9cyW4I1H9owS6vP/P6viZGsOGKtT+2KZUBTChsRFtpt+A26qBcSJaNWHEqwm29DXL5W1+TcAZe6wAYjRym+aLva0yL++svAirysJOFfY5Xg2IoQAvU2p/52Q7O/ELUCKpGuYdykTLgQ2njsJfeuz/QrsWeVEnibgItXrKZHt+gpT+131gciwktqM9bMbI6glIQoO1K8ro/wuA4cKk3KoCShGV3w+H5O2TPA4GKqkO4hVydWHx424dOnzyzSHMrePYvFBviL/1pqNJDQxi6YMHuuvHLrnezctZjXCDxG2x8oVv6Kt8YF55LGa8Eit92B1DU+nQqouHJASQgfO+ITfcSHdqL1Xo5craQmmp0JFICNpQa8z41z4T06U9Rh0wMFUPtWBAfFDgwJn/khNLGmXdWGHs5dsFt6fE8+Y1pYzUvBPp0n+2Tt6IvFmFLVQ57P/iz1/Q6/z7EXgy4YJjv4MRv4MRkndyblx0kR6ZjGC9U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > >> >>> + /* The above check should imply these. */ >>> + VM_WARN_ON_ONCE(folio_mapcount(folio) > folio_nr_pages(folio)); >>> + VM_WARN_ON_ONCE(!PageAnonExclusive(folio_page(folio, 0))); >> >> This can trigger in one nasty case, where we can lose the PAE bit during >> swapin (refault from the swapcache while the folio is under writeback, and >> the device does not allow for modifying the data while under writeback). > > Ugh god wasn't aware of that. So maybe drop this second one? Yes. > >> >>> + >>> + /* >>> + * A pinned folio implies that it will be used for a duration longer >>> + * than that over which the mmap_lock is held, meaning that another part >>> + * of the kernel may be making use of this folio. >>> + * >>> + * Since we are about to manipulate index & mapping fields, we cannot >>> + * safely proceed because whatever has pinned this folio may then >>> + * incorrectly assume these do not change. >>> + */ >>> + if (folio_maybe_dma_pinned(folio)) >>> + goto out; >> >> As discussed, this can race with GUP-fast. SO *maybe* we can just allow for >> moving these. > > I'm guessing you mean as discussed below? :P Or in the cover letter I've not > read yet? :P The latter .. IIRC :P It was late ... > > Yeah, to be honest you shouldn't be fiddling with index, mapping anyway except > via rmap logic. > > I will audit access of these fields just to be safe. > [...] >>> + >>> + state.ptep = ptep_start; >>> + for (; !pte_done(&state); pte_next(&state, nr_pages)) { >>> + pte_t pte = ptep_get(state.ptep); >>> + >>> + if (pte_none(pte) || !pte_present(pte)) { >>> + nr_pages = 1; >> >> What if we have >> >> (a) A migration entry (possibly we might fail migration and simply remap the >> original folio) >> >> (b) A swap entry with a folio in the swapcache that we can refault. >> >> I don't think we can simply skip these ... > > Good point... will investigate these cases. migration entries are really nasty ... probably have to wait for the migration entry to become a present pte again. swap entries ... we could lookup any folio in the swapcache and adjust that. > >> >>> + continue; >>> + } >>> + >>> + nr_pages = relocate_anon_pte(pmc, &state, undo); >>> + if (!nr_pages) { >>> + ret = false; >>> + goto out; >>> + } >>> + } >>> + >>> + ret = true; >>> +out: >>> + pte_unmap_unlock(ptep_start, state.ptl); >>> + return ret; >>> +} >>> + >>> +static bool __relocate_anon_folios(struct pagetable_move_control *pmc, bool undo) >>> +{ >>> + pud_t *pudp; >>> + pmd_t *pmdp; >>> + unsigned long extent; >>> + struct mm_struct *mm = current->mm; >>> + >>> + if (!pmc->len_in) >>> + return true; >>> + >>> + for (; !pmc_done(pmc); pmc_next(pmc, extent)) { >>> + pmd_t pmd; >>> + pud_t pud; >>> + >>> + extent = get_extent(NORMAL_PUD, pmc); >>> + >>> + pudp = get_old_pud(mm, pmc->old_addr); >>> + if (!pudp) >>> + continue; >>> + pud = pudp_get(pudp); >>> + >>> + if (pud_trans_huge(pud) || pud_devmap(pud)) >>> + return false; >> >> We don't support PUD-size THP, why to we have to fail here? > > This is just to be in line with other 'magical future where we have PUD THP' > stuff in mremap.c. > > A later commit that permits huge folio support actually lets us support these... > >> >>> + >>> + extent = get_extent(NORMAL_PMD, pmc); >>> + pmdp = get_old_pmd(mm, pmc->old_addr); >>> + if (!pmdp) >>> + continue; >>> + pmd = pmdp_get(pmdp); >>> + >>> + if (is_swap_pmd(pmd) || pmd_trans_huge(pmd) || >>> + pmd_devmap(pmd)) >>> + return false; >> >> Okay, this case could likely be handled later (present anon folio or >> migration entry; everything else, we can skip). > > Hmm, but how? the PMD cannot be traversed in this case? > > 'Present' migration entry? Migration entries are non-present right? :) Or is it > different at PMD? "present anon folio" or "migration entry" :) So that latter meant a PMD migration entry (that is non-present) [...] >>> pmc.new = new_vma; >>> + if (relocate_anon) { >>> + lock_new_anon_vma(new_vma); >>> + pmc.relocate_locked = new_vma; >>> + >>> + if (!relocate_anon_folios(&pmc, /* undo= */false)) { >>> + unsigned long start = new_vma->vm_start; >>> + unsigned long size = new_vma->vm_end - start; >>> + >>> + /* Undo if fails. */ >>> + relocate_anon_folios(&pmc, /* undo= */true); >> >> You'd assume this cannot fail, but I think it can: imagine concurrent >> GUP-fast ... > > Well if we change the racey code to ignore DMA pinned we should be ok right? We completely block migration/swapout, or could they happen concurrently? I assume you'd block them already using the rmap locks in write mode. > >> >> I really wish we can find a way to not require the fallback. > > Yeah the fallback is horrible but we really do need it. See the page table move > fallback code for nightmares also :) > > We could also alternatively: > > - Have some kind of anon_vma fragmentation where some folios in range reference > a different anon_vma that we link to the original VMA (quite possibly very > broken though). > > - Keep a track of folios somehow and separate them from the page table walk (but > then we risk races) > > - Have some way of telling the kernel that such a situation exists with a new > object that can be pointed to by folio->mapping, that the rmap code recognise, > like essentially an 'anon_vma migration entry' which can fail. > > I already considered combining this operation with the page table move > operation, but the locking gets horrible and the undo is categorically much > worse and I'm not sure it's actually workable. Yeah, I have to further think about that. :( -- Cheers, David / dhildenb