From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E061C369B1 for ; Wed, 16 Apr 2025 08:56:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C66D6B0011; Wed, 16 Apr 2025 04:56:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6749B6B0012; Wed, 16 Apr 2025 04:56:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EF966B0023; Wed, 16 Apr 2025 04:56:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D6AF6B0011 for ; Wed, 16 Apr 2025 04:56:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D07B4C0388 for ; Wed, 16 Apr 2025 08:56:10 +0000 (UTC) X-FDA: 83339300100.02.75B755A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 69EE640006 for ; Wed, 16 Apr 2025 08:56:08 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="HO/Z/8Ta"; spf=pass (imf27.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744793768; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/b4ncp271t0ndQIqnuTGHF/WTgbEd3GAjh59v8EuXU4=; b=OTaGjiDtGwyrUP+vwF0+dAM4YgiDyRhcktGIqiNCeU6fAbR2wuKaiMXLVwepPxLWqY6tvl kz9YMIvU9O2b+ZB4zMZ52p4ahr0/vJD4qLJVyLGTxasq/XliYsOmbOjEiH3USjnb3U6qEb Qt2ALO1U89j6GdbdJNq9wL46OUK67FE= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="HO/Z/8Ta"; spf=pass (imf27.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744793768; a=rsa-sha256; cv=none; b=2WxMGWvPUmrzcWbDbaD/SccqJ5XejE6tURtNklbogxP5emPbzLbueMIwBPFfSiayqLKYDb LG54cFJn73RKa43HKgNIRkGyRdGeDU/PgzH1WhGipFwGw6e3UCxAJZUe6VOVAt8cvUiULQ XSv0/Kjm1Br10RHThaw9CbkmeSs3thw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1744793767; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=/b4ncp271t0ndQIqnuTGHF/WTgbEd3GAjh59v8EuXU4=; b=HO/Z/8TaeiYym8tXmrmqGqpGheHw36h3BdZ/NgbgqHQRshTlpKiK2O1+yPVn2kzjng9xwi UTJA49HybYsr3bBLcs0C+J5CAr5rxDNQb3O7vLbe5qcYeSBbWQI9LnTMlI8XvovkHFbomN Tk5HRUchoDTZii5v6SPWMbJXztU8iLw= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-453-r01O7L-7PrqROe1mzJTEZw-1; Wed, 16 Apr 2025 04:56:06 -0400 X-MC-Unique: r01O7L-7PrqROe1mzJTEZw-1 X-Mimecast-MFC-AGG-ID: r01O7L-7PrqROe1mzJTEZw_1744793765 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3913aaf1e32so3720046f8f.0 for ; Wed, 16 Apr 2025 01:56:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744793765; x=1745398565; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:references:cc:to:from:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=/b4ncp271t0ndQIqnuTGHF/WTgbEd3GAjh59v8EuXU4=; b=IlgVmRxJZ+bxfedIloaL/GnvfHnnnf3wuqyy9QILQfIBI8NEqJwL7ywtywl2sF9VxT f24D8GOn5m/SkmwcME1dJ9QF0NB70bHlgkEkTlixZ065z5qsdOVQQnbUUJS9pmKpPjh7 EBxrsKlGEP9hG857hKuol9Elw8aifwmM3AChQwvzN0QUI3iVnPEmuV7Bjsl0bwqWb3nD QttGJv3KeBfN9HSugS+T6RSxHSAqbiolz3e+SOjd4XDoUYMtbkmhz63g/sXSItl5Bg5t 1FdNHOB0Tu7d15sIB5oU4WDy9R4dpK5eqgw+ajnTOwbkufTUmKIkD+YbEbYU391EURhn LI6g== X-Forwarded-Encrypted: i=1; AJvYcCV29fnQkBAXkdQHl0+X4fR37TgvUNioi/kSwQ62C4G0DRIHJBFGW8sdu8h3tLkiY/5n9+qCf/oiwA==@kvack.org X-Gm-Message-State: AOJu0YwjHkhyb0DW89wlY/Ep9OtmG/MjSprfkdIttUl0am8CMGiIYOEP /+NhKTEe/rwb+Sny89/mhDPeZCdPKIhw5Bh19fSdmc93O+XqFrUqbkk//69xY9nYUAsGOA4lxt8 krga1HhMTR0W6FfAW9KUpR5CdrK+ry93XbtuEH6Ej2BaL80A5 X-Gm-Gg: ASbGncvGyVg7XiSawtIzQMv6K6gOGtYHICDa0HN5MuLG9whYk/xxVG0S2C05ng1v/Bv u2tbGcYcAH5GuMhtUWVm3uFEyXaHIOiaStUBx9BPi05qG9MUxe4KaNxLNKRSrj4RimFpFjcx1/f CKhJHCkd6WFc6FRKU1hQmtpqWu0jVJ+/2Fms7tznOnbvYmDBXKAUWYYl88MTiPBIh72ADpWU+7y bdCbfUDCONKuv8tRSHg3sDc5JECIfqRTiWMxXImFXZc3ADJrpXvEALcpQwemciZdAonw51Ztl34 8sum35PyPE8u1Pf25OWTi7Gm+SoNb+3+q7X52vxrOzX/YS5S9raVXrBtfuxWDHuLM+jpSrn+1rk iC9QJVCaMCOg2aybfRVDq/fapg23iwKLCri9mNQ== X-Received: by 2002:a5d:648c:0:b0:391:122c:8b2 with SMTP id ffacd0b85a97d-39ee5b1cb1amr1033795f8f.31.1744793765239; Wed, 16 Apr 2025 01:56:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH1wo1+dvDVuD9rPW2MQ+hSjAIwnfub6e/NjUxGKnKiKVp+OBbxuI/0l2iHcOTeX2v3wkwOaA== X-Received: by 2002:a5d:648c:0:b0:391:122c:8b2 with SMTP id ffacd0b85a97d-39ee5b1cb1amr1033762f8f.31.1744793764750; Wed, 16 Apr 2025 01:56:04 -0700 (PDT) Received: from ?IPV6:2003:d8:2f02:2900:f54f:bad7:c5f4:9404? (p200300d82f022900f54fbad7c5f49404.dip0.t-ipconnect.de. [2003:d8:2f02:2900:f54f:bad7:c5f4:9404]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-39eae96c02esm16284750f8f.23.2025.04.16.01.56.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 16 Apr 2025 01:56:04 -0700 (PDT) Message-ID: Date: Wed, 16 Apr 2025 10:56:03 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] mempolicy: Optimize queue_folios_pte_range by PTE batching From: David Hildenbrand To: Baolin Wang , Dev Jain , akpm@linux-foundation.org Cc: ryan.roberts@arm.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, hughd@google.com, vishal.moola@gmail.com, yang@os.amperecomputing.com, ziy@nvidia.com References: <20250416053048.96479-1-dev.jain@arm.com> <7f96283b-11b3-49ee-9d2d-5ad977325cb0@linux.alibaba.com> <019d1c4a-ffd0-4602-b2ba-cf07379dab17@redhat.com> <7392a21b-10bf-4ce9-a6fd-807ed954c138@linux.alibaba.com> <8b387a53-40e0-40d1-8bfa-b7524657a7dd@redhat.com> Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <8b387a53-40e0-40d1-8bfa-b7524657a7dd@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 8JT5vlL6zZkhHmiN6BkJxIv-9xivRJ_n9JeIr4cI-5U_1744793765 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: uztwr3rrafcyq6qonq7ro9nbxh3h4tkn X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 69EE640006 X-Rspam-User: X-HE-Tag: 1744793768-326540 X-HE-Meta: U2FsdGVkX18lollFmygq19G5QcpeLuyIM5f4nBoJgX5xe8wOvAcfXv51iKFtA/lvSWjTMPM2CUyM1C8T074iDZrY1nJf0CfOB+o9DfN8C9ypDP+sV1tYavH5nYvJDxz5n/ESy5U0DLRvE7/DtJthjHkKmuXRPX5AoG+mHJC90PAVYZa1gNGh4ttH1gKld7490p1uuhg2WbonhGlavdEwS7BrTQO4bb4QEf1KYNzIC0s+m2kXKJabaqrlsrOkYHRNrK9QMaxUfZHjl9e7rRNEw7ig+J5BQkuRhU8Xp9n8mEWwjF0/zw4PTJl6dLYd/YTYFdjS1I2N/39tfZYBD+iQcTCDS66BmDvrRqFKpgSajOH5/iekMAxFNgdLCFGIVmX567zRBJM5lH+dQ8mQceeimtchPaOxxubRLrqlI7yt35hGzFLSMFcv8Tyy+y3jHsZQgc4bGgnq6yfrSqZDOgoK+FhJcLvygx+MGTN8k2ZJC7FDAqexmN7XBX8mExDhVhCaP3sNc9nUAoRd4x/K++qzkOLvhtNzCwjIDQee7RB6CCPtu8KyeGUtaJk+aFSP7VZsERSt+mOH/Pj9W7WsbzD0Dkh3NgQ70DCG9+wyy/xvY29mx+4IpTeDL4cIy9e02IqPoDohgbOwS0ztLigjiXQVJHkwJrLNUAa4/6JkEkPfwaoTA24GbvZRQEfdK1pVaWUIlff5qN5KBcQ521Y3Nmc2rQ1luQTkiR+/ZSCtyOz3/3hA8kXMh2tPGHkXuWP+nzJqlYsNm2dcDyVNLdDJcMZyID96++ZII9GvCXuaV63m7HdVjO6kBZkrhXY9zcO42BAlfUkfCkxbB8KTTbeOLXj552QKsIWQ4r9/ed35d532i/2jCGi5dcXlrR8FaVuheZOHqxU35WyhOguI2vJqhBgGDUJs9cbbjVLPLV1dJppkVVJiU6wTjxyjfSfO7rPuicyQzewq9zWQBKwrERe170h 6AkEU+4h qxITU7gA+s1pmcQAx5xDa4npNCx+VMTDads6Ara7JO25JtYfYT1AsNybG0rGm9eZDQfGWDPyi6pXDwRqhr7WIbuEx/EJlLUZY80rJXLrRihMkP9f/5Jvn1PdGATqYQXWrkf+1o4mIMDDoXb0MzmlFQnvUeDBOqVrjZmcYuLH5kneWcz1RsG7ydmGR2gRFxcxHZ24B X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 16.04.25 10:51, David Hildenbrand wrote: > On 16.04.25 10:41, Baolin Wang wrote: >> >> >> On 2025/4/16 16:22, David Hildenbrand wrote: >>> On 16.04.25 08:32, Baolin Wang wrote: >>>> >>>> >>>> On 2025/4/16 13:30, Dev Jain wrote: >>>>> After the check for queue_folio_required(), the code only cares about >>>>> the >>>>> folio in the for loop, i.e the PTEs are redundant. Therefore, optimize >>>>> this loop by skipping over a PTE batch mapping the same folio. >>>>> >>>>> With a test program migrating pages of the calling process, which >>>>> includes >>>>> a mapped VMA of size 4GB with pte-mapped large folios of order-9, and >>>>> migrating once back and forth node-0 and node-1, the average execution >>>>> time reduces from 7.5 to 4 seconds, giving an approx 47% speedup. >>>>> >>>>> v2->v3: >>>>>    - Don't use assignment in if condition >>>>> >>>>> v1->v2: >>>>>    - Follow reverse xmas tree declarations >>>>>    - Don't initialize nr >>>>>    - Move folio_pte_batch() immediately after retrieving a normal folio >>>>>    - increment nr_failed in one shot >>>>> >>>>> Acked-by: David Hildenbrand >>>>> Signed-off-by: Dev Jain >>>>> --- >>>>>    mm/mempolicy.c | 12 ++++++++++-- >>>>>    1 file changed, 10 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >>>>> index b28a1e6ae096..4d2dc8b63965 100644 >>>>> --- a/mm/mempolicy.c >>>>> +++ b/mm/mempolicy.c >>>>> @@ -566,6 +566,7 @@ static void queue_folios_pmd(pmd_t *pmd, struct >>>>> mm_walk *walk) >>>>>    static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, >>>>>                unsigned long end, struct mm_walk *walk) >>>>>    { >>>>> +    const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; >>>>>        struct vm_area_struct *vma = walk->vma; >>>>>        struct folio *folio; >>>>>        struct queue_pages *qp = walk->private; >>>>> @@ -573,6 +574,7 @@ static int queue_folios_pte_range(pmd_t *pmd, >>>>> unsigned long addr, >>>>>        pte_t *pte, *mapped_pte; >>>>>        pte_t ptent; >>>>>        spinlock_t *ptl; >>>>> +    int max_nr, nr; >>>>>        ptl = pmd_trans_huge_lock(pmd, vma); >>>>>        if (ptl) { >>>>> @@ -586,7 +588,9 @@ static int queue_folios_pte_range(pmd_t *pmd, >>>>> unsigned long addr, >>>>>            walk->action = ACTION_AGAIN; >>>>>            return 0; >>>>>        } >>>>> -    for (; addr != end; pte++, addr += PAGE_SIZE) { >>>>> +    for (; addr != end; pte += nr, addr += nr * PAGE_SIZE) { >>>>> +        max_nr = (end - addr) >> PAGE_SHIFT; >>>>> +        nr = 1; >>>>>            ptent = ptep_get(pte); >>>>>            if (pte_none(ptent)) >>>>>                continue; >>>>> @@ -598,6 +602,10 @@ static int queue_folios_pte_range(pmd_t *pmd, >>>>> unsigned long addr, >>>>>            folio = vm_normal_folio(vma, addr, ptent); >>>>>            if (!folio || folio_is_zone_device(folio)) >>>>>                continue; >>>>> +        if (folio_test_large(folio) && max_nr != 1) >>>>> +            nr = folio_pte_batch(folio, addr, pte, ptent, >>>>> +                         max_nr, fpb_flags, >>>>> +                         NULL, NULL, NULL); >>>>>            /* >>>>>             * vm_normal_folio() filters out zero pages, but there might >>>>>             * still be reserved folios to skip, perhaps in a VDSO. >>>>> @@ -630,7 +638,7 @@ static int queue_folios_pte_range(pmd_t *pmd, >>>>> unsigned long addr, >>>>>            if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || >>>>>                !vma_migratable(vma) || >>>>>                !migrate_folio_add(folio, qp->pagelist, flags)) { >>>>> -            qp->nr_failed++; >>>>> +            qp->nr_failed += nr; >>>> >>>> Sorry for chiming in late, but I am not convinced that 'qp->nr_failed' >>>> should add 'nr' when isolation fails. >>> >>> This patch does not change the existing behavior. But I stumbled over >>> that as well ... and scratched my head. >>> >>>> >>>>   From the comments of queue_pages_range(): >>>> " >>>> * >0 - this number of misplaced folios could not be queued for moving >>>>    *      (a hugetlbfs page or a transparent huge page being counted >>>> as 1). >>>> " >>>> >>>> That means if a large folio is failed to isolate, we should only add '1' >>>> for qp->nr_failed instead of the number of pages in this large folio. >>>> Right? >>> >>> I think what the doc really meant is "PMD-mapped THP". PTE-mapped THPs >>> always had the same behavior: per PTE of the THP we would increment >>> nr_failed by 1. >> >> No? For pte-mapped THPs, it only adds 1 for the large folio, since we >> have below check in queue_folios_pte_range(). >> >> if (folio == qp->large) >> continue; >> >> Or I missed anything else? > > Ah, I got confused by that and thought it would only be for LRU > isolation purposes. > > Yeah, it will kind-of work for now and I think you are correct that we > would only increment nr_failed by 1. > > I still think that counting nr_failed that way is dubious. We should be > counting pages, which is something that user space from migrate_pages() > could understand. Having it count arbitrary THPs/large folio sizes is > really questionable. > > But that is indeed a separate issue to resolve. Digging into it: commit 1cb5d11a370f661c5d0d888bb0cfc2cdc5791382 Author: Hugh Dickins Date: Tue Oct 3 02:17:43 2023 -0700 mempolicy: fix migrate_pages(2) syscall return nr_failed "man 2 migrate_pages" says "On success migrate_pages() returns the number of pages that could not be moved". Although 5.3 and 5.4 commits fixed mbind(MPOL_MF_STRICT|MPOL_MF_MOVE*) to fail with EIO when not all pages could be moved (because some could not be isolated for migration), migrate_pages(2) was left still reporting only those pages failing at the migration stage, forgetting those failing at the earlier isolation stage. Fix that by accumulating a long nr_failed count in struct queue_pages, returned by queue_pages_range() when it's not returning an error, for adding on to the nr_failed count from migrate_pages() in mm/migrate.c. A count of pages? It's more a count of folios, but changing it to pages would entail more work (also in mm/migrate.c): does not seem justified. Yeah, we should be counting pages, but likely nobody really cares, because we only care if everything was migrated or something was not migrated. -- Cheers, David / dhildenb