From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C2E3C4345F for ; Mon, 15 Apr 2024 15:40:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8A2C6B009B; Mon, 15 Apr 2024 11:40:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D13256B00A1; Mon, 15 Apr 2024 11:40:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3EEC6B00A3; Mon, 15 Apr 2024 11:40:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 915146B009B for ; Mon, 15 Apr 2024 11:40:51 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5AEB4160654 for ; Mon, 15 Apr 2024 15:40:51 +0000 (UTC) X-FDA: 82012179102.21.0592882 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 07035140006 for ; Mon, 15 Apr 2024 15:40:48 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DVrIf8o7; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713195649; a=rsa-sha256; cv=none; b=wV4+5BLAKj/vkoA8WL4vGtVD1nIcnmpvrzc9SbXxfxGAmaBncAEeAiO+eteeaKX9GSmf56 af6Tr/I8BgEfJIa/cBvqSt2do1pslr6oA83FGHNsX6Nu036g3F/Jn373FOLSfKO/GJNFtj VLJlBgD9lqOLdFBGIVciejRLukcUdto= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DVrIf8o7; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713195649; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ntJhwg/SSS9hu2Q3dPswHvoSiul2gBFf+8KKqKAPnWQ=; b=bQ0Mx5j71clYsb/+VS7zwapq5LwQvJIhwDQEA/wkoV8bIrCwWt6HIHPVvRgELMmfi6fL5n X6TwCWr7Hxwt72Mqi9nGFZCXzGS1eZxyCQeexZOh0z8kJ5D008rYi2+F9xVsE/UZNhOAkq Z/LghZ5YXq3Pc12S5uWNVWvACq2BLN8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1713195648; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=ntJhwg/SSS9hu2Q3dPswHvoSiul2gBFf+8KKqKAPnWQ=; b=DVrIf8o7h5nzeurN5v2kbvoNPqGtCS5KMJwW1gGodsi89jRrSiiJv71a8WxowHq+covZok UrqC5wuIajh9Ca+VWVlIw+EKn3c9ssi1/04Ho+/lH2hoZtXSrEVwPcdvQO2wnaLou3dqp9 Mf18XwVKGYNLrya7suDh1v5XzWzbPR8= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-414--FF4rsqlO5StUvM-v3VExw-1; Mon, 15 Apr 2024 11:40:46 -0400 X-MC-Unique: -FF4rsqlO5StUvM-v3VExw-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-3479b511725so1154570f8f.3 for ; Mon, 15 Apr 2024 08:40:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713195645; x=1713800445; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ntJhwg/SSS9hu2Q3dPswHvoSiul2gBFf+8KKqKAPnWQ=; b=FyOX/6J4rcqWPZhshaGG1CoMsGUNWimyQylesKrNDXWMfasNUwq0hbmlRY98CmQPJh PhJkAGGzfpvLB4bqKXINesvWpDfxDVBKwR8z66b4l8MzqwQ+ErbLBprkGdVS9ZjgtiJi HvhX+MGBzXFk5/gASRJcB1Dd6/CGHFgmyBqWHMteUq45PC4JDqE/NHDrQpf/Y0Klhkbs MhQegN/0zifaR+1B+oIUrtXYrVYpLpxStyU478aJyHyxrx24hLqvZMy+fZQb2LrshNZu P1OS5l9eL9RVietL0AdKetLmUbhr0EEWNKNtK3h3vzK1tkqJJt3wxyaGWjezM4fyoXqK q9TA== X-Gm-Message-State: AOJu0YxWxCbUN6bVeb7eR0bzL0a1tMvNmBfIGLbTlK++PwAVntlbBC5s R/AOQ2jjp+11r0mvQOAO3Z/H2oI+3pW1ygVWRVvWeknl+cC9/Ky1IR5si/lZPMH8BVqZ9g5cumE UivULPHrY0SppH/imkryKcEKYJ/llNvKcEWgUIG7Acam7tTGQ X-Received: by 2002:a05:600c:314c:b0:417:e4ad:d809 with SMTP id h12-20020a05600c314c00b00417e4add809mr7124076wmo.25.1713195644812; Mon, 15 Apr 2024 08:40:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGYpuxWeOJFNO+X6LBXeZns8qa31H5PqGOemV0QKwJMTQfL/Gcn2rBh9A+ZRyNa5xZkYhxfGA== X-Received: by 2002:a05:600c:314c:b0:417:e4ad:d809 with SMTP id h12-20020a05600c314c00b00417e4add809mr7124060wmo.25.1713195644390; Mon, 15 Apr 2024 08:40:44 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:d800:568a:6ea7:5272:797c? (p200300cbc706d800568a6ea75272797c.dip0.t-ipconnect.de. [2003:cb:c706:d800:568a:6ea7:5272:797c]) by smtp.gmail.com with ESMTPSA id g13-20020a05600c310d00b004185d1a4512sm4923790wmo.13.2024.04.15.08.40.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 15 Apr 2024 08:40:44 -0700 (PDT) Message-ID: <60049ec1-df14-4c3f-b3dd-5d771c2ceac4@redhat.com> Date: Mon, 15 Apr 2024 17:40:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list To: Yang Shi , Zi Yan Cc: linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Ryan Roberts , Barry Song <21cnbao@gmail.com>, linux-kernel@vger.kernel.org References: <20240411153232.169560-1-zi.yan@sent.com> <2C698A64-268C-4E43-9EDE-6238B656A391@nvidia.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 07035140006 X-Stat-Signature: tw7yku34kcbaotqs8smcskpn5wsuhyza X-Rspam-User: X-HE-Tag: 1713195648-327337 X-HE-Meta: U2FsdGVkX18SPXab7GQdz8Nr8tADK5w8+4ztmWBseFn7Nd6njKdbztyOLhBwSM+MiskIbVIdgmAtWywYl5LPZYN3ZtCcSO99lB/yukpPLeaLLfBkjWrxeMeGGnRVk3RNltzB49ZlTmt7SPBeCV3Tr63aDOd0+Tu6n4ldfH0tHLixPenT6MzkfcTn/EpzF+i9reyB9YkO8CEe14LpKsAdU6vCJwp/S62jmk0lyufj2Y63QT8DuSQGayovDKFlG/RrVi6xwpfxFs1kN9AAOn5TCPyR2RsxwQ6L7MHzfU5arKsPOF/s5VIzbs60J6oe5le/GLUMbs5+dNE1gKYBSACsedoCUsizhHf1XZVmKqBMP4XAZW4ZresadylHlgyEG6A6TDHNAegROYHvzR7PFN0BfwyGYM1A88RQUkD1ahRtVuO9VzrQN0gWw1nagixzKWQtHCpQCIiotKbk/0lILyhyAWhdl2Dps+GIAjuSZ/eODl4veWtJYa4/Esd6pbyNdeS5rHBRobVDfSFogIZnbVL/FxE4pt/rhFT8yEWy0guwJy0ew++Gyd07RLYeYOiVKQBafGZug+KDoL4aiChyA4LgPHTc0zoF1WAMttSydYdqju/xH0JSmL3MjeIcN6W/lG5EqnVRxAqx6KD70gvFHz90ci1nAVpy1NR0eeeWEEBSyJgKeL8nDEfGHXXK8T5fw+aqAqwZUyNM/bNK+CWuqdqNFgnZbpRdgn6+5BG9Jo7fnqeTHO7vbhkEHTgKvzSrFd33ilKiqjV8pDxktwzC+ri4Qm05jC1cc3mdBaul5Lc0FB+IjmRtm3tb6jl+kdwgQAZuQifh21xmL5R6xUZgDtBg+CWl55IV6tn6i1S7AaMqlc09mEdl2ItDL1cPXCJRjijj6w4pqWBIf57BrrkCszzG6rq0gwSO3aqfZmwyVECIK11umOrGZAa0HtIq11cC7PXRxeXcllPowDoPFSDmrcy xdSpsU4O VW3AkBHjAg8V82RCUeUaORFTf81XvxpomMOY01cZoNsMPmHTPlx1kZQ76rwaptlhOP5qx/e1Vp+SA0jvyh/ocmVQSQRirVBWtEjo3mtUXgHNTZYVHbPlqQVQH6Ljv/NWzpXNxL3YSCSP9fYAdAY1rYDz97EHKib0AvVi67Z8f6KH8AWUKKTbbltO61NX2ick1QiNLhunpuYCDKX9+mn5fCtvwILKq1DZ+8JYhj8TEvbJ1B6QxaKHkK0nDibzLxXsbcQXtcjtrk5REYGKDVgnCGZwcO1vdsvtyBKL+/XdHAoWAeJhm6X1FVYw0qiypvIhs7lOmUoNXQ4k0HaaDRC7s6AJL078IIfskuYsUGOWOywhmzf19BnJTCUVU/02NIevNMGGKNUt/TC/jZRwMUvx4bNriP3oTwKXwa3B8i95U2cDZCvnWebWrm8Rnb4qAoAqHcSEdkJ+7HHkz/AgH3m+h3LtuIodSyWEnODRc57drWkmFMu1BdcUnbFsIgIKSh2AMwf6Sihps0o89IflFfiLDouVQgA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 13.04.24 00:29, Yang Shi wrote: > On Fri, Apr 12, 2024 at 2:06 PM Zi Yan wrote: >> >> On 12 Apr 2024, at 15:32, David Hildenbrand wrote: >> >>> On 12.04.24 16:35, Zi Yan wrote: >>>> On 11 Apr 2024, at 11:46, David Hildenbrand wrote: >>>> >>>>> On 11.04.24 17:32, Zi Yan wrote: >>>>>> From: Zi Yan >>>>>> >>>>>> In __folio_remove_rmap(), a large folio is added to deferred split list >>>>>> if any page in a folio loses its final mapping. It is possible that >>>>>> the folio is unmapped fully, but it is unnecessary to add the folio >>>>>> to deferred split list at all. Fix it by checking folio mapcount before >>>>>> adding a folio to deferred split list. >>>>>> >>>>>> Signed-off-by: Zi Yan >>>>>> --- >>>>>> mm/rmap.c | 9 ++++++--- >>>>>> 1 file changed, 6 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>> index 2608c40dffad..d599a772e282 100644 >>>>>> --- a/mm/rmap.c >>>>>> +++ b/mm/rmap.c >>>>>> @@ -1494,7 +1494,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>>>>> enum rmap_level level) >>>>>> { >>>>>> atomic_t *mapped = &folio->_nr_pages_mapped; >>>>>> - int last, nr = 0, nr_pmdmapped = 0; >>>>>> + int last, nr = 0, nr_pmdmapped = 0, mapcount = 0; >>>>>> enum node_stat_item idx; >>>>>> __folio_rmap_sanity_checks(folio, page, nr_pages, level); >>>>>> @@ -1506,7 +1506,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>>>>> break; >>>>>> } >>>>>> - atomic_sub(nr_pages, &folio->_large_mapcount); >>>>>> + mapcount = atomic_sub_return(nr_pages, >>>>>> + &folio->_large_mapcount) + 1; >>>>> >>>>> That becomes a new memory barrier on some archs. Rather just re-read it below. Re-reading should be fine here. >>>> >>>> Would atomic_sub_return_relaxed() work? Originally I was using atomic_read(mapped) >>>> below, but to save an atomic op, I chose to read mapcount here. >>> >>> Some points: >>> >>> (1) I suggest reading about atomic get/set vs. atomic RMW vs. atomic >>> RMW that return a value -- and how they interact with memory barriers. >>> Further, how relaxed variants are only optimized on some architectures. >>> >>> atomic_read() is usually READ_ONCE(), which is just an "ordinary" memory >>> access that should not be refetched. Usually cheaper than most other stuff >>> that involves atomics. >> >> I should have checked the actual implementation instead of being fooled >> by the name. Will read about it. Thanks. >> >>> >>> (2) We can either use folio_large_mapcount() == 0 or !atomic_read(mapped) >>> to figure out if the folio is now completely unmapped. >>> >>> (3) There is one fundamental issue: if we are not batch-unmapping the whole >>> thing, we will still add the folios to the deferred split queue. Migration >>> would still do that, or if there are multiple VMAs covering a folio. >>> >>> (4) We should really avoid making common operations slower only to make >>> some unreliable stats less unreliable. >>> >>> >>> We should likely do something like the following, which might even be a bit >>> faster in some cases because we avoid a function call in case we unmap >>> individual PTEs by checking _deferred_list ahead of time >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 2608c40dffad..356598b3dc3c 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1553,9 +1553,11 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>> * page of the folio is unmapped and at least one page >>> * is still mapped. >>> */ >>> - if (folio_test_large(folio) && folio_test_anon(folio)) >>> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) >>> - deferred_split_folio(folio); >>> + if (folio_test_large(folio) && folio_test_anon(folio) && >>> + (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) && >>> + atomic_read(mapped) && >>> + data_race(list_empty(&folio->_deferred_list))) >> >> data_race() might not be needed, as Ryan pointed out[1] >> >>> + deferred_split_folio(folio); >>> } >>> >>> I also thought about handling the scenario where we unmap the whole >>> think in smaller chunks. We could detect "!atomic_read(mapped)" and >>> detect that it is on the deferred split list, and simply remove it >>> from that list incrementing an THP_UNDO_DEFERRED_SPLIT_PAGE event. >>> >>> But it would be racy with concurrent remapping of the folio (might happen with >>> anon folios in corner cases I guess). >>> >>> What we can do is the following, though: >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index dc30139590e6..f05cba1807f2 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -3133,6 +3133,8 @@ void folio_undo_large_rmappable(struct folio *folio) >>> ds_queue = get_deferred_split_queue(folio); >>> spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >>> if (!list_empty(&folio->_deferred_list)) { >>> + if (folio_test_pmd_mappable(folio)) >>> + count_vm_event(THP_UNDO_DEFERRED_SPLIT_PAGE); >>> ds_queue->split_queue_len--; >>> list_del_init(&folio->_deferred_list); >>> } >>> >>> Adding the right event of course. >>> >>> >>> Then it's easy to filter out these "temporarily added to the list, but never split >>> before the folio was freed" cases. >> >> So instead of making THP_DEFERRED_SPLIT_PAGE precise, use >> THP_DEFERRED_SPLIT_PAGE - THP_UNDO_DEFERRED_SPLIT_PAGE instead? That should work. > > It is definitely possible that the THP on the deferred split queue are > freed instead of split. For example, 1M is unmapped for a 2M THP, then > later the remaining 1M is unmapped, or the process exits before memory > pressure happens. So how come we can tell it is "temporarily added to > list"? Then THP_DEFERRED_SPLIT_PAGE - THP_UNDO_DEFERRED_SPLIT_PAGE > actually just counts how many pages are still on deferred split queue. Not quite I think. I don't think we have a counter that counts how many large folios on the deferred list were split. I think we only have THP_SPLIT_PAGE. We could have * THP_DEFERRED_SPLIT_PAGE * THP_UNDO_DEFERRED_SPLIT_PAGE * THP_PERFORM_DEFERRED_SPLIT_PAGE Maybe that would catch more cases (not sure if all, though). Then, you could tell how many are still on that list. THP_DEFERRED_SPLIT_PAGE - THP_UNDO_DEFERRED_SPLIT_PAGE - THP_PERFORM_DEFERRED_SPLIT_PAGE. That could give one a clearer picture how deferred split interacts with actual splitting (possibly under memory pressure), the whole reason why deferred splitting was added after all. > It may be useful. However the counter is typically used to estimate > how many THP are partially unmapped during a period of time. I'd say it's a bit of an abuse of that counter; well, or interpreting something into the counter that that counter never reliably represented. I can easily write a program that keeps sending your counter to infinity simply by triggering that behavior in a loop, so it's all a bit shaky. Something like Ryans script makes more sense, where you get a clearer picture of what's mapped where and how. Because that information can be much more valuable than just knowing if it's mapped fully or partially (again, relevant for handling with memory waste). -- Cheers, David / dhildenb