From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC252C4345F for ; Fri, 12 Apr 2024 19:33:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4255C6B008A; Fri, 12 Apr 2024 15:33:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D57F6B0093; Fri, 12 Apr 2024 15:33:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24E9F6B0095; Fri, 12 Apr 2024 15:33:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 079426B008A for ; Fri, 12 Apr 2024 15:33:06 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AC129A097D for ; Fri, 12 Apr 2024 19:33:05 +0000 (UTC) X-FDA: 82001877930.17.5F2713E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 9852D2000D for ; Fri, 12 Apr 2024 19:33:03 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f0sxYUV0; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712950383; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=530p2kJHJyGsx7M4tsUIH7AbksMfeY1vZMdC3f72yYQ=; b=KaI8gQeYRQIomWqpq/Z7vFm4s1OPYSldmr4Ry6Ba0qWxfJo64dRenrSm2DSDYVe3ajygHX OwVfcQXwk9eoVtOL12IHMCwvSgmSkyKPmc8FfKDTtkA6SyuqFcQBKhsNB37KTc1x1tnhnw 5D6J5BOhOjIt5gxtwhfPGofU6wJayqM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=f0sxYUV0; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712950383; a=rsa-sha256; cv=none; b=i4iN3tG7SOHN696KjbAgKGl0yMKpUURpWK/a3PITeGpvew3l+nsTMJZz7YL9lMut7dGsG1 VXfOSRUIFUU6n1+hrlX7JSt0y/CDi3xO/PaaGZCCsk0ieRXCT75yhX3xWLV767JDX1byFq NxfkM64ScN3A/2jh68MZxI6PMeFIMfQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712950383; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=530p2kJHJyGsx7M4tsUIH7AbksMfeY1vZMdC3f72yYQ=; b=f0sxYUV0K4b3HXDm6H7tIc5iUPNdVkQmEewa+zo+0IhJa4/kZARHDpoV32eeprlIKbmn/s tSTjp1JZGwurVEYxzMnxxOAiCwRMVev1QKv27ce8NdoERLpeG8oG0oMGKA80OIzt9CWZMQ DieUkDSDSKHIKqfgL2op/wW63BevJNM= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-396-Vsq6K5PsM0ahmXUFk2qI4A-1; Fri, 12 Apr 2024 15:33:01 -0400 X-MC-Unique: Vsq6K5PsM0ahmXUFk2qI4A-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-343d6732721so647160f8f.2 for ; Fri, 12 Apr 2024 12:33:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712950380; x=1713555180; h=content-transfer-encoding:in-reply-to:organization:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=530p2kJHJyGsx7M4tsUIH7AbksMfeY1vZMdC3f72yYQ=; b=OaXdzS1vKndQnrluM9dBNGjXEf9B09pgRGWLTHLxcSFMGSR4ivi4RaiWesL7E9qs4N hBbtOGPu9t1Topoi1iRaLq2uJXmSJ7a91xL3RC+TLgnn1iLuoJsx3jetttogadZBnDgP kCK7Ei2cJc3l4in+U5ZZ4tmALG8LXrjP2lXOKhoN2G88NV4B+e189s86xEl9FeSnRuc3 JZb4XUNnPIyZc63zwoLuEhZk6OTHUOBc6UIcp8nFd2/dbV6WW8zuBQ1t48GWtot+bVz5 oTKgkQl6Tn7bfewe2fUamcE2+C+eScEMOe1KVjqU5fEv3h8Mc1ljnHa5hujPeMZNvoj2 5BOg== X-Gm-Message-State: AOJu0YxInrxvuI8n1enjjZJdMg3rF8F5WeXDQHKo5xv7o73HCbyMYPrX 8xc11bjIEgWPB81uGWlqOClsPcePhEouk5nDf2PARHYJ9LzLiLRAXE+NRj6yjUs7JG4ddKOcSdu VHzIdk5N6kkyhnv3T67w0t/FRV+vkRDBzuoUEYDqtTQhKZOr3 X-Received: by 2002:adf:a397:0:b0:33d:26b1:c460 with SMTP id l23-20020adfa397000000b0033d26b1c460mr2538031wrb.39.1712950379898; Fri, 12 Apr 2024 12:32:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGsil4fbsPURDy5PgN4g+rIpcenpger+bm2UhrHS7PhLVoGZ2tG+uy2Z6MDHc2HVxP2UUYHSA== X-Received: by 2002:adf:a397:0:b0:33d:26b1:c460 with SMTP id l23-20020adfa397000000b0033d26b1c460mr2538014wrb.39.1712950379464; Fri, 12 Apr 2024 12:32:59 -0700 (PDT) Received: from [192.168.3.108] (p4ff23ff1.dip0.t-ipconnect.de. [79.242.63.241]) by smtp.gmail.com with ESMTPSA id c12-20020a5d4ccc000000b00343f662327bsm4898968wrt.77.2024.04.12.12.32.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 12 Apr 2024 12:32:59 -0700 (PDT) Message-ID: Date: Fri, 12 Apr 2024 21:32:58 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list To: Zi Yan Cc: linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Yang Shi , Ryan Roberts , Barry Song <21cnbao@gmail.com>, linux-kernel@vger.kernel.org References: <20240411153232.169560-1-zi.yan@sent.com> <2C698A64-268C-4E43-9EDE-6238B656A391@nvidia.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <2C698A64-268C-4E43-9EDE-6238B656A391@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: 4f6sywwphtb8m46g1p4pd39qthe8tfny X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9852D2000D X-HE-Tag: 1712950383-397424 X-HE-Meta: U2FsdGVkX18UI0hdvwN/c3GHJiRmbd9+2aasg67/W7xTIjnXFTBNghw9r5SwZhpfIqs3ifuUw3eaeSxXLiZk64hqU59IqNPjfzDvv+QA64oPF9LM523DKzGmUu462Qh8s8C2eMung0L3s8I9VhPXDfSSckJe+nIb+6DWsKmUCgHBNPGuk7GQ9GIlL2flAI1qVgUnasKB2AYpiPGhzv9ca041354DG1lsT8ew0t5LUK3sDikBJyFZMOCUgUuGgGbJsM9eIY5P3Bjw9/JrhDulnNVAtL+bI1gq6nRvnIgbgcTsrpTo7Q9lOPW70l7ev4Yw/0fj/Bjgxj16tRwLSKJBp8gilpxcPBGKVwwRBAQ189yT3TUAizSfggNa5YRp9pzZkUs9AzTk6eH4opp8xWUh5wCIogGEAT1qIhj7wEqlBEUijqrd9j8EEzaM7vsUUq90az6Xrn11/wnY+4/mjRSlSctz8e7wZ3vl1tX2hlqxIjED/7dTXwzW4JQoJ2ES9/bgHvNVYxJZ8dFSPvHqFOi13rNVRbv0Rj0yMOXbp51nk9/HtIn2rk2MdjUxY3Ws37WWSJg014/2TbJYIaAMLveHadRCD6ROCjCZ2pMZKQqoVQ8NDCdb7Y2bakLs1C2QA6A82s+kHjiDjvwNvUmL8x39OsY+iaON0xANJAAHbHVJ9/Qw8NiGuapK9wsWrUeX7Scf+vg1IrqVQl+nT43VVJ36yCjIO+8BxGdyIwqnDIuydPwSpgaJaIGGuR/R+pKxJnNgtRljlw0uzh1TRUOHIY9hXkJ/xCItic7RSrfIIzcyeCOALQHQ1xhWtDm/nomHu5BrqqgiaMEzhSfYS1uaVgYvvAyhiIg9tNwdx2U50Xw0oh+gbI120obDSxexe/M6Jvh0san7Cy3qBIuy9KAijD8sNOe5RcSufDx750qQLmoBkEvBrytzCL12SlV6aX2joX/FEDxloaA/XrQYTFuaJpK G5F1XyZy 2OLvwxN+8chmOrm9k0hS3ido01WtukPlGZ1GF7HDYNs27EZfx5oH9WGp7pfW2yEvfGN2smMlRcx6FQwUDqmO9icmlhDNcjEK/YV+NAZwg0xcYzY/tCFD1WBeZzvFHGWkuDczrVDaCoaaTlkucwihQs4MQBDG8T+lA/TTRmfzjDZCjw5E8hLgJbVl21nHUjD9cTOYGqTbQ4S8OnFb1L0L6ietKzMXsjag8B56X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 12.04.24 16:35, Zi Yan wrote: > On 11 Apr 2024, at 11:46, David Hildenbrand wrote: > >> On 11.04.24 17:32, Zi Yan wrote: >>> From: Zi Yan >>> >>> In __folio_remove_rmap(), a large folio is added to deferred split list >>> if any page in a folio loses its final mapping. It is possible that >>> the folio is unmapped fully, but it is unnecessary to add the folio >>> to deferred split list at all. Fix it by checking folio mapcount before >>> adding a folio to deferred split list. >>> >>> Signed-off-by: Zi Yan >>> --- >>> mm/rmap.c | 9 ++++++--- >>> 1 file changed, 6 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 2608c40dffad..d599a772e282 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1494,7 +1494,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>> enum rmap_level level) >>> { >>> atomic_t *mapped = &folio->_nr_pages_mapped; >>> - int last, nr = 0, nr_pmdmapped = 0; >>> + int last, nr = 0, nr_pmdmapped = 0, mapcount = 0; >>> enum node_stat_item idx; >>> __folio_rmap_sanity_checks(folio, page, nr_pages, level); >>> @@ -1506,7 +1506,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, >>> break; >>> } >>> - atomic_sub(nr_pages, &folio->_large_mapcount); >>> + mapcount = atomic_sub_return(nr_pages, >>> + &folio->_large_mapcount) + 1; >> >> That becomes a new memory barrier on some archs. Rather just re-read it below. Re-reading should be fine here. > > Would atomic_sub_return_relaxed() work? Originally I was using atomic_read(mapped) > below, but to save an atomic op, I chose to read mapcount here. Some points: (1) I suggest reading about atomic get/set vs. atomic RMW vs. atomic RMW that return a value -- and how they interact with memory barriers. Further, how relaxed variants are only optimized on some architectures. atomic_read() is usually READ_ONCE(), which is just an "ordinary" memory access that should not be refetched. Usually cheaper than most other stuff that involves atomics. (2) We can either use folio_large_mapcount() == 0 or !atomic_read(mapped) to figure out if the folio is now completely unmapped. (3) There is one fundamental issue: if we are not batch-unmapping the whole thing, we will still add the folios to the deferred split queue. Migration would still do that, or if there are multiple VMAs covering a folio. (4) We should really avoid making common operations slower only to make some unreliable stats less unreliable. We should likely do something like the following, which might even be a bit faster in some cases because we avoid a function call in case we unmap individual PTEs by checking _deferred_list ahead of time diff --git a/mm/rmap.c b/mm/rmap.c index 2608c40dffad..356598b3dc3c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1553,9 +1553,11 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, * page of the folio is unmapped and at least one page * is still mapped. */ - if (folio_test_large(folio) && folio_test_anon(folio)) - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) - deferred_split_folio(folio); + if (folio_test_large(folio) && folio_test_anon(folio) && + (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped) && + atomic_read(mapped) && + data_race(list_empty(&folio->_deferred_list))) + deferred_split_folio(folio); } I also thought about handling the scenario where we unmap the whole think in smaller chunks. We could detect "!atomic_read(mapped)" and detect that it is on the deferred split list, and simply remove it from that list incrementing an THP_UNDO_DEFERRED_SPLIT_PAGE event. But it would be racy with concurrent remapping of the folio (might happen with anon folios in corner cases I guess). What we can do is the following, though: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index dc30139590e6..f05cba1807f2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3133,6 +3133,8 @@ void folio_undo_large_rmappable(struct folio *folio) ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(&folio->_deferred_list)) { + if (folio_test_pmd_mappable(folio)) + count_vm_event(THP_UNDO_DEFERRED_SPLIT_PAGE); ds_queue->split_queue_len--; list_del_init(&folio->_deferred_list); } Adding the right event of course. Then it's easy to filter out these "temporarily added to the list, but never split before the folio was freed" cases. -- Cheers, David / dhildenb