From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F4D3C001B0 for ; Mon, 3 Jul 2023 21:09:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60AD36B0081; Mon, 3 Jul 2023 17:09:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BAB528003E; Mon, 3 Jul 2023 17:09:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 482FF280030; Mon, 3 Jul 2023 17:09:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 387D06B0081 for ; Mon, 3 Jul 2023 17:09:18 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 054FC1A030D for ; Mon, 3 Jul 2023 21:09:18 +0000 (UTC) X-FDA: 80971541196.28.EDBD27C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 0AF5FA0016 for ; Mon, 3 Jul 2023 21:09:14 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Y86BgdqO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688418555; a=rsa-sha256; cv=none; b=A8hwEPCkUdD1BwOgYhq7kYPu7qTuYXq6fe4C06K0t+aGo1Jx+REPc1SFTUuYGvSkt3UT+s 4KcieodVcVkiuFhCpsmAo8MZYE+NO4dT/dz9pJaJUjMSmAvX6BiKtw/WOi4APccxz4Cy+c 1SypL8+Zzb/ltWoMuLtupPPW1W4MN+Y= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Y86BgdqO; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688418555; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5lEe62qPI6L3bdXWNzcJT8zkRZLLJ1jzzn4wZBu0uFE=; b=MR4WlzQqFejUjkE75Jdn0LOiJh8h54WrUlhom4GO7UE90qt4Z0ESrFwzGBEz5GzJPVxi99 LfHVD1FfpnrZH/x1JXJwhYoFYsqseRBeJdcowMxU7xPh9DPUOsPU3r5AiAYJhGDxrQF02F QspRocIsBlDZKeeIYZbrvizmvQzxWAY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688418555; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5lEe62qPI6L3bdXWNzcJT8zkRZLLJ1jzzn4wZBu0uFE=; b=Y86BgdqOXLvnzqq1+zqiLA8+Jm4JeJNrdn7yuAw6epO15fdMpI3thXn89IShc19cd/L2Po 34+WLtnwcd8A4T+jVt0dPIodKSxtzs/CsQGcbpRIr2qNt2SIgMC08W/8HwuDwSZF7I5csG MACQhltTwV9VutlSrb0rgkx4lduDI3g= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-117-7CiDgI2GOLa9E0Eg0il36g-1; Mon, 03 Jul 2023 17:09:13 -0400 X-MC-Unique: 7CiDgI2GOLa9E0Eg0il36g-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-3fbcae05906so20028845e9.3 for ; Mon, 03 Jul 2023 14:09:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688418552; x=1691010552; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5lEe62qPI6L3bdXWNzcJT8zkRZLLJ1jzzn4wZBu0uFE=; b=UWV0pyZArK+zLaEeQ7TV+BTGl+anVbpqsHUwB5LoQOPlC2vFbZ1VvIPR7L4TfwIkmc SC0VLL8U+CTLRUm1py1IhtHHmrQt+2P4dTfdY8W1bZt9tFk2Ix1teUbVZROPtLnApwHm LUfkBv1DjARlinuSYKQClcnFafAKJMiKFdlwTvqIZqK6FywrOq12A5qtcrUdIeDIFe3J KKAV1ymhbbZZB/irxdZ34ETBLXH5c3C4WgoaFgTGqeUMufn9b+buHu+nH30KLnR/aYq+ noFZInwNZsTL0La+KvbCS5JLeT2l1+Dn/CMSRphVyVuWzwgAzdlaifBAs3Eg3XadXrFa pxMA== X-Gm-Message-State: AC+VfDz0PkVpTlwa++9Wt3MtJRxTvOzDTUdxQ0w3VZ6dH/x0fN1P4Ixz k8803VCdai9/q/TAvFp1qPHNjkWjctKKPSTMvNdlj2ll3AZf/usfydlgeoM9J9Khkiw+oGFcE5Z yVwkjotYWM9s= X-Received: by 2002:a7b:cb85:0:b0:3fb:a0fc:1ba1 with SMTP id m5-20020a7bcb85000000b003fba0fc1ba1mr10218066wmi.35.1688418552573; Mon, 03 Jul 2023 14:09:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7SRtJ/cmAYnukcQIM8CiCiNaghoip/kvhmq1SB3do2ghS/OzKkJPb0/YjUTEG3OpUPxArb4w== X-Received: by 2002:a7b:cb85:0:b0:3fb:a0fc:1ba1 with SMTP id m5-20020a7bcb85000000b003fba0fc1ba1mr10218048wmi.35.1688418552236; Mon, 03 Jul 2023 14:09:12 -0700 (PDT) Received: from ?IPV6:2003:d8:2f30:5a00:b30d:e6bc:74c3:d6f2? (p200300d82f305a00b30de6bc74c3d6f2.dip0.t-ipconnect.de. [2003:d8:2f30:5a00:b30d:e6bc:74c3:d6f2]) by smtp.gmail.com with ESMTPSA id o16-20020a1c7510000000b003f9b155b148sm31016299wmc.34.2023.07.03.14.09.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Jul 2023 14:09:11 -0700 (PDT) Message-ID: Date: Mon, 3 Jul 2023 23:09:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Zi Yan Cc: "Yin, Fengwei" , Matthew Wilcox , linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel References: <7DCA075B-1E43-47B1-9402-66C54513D52E@nvidia.com> <310c4d8a-e14c-742b-5c6c-018c01ed897e@intel.com> <957ea888-a96b-89cc-29e2-973bb9e36f40@intel.com> <6cec6f68-248e-63b4-5615-9e0f3f819a0a@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: Folio mapcount In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0AF5FA0016 X-Stat-Signature: 8jnt4bh39igoex8p5tke3e1yf49ayh8d X-HE-Tag: 1688418554-285189 X-HE-Meta: U2FsdGVkX1835JEQCxa3W2q9QH3HZVoUik/HPtI7+ZvtrtOLTVLgmvdo9rcP8HLX9+DWqtDODgTex7XzMIab8+FLAOYaujJMSzws05Pk3KHCPRh7dUVZn5fprje1uZJpxK+6m5hRcY0MwhTMqceN8fAB9d0zMl0nw/hKTY4tW6r+Tz5OonpfpMTzxBeJybIO7vi2NBT5jLCF4NOHHZ1RMFJLCqKQKnR/WsstxoOzo0Q9M4h/QUVWjZfStjWdMB3bN5OLVdiGKSUKGrBQub+nKgh89hg6TXjKkgUJIjw5pKYm2a50b60TacFrW9kWSXqejSqrtHVCawDR4CMXM50jbv1Ep7c0SJNL7mrxAUpidc8AAf0ETuGgIArqP5mBvkKHrUy6jz0AKWaS1JIS00cWgN2NBTrBT56susRRLmy7+I3Xr3KmvBkCn1oB3sfBNKvuxoh16Yy8XcXHJ5Mskt0ToVe3DFVHTjppsT+v4OWR1ZwrNPax3xu+qg77o2QJqiZ58rM7dZ5gE+gr2ykvRQ01UN5wtf03fgwbReJcR31MNNFjeg5u4AoTUggQ7idJ9vaBQyQgXGEJWxNcI7YkyjEKVZEtcKKuAzDJ9BDPMyCZ7ApvJohRxOC306CLW/Dy2/wHPPYzFXuuwVrFgz11z8dhPe434c7wBinUJ89/LexDPm1q5ccjn35altLOIe+OfYwrhyPEN+c8Wa5xiCBec65dgr+skXxQ93vVY11qEkA2XfzqXEH4gKblU9npDmWxtWAkaAWjmHtC+fuqaLzbyS+aXLxS9ujxCrnIIG1ZWqAI5+eUpWxqs79VfI7pPWnohfvCvRMFipXrcYse2K1HrzXVelIMBlxeTeNvhKsuwaEAyKl+7VmvYIRjU7YxK9V2uAXjWZ3hEdHoYiNIxBlGTPj0j8M3EusW6SyPY1AS50phA/XBCWBwWgkMSHPIvrmenY+dvaXwXC3175ozFROIT9D f03PREuG Vt+o5xoa9W76X/6wVpjYfvunxe/O1tDBBWIdO3af29H8tsKeuwcWbHkmfhCgudcBrrnJQBnGAzEJe5t6tK1p2clkEPEt4mbnKdrz1tgVNyvoc5cwnIVCLe1edXl9VAwGM7xBtrapJP8VAXnd3yq/G/ufWjbYi6OLr3V2EDMvpHTedyFQNsKMVsh6x6iEQReFK8PRPIVwXcgpWi8vwSKBse3Q6ImGSl/ZpWq9PPVlDxGr93f3Mr2UaOm8o//q9+3xToXaZZ+CVPK7AK8sPKr3EmEEBLR4UI1DXFnDFLR5wGQ0W/J7x0dMALcARKFIDwe2YQfd4A+6NPZjPqZI6fPs36V47MVYhq0z69JmlH6dUqahaaeLC2WRXKq6fxu543gOCuolKYDgV/Wya48xJ9/qeRAFO26uayyqmhDCQLPQoAE2bN64uPlkRAS6w4fKg4DgJopNQS6VeiPP0fzg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.07.23 21:51, Zi Yan wrote: > On 2 Jul 2023, at 7:45, David Hildenbrand wrote: > >> On 02.07.23 11:50, Yin, Fengwei wrote: >>> >>> >>> On 7/1/2023 9:17 AM, Zi Yan wrote: >>>> In kernel, almost all code only cares: 1) if a page/folio has extra pins >>>> by checking if mapcount is equal to refcount + extra, and 2) >>>> if a page/folio is mapped multiple times. A single mapcount can meet >>>> these two needs. >>> For 2, how can we know whether a page/folio is mapped multiple times from >>> single mapcount? My understanding is we need two counts as folio could be >>> partial mapped. >> >> Yes, a single mapcount is most probably insufficient. I started analyzing all existing users and use cases, trying to avoid walking page tables. > > From my understanding, a single mapcount is sufficient for kernel users, which > calls page_mapcount(). Because they either check mapcount against refcount to > see if a page has extra pin or check mapcount to see if a page is mapped more > than once. > There are cases where we want to know "do we have PTE mappings", but I yet have to write it all up. >> >> If we want to get rid of all of (most) sub-page mapcounts, we'd probably want: >> >> (1) Total mapcount (compound + any sub-page): page_mapped(), pagecount >> vs. refcount games, ... > > a single mapcount is sufficient in this case. Well, that's what I describe here: 1) covers exactly these cases. > >> >> (2) Compound mapcount (for PMD/PUD-mappale THP only): (2) - (1) tells >> you if it's only PMD mapped or also PTE-mapped. For example, for >> statistics but also swapout code. > > For statistics, it is for NR_{ANON,FILE}_MAPPED and NR_ANON_THP. I wonder > if we can use the number of anonymous/file pages and THPs instead, without > caring about if it is mapped or not. > > For swapout, folio_entire_mapcount() is used to estimate if a THP is fully > mapped or not. I wonder if we can get away with another estimation like > total_mapcount() > folio_nr_pages(). What do we gain by that? Again, I don't see a reason to degrade current state just by trying to achieve 1 mapcount when it really barely matter if we have 2 or 3 instead. Right now we have 513 and with any larger folios significantly more ... than 2 or 3. > >> >> (3) Mapcount of first (or any other) subpage (compount+subpage): for >> folio_estimated_sharers(). > > This is another estimation. I wonder if we can use a different estimation > like total_mapcount() > folio_nr_pages() instead. At least not for PMD-mapped THP. Maybe we could do with (2). But I recall some cases where it got ugly, will try to remember them. > >> >> For anon pages, I'm thinking about remembering an additional >> >> (1) Page/folio creator (MM pointer/identification) >> (2) Page/folio creator mapcount >> >> When optimizing a PTE-mapped THP (especially not- pmd-mappale) for the fork()+exec() case, we'd have to walk page tables to see if all folio references come from this MM. The page/folio creator exactly avoids that completely. We might need a mechanism to synchronize against mapping/unmapping of this folio from the creator concurrently (relevant when mapped into multiple page tables). > > creator_mapcount < total_mapcount means multiple MMs map this folio? And this is for > page exclusive check? Sorry I have not checked the code in detail yet. The sync Right now we essentially do if !PageAnonExlusive: if (page_count() != 1) copy reuse to see if we really hold the only reference to that folio. If we could stabilize the creators mapcount, it would be something like if (f->creator != mm || page_count(f) != f->creators_mapcount) copy reuse So we wouldn't have to scan page tables to identify if we're resonsible for all of the page references via our page tables. But that's so far only an idea I had when thinking about how to avoid page table scans for the simple fork+exec() case, not matured yet. > of creator_mapcount with total_mapcount might have some extra cost. I wonder if > this can be solved by checked num_active_vmas in anon_vma of a folio. As we nowadays match the actual references (i.e., page_count() != 1), that's most probably insufficient and what I recall, easily less precise. -- Cheers, David / dhildenb