From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36B22C04E69 for ; Wed, 2 Aug 2023 11:52:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7CFF280164; Wed, 2 Aug 2023 07:52:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C068D280143; Wed, 2 Aug 2023 07:52:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7FC5280164; Wed, 2 Aug 2023 07:52:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 91A41280143 for ; Wed, 2 Aug 2023 07:52:48 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5BB781C9E65 for ; Wed, 2 Aug 2023 11:52:48 +0000 (UTC) X-FDA: 81079002816.10.2D2A071 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 34E2718000A for ; Wed, 2 Aug 2023 11:52:45 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jJMnSUal; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690977166; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5SbuVFR03MYvfb13U1nMAvAGZ+NkapiDklKoMGko6wE=; b=nx5/J0E6flF7yS12cemaZ8MrslNhBG2ITBkI0vkbIzlLCHEHlExLGsfgBthS7PlHaPuXQB 3kPzdFuPlQ1vFbtcodarUBxX9ETfyiQ8bqtPGNVo43iCQMDBMbccBAJlUSC4mhnjAhVIPn Dy2H0qo9rQAXzzqVO3l+/gKkimBhDzY= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jJMnSUal; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690977166; a=rsa-sha256; cv=none; b=ApMbj72UJ3Kbew/5Mrn4lfhhhoMkPBr4MWAxnhiax5qcLNUhM3DKSvY0ANseVzRixVXLm4 BdggKK/gQY1c39proW1WMK9RT3og1Ws+UXS1u6EXHjfx40ryCtdwq88ILalrxGU/2BDqm1 KI2DSkEsj44pBOShe/+Ygtfcpt0+x3E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690977165; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5SbuVFR03MYvfb13U1nMAvAGZ+NkapiDklKoMGko6wE=; b=jJMnSUaljE6Cbz6Lz7EZnUE6w7/VLFL55O+7iVZujULD4+4cNlQd+/0+f6vlzorwlTa81c F7FF31TmUAIbEyyInR3iooXDj87itXXs37ZkozM2c4WNjmtax3utlTbn2BmrA12eIodaCP nwp7HbIZjGLa1I+ifhkBO9LoORjWjpI= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-355-aq_SV3M4M4Wzd1Pni94qPw-1; Wed, 02 Aug 2023 07:52:43 -0400 X-MC-Unique: aq_SV3M4M4Wzd1Pni94qPw-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-317955f1b5fso2751863f8f.0 for ; Wed, 02 Aug 2023 04:52:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690977162; x=1691581962; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5SbuVFR03MYvfb13U1nMAvAGZ+NkapiDklKoMGko6wE=; b=MA4zSyg7g6k9RU3hwLpsf+/s71aNW4miV56ene7niMh6z1GOA58NZIkMIDnuQP8eV0 O3A68Ps56T363eoI0Ov0xB4qGirI7DYxrPqKM4V7YWpEytnArMsP30zXsC+J3nMFj1fh B6QjW6B8uqQHx1ryF0w5xS8pIbjxTkHtyCHCrzUe9EBIQ/4EYaaatfreNc46BsliqHqU RI0R9w/O/YG1k4Inu1ON4Lf/1hvlDF9S/xZVp8jo3pfWcvdoWF66d16VWEOq8lAd5bm+ nl3ZkgWECrlKyRJywOTl3D0b0jDzBXQUi3k4xcGb3ixvil5cwkwcmuB74ve+nce1Fii0 xVAw== X-Gm-Message-State: ABy/qLb1tnYZRcMeEIo5HKWvCxOOMLzPxJq54/3UAhyPYPQ+4kZSB/vx 4zQZ/Wt8jcCetqhUb1PUIt4hHIvFEw4AmoOcwkzgZBx9ug4HMg2fFPgO5rwlFwWAkbiXvxppq+O uiooEIyC3DHQ= X-Received: by 2002:adf:e7cb:0:b0:314:2132:a277 with SMTP id e11-20020adfe7cb000000b003142132a277mr5076674wrn.9.1690977161968; Wed, 02 Aug 2023 04:52:41 -0700 (PDT) X-Google-Smtp-Source: APBJJlEXKitccTNjLbBaunJd/lIcNHXV0fS8ByyqZPsRwUJgtwC37K9kQMbP0eifwf8QDG7nU35foQ== X-Received: by 2002:adf:e7cb:0:b0:314:2132:a277 with SMTP id e11-20020adfe7cb000000b003142132a277mr5076659wrn.9.1690977161481; Wed, 02 Aug 2023 04:52:41 -0700 (PDT) Received: from ?IPV6:2003:cb:c70b:e00:b8a4:8613:1529:1caf? (p200300cbc70b0e00b8a4861315291caf.dip0.t-ipconnect.de. [2003:cb:c70b:e00:b8a4:8613:1529:1caf]) by smtp.gmail.com with ESMTPSA id l9-20020a05600012c900b003143801f8d8sm18828317wrx.103.2023.08.02.04.52.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Aug 2023 04:52:40 -0700 (PDT) Message-ID: Date: Wed, 2 Aug 2023 13:52:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH 0/2] don't use mapcount() to check large folio sharing To: Ryan Roberts , Yin Fengwei , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, vishal.moola@gmail.com, wangkefeng.wang@huawei.com, minchan@kernel.org, yuzhao@google.com, shy828301@gmail.com References: <20230728161356.1784568-1-fengwei.yin@intel.com> <3bbfde16-ced1-dca8-6a3f-da893e045bc5@arm.com> <31093c49-5baa-caed-9871-9503cb89454b@redhat.com> <20419779-b5f5-7240-3f90-fe5c4b590e4d@arm.com> <2722c9ad-370a-70ff-c374-90a94eca742a@redhat.com> <2d64ca09-06fe-a32f-16f9-c277b7033b57@arm.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <2d64ca09-06fe-a32f-16f9-c277b7033b57@arm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 34E2718000A X-Stat-Signature: fc7u6pcu4pktdabgwtgnu8cuzd8rymnn X-HE-Tag: 1690977165-389777 X-HE-Meta: U2FsdGVkX18DMvSKLvqC+5gR0nwoRXvEztvPL8yLIyx3WWi1/naNF5Cx7hlOUFDGnWhxjg1rgqRaZDbyHwn4YvZFu6m3cuy390KSOoZpCM4vsw/18cuxvFdPwWjawqGRv2q8KYg5XAWhlKbhRNrl69OxFfm2IsfoG832mA0hPOC+BDj/VOhGTGTRYVDAlU9PWhJ3naF8TLqj6Qyl01SjptaJDe1Ulc8SNJzQyb2jCeva9Nd5t4pKUWM1jGe+M0ZxZCzA+yxGa6aFQ+YF7S7uPioALF7YeB2U6YtvTJUwaMOflEZ5U6IC3c25aLrQJverj1FVVpw1k5SNycFUpKLySNmWZQ1BmhQP3JXFtuMUBtdgbZeRd5PqnsMZ2QSdLi4r169Awoh6tNaTsVBCPhs3kj+yqtKSK9q+WGVITXNCVL9cRneAWhVsTQ8HxBkUYnWWGWxJ1Ax+S9WLpJyWAmmHouSYRA3L1WDW3NUAXV+e0VR3ahHde2a1p86vBFxorcaz/DBXBMm0L3NqRTHEyCXjrkDpTtTC+GUgV1cmVVzFZ0BcfO+BMA1lDOeEgG+vnzw7vdQAtzM1ehw6MPpDqv15LKITCwt1vVpMYn+gieRc+tLmm4E4/jYU7tg/ZJFDhx5RAxvg2TYto2EcC3QBXpeZgRs/TiZns79ZM5QM8/2JIS/hNcGJ4HrtPb2rs8L4B0I+TgvLsXc9bmJFJEp8RE0TWPcS0sh2Jhznic2+RiH7E5sapXp5ue2y2rUUqboXolCyWPBURiutNcEioDFIjSI+EPGc6XcEGCed7F/Cu9l1BKU/HEtnr6A0pGKfr8VkAZG9gpfWhMYxCRFH3Bv2/fv6GmkcyW5/alyuFStqn0eWJMbPaiE4s8LEQjR1SwVSjhK3BwD1rLPOZYakBLdq6ip80YmM3GkmPPaguoYBKp22OVmBxQU3l/JJt35j5EEwjMLIo/h3390PxGKDU8u3o5v f4EFgwdJ KWd2ufmC9g64I403tvVxcrsY8kZadKbCCXZ96Ca00TVk3xp+5CYoGEPIaws8umgV+hFMUyWA4JgUZzmZ4/1AOTC34glvEi9IRo2YmO2oKP+dTGp1//GgCLI3633w2aCzhvaNwFlQjFJ9liLAzVh59jJmyrR/UnKQpDTsA/gngvbXuXgJruhVGPodPZsJzlp0/KqsJYuK2A5Zo56CjAvYlF4lAnKS8DIlPsVZsekJnaYi9PSbN86rrdNzWjw5cW8ytoZiy6RsUQQb/HW8OsI+uMQyqtS9wEsIFr4DYCML44He3nDfow4hOvmvllcf9MDgIkdhcz5zIHrB99vQy68zE+1JW8hLaqUytBSQjKAo+yGZOmLl6D5acSLEHrmbNRLjd5Scl0rSj8tgjrsOR8xpO8rH+0AT1HuDQI+A0QvIhNGWJlu63XY9eDxsfe4Ok/aRCjmKD/M3QMIo09tYZ2+1YN6zeb+1D09yY3Wnspz6tJsiQ4Pmwvvr7G5ZuJpW96Rjx3p52nHoLID2IFFC1y0cDdvyVN7o+t0Y3Vw9d X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.08.23 13:51, Ryan Roberts wrote: > On 02/08/2023 12:36, David Hildenbrand wrote: >> On 02.08.23 13:20, Ryan Roberts wrote: >>> On 02/08/2023 11:48, David Hildenbrand wrote: >>>> On 02.08.23 12:27, Ryan Roberts wrote: >>>>> On 28/07/2023 17:13, Yin Fengwei wrote: >>>>>> In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(), >>>>>> folio_mapcount() is used to check whether the folio is shared. But it's >>>>>> not correct as folio_mapcount() returns total mapcount of large folio. >>>>>> >>>>>> Use folio_estimated_sharers() here as the estimated number is enough. >>>>>> >>>>>> Yin Fengwei (2): >>>>>>     madvise: don't use mapcount() against large folio for sharing check >>>>>>     madvise: don't use mapcount() against large folio for sharing check >>>>>> >>>>>>    mm/huge_memory.c | 2 +- >>>>>>    mm/madvise.c     | 6 +++--- >>>>>>    2 files changed, 4 insertions(+), 4 deletions(-) >>>>>> >>>>> >>>>> As a set of fixes, I agree this is definitely an improvement, so: >>>>> >>>>> Reviewed-By: Ryan Roberts >>>>> >>>>> >>>>> But I have a couple of comments around further improvements; >>>>> >>>>> Once we have the scheme that David is working on to be able to provide precise >>>>> exclusive vs shared info, we will probably want to move to that. Although that >>>>> scheme will need access to the mm_struct of a process known to be mapping the >>>> >>>> There are probably ways to work around lack of mm_struct, but it would not be >>>> completely for free. But passing the mm_struct should probably be an easy >>>> refactoring. >>>> >>>>> folio. We have that info, but its not passed to folio_estimated_sharers() so we >>>>> can't just reimplement folio_estimated_sharers() - we will need to rework these >>>>> call sites again. >>>> >>>> We should probably just have a >>>> >>>> folio_maybe_mapped_shared() >>>> >>>> with proper documentation. Nobody should care about the exact number. >>>> >>>> >>>> If my scheme for anon pages makes it in, that would be precise for anon pages >>>> and we could document that. Once we can handle pagecache pages as well to get a >>>> precise answer, we could change to folio_mapped_shared() and adjust the >>>> documentation. >>> >>> Makes sense to me. I'm assuming your change would allow us to get rid of >>> PG_anon_exclusive too? In which case we would also want a precise API >>> specifically for anon folios for the CoW case, without waiting for pagecache >>> page support. >> >> Not necessarily and I'm currently not planning that >> >> On the COW path, I'm planning on using it only when PG_anon_exclusive is clear >> for a compound page, combined with a check that there are no other page >> references besides from mappings: all mappings from me and #refs == #mappings -> >> reuse (set PG_anon_exclusive). That keeps the default (no fork) as fast as >> possible and simple. >> >>>> >>>> I just saw >>>> >>>> https://lkml.kernel.org/r/20230802095346.87449-1-wangkefeng.wang@huawei.com >>>> >>>> that converts a lot of code to folio_estimated_sharers(). >>>> >>>> >>>> That patchset, for example, also does >>>> >>>> total_mapcount(page) > 1 -> folio_estimated_sharers(folio) > 1 >>>> >>>> I'm not 100% sure what to think about that at this point. We eventually add >>>> false negatives (actually shared but we fail to detect it) all over the place, >>>> instead of having false positives (actually exclusive, but we fail to detect >>>> it). >>>> >>>> And that patch set doesn't even spell that out. >>>> >>>> >>>> Maybe it's as good as we will get, especially if my scheme doesn't make it in. >>> >>> I've been working on the assumption that your scheme is plan A, and I'm waiting >>> for it to unblock forward progress on large anon folios. Is this the right >>> approach, or do you think your scheme is sufficiently riskly and/or far out that >>> I should aim not to depend on it? >> >> It is plan A. IMHO, it does not feel too risky and/or far out at this point -- >> and the implementation should not end up too complicated. But as always, I >> cannot promise anything before it's been implemented and discussed upstream. > > OK, good we are on the same folio... (stolen from Hugh; if a joke is worth > telling once, its worth telling 1000 times ;-) Heard it first the time :)) -- Cheers, David / dhildenb