From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79EF6EB64DD for ; Fri, 11 Aug 2023 15:17:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE5356B0071; Fri, 11 Aug 2023 11:17:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C6DF36B0072; Fri, 11 Aug 2023 11:17:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0EA86B0074; Fri, 11 Aug 2023 11:17:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9B6E86B0071 for ; Fri, 11 Aug 2023 11:17:10 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3876BC11BD for ; Fri, 11 Aug 2023 15:17:10 +0000 (UTC) X-FDA: 81112177020.09.806FA64 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 0190540029 for ; Fri, 11 Aug 2023 15:17:07 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VLXGnaAr; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691767028; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PjgG9L91w6qmgYTaep8DhZMo3htJ0LqTmmQ9RNCiHlc=; b=Q8l+0jkcUiJ4TYuQ7Iu5vptmqJ2RNGDam+2cfOOwOuGj62obgI4OgXuX74auTHJ7rdbc1E g/52Sj1wfu9MK2GPY6K1IAzc18kJBWoZWAfTBDxwBlmK4flcM2PDZe6g6M/8Cf/W5Z8xpH l2HGUJMmMKfUVHzk3qaWJSkrWPmwSO8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VLXGnaAr; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691767028; a=rsa-sha256; cv=none; b=aPKU6z94kSzmanQk2Zi0OJ0zfk7B5elTWnG3P8uVhy8woqEqzPhfwhirx3Pzrr+BM2lAMf jhxyJLWa8ZKVy/VznYB04uTWhSdNlRrmMer6C5BJ8mm+UOMXPOWLFUkCkzUhVlz821fFZU SyKHPoo7NGvvSmcxFouI5LlIoGI6JAc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691767026; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PjgG9L91w6qmgYTaep8DhZMo3htJ0LqTmmQ9RNCiHlc=; b=VLXGnaAriwKbQTjUfvQpR9xwZjvhSSQlsAoyJatTgBjP+fpy+gOhoj1htup+r26dSEEEgU 5OQHqz7+n/t9DjyKjHvECh0CgC+ZBIxNq9frV+GgCpoEdKbwRmE8SVB2crffCGb5mAAsvV FZtbqfzqzHovT+uI8jHHyf9ApRxJmO0= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-433-O1fwhUQKOjWkNSnX9eD3Mg-1; Fri, 11 Aug 2023 11:17:05 -0400 X-MC-Unique: O1fwhUQKOjWkNSnX9eD3Mg-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-3178ddc3d94so1260460f8f.1 for ; Fri, 11 Aug 2023 08:17:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691767024; x=1692371824; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PjgG9L91w6qmgYTaep8DhZMo3htJ0LqTmmQ9RNCiHlc=; b=ZmuHxXvIumqzFpoAI8E1kwIOReERY24E+7IjVzpjoIvW2LnacrjPGzih6tmkp7IHxw 53Jpsiz8kjCpXGLmSRhq/EGGRjbN9SNU050rWRuO75SSeH4EXQRh9arN4SXR4UfKnrJk o5pabSFhUk4K/5iZS6raKt/IHKBPCLZMEQiKHliv+tCyasGsei7nEJtiSLoim6iRzrhU 8z90s0e6AOETVGKUjTu175XItNO25iGSG40267/7RoZFJZBl04E+kkAeartCTJ/tRPlT qe1QqM7KIOnKoEBZfF9NdZK79H6ibUZQI+NZCebOF1VUlYS2BL/V4UTEzYjg/HWR8/wC GtDA== X-Gm-Message-State: AOJu0Yzfz1VH21Qrm/4UQdTBOEbSMFVRknkApKO4J+8oUCisElGbW5+s 0ZchjnDfT6lxZVMLUIp0Z+KWMoXB8B5WrLVMC199zzgBDaKVRuiDdQuPtY9jeH+STKhr0EJATaJ JIZ1SkSFkVW8= X-Received: by 2002:a5d:4531:0:b0:317:6704:72c with SMTP id j17-20020a5d4531000000b003176704072cmr1630644wra.52.1691767024290; Fri, 11 Aug 2023 08:17:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEN1l577HlgGm3un1BlQaTqy0RuKhYqAZhoovVJzfygHufzHFUri/fMA0KPEcJFIcvnnaU5TQ== X-Received: by 2002:a5d:4531:0:b0:317:6704:72c with SMTP id j17-20020a5d4531000000b003176704072cmr1630623wra.52.1691767023758; Fri, 11 Aug 2023 08:17:03 -0700 (PDT) Received: from ?IPV6:2003:cb:c71a:3000:973c:c367:3012:8b20? (p200300cbc71a3000973cc36730128b20.dip0.t-ipconnect.de. [2003:cb:c71a:3000:973c:c367:3012:8b20]) by smtp.gmail.com with ESMTPSA id e1-20020adfe381000000b003140f47224csm5739246wrm.15.2023.08.11.08.17.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Aug 2023 08:17:03 -0700 (PDT) Message-ID: <50ba595b-d992-cadf-9064-6b99347fa348@redhat.com> Date: Fri, 11 Aug 2023 17:17:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Peter Xu Cc: Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, Andrew Morton , Jonathan Corbet , Mike Kravetz , Hugh Dickins , Ryan Roberts , Yin Fengwei , Yang Shi , Zi Yan References: <20230809083256.699513-1-david@redhat.com> <7e31254d-8889-7e79-50e1-2630bd493d59@redhat.com> <73d6d29f-9947-9b50-3b94-77f1ee547387@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 0190540029 X-Rspam-User: X-Stat-Signature: f66ecqgjcr1p4dim9o5hpxsmp6yrnom3 X-Rspamd-Server: rspam01 X-HE-Tag: 1691767027-677883 X-HE-Meta: U2FsdGVkX19RlYblermCVgdpm3Ml/GV/lgVXdPMMNDWyOb0ZsHsehRPJOExfIoHMz7Lz0f2UHIZtaSWRFG8b377fVgd2byu99WRQcqdpkGTRJNC35hq+D8OH4OY4sXpZ1Sh6PbkELh+FFfGCdMLOuBwU98X77Z0hm9vDARKTaSfnA3O+lm1Qq/eGV7jKDX9sAH5YsQPxUqbp0Te4h6R9pgaNP9NLlMKcr/AFOFLQBPoArYPbNWcJlHuPrh40DkDwENHahqYWEZlSJ929VOW23TOH1xpY16MkAotSNU7RxrEfR62Du2SCLNYXh2Nq4q6fVDZ2VYyYueFdJAGxtL8+DOO+W9eIdnT4J7ez8xVK83uUK6ApzJGCeJtmSptiqWBGB7lu4ZJnMMcMkkgyNKCe3TV0aiQ2FWb8mmAzff8kks2n4RGbHz52jQbJcZLdO/hSfZfz6Yu+ox0AVTyL54n5wIu/JMCUvhG2GNNSA4eiTa9CAi+l9xQGl9sgzgs9hpYxNGOWhkJK92nvGbAHSP76IYDWaQ6WdhbGLrk760hD3UYGvLo2E9MCl3p6r+sGvLNBKqihmfRSE8xLVPVepifHjKsKRO00GWW2zFyHGGV6v7MJS5APWkaSDoESa2mXJ35GLWWRZKTTMuw4+kWPOnC01ktD/s9juo4hvy1IJYa4XOgehzy4glKnXQwEccJX4oV8gEA5puCItuM5G6A43vSxDrBRSrmdEDkCGfPIxEdw3O7BmVWlJGX5Bpkm26Kn2RdvrGw/VecCKO5Y3cBPpPytldcl5m8aqq4MP8vCO6EkTXWvZER39KtjyV+L5k564cPi13hYE/wMl+vghCee802jChj66N7RFbZpLR/XzB6MfDv9zLHKAi/kU2kgm6KBdgdELB+UDK50LEQldh3sU+SdO4FdjNPqoStr7orZVy4mrRzD5h5rimAmA20BkuPL5BLyAeqlFe3vPimyJD/puEc FFkOfJHp IIAcTrvVGMwO1mzptVrp5dPjHdhTDAOKf0QzCqEmviJcTSLJPIw5EVfJaigNw8aP8iw4vHVoPrX1c506r5pDwjIS93ysC9jwv2WxkSDmepJ+pfq5mFRrdxDEQcjJqC91k2MqbnyvuqVPVpBCZWFOI0+qMQokPH91nXXNWggjeKH7CSqM7jNs6U4UC2sCzIKQQfUUiRBii4jWMC3rjdBxErxDkc/P3et/AbDvFhWmnr/gxzWVoIz7IbZp+6TDSENF1lgHYMIRfUZB94cX/evbpjwhUsy+9hp2B5OxqVleZ56MH+9iHo3DsO9DDgYG8sqWihLvIxin4swmoYIYv+MJDqoCF20ZFqVlk6WiCLlRH9fCHUA8+oVcZhFYcd0vpDaJGBl9v X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11.08.23 17:03, Peter Xu wrote: > On Thu, Aug 10, 2023 at 11:59:25PM +0200, David Hildenbrand wrote: >> On 10.08.23 23:54, Matthew Wilcox wrote: >>> On Thu, Aug 10, 2023 at 05:48:19PM -0400, Peter Xu wrote: >>>>> Yes, that comment from Hugh primarily discusses how we could possibly >>>>> optimize the loop, and if relying on folio_nr_pages_mapped() to reduce the >>>>> iterations would be racy. As far as I can see, there are cases where "it >>>>> would be certainly a bad idea" :) >>>> >>>> Is the race described about mapcount being changed right after it's read? >>>> Are you aware of anything specific that will be broken, and will be fixed >>>> with this patch? >>> >>> The problem is that people check the mapcount while holding no locks; >>> not the PTL, not the page lock. So it's an unfixable race. >>> >>>> Having a total mapcount does sound helpful if partial folio is common >>>> indeed. >>>> >>>> I'm curious whether that'll be so common after the large anon folio work - >>>> isn't it be sad if partial folio will be a norm? It sounds to me that's >>>> the case when small page sizes should be used.. and it's prone to waste? >>> >>> The problem is that entire_mapcount isn't really entire_mapcount. >>> It's pmd_mapcount. I have had thoughts about using it as entire_mapcount, >>> but it gets gnarly when people do partial unmaps. So the _usual_ case >>> ends up touching every struct page. Which sucks. Also it's one of the >>> things which stands in the way of shrinking struct page. >> >> Right, so one current idea is to have a single total_mapcount and look into >> removing the subpage mapcounts (which will require first removing >> _nr_pages_mapped, because that's still one of the important users). >> >> Until we get there, also rmap code has to do eventually "more tracking" and >> might, unfortunately, end up slower. >> >>> >>> But it's kind of annoying to explain all of this to you individually. >>> There have been hundreds of emails about it over the last months on >>> this mailing list. It would be nice if you could catch up instead of >>> jumping in. >> >> To be fair, a lot of the details are not readily available and in the heads >> of selected people :) >> >> Peter, if you're interested, we can discuss the current plans, issues and >> ideas offline! > > Thanks for offering help, David. > > Personally I still am unclear yet on why entire_mapcount cannot be used as > full-folio mapcounts, and why "partial unmap" can happen a lot (I don't > expect), but yeah I can try to catch up to educate myself first. Using fork() is the easiest way. mremap(), MADV_DONTNEED, munmap, ... You might end up having to scan page tables and/or the rmap to figure out which mapcount to adjust, which we should absolutely avoid. > The only issue regarding an offline sync-up is that even if David will help > Peter on catching up the bits, it'll not scale when another Peter2 had the > same question.. So David, rather than I waste your time on helping one > person, let me try to catch up with the public threads - I'm not sure how > far I can go myself; Sure. But note that it's a moving target, and some discussions have been going on for a long time. I recall there were various discussions, including LSF/MM, mm biweekly meeting, and more. So even if you scan through all that, you might either get outdated or incomplete information. -- Cheers, David / dhildenb