From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAD8EC6FA8F for ; Wed, 30 Aug 2023 16:20:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18CD6440169; Wed, 30 Aug 2023 12:20:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13D1D440165; Wed, 30 Aug 2023 12:20:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 004FD440169; Wed, 30 Aug 2023 12:20:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E5F41440165 for ; Wed, 30 Aug 2023 12:20:45 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AAEB81A0148 for ; Wed, 30 Aug 2023 16:20:45 +0000 (UTC) X-FDA: 81181284450.29.2A9241E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 5EDBB20033 for ; Wed, 30 Aug 2023 16:20:43 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QezL31Je; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693412443; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3aA6+1VDU6cA1o7etlky4h5vmNMnfpuYT/4l4Aj4+V8=; b=1Stg8sGPtJi2/xauBtNcoHNHzwsLKdKtcb8XoDWJrgG5ZUrnSy86E6BaTHUCp+2CvoF24U UnvBsCqN1YNNrgjR3gfBjIg+qociJPc/HD796Q0vpNgRPZPSkrk3TkIm6B9nbh6dJx7ENn WxEZwgQUOE12eJTrSQ3DV3iN1WtdYAQ= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=QezL31Je; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693412443; a=rsa-sha256; cv=none; b=RJ+FSa7EQsNOK96uvjABPI/7GKheOqtr6G/au1x3cW1wCFkWZFMBi+EHeeAhqfPCSYcwBT QMVkDQrMA8daf3gXzTMTfRPv5ttrBBHnJ+xmzkVOtfPswMMii8eAhj2NBJkidvNotoIggm R1J1zylhLkL5JY37kPI+oMKn+4u9AjI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693412442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3aA6+1VDU6cA1o7etlky4h5vmNMnfpuYT/4l4Aj4+V8=; b=QezL31JeK8zpXRlG/ROQM7kW2+DOhk4iHC0X00JFiyy4LjAjoAkPzHJ8yAAQtZvO5SSli6 L8RpV3hQ0Vkl7mWsTPbnrDJnsPSslt2F9sUtxKmc+neoMMrv/l2AR91zo8Or8Yh1Qulvev VeDb77FsTJTHmkoFMnM6LocGlPos0q4= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-630-8BQqW4ZiOLCgO4XdG-z_zA-1; Wed, 30 Aug 2023 12:20:41 -0400 X-MC-Unique: 8BQqW4ZiOLCgO4XdG-z_zA-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-3fe19cf2796so40451965e9.0 for ; Wed, 30 Aug 2023 09:20:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693412440; x=1694017240; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3aA6+1VDU6cA1o7etlky4h5vmNMnfpuYT/4l4Aj4+V8=; b=V7aUr/2WxkTGLuxMtIY2WZt7l44uibYjREVhG50lF24+mMCiTILB6iK6DinUPgixqy CixYVjAgAjwR9ja6ex0MEawMEFaJJSA3YcHxn/XP7jENbdkPgNuTPjd0pY68IP+YoNfC f7w/EJxa1qIe5rXJknlifhZyEFa+p1J6jM3jmT+PdGV/Q3XB/WVf0diPlEwE7f43QvKU Jviwmo+7u4PrHoxpiBAn4I+O4FumcBt1vyK/eoD/17jcpJcYU10+t1/vRbLzxTLIFdSY 3dR/07Mrh2YZoJ/zDbzGS+QEeU6P3Mu9e7NTcLOFtGT0XyInr6IfvXcWGCRIrBdd0uS5 8Vww== X-Gm-Message-State: AOJu0Yx/cnj3iJdBXydfJIoUTEvDZIf8p3sxIuk1Af5EgTeVtaPJWI+k AxGrAkSwYEM5Fn8nnVisY9AqRW0ezQCMF2FUWAbc+Nj4fMCRdLAh6YQJMVIcB/lar/NFM0aaCQ5 pSLJJnGoyYoo= X-Received: by 2002:a05:600c:2a4a:b0:400:57d1:4915 with SMTP id x10-20020a05600c2a4a00b0040057d14915mr2275305wme.37.1693412439849; Wed, 30 Aug 2023 09:20:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG3tNAraKzKfYt0w0/M+hDc3tbY+BYELPhP1lLuJmJp4aWDR+8l5IWLmaMTtWvbESjdMHRjqA== X-Received: by 2002:a05:600c:2a4a:b0:400:57d1:4915 with SMTP id x10-20020a05600c2a4a00b0040057d14915mr2275292wme.37.1693412439401; Wed, 30 Aug 2023 09:20:39 -0700 (PDT) Received: from [10.59.19.200] (pd956a06e.dip0.t-ipconnect.de. [217.86.160.110]) by smtp.gmail.com with ESMTPSA id 25-20020a05600c22d900b003fe2b6d64c8sm2722603wmg.21.2023.08.30.09.20.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Aug 2023 09:20:38 -0700 (PDT) Message-ID: <43736fdb-1a9c-4ab4-bf9c-6e2052c6dfea@redhat.com> Date: Wed, 30 Aug 2023 18:20:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Ryan Roberts , "Yin, Fengwei" , Zi Yan , Matthew Wilcox , Yu Zhao , David Rientjes Cc: Linux-MM References: <7f66344b-bf63-41e0-ae79-0a0a1d4f2afd@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: Prerequisites for Large Anon Folios In-Reply-To: <7f66344b-bf63-41e0-ae79-0a0a1d4f2afd@arm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 5EDBB20033 X-Stat-Signature: 9j8tgardafw8ew4seu3jrspeq8bgcyct X-Rspam-User: X-HE-Tag: 1693412443-612930 X-HE-Meta: U2FsdGVkX18UHl4zyKAVhD/ZIzX3CU7Y2WAbA6sdbImSqIo9yYAI5vuv/s71DzhdKeF88UmeFfXtHUp0G+wk0z4PIzTt6Ri+D5ZLjOElBafNFOuFGvVhzza13WBHVhQ2SrvVVREbc6skQGBzBg8PA+XSRtE7qsBuQjsYmB4ff/1R86jeSlEZinNu9AixgRCCNPuocOSSppaF2kLGuHAS7UqeFTLM8/S569r48l4KAzPdS4fxgdlEH/ts+KEUXx1q0NymGuXQ5t70bRSLDEgq5RzOMbQbtkN/eWHG2luu9HZgdqOs5kdt8sVPNhYY/VWooki2c7z6iC9c8+jhPgdUKVp9fzmhBOh2LREPa9JYdZpt8oCu25a1Sd4MgCc3fJAGPH0NpwpYUgozvid3jClOXKipG6H5+6N2OSEtdBC7op/M2JnA2Z1VrVv5dqkPZkJkYr8bg55ix1fgncPW4Qf7Ui+oFG60pTg3rB/Fo65J78LV+wCgNjCFJvVQ69tPGCTxT9XdZrGAU05saEjFTm1vyYjzRRj2g3IHxRuOsVpdTxeBV4+YDWH/bSye6mkFWttLDLrY0hdxSHj9VmsWHkbiRfhU78GugHIK0KVk1Ea3K74ybZDilzTllTfmiauUFZzokOVZY6v4wBtHn7OaKxWJN47GvsKz7ur2MZSiEkKi9TbPf4ow6SOWu0RWErRI/MYUVH8weqIknT3HZbLUbq4sGySR88oM8XmEHbkhu+wksAkjkgVkJI6TmEYgxg40iBk3Nm8UYFzetrkPVq8+mTztQhYgc06coBmQc6ek3nHPS7sbWB3yot/wjItEjQ5OSPTgQnnS7Q6c47n1LNg8swfxTV+ojjGOg6D5TtIh+CMhol+drzdPOJjXmm7WmEHaZa9dJCUD8/mqePVvccHSIdsBulpasMBmfY7TDUzXCTpchFEBCFMmoz9K5hahXr8EFwG+6RxZdv7FFg4TUHYCelG Lg3Ksi8z txKFr+B9AEz+Pep5YyhM9xkH13K+eTph9Q0JXg39xzrb907O2gg6S37VsSLxKNULuyHDV7T0EXc/St72KHbgKTbC0Vi1H6B5wgKL8Rz97vUfZ4375hBZBlo/ByDbju0QDuxZL4wj2uIukGgjctBhPUPpbMCmnacwZ3MULe/As5cIz1LyTpkk04efbUzvER4Hndr6zUovlgdYt+77HwkcTkYztGXiQXhqZIMI5qWKqE4XnzjetGPopg0vZIaHh94/608Vv3lwGMVPqnvB5y7oMEOsJy7pyhtYZE560zDMLPYyng0VJEzABhle4LLt4npkbqiHF2hxtsswO/bmg8IEACxgs6tn/7NkbvDBJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30.08.23 12:44, Ryan Roberts wrote: > Hi All, > Hi Ryan, I'll be back from vacation next Wednesday. Note that I asked David R. to have large anon folios as topic for the next bi-weekly mm meeting. There, we should discuss things like * naming * accounting (/proc/meminfo) * required toggles (especially, to ways to disable it, as we want to keep toggles minimal) David R. raised that there are certainly workloads where the additional memory overhead is usually not acceptable. So it will be valuable to get input from others. > > I want to get serious about getting large anon folios merged. To do that, there > are a number of outstanding prerequistes. I'm hoping the respective owners may > be able to provide an update on progress? I shared some details in the last meeting when you were on vacation :) High level update below. [...] >> >> - item: >> shared vs exclusive mappings >> >> priority: >> prerequisite >> >> description: >- >> New mechanism to allow us to easily determine precisely whether a given >> folio is mapped exclusively or shared between multiple processes. Required >> for (from David H): >> >> (1) Detecting shared folios, to not mess with them while they are shared. >> MADV_PAGEOUT, user-triggered page migration, NUMA hinting, khugepaged ... >> replace cases where folio_estimated_sharers() == 1 would currently be the >> best we can do (and in some cases, page_mapcount() == 1). >> >> (2) COW improvements for PTE-mapped large anon folios after fork(). Before >> fork(), PageAnonExclusive would have been reliable, after fork() it's not. >> >> For (1), "MADV_PAGEOUT" maps to the "madvise" item captured in this list. I >> *think* "NUMA hinting" maps to "numa balancing" (but need confirmation!). >> "user-triggered page migration" and "khugepaged" not yet captured (would >> appreciate someone fleshing it out). I previously understood migration to be >> working for large folios - is "user-triggered page migration" some specific >> aspect that does not work? >> >> For (2), this relates to Large Anon Folio enhancements which I plan to >> tackle after we get the basic series merged. >> >> links: >> - 'email thread: Mapcount games: "exclusive mapped" vs. "mapped shared"' >> >> location: >> - shrink_folio_list() >> >> assignee: >> David Hildenbrand > > Any comment on this David? I think the last comment I saw was that you were > planning to start an implementation a couple of weeks back? Did that get anywhere? The math should be solid at this point and I had a simple prototype running -- including fairly clean COW reuse handling. I started cleaning it all up before my vacation. I'll first need the total mapcount (which I sent), and might have to implement rmap patching during THP split (easy), but I first have to do more measurements. Willies patches to free up space in the first tail page will be required. In addition, my patches to free up ->private in tail pages for THP_SWAP. Both things on their way upstream. Based on that, I need a bit spinlock to protect the total mapcount+tracking data. There are things to measure (contention) and optimize (why even care about tracking shared vs. exclusive if it's pretty guaranteed to always be shared -- for example, shared libraries). So it looks reasonable at this point, but I'll have to look into possible contentions and optimizations once I have the basics implemented cleanly. It's a shame we cannot get the subpage mapcount out of the way immediately, then it wouldn't be "additional tracking" but "different tracking" :) Once back from vacation, I'm planning on prioritizing this. Shouldn't take ages to get it cleaned up. Measurements and optimizations might take a bit longer. [...] >> >> assignee: >> Yin, Fengwei > > As I understand it: initial solution based on folio_estimated_sharers() has gone > into v6.5. Have a dependecy on David's precise shared vs exclusive work for an shared vs. exclusive in place would replace folio_estimated_sharers() users and most sub-page mapcount users. > improved solution. And I think you mentioned you are planning to do a change > that avoids splitting a large folio if it is entirely covered by the range? [..] >> >> - item: >> numa balancing >> >> priority: >> prerequisite >> >> description: >- >> Large, pte-mapped folios are ignored by numa-balancing code. Commit comment >> (e81c480): "We're going to have THP mapped with PTEs. It will confuse >> numabalancing. Let's skip them for now." Likely depends on "shared vs >> exclusive mappings". >> >> links: [] >> >> location: >> - do_numa_page() >> >> assignee: >> >> > > Vaguely sounded like David might be planning to tackle this as part of his work > on "shared vs exclusive mappings" ("NUMA hinting"??). David? It should be easy to handle it based on that. Similarly, khugepaged IIRC. -- Cheers, David / dhildenb