From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89B53C5479D for ; Mon, 9 Jan 2023 17:33:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10A348E0003; Mon, 9 Jan 2023 12:33:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E2328E0001; Mon, 9 Jan 2023 12:33:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC3BA8E0003; Mon, 9 Jan 2023 12:33:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DC13A8E0001 for ; Mon, 9 Jan 2023 12:33:26 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AC4C31A0575 for ; Mon, 9 Jan 2023 17:33:26 +0000 (UTC) X-FDA: 80335957212.02.EE2DB0A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 9CCD1180017 for ; Mon, 9 Jan 2023 17:33:23 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UGSz7Tvg; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673285604; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zhP4K3JTX2aw3f/spaswIxHRy9c+KvuTzTjonZXpAew=; b=Hg22/uOPqactw+VOC7+gEY2Dxe5lpBG0xsAfDx7iR/CQVawcngk7THYWIf5bZr+MkLV3Qt IggeBFLImAZ7OdB1ACv8TmPvAZXO9hMQuq9zo4nAxRClCgCe03e8CsSgc3o7ReOFwMAzky WkYiwX+vznR1/FRPKmvtbtjsVOv0/Jc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UGSz7Tvg; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673285604; a=rsa-sha256; cv=none; b=QzfFDe9yb5E4oiMc4LZgAYwdaAwrX7s7eL+nAR7t0wxmvUFhbUIPi7Uo2k0nM0HI3lNg6d QlLtzqLtZKaNAOH/eYdIHPQaqbDPfkTpFUdpprqf6Qh8oZzxT6E1G6HGzEVe3paVpvF4kq +ic5QpwHQyNpC76Ui2D0byRtZz5dKp0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673285603; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zhP4K3JTX2aw3f/spaswIxHRy9c+KvuTzTjonZXpAew=; b=UGSz7TvgK748bip9HNIhz+XXVFeg5bVvtDBvezbenJLCVcg+PjFeYcCc54Dkje7n3M6SlK ibhrloqKOeb6WHwM3ffcMD1+tLi/2XDE/nAA/5bFccCh6JOzsSP1+IqBinZ7WweRBOKa1a FKy4hoa8VTrQjOZyRmpF27O1A/VsL9Q= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-434-1I5O7e7YONewoNmtwS82DQ-1; Mon, 09 Jan 2023 12:33:12 -0500 X-MC-Unique: 1I5O7e7YONewoNmtwS82DQ-1 Received: by mail-wr1-f71.google.com with SMTP id u15-20020adfa18f000000b002b129dc55bfso1470070wru.11 for ; Mon, 09 Jan 2023 09:33:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zhP4K3JTX2aw3f/spaswIxHRy9c+KvuTzTjonZXpAew=; b=T9BdLRxkm3ohP6/MFUnpFNTNlEbMuOuE34xmf66ocy1nrSLIa+nDXkiwETQzribqGZ lzEMm1ngwI8Du588HvIdfBf4fqK7gmOshSRRYQJh9jcxz+6HolzTmQfp+G7D4RRAE+cx yb9qqQR0SSyWyseuFyoFQ9NpkA8utz1LTPtuwFiRM4fHCI7uKJBfrvn0ksEoZPW5f77W PHO3YZCBy3HcjW6CgudXgmw5LLfpAkOCqO/Polsfxr3wHEpZBaUv5yaq6Lzd0R3slsFo rARAcaeSJhNbUf+PoMdRiFf3uUn4OglHKq/WFb/wYolSD+lPppSpHx1Ctbn1/6gMSxQL lFzg== X-Gm-Message-State: AFqh2kpYpJM91OfNEmCaXMv95zRnHZIdtXAms/9tS3B8uYwBrpaOwS02 Gh9d8SlJUZSMFDn0+8dNZUiY7a5rIxssW1P5VX5Fn5VbUUS9eL16brV8wSnAa0nkUwqlJbUCCmN rqKhCwnLZbh0= X-Received: by 2002:a05:600c:5014:b0:3d3:446a:b46a with SMTP id n20-20020a05600c501400b003d3446ab46amr49871443wmr.38.1673285590853; Mon, 09 Jan 2023 09:33:10 -0800 (PST) X-Google-Smtp-Source: AMrXdXtdpVaAz/7yStzK2zsXs/mtm3Qf8numueZ4EEoxaY253cwtKCy1MxYAfq1RJcA4zJRqrFmTFA== X-Received: by 2002:a05:600c:5014:b0:3d3:446a:b46a with SMTP id n20-20020a05600c501400b003d3446ab46amr49871422wmr.38.1673285590574; Mon, 09 Jan 2023 09:33:10 -0800 (PST) Received: from ?IPV6:2003:cb:c703:8f00:ba3:7d27:204f:8e29? (p200300cbc7038f000ba37d27204f8e29.dip0.t-ipconnect.de. [2003:cb:c703:8f00:ba3:7d27:204f:8e29]) by smtp.gmail.com with ESMTPSA id r7-20020a05600c458700b003d974076f13sm13009227wmo.3.2023.01.09.09.33.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 09 Jan 2023 09:33:10 -0800 (PST) Message-ID: Date: Mon, 9 Jan 2023 18:33:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Yin Fengwei , linux-mm@kvack.org, akpm@linux-foundation.org, jack@suse.cz, hughd@google.com, kirill.shutemov@linux.intel.com, mhocko@suse.com, ak@linux.intel.com, aarcange@redhat.com, npiggin@gmail.com, mgorman@techsingularity.net, willy@infradead.org, rppt@kernel.org, dave.hansen@intel.com, ying.huang@intel.com, tim.c.chen@intel.com References: <20230109072232.2398464-1-fengwei.yin@intel.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping In-Reply-To: <20230109072232.2398464-1-fengwei.yin@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9CCD1180017 X-Stat-Signature: dnmf4y8iuhapfkra4cawjxzxi3sxcc3b X-HE-Tag: 1673285603-421058 X-HE-Meta: U2FsdGVkX1/G6QxbGxxay7abH6tR1sCNkyIbchhfCYleSr20q8PGsetPkjVtLLspDcffXs147J79b2edRu5qdgAEF9jxNXdNEN3wFF+n46aWQvp9VXqTpYkbl2/yuYdGViXqjBv7f1yQoNRgmAMrzRFi7QDea/5RAQ7DVD9hRs0S69B+IknLINtdWYXUTCWaxcwgL/1RKK5j972B0AT2Viqdz0klHDiVAQuOqOK3NRIY2UEISa5KNd/AHyGn7xNe6BGQblWc3i+UF8ncRrjbMoZLai/ojyZHa/j00IOvX9bwW6tlBqFuxAl9osh4mH1JQJ3H7XQwzjfJuDLO/HD+8kPgpClUaqt3K1a2dl7u/o4Y8vuahKzgRuOKrN8DtXdQN/dafH6UjlWypMUvdpD8637+eHRMuoIuF36o50rOQhA4PsL6TpnmpHizVG9PLwpBhxSqBGs7sF7LbCIdZQDluuIW4KvWrmUgY9D1VW3vlsRm94h37aoMAbFWh2ZimLxUhnCJmlO3GPE2lhsFirZ4DUghEorQBoWzGsiEIk/CJo6c4TcHCXr9NBuPWGlt8gu5aM9GMkWXeogHobfQ3GEOHZ31y3bAEZa3+uv+A0LLnwwUIO8fPCMOp9iGjSrda5kov/JrAkUPgPpq0ySaCFb9BIRa/+2mNW/NHG12acppg+GVXtZdj/6/DKzWITP03SefXqrNo+iZHbnEcEiT4y/LweKnM78T69yJTRPzIPtwSXHsfwkSXDCwgISB82uSEq+alsEDH/XL1Xsafnrm72ZjGSfBt6FS8XATfbHiPEEf4MKJvpc79qQMFUuG0hxGhE/zp+PqYrpUV2LP9xnsa6jlo4oZ8/VJ1eq4Nsn5fci1LQIoNC1HKKKXRDvfvq4tGNuFABpPCDdYel7I4u9updgducww6MDtkkZt3vRrusEI7ndS1hK9leUqAXINdJkXSuSpA/OKHwtQ/w/vvoMpvkx +gouHmu8 dvqw2BiunYY+uflEnGzHNNGbHQ4CjGKuPrcZVOXshlSoei7b4pH3w8jw1U683pimrX5y+rAYSpj9jroyZFBQG9PdD1YjzojRztx99W/L6lfCKvC3dOl/90PxlA99XmxCOWbMniv5kgNTyEZmBmixWLqz/vA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09.01.23 08:22, Yin Fengwei wrote: > In a nutshell: 4k is too small and 2M is too big. We started > asking ourselves whether there was something in the middle that > we could do. This series shows what that middle ground might > look like. It provides some of the benefits of THP while > eliminating some of the downsides. > > This series uses "multiple consecutive pages" (mcpages) of > between 8K and 2M of base pages for anonymous user space mappings. > This will lead to less internal fragmentation versus 2M mappings > and thus less memory consumption and wasted CPU time zeroing > memory which will never be used. Hi, what I understand is that this is some form of faultaround for anonymous memory, with the special-case that we try to allocate the pages consecutively. Some thoughts: (1) Faultaround might be unexpected for some workloads and increase memory consumption unnecessarily. Yes, something like that can happen with THP BUT (a) THP can be disabled or is frequently only enabled for madvised regions -- for example, exactly for this reason. (b) Some workloads (especially memory ballooning) rely on memory not suddenly re-appearing after MADV_DONTNEED. This works even with THP, because the 4k MADV_DONTNEED will first PTE-map the THP. Because there is a PTE page table, we won't suddenly get a THP populated again (unless khugepaged is configured to fill holes). I strongly assume we will need something similar to force-disable, selectively-enable etc. (2) This steals consecutive pages to immediately split them up I know, everybody thinks it might be valuable for their use case to grab all higher-order pages :) It will be "fun" once all these cases start competing. TBH, splitting up them immediately again smells like being the lowest priority among all higher-order users. (3) All effort will be lost once page compaction gets active, compacts, and simply migrates to random 4k pages. This is most probably the biggest "issue" of the whole approach AFAIKS: it's only temporary because there is no notion of these pages belonging together anymore. > > In the implementation, we allocate high order page with order of > mcpage (e.g., order 2 for 16KB mcpage). This makes sure the > physical contiguous memory is used and benefit sequential memory > access latency. > > Then split the high order page. By doing this, the sub-page of > mcpage is just 4K normal page. The current kernel page > management is applied to "mc" pages without any changes. Batching > page faults is allowed with mcpage and reduce page faults number. > > There are costs with mcpage. Besides no TLB benefit THP brings, it > increases memory consumption and latency of allocation page > comparing to 4K base page. > > This series is the first step of mcpage. The furture work can be > enable mcpage for more components like page cache, swapping etc. > Finally, most pages in system will be allocated/free/reclaimed > with mcpage order. I think avoiding new, herd-to-get terminology ("mcpage") might be better. I know, everybody wants to give its child a name, but the name us not really future proof: "multiple consecutive pages" might at one point be maybe just a folio. I'd summarize the ideas as "faultaround" whereby we try optimizing for locality. Note that a similar (but different) concept already exists (hidden) for hugetlb e.g., on arm64. The feature is called "cont-pte" -- a sequence of PTEs that logically map a hugetlb page. -- Thanks, David / dhildenb