From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90661C636CC for ; Sun, 19 Feb 2023 08:08:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C4DB6B0072; Sun, 19 Feb 2023 03:08:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 09CB26B0073; Sun, 19 Feb 2023 03:08:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECCC16B0074; Sun, 19 Feb 2023 03:08:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DBD856B0072 for ; Sun, 19 Feb 2023 03:08:10 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AAFF6141442 for ; Sun, 19 Feb 2023 08:08:10 +0000 (UTC) X-FDA: 80483313540.14.C90704A Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf20.hostedemail.com (Postfix) with ESMTP id D14F11C0006 for ; Sun, 19 Feb 2023 08:08:08 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LqT4fFIX; spf=pass (imf20.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676794088; a=rsa-sha256; cv=none; b=V5p9xMGSdUWWykZla051T3VQc6kgnluNhHIO4Gx8sJ0JQUewKm1XNx8/vuEkdbhFn+Nlp/ r07MWMotHJi0BrH63z9g7UAb+MBn3shVzlyQNWikg9PkdtRjRP1FR9BzykxZZ0s5fyiAId EmE8X/hHX0oIZM1jR0cGEDCzcjuQcEc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LqT4fFIX; spf=pass (imf20.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676794088; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OzujE50mMtxPqO1CIAobu9wP4XNFq6Tb96AOpeXUctw=; b=yCugIV7zHRDcsKZRFRzXyVqplREE/qUnalmfbXKp4C9/7BE0sgwcttFOd1D6p0sDvVjeLF /ZYw97yGRTive42YNoyQ55BUbr5Y87Hb0D/JR54N6E7m6vvrkXPT763LIRfowTNvQu5em7 0KzDr1DYOtO0fgtsTFKFbfrzjhBwDYs= Received: by mail-pf1-f182.google.com with SMTP id a19so149921pfv.11 for ; Sun, 19 Feb 2023 00:08:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=OzujE50mMtxPqO1CIAobu9wP4XNFq6Tb96AOpeXUctw=; b=LqT4fFIXlI0Q54jkZj9ZDd8auxmVCa4qoVRpdQdaAaUuYqirRjj/eKvu6SF+wch9cv v1L4QLMTWvF480YxKKhB0Odw47+PQPHoN71nylauXWzV/JM4C+ap4IMfyAFwdgzNUL80 3YGiKoqKCLjvCk+5rsgy4ZZXyFSdIlIVOzszQ16nTd0fPZon4FKlNNOJ8oIo335KoRDO cS3ns3nByT6w4Xvu4yOOE2mbDMEajWK2ZJVUEibQ3UvuIn+bg7XmybCzAHxrzVyMBGYE NQobh9KZNtCF2Pe+AkFs6i6sACkLjo4Nba+X9d4l3KV9WyQKD9LSbs+/dpy0kGDvibPC y1xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=OzujE50mMtxPqO1CIAobu9wP4XNFq6Tb96AOpeXUctw=; b=ApZqD3X3iDU3KTv5t4Ngheq2TKDhNgC836YVqEgT1U7Ro8lzZRO/UxgKsiTiDaPqMD IhuGnZ7j3L8lO9I4w0RfxZS1sOT3NARD4Z6l8Zodw1my/HWzMMT9fOJ4K6wUlqkWSnJk o/INxOr2MHVO1/wCgjsYe5BVD72LBjUqTsspRf11snZdfoiZd7XUiPE+q6xM/b6+dqef UZ4XgL9SXI/pNe1AhfMrgl68AwfVjftSuIkXFL3k5CeYH+boIwUhBTm/aleIx9QD1HEa WHx1dml9IbBSUoJCZD3IpeUSSeL+2XWTKNc01+rTjHz1XjZuXjg0vror+Sb82AoMIAKE 5WYA== X-Gm-Message-State: AO0yUKWM5R+kveK1G0W70SzsCRriAViOLdTw+XcQJHRO1Em/qs+4Jj7c 1OlNHVjqjShgK8EtrDK0CJ8= X-Google-Smtp-Source: AK7set9twJ/eOObcy0o1lda+6Ld2rfGaAvsL9NqDSDueQo4VK+JQcu8O9ZEme3upymmDE9lGx9aX+g== X-Received: by 2002:a62:1ad4:0:b0:5a9:c43a:5810 with SMTP id a203-20020a621ad4000000b005a9c43a5810mr8749355pfa.25.1676794087418; Sun, 19 Feb 2023 00:08:07 -0800 (PST) Received: from localhost ([2400:8902::f03c:93ff:fe27:642a]) by smtp.gmail.com with ESMTPSA id p21-20020aa78615000000b005afda1496c6sm1952450pfn.31.2023.02.19.00.08.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Feb 2023 00:08:06 -0800 (PST) Date: Sun, 19 Feb 2023 08:07:59 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Mike Rapoport Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Aaron Lu , "Kirill A. Shutemov" Subject: Re: [LSF/MM/BPF TOPIC] reducing direct map fragmentation Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: D14F11C0006 X-Rspamd-Server: rspam01 X-Stat-Signature: ag5ikpj3x7wooopprxp17hjtgfzug8rr X-HE-Tag: 1676794088-423375 X-HE-Meta: U2FsdGVkX198To5tpdYeWJ08K83EiD4ZksMTZ1oxEdAAM/04DY+u+QPsK2LLRPhNTDfXa3hY/hdzBeNGCW/pzqpBwi8bXdHEnT7594uElNoPxMWfuK9nLnTYCOCD37ZYKa53HvOmaPQ+7TvXeeo1P/dqcbm4sMis+ePjzwoFrvV74KSMWlPtPjoMYcH+6eBAJCh4waF7yuNXwcN5IJz8g1jywRwhK7wesb3vw5603vgYf98WPSzSTWcaIWvqIQ947Ay1KLEhsDm1T+4tBJ1gbvmKCQqGCJJFltojs50XrwiQu8gutkEX4Pa+S6m0zFLc+3bumhsUS1MHybKG2WHi+PCmAJB5UqEPVBK/AQvktRopf0U8FUJyVe3yOeG1Z2HAdNlzgvVqf6NydWzEpgxfBfP1+dXu40CL50P39ZWQa4G6I6BfItDEjhwBOePG3x8p6gda+dR2MCxR4QF3WwCrnC+b60PryGTfSh7tkz9BL8ZxC4yneTRzlDEf9dyyzlcwiyLNvk7/dlDVXyaoCoavUJuNN8zBrBC/WCHkp+231/9E+W9s47qWJ954t4lN0IC5PM/ti6g8UgdOyzEEarXgoLl21crSokCFYOWY/x6kg+xl5/Eu+nkMeaUez6PnsUqN0NXZd7ab0osUX4IMjIKuHQjjuVCrTvsnoYGeiCYuTsNf7mE6TuXpFcmnmi+YmR0vpO+sCa6vEui9hVfCHiK/n2qUE1nVSLLlv8uCwrqSIcVGpThfd6uKi+M4jXIMSsu1DcFRhRxXJYljfWY5jE5AEwYCoXtCJReVnMMgFg7X6Xpp+eZsCXLnHb9j3i2QyeoEgwEAvGCYaRyeg0HnwM8kRgaC8rpTlDaPwR+D7l0A273vO/sC2BnnXiyhXSInKUReOj8sAc8kuoFUMPsHiAg8BK+KHo5PJM4sMLDyScP6XhTVnyDJm1QXtUNoqy342W1s9fEef2faEcPVqzv2Ydd SAsED2J6 CbXAQ2YG8jv087X7sLpoM0E1mT7KqkPV9EGBD2saIWmtgvNoXhTd/2+z0g/7TyAVWAUiSNYImk5wxV4GPOl9F84eWK7mzqyxkhWT30WhL4qCefL22YEg4hcQVjcc9wE1lbHmNqPO4i0ntDTEWpgl7hdt/IyrSSZJQoMQXfJtYDgKDWLbXYMbOTR2ZoEe5JHxbAhLHIBoUfFUzPf/7Q4EXCeXEgQ9s7LueJjxvgpBBqyMd1P/nrnPb0VS4aSlv2wK3rRxJtouXVkNIsd/4zWh/hEbIQpjSFWSGw7zaNZwACz09+bn4Qs6o56EDqgonNAGR+Awo4L6RCIOk+av4cQMTcLrpu1JuMOvEIbZGueyplyjq0VEPFg7AE6H8QjhDVLB+5bCDQlj3cWnpZyOQCbNWZhLM0ksqdZ+S91rmYvO98IKQb3dGHu9a1EICEl3kE5hMiyY5M7kIMSIZbGBfNs4h61He7NfxZ2YRN0sQ8XJSGs82RBc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 01, 2023 at 08:06:37PM +0200, Mike Rapoport wrote: > Hi all, Hi Mike, I'm interested in this topic and hope to discuss this with you at LSF/MM/BPF. > There are use-cases that need to remove pages from the direct map or at least > map them at PTE level. These use-cases include vfree, module loading, ftrace, > kprobe, BPF, secretmem and generally any caller of set_memory/set_direct_map > APIs. > > Remapping pages at PTE level causes split of the PUD and PMD sized mappings > in the direct map which leads to performance degradation. > > To reduce the performance hit caused by the fragmentation of the direct > map, it makes sense to group and/or cache the base pages removed from the > direct map so that the most of base pages created during a split of a large > page will be consumed by users requiring PTE level mappings. How much performance difference did you see in your test when direct map was fragmented, or is there a way to check this difference? > Last year the proposal to use a new migrate type for such cache received > strong pushback and the suggested alternative was to try to use slab > instead. > > I've been thinking about it (yeah, it took me a while) and I believe slab > is not appropriate because use cases require at least page size allocations > and some would really benefit from higher order allocations, and in the > most cases the code that allocates memory excluded from the direct map > needs the struct page/folio. > > For example, caching allocations of text in 2M pages would benefit from > reduced iTLB pressure and doing kmalloc() from vmalloc() will be way more > intrusive than using some variant of __alloc_pages(). > > Secretmem and potentially PKS protected page tables also need struct > page/folio. > > My current proposal is to have a cache of 2M pages close to the page > allocator and use a GFP flag to make allocation request use that cache. On > the free() path, the pages that are mapped at PTE level will be put into > that cache. I would like to discuss not only having cache layer of pages but also how direct map could be merged correctly and efficiently. I vaguely recall that Aaron Lu sent RFC series about this and Kirill A. Shutemov's feedback was to batch merge operations. [1] Also a CPA API called by the cache layer that could merge fragmented mappings would work for merging 4K pages to 2M [2], but won't work for merging 2M mappings to 1G mappings. At that time I didn't follow more discussions (e.g. execmem_alloc()) Maybe I'm missing some points. [1] https://lore.kernel.org/linux-mm/20220809100408.rm6ofiewtty6rvcl@box [2] https://lore.kernel.org/linux-mm/YvfLxuflw2ctHFWF@kernel.org > The cache is internally implemented as a buddy allocator so it can satisfy > high order allocations, and there will be a shrinker to release free pages > from that cache to the page allocator. > > I hope to have a first prototype posted Really Soon. Looking forward to that! Wonder how it would be shaped. > > -- > Sincerely yours, > Mike.