From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 664AEE77173 for ; Fri, 6 Dec 2024 18:49:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF0156B02C8; Fri, 6 Dec 2024 13:49:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9E146B02C9; Fri, 6 Dec 2024 13:49:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 965DA6B02CA; Fri, 6 Dec 2024 13:49:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 732696B02C8 for ; Fri, 6 Dec 2024 13:49:07 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E68F41A1E98 for ; Fri, 6 Dec 2024 18:49:06 +0000 (UTC) X-FDA: 82865421366.29.C5F8B82 Received: from mail-vs1-f51.google.com (mail-vs1-f51.google.com [209.85.217.51]) by imf12.hostedemail.com (Postfix) with ESMTP id 98A974001C for ; Fri, 6 Dec 2024 18:48:57 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MqUpVpBX; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733510929; a=rsa-sha256; cv=none; b=VeP6vbQPWqP9ZrZRfcyfCmOZt5HYM+T07ULBJC0dbaPzMmmsuIxwvoz2B2il1bemXDDJN3 H4eqsow/T4N8pnXLpsZ/Bd1uw33H0YPM5GM7sk7w3ogZOBzrqsfGt/PiX5vBdqioJ/BU5m j9lqh/Evz8A522gPWdvv78LcD9YEJ+s= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MqUpVpBX; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of pedro.falcato@gmail.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=pedro.falcato@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733510929; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zDe94JiUBLyj5M0Nz9phYsYy6Pis/1/rK91oYvpIJPc=; b=jZOgwLA3sSyjrv8tL+Kwi7VWaDteTOvUoAkuM3UwKYhngYTbrkT+Qo3NzBRUMxPrOoJw8N kBG3Kyo5aQbveWKQLuxd2LzcRJCRr49aftW2H/JeBgpoKUDCcNuwpl8zPsY0ZjUOsbKvDL /wiGaCHFUKbiZEls+b+DrysxFlvfcTM= Received: by mail-vs1-f51.google.com with SMTP id ada2fe7eead31-4afacb253a3so629313137.0 for ; Fri, 06 Dec 2024 10:49:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733510944; x=1734115744; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zDe94JiUBLyj5M0Nz9phYsYy6Pis/1/rK91oYvpIJPc=; b=MqUpVpBXU6Ni7ACtjirC6SCG4DgtZEUzW75+26r/4DDAc0GJtM3hJBJbWLjZqSXRwc IJo6Zyct2Esj6jZjStBZjvRAcGLAQdAgmrwDXZpudBQb+ZEWTir2yAocEn3y3SBbZNZs O1+/jab/3PUy2r+uTK+9XVtZY2+ngvzCFSfCtsrHOZwc7RexTl3JC5sG6OiuuJKQuViz WOku1gzG4xfMFVKT1hPOqhpoe8yzcjnuidfMDBG2jxdVX/no3C8eyKpKgQ72tnD8IDuD e2Mrml8vo/7l7SLJLOL9pyiEQimPH4UzFBMCrusvRy6CwINeL3TvQn7GK0k0n/NvRubY iBnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733510944; x=1734115744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zDe94JiUBLyj5M0Nz9phYsYy6Pis/1/rK91oYvpIJPc=; b=uitTIjiq2IMVliOVpj3K650X2r5DnVU6OKmIjb3xQCx4C8unyJOpRceizkMJB/hFy0 bZXgD8BuSOczlHSpEMgGfL3TiD1Vp0FNkjizwVO+QeaELdWvvwgZhPbcNSHuq2M7r3nt Ui1HKDxyATIERnRZ2OVqplONTy85rDBGtlRe2RxIPbfDZKA1Rb0m7b8DMVWZlvPlZVfB +mITGbM0B8rZt+DMrqAMrIHkdZjHNEBaV8COxqORlg7mk/Ohe+ecivbHAS3k+8jFxBms /CwjaV4qhgGhWYa1f9GcSxb+ep1ueplyt6FmbjNmZDuGTxJSBxgrVXWSEJ5ePtsPMHa9 AGOw== X-Forwarded-Encrypted: i=1; AJvYcCVkrUztL/Lj8ijnMKz3U/SUyLwwhY/gp01Yzl0BMyUPjvnD/wWftqv/Oo1kc3KfU2beKi+5RYD1nA==@kvack.org X-Gm-Message-State: AOJu0YxJZfCABH9gjkmP0UQctjaCiSMN43ghmKHf+b2AVnuKtSplCneJ VuDAzMTw+R7wHdT2Y4hrDgyHJ/1EQ665qR//6UtdbmFZRvzJ9Poyb+Oz7I2KzCU6rcnu60i857w GaHITbTAHfvonD2JsjY0bvSIBWOA= X-Gm-Gg: ASbGncsxnOe8Xeo7kKMWZVSJwFFYHzSeqxzbEMVKPc8v/H0lNR0F4m3i02ZLZSlVlxQ U3dwanXw3CSS9eCEG1nHjUKQtUZWY1UDJOv4eCsWgrK5rpq3zliYfTpejNIJEANQ= X-Google-Smtp-Source: AGHT+IFdVK/2SW0dQA9D83t33zskmSCGjoh2wu0VKQvtZhnyeVm/Yo0+w+FJ+Tpyj82NsvB6vY+bbP+azdgpqNLFeRs= X-Received: by 2002:a05:6102:3714:b0:4a4:781f:167e with SMTP id ada2fe7eead31-4afcaaa5ceemr5632763137.16.1733510944103; Fri, 06 Dec 2024 10:49:04 -0800 (PST) MIME-Version: 1.0 References: <20241205103729.14798-1-luxu.kernel@bytedance.com> <315752c5-6129-4c8b-bf8c-0cc26f0ad5c5@redhat.com> In-Reply-To: From: Pedro Falcato Date: Fri, 6 Dec 2024 18:48:52 +0000 Message-ID: Subject: Re: [External] Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page To: Xu Lu Cc: David Hildenbrand , Zi Yan , paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org, anup@brainfault.org, atishp@atishpatra.org, xieyongji@bytedance.com, lihangjing@bytedance.com, punit.agrawal@bytedance.com, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: zm68mng5ejjesdp8jq6s3c1umsmw79mc X-Rspam-User: X-Rspamd-Queue-Id: 98A974001C X-Rspamd-Server: rspam08 X-HE-Tag: 1733510937-28107 X-HE-Meta: U2FsdGVkX19Kb09fSjRlM/83ORMnBWvySB8IsQXw8SBmWmmNqCZrziUevknpf7pOtkTyRQrZDGa6f81SbFWS/OFF/NVUE4RRDbxoE4MGADNEsxwrAqKdzfyPZjiXdNQoiaDgvOVE1n93Vu66x76jEXo+iyYYHZ1d/zgo476cNNB09X37p8RFX/qSTfUK8BBwOn13pb85McUWRJajiiriGIgqEv+UHC5HH+wnICOfQLm38ebnx+eN+6rUt8SqIMk5m7Oj041WZcnH/Xop7j7ZYcvIvog0TGSbXI89MwUDyJ+3nGQFgV5BW5KJN6pXv0ReKvpS0vdYar3xPhB2MXTJm7fz86EO0u8YxEl12pLf57vzpJuciXs9bx5PotcunqYgcLK3dIAbG9TlzwgLKOnZlwjnbDyE+ZQeySU60Fd7/+d53g3ZXb4nJeoQsm5uxCYgLYagRF5xIY99cSqTalpqflAqNfyPAPj7g/S4CtX+9DH/7cx6N/1cTaGz/9yVDx+5CkTDQGdSmggP8S4uic/kiTcAQIAlhpnsxVz4JNPYRKWoskC446fhNPyHPqmsbSkIXnaNaN9a1V/22HwEge2l2iywCcaMmb3CRFzhU5NzYPuIjJvRqQ83HCVzqhICH+mR51iNsZTyedveppN7leytUPqCMm9O7oZzcQgE0ASGc1KSgj/gAuZgSAbfmmrqEuEn9Uiz3wYIpfog9KNiXYc/DwcarU+FNw/px13pdWTu1y2CyduLMFR0Gpeha9bSo9cItsw3BnTB2/h99/Osr45DAQcEII3Fr4cb58z9mNIQnY0ZlBP823XbURI6h3JAw97ykby6CH4Fbd9G/Pz/kNg6uGNvyCwNuotIhRMPo1TWBqBV9HEPbY9A0x4wWGKczI6f4phVhdeq+LTCwW+gvg4zFQT+KG6YsWuoJ667d9tSeh1iyRcl5taYyxuvF+1z0Elzhuyh0ByckZYlb69lcEB KvPD4jUB LYpJlXJ3pobN6EbQJMqypqD1bsno1UmyYY8UiqAfUBWfWuuOgXdW7Vh6fBgPfA1OW4FAi7a8lLcDD73UsE8gsBS0UjQpMDio4MpzfJ73zDBWiMUCAaVhgcGeuZ1ZPjCfywNdjuJD66Tx4kYKgYx4q+Ug8uU3/ZfVReMbw+8CcJfbVFDP3jeSdniuzc+aPl3lCLHbcNaGYWvEK/14j7qxgI3jL84Gv9zKtvSzbV/WSaV0RLpfQbn+xaKyvet6kRHUqZduE/5GHX+J2LIcEFam7ZcosFqwAAwsLPXBr3C0d8XiYRVOcROuULzUek9UBN/Xqotm/otSE8/AqBZBxSlaHGx0sV6vtjr8AJ7yl/bK28q2s5+UMqfjU3sLA66SjaTsjcXg650xZKmH4pfYOz2g05v+Vj8G0g6Vgc2YeoJKEwQF2k3fd5fyrKInzm2b+v0YAJg9MJ8yfYonhyYJgXO4adwL7f3xekUnHPcUHgp4ZEMmZViK2dX/MXOqxrNULQstDufutunkM8H/CMrQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.033321, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 6, 2024 at 1:42=E2=80=AFPM Xu Lu wr= ote: > > Hi David, > > On Fri, Dec 6, 2024 at 6:13=E2=80=AFPM David Hildenbrand wrote: > > > > On 06.12.24 03:00, Zi Yan wrote: > > > On 5 Dec 2024, at 5:37, Xu Lu wrote: > > > > > >> This patch series attempts to break through the limitation of MMU an= d > > >> supports larger base page on RISC-V, which only supports 4K page siz= e > > >> now. The key idea is to always manage and allocate memory at a > > >> granularity of 64K and use SVNAPOT to accelerate address translation= . > > >> This is the second version and the detailed introduction can be foun= d > > >> in [1]. > > >> > > >> Changes from v1: > > >> - Rebase on v6.12. > > >> > > >> - Adjust the page table entry shift to reduce page table memory usag= e. > > >> For example, in SV39, the traditional va behaves as: > > >> > > >> ---------------------------------------------- > > >> | pgd index | pmd index | pte index | offset | > > >> ---------------------------------------------- > > >> | 38 30 | 29 21 | 20 12 | 11 0 | > > >> ---------------------------------------------- > > >> > > >> When we choose 64K as basic software page, va now behaves as: > > >> > > >> ---------------------------------------------- > > >> | pgd index | pmd index | pte index | offset | > > >> ---------------------------------------------- > > >> | 38 34 | 33 25 | 24 16 | 15 0 | > > >> ---------------------------------------------- > > >> > > >> - Fix some bugs in v1. > > >> > > >> Thanks in advance for comments. > > >> > > >> [1] https://lwn.net/Articles/952722/ > > > > > > This looks very interesting. Can you cc me and linux-mm@kvack.org > > > in the future? Thanks. > > > > > > Have you thought about doing it for ARM64 4KB as well? ARM64=E2=80=99= s contig PTE > > > should have similar effect of RISC-V=E2=80=99s SVNAPOT, right? > > > > What is the real benefit over 4k + large folios/mTHP? > > > > 64K comes with the problem of internal fragmentation: for example, a > > page table that only occupies 4k of memory suddenly consumes 64K; quite > > a downside. > > The original idea comes from the performance benefits we achieved on > the ARM 64K kernel. We run several real world applications on the ARM > Ampere Altra platform and found these apps' performance based on the > 64K page kernel is significantly higher than that on the 4K page > kernel: > For Redis, the throughput has increased by 250% and latency has > decreased by 70%. > For Mysql, the throughput has increased by 16.9% and latency has > decreased by 14.5%. > For our own newsql database, throughput has increased by 16.5% and > latency has decreased by 13.8%. > > Also, we have compared the performance between 64K and 4k + large > folios/mTHP on ARM Neoverse-N2. The result shows considerable > performance improvement on 64K kernel for both speccpu and lmbench, > even when 4K kernel enables THP and ARM64_CONTPTE: > For speccpu benchmark, 64K kernel without any huge pages optimization > can still achieve 4.17% higher score than 4K kernel with transparent > huge pages as well as CONTPTE optimization. > For lmbench, 64K kernel achieves 75.98% lower memory mapping > latency(16MB) than 4K kernel with transparent huge pages and CONTPTE > optimization, 84.34% higher map read open2close bandwidth(16MB), and > 10.71% lower random load latency(16MB). > Interestingly, sometimes kernel with transparent pages support have > poorer performance for both 4K and 64K (for example, mmap read > bandwidth bench). We assume this is due to the overhead of huge pages' > combination and collapse. > Also, if you check the full result, you will find that usually the > larger the memory size used for testing is, the better the performance > of 64k kernel is (compared to 4K kernel). Unless the memory size lies > in a range where 4K kernel can apply 2MB huge pages while 64K kernel > can't. > In summary, for performance sensitive applications which require > higher bandwidth and lower latency, sometimes 4K pages with huge pages > may not be the best choice and 64k page can achieve better results. > The test environment and result is attached. > > As RISC-V has no native 64K MMU support, we introduce a software > implementation and accelerate it via Svnapot. Of course, there will be > some extra overhead compared with native 64K MMU. Thus, we are also > trying to persuade the RISC-V community to support the extension of > native 64K MMU [1]. Please join us if you are interested. > Ok, so you... didn't test this on riscv? And you're basing this patchset off of a native 64KiB page size kernel being faster than 4KiB + CONTPTE? I don't see how that makes sense? /me is confused How many of these PAGE_SIZE wins are related to e.g userspace basing its buffer sizes (or whatever) off of the system page size? Where exactly are you gaining time versus the CONTPTE stuff? I think MM in general would be better off if we were more transparent with regard to CONTPTE and page sizes instead of hand waving with "hardware page size !=3D software page size", which is such a *checks notes* 4.4BSD idea... :) At the very least, this patchset seems to go against all the work on better supporting large folios and CONTPTE. --=20 Pedro