From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD0D0C3DA49 for ; Tue, 30 Jul 2024 22:37:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6044F6B007B; Tue, 30 Jul 2024 18:37:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B3546B0082; Tue, 30 Jul 2024 18:37:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47B2D6B0083; Tue, 30 Jul 2024 18:37:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 293A56B007B for ; Tue, 30 Jul 2024 18:37:38 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9474B140112 for ; Tue, 30 Jul 2024 22:37:37 +0000 (UTC) X-FDA: 82397882154.01.0A3560F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 990AD140012 for ; Tue, 30 Jul 2024 22:37:34 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VqdFy9Di; spf=pass (imf26.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722379027; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b7E1BqsGKjEqQ6l0AIGJfqOuIv4xshIDsMPfwptgFsA=; b=4JpK9WlJh8DmNKkbr3F6IDYneSD7F2RZih1q4Y0ETC3qXw9G701M1e5DZ24pVSMR3SUSOQ sgN1PVhFNuaVXEGJWEGMmwPsMUqVVMrO8dR6l+kAkNn/d0XRkOSi9UKO9teKUPE6WpxXHD MmlRybxTj8TBBkOMX1CqutE8M9yY65Y= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VqdFy9Di; spf=pass (imf26.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722379027; a=rsa-sha256; cv=none; b=1kCMuidp1F0Vb6oNz541YKzDTGpNz5uyxxWsjG/+gtS3MXAr50s8yXdTU+AbFUaj/j8m/x tWJHH4p7rDL3LFw0lT16P5pz50I5CPMokDjVzQl5w2NYlJmhyrTYuvyk4a0EMGs+Z6VcgQ XxxiuBNsXw5Sy294f80yAtQTS+I2SJg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722379053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b7E1BqsGKjEqQ6l0AIGJfqOuIv4xshIDsMPfwptgFsA=; b=VqdFy9DiwKgzuMncOSszjkFRusqxB9APOhxjzi7yRJ0EqV9dsqbBm69TdJVm+Vl9BBdBOY R0dgXm9HQhxM0nWwDDTlLg4VhT4E5O36UUsUzBlks8JumMIcIsCo+pSWH43Gu+IYce+HFe lNjTrpZylrqLUYlUA+lHsfT0fXj2uNc= Received: from mail-yw1-f198.google.com (mail-yw1-f198.google.com [209.85.128.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-347-WEaAwQKQPQORjyxA3XrLEg-1; Tue, 30 Jul 2024 18:37:32 -0400 X-MC-Unique: WEaAwQKQPQORjyxA3XrLEg-1 Received: by mail-yw1-f198.google.com with SMTP id 00721157ae682-64b70c4a269so90380377b3.1 for ; Tue, 30 Jul 2024 15:37:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722379052; x=1722983852; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b7E1BqsGKjEqQ6l0AIGJfqOuIv4xshIDsMPfwptgFsA=; b=nQiTjWLsLJLxjW5mxHP5P+406OzJWRgyaNh+XK+y6iO49EVzkqV80Aq3dH80gU3J4e dVa0DuumCyJrOfff+SCh/Hp09wk1URxBd7KZT+WH9GRMk/HJ96eHR91eLsCTzUTsRAvW YS74SuQpYSkwr6YADsx1EvcOwe3gZyu4a9YFtqQe1rzEyQSAlrswgRncntNRI/XOaMa1 Jbf3P7C+/CF7LhyNyJpFSn+Hk4BJ0qB5jT1DrmFN9npdv10hboOtg4jhSAjKj2tvV/CM rhUq75TYjfcc8rj1QVkLf5LGbH2BdNYZZlAVhhCv1TwvKzScpIHSwSLoTKFf2wdreMlK dleg== X-Forwarded-Encrypted: i=1; AJvYcCXaI8HafwlavR26O6O+AP7AWwyj7i7dTfIcwbaUG+jsFRWmeuwErmoceV1/A8xJ7Pu5Ck6BjC9IuZ246i90NqGaorI= X-Gm-Message-State: AOJu0YyM+CPnoYPbxMNIjGEH+Z207A63jNVkIcpJ5GuwA0v4ydbBiEYt 7Jk6UJ3oShFXrbk44qHy5n+eRlNBWaVyNpWzJJgaS4BgHSGgAm8C1wYsSmG3QrlAYxaEmaNBJ8R 7gVwdDBwawoS9WT7mtZ3yz5j9Z0jmpbZHPB3JK0yGdzsa0yJ2DrwnoImprKqAiGU0i+dLpgg7yM zbS1ideCDRtO1/ddB1wU+QhbE= X-Received: by 2002:a0d:c885:0:b0:627:d23a:44ff with SMTP id 00721157ae682-67a053e0ec1mr139619257b3.3.1722379052007; Tue, 30 Jul 2024 15:37:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEBo3GepYwO9ZNWghzq4hxO7KE522z6Icy+q1emIbleDAYd/B4Mw/pYiccytgB2gJ2Y4/Y8v90CRnnJIn0SDIE= X-Received: by 2002:a0d:c885:0:b0:627:d23a:44ff with SMTP id 00721157ae682-67a053e0ec1mr139619027b3.3.1722379051709; Tue, 30 Jul 2024 15:37:31 -0700 (PDT) MIME-Version: 1.0 References: <20240729222727.64319-1-npache@redhat.com> <72320F9D-9B6A-4ABA-9B18-E59B8382A262@nvidia.com> In-Reply-To: <72320F9D-9B6A-4ABA-9B18-E59B8382A262@nvidia.com> From: Nico Pache Date: Tue, 30 Jul 2024 16:37:05 -0600 Message-ID: Subject: Re: [RFC 0/2] mm: introduce THP deferred setting To: Zi Yan Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , David Hildenbrand , Matthew Wilcox , Barry Song , Ryan Roberts , Baolin Wang , Lance Yang , Peter Xu , Rafael Aquini , Andrea Arcangeli , Jonathan Corbet , "Kirill A . Shutemov" X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 990AD140012 X-Stat-Signature: dgdoa7sfqjx1gm1565wmqk17q4st59xx X-HE-Tag: 1722379054-822443 X-HE-Meta: U2FsdGVkX1+PIU3YSTE74ia7jjb20071+Cn6kqu95bsLzSdpTbz8V60woBAlXQDiW9jRfnx4qI78vQbYWlbrJqlMELjrGhIRLDylilNzYmlXFpARdy5YEvz/oWwRecirt0/23ugmrUNQ6mrfZGpYFs6n5jv9B8R6o1kRyUezhKDFCVlPw1zCXdd2uUPw91Mx92DG5lVTOUW1F7zQiRn97nJR1nooO4H5mM390PktNMSRCWykjqD694PoZf0MBfdq8lmlV/pU7Td37t6FiwTC0137+/bASuCL5j7rM7PYX0vGsVvOdHB6sXoReOQeu9mEvfRmphZsqbi6FT2ygh4KPhvsxwyitQr0vmn7QkU+zMOnLAeB/SuvWhlZJURwjn/uUvPaIoOlRlhdQAVCS9qXFhX30L7yGVnFd1rn8/dqu7LgVkk6cNATcAEVdpwxxhzLLI/alSP70ftMaoDexjoz21gQqx2w4XXE7KdeiggwVtPhcz2btmNlSH9VJV5DCpzZFN+8TWLFU0/MGleCzS03r+37aGZfrNlxv89I2bTVpA2Q1NFNluLFQjJB7/nOEIHutX66Xch5AYdwIcIBoFlZU/FPcfuZxqcWzvsCW0WlhZO/Jymm4yeA21LzZVWlm3yRiWcs1lJW/II3kQhdeDIgmRMT2reE3J9Us9f7nao87G/eLc7ke+WQjkL9iSK4rVaRcOXVY73YDOZ7gPC9b9GiKSPTTGFBajNKLa4W0JXUyt0Z4+3B+4/LF8dSHfs7DSeBxDSuF23Zxqk7FzY9XTuIKs7UIoGtHRxylR/fAq/hZAajgUKa7w8cm9Ps+FDoDZ+6nvnBsB5K4z8TBxQ6gY9lf+ZSdvMXnslvAFC1+sCOeOtwShbhhe4A2BefTKldxSuxsEGRwbCZPpgcEc2gbSl7btMdogXTnZ5Qg38b8H+Fhi6jo3nBg4rJWAq2AGN90dJw1bDPFiiwt73+qhXRGWl Y8dtTn6r MqTThmSWXnq3d63Fy+6DzosCAjghlxaPvldX+2yKNTahBYOPD5XhTTIdDU0WY7M4+QxzTx1VXWNsn7R1q8rpVVO9o8JuNUWLjdcgS6NOG6c8nJ7hYJMziKz7lU3tQ36GUCE5lHyXMghvHgDTy82VGLIPz7uE5Purw4OFBKGAaiAoo5WbKf78yUw8ZRiCqOSWcZ1OUQymd8p7DyJSvprzjv5XmtpH7y2XCxYz59eSBge03FwyxdOs26t+KjpZRoL/06aZ+6wkd8/UbozFtjbFTWFqD7I/UIe1sqsJvnPUNoiykDGo1Gr11UqKY8XYmXBgAkrSwdFad0Y7AHpUdDgoLQQuYFPm0Sc4KBxglw9xTobfPaki5m9FkpwlU13WjdL0X0jBmILPs3lwbMR5NEpYjlr2dL65IF5880CKXVpHjS5Ue3VdFsI8bVe/vB3G4Ik3hfBiy+CM4nN4fqwIf95CXhzfMacLFGkCQgWZSN0ATFVuaH3jA8j/CoyhE2OjJnH7tKq93aAY7SDBVDWTJ4QWLF1HyLhfxjiftwKp6PkS23FdFHh1DGGYQlpkssZBQUw6YQutg/UVOw1LcvACqsBl6ntlRia4XhRfk//pKUtD7XHzOakv6xZ7Iwx3B0Hq8M6l/QZwMIKcD4PJMB3y7m4q2bSWvC4V6DZG5P7c2JFVZdZD2MZHgNDSmlgY7oS2ZpvmE/HwRjZHic59yGgI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Zi Yan, On Mon, Jul 29, 2024 at 7:26=E2=80=AFPM Zi Yan wrote: > > +Kirill > > On 29 Jul 2024, at 18:27, Nico Pache wrote: > > > We've seen cases were customers switching from RHEL7 to RHEL8 see a > > significant increase in the memory footprint for the same workloads. > > > > Through our investigations we found that a large contributing factor to > > the increase in RSS was an increase in THP usage. > > Any knob is changed from RHEL7 to RHEL8 to cause more THP usage? IIRC, most of the systems tuning is the same. We attributed the increase in THP usage to a combination of improvements in the kernel, and improvements in the libraries (better alignments). That allowed THP allocations to succeed at a higher rate. I can go back and confirm this tomorrow though. > > > > > For workloads like MySQL, or when using allocators like jemalloc, it is > > often recommended to set /transparent_hugepages/enabled=3Dnever. This i= s > > in part due to performance degradations and increased memory waste. > > > > This series introduces enabled=3Ddefer, this setting acts as a middle > > ground between always and madvise. If the mapping is MADV_HUGEPAGE, the > > page fault handler will act normally, making a hugepage if possible. If > > the allocation is not MADV_HUGEPAGE, then the page fault handler will > > default to the base size allocation. The caveat is that khugepaged can > > still operate on pages thats not MADV_HUGEPAGE. > > Why? If user does not explicitly want huge page, why bother providing hug= e > pages? Wouldn't it increase memory footprint? So we have "always", which will always try to allocate a THP when it can. This setting gives good performance in a lot of conditions, but tends to waste memory. Additionally applications DON'T need to be modified to take advantage of THPs. We have "madvise" which will only satisfy allocations that are MADV_HUGEPAGE, this gives you granular control, and a lot of times these madvises come from libraries. Unlike "always" you DO need to modify your application if you want to use THPs. Then we have "never", which of course, never allocates THPs. Ok. back to your question, like "madvise", "defer" gives you the benefits of THPs when you specifically know you want them (madv_hugepage), but also benefits applications that dont specifically ask for them (or cant be modified to ask for them), like "always" does. The applications that dont ask for THPs must wait for khugepaged to get them (avoid insertions at PF time)-- this curbs a lot of memory waste, and gives an increased tunability over "always". Another added benefit is that khugepaged will most likely not operate on short lived allocations, meaning that only longstanding memory will be collapsed to THPs. The memory waste can be tuned with max_ptes_none... lets say you want ~90% of your PMD to be full before collapsing into a huge page. simply set max_ptes_none=3D64. or no waste, set max_ptes_none=3D0, requiring the 512 pages to be present before being collapsed. > > > > > This allows for two things... one, applications specifically designed t= o > > use hugepages will get them, and two, applications that don't use > > hugepages can still benefit from them without aggressively inserting > > THPs at every possible chance. This curbs the memory waste, and defers > > the use of hugepages to khugepaged. Khugepaged can then scan the memory > > for eligible collapsing. > > khugepaged would replace application memory with huge pages without speci= fic > goal. Why not use a user space agent with process_madvise() to collapse > huge pages? Admin might have more knobs to tweak than khugepaged. The benefits of "always" are that no userspace agent is needed, and applications dont have to be modified to use madvise(MADV_HUGEPAGE) to benefit from THPs. This setting hopes to gain some of the same benefits without the significant waste of memory and an increased tunability. future changes I have in the works are to make khugepaged more "smart". Moving it away from the round robin fashion it currently operates in, to instead make smart and informed decisions of what memory to collapse (and potentially split). Hopefully that helped explain the motivation for this new setting! Cheer! -- Nico > > > > > Admins may want to lower max_ptes_none, if not, khugepaged may > > aggressively collapse single allocations into hugepages. > > > > RFC note > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Im not sure if im missing anything related to the mTHP > > changes. I think now that we have hugepage_pmd_enabled in > > commit 00f58104202c ("mm: fix khugepaged activation policy") everything > > should work as expected. > > > > Nico Pache (2): > > mm: defer THP insertion to khugepaged > > mm: document transparent_hugepage=3Ddefer usage > > > > Documentation/admin-guide/mm/transhuge.rst | 18 ++++++++++--- > > include/linux/huge_mm.h | 15 +++++++++-- > > mm/huge_memory.c | 31 +++++++++++++++++++--- > > 3 files changed, 55 insertions(+), 9 deletions(-) > > > > Cc: Andrew Morton > > Cc: David Hildenbrand > > Cc: Matthew Wilcox > > Cc: Barry Song > > Cc: Ryan Roberts > > Cc: Baolin Wang > > Cc: Lance Yang > > Cc: Peter Xu > > Cc: Zi Yan > > Cc: Rafael Aquini > > Cc: Andrea Arcangeli > > Cc: Jonathan Corbet > > -- > > 2.45.2 > > -- > Best Regards, > Yan, Zi