From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94687C3DA6D for ; Tue, 20 May 2025 07:25:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CE8E6B0089; Tue, 20 May 2025 03:25:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A5F96B008A; Tue, 20 May 2025 03:25:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BC636B008C; Tue, 20 May 2025 03:25:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6C6396B0089 for ; Tue, 20 May 2025 03:25:47 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AEA35C05E2 for ; Tue, 20 May 2025 07:25:46 +0000 (UTC) X-FDA: 83462451492.27.D20CA0C Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) by imf13.hostedemail.com (Postfix) with ESMTP id D4CB720002 for ; Tue, 20 May 2025 07:25:44 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CDZSLfcJ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747725944; a=rsa-sha256; cv=none; b=zn8b94J0wD69kjn6vohVtxUkzHElkEqhV4ac8vw5JirhyEoBahXfLX7aRpKhnev6JVpEyw hfmCWB+3NAA61RJoGZltvBjnwuPBvAvpxJDSVJBaRlXwJCdK8UboffniD0/7qZctlB6Edm zKK+5EisPEL5bobqEIVnZXRQtwRRFKU= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CDZSLfcJ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.46 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747725944; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hObiR3OS0YcfNiy0nnBYeGVqefSTZY0x1eGYXc1HwuQ=; b=MMHDflgG23OQIi1omqa7TjCq4uQAGSjldgSz03MRchf4dtiReS4LpFj1q2cJdFF2aDnTKT XTsMz4dJavhfyJWTLtz/gW92NGSiIreCnomz3yZfcAIKRhqbm/5qD+UQgoTtWSQ1xfpiUX ao51RjCexYABlOGbAYRWto/+W9Wgr3s= Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-6f8c53aeedbso64535376d6.2 for ; Tue, 20 May 2025 00:25:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747725944; x=1748330744; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hObiR3OS0YcfNiy0nnBYeGVqefSTZY0x1eGYXc1HwuQ=; b=CDZSLfcJji1wrAMtEBaDV7ckJFkY+OImRzTWOFvvFosaZo62EbF283Zs9n7YIrh22z 068GCDVP6Z4HtfloxMv993KtlRl7gw1nEg3fH78FUCELEMsS7EQ0c9huUnp0Md9ahjFk 5VFBkD8N24vl0GDQMBxWzh2UkGmOvtOv8T7edRkqUcGGzVEIC9cuULp3tGcKTYgPCJBF rojs1jofdWudhTVhnNdAUIJe6Nqkoi+8m0CtBXSWhpyOcUnPOfynqqUyjb/0HHDe5RuT Y0syd3qy7XKfAfEpUqMqML0ueilOq+N0yZIhnfrTuE6JxReoEEyKPDI2aTMXIm3dvuw+ CMjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747725944; x=1748330744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hObiR3OS0YcfNiy0nnBYeGVqefSTZY0x1eGYXc1HwuQ=; b=S6GDN1c7IpxiETPLI1N9JaWHfJceHd6G92FaHRwRWsMO6rHr9+wBNWmVHWUHAQWs7X mrFhpBsF/ruIE3FrCIelJ0dz9LRlU+PCZ1AIB5Z26MlxE9w8xZR/hYckQCOCaMQ7LLVG yFlG/P4aCWag6zCxZ7wNRgViTk0rb2YKJRN41aqR+AX7Egnl0ErAJ6VGdrO2U6aV9/zn VQg+f3oImlKW2Ln4E9QaWOkvjI7n9igTvVOq8K+JNBrEPvKbofos7WlUu+Mnlh68d4kz DQ8fd1fnkqvj3UPzByt3CFqTZKOvl6WqwVV8nF2Ngd4zDRR9WThlpiRayYEADVBtqjJg 8wyg== X-Forwarded-Encrypted: i=1; AJvYcCVnd0UgFek2OZNhiQtcm7RfHg1IhdFuydv4u/evOHAl95T2crv6Car9IFXNLN9pr3HgF9xJBjdXHw==@kvack.org X-Gm-Message-State: AOJu0YxN7pJoqa9hfx8PSEpElbH/49brNrsLtdWnVgFAJ5Etph4W2CJi OxJDuplmWVVLTVRaR12w7MjbyhGtxFltr6UGZTZEO8Dd+a0Lns93x4YuXT7FUCn0O3RgU8K9KYd aeduFEtGrTr4uV4jP0GKK4v5s2FaxA2M= X-Gm-Gg: ASbGncuBNcO4lzn7W4v0xWGqf6UPDlEa114nMY9Wb47SWuha6cTpil4LtmPdjlVQNin bqsqeULaZyW5AObG6Eci81ZHuDk0qJeyJ5JrftDF3rIcYpe9LZLlo3u5/aqOw8N93kiJ2KLgJhI vhHKObQYt4uWt6ISnku9HQypln9M6TCFn4OA== X-Google-Smtp-Source: AGHT+IGK8HunlVifTt49FjbjG3L8jmHKgxpHhfFG6f5CcOb2w/xPPfAA1rx/ee5dIXEH56cDqFYpFNk+JEw0DjrOmjc= X-Received: by 2002:a05:6214:260e:b0:6e8:f470:2b11 with SMTP id 6a1803df08f44-6f8b2c95b78mr271524316d6.23.1747725943836; Tue, 20 May 2025 00:25:43 -0700 (PDT) MIME-Version: 1.0 References: <20250520060504.20251-1-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Tue, 20 May 2025 15:25:07 +0800 X-Gm-Features: AX0GCFvZIuvlTgJImXf7EfQpV_z2hERe_3ttyhvibb45f9-8iHsF2EUD8BCjsME Message-ID: Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment To: Nico Pache Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: z1e1qi8y1x6yc3etic3zz5tc9dgm5s7s X-Rspam-User: X-Rspamd-Queue-Id: D4CB720002 X-Rspamd-Server: rspam06 X-HE-Tag: 1747725944-554483 X-HE-Meta: U2FsdGVkX1+uz0qtS011GEYQeo58pEZMgMoVQ3vKPLb8opQ2MwLg9id2oYjd8TOTy7iibsrKTFkExL/0mCoPxImZAo8rkYfIGA9LZuvD+zh/gf23wisQN7W5wGMZKrPBJJP8dG+IyWVUN9hJ+VJzPO2SqWnyygXNKsjlbV+zZVGMebtJrmZmcusUOQWZjipQOPxfuWm2Llczhr6sEJXk1LGav9FJYR9vgXnbr9m9TmYLqzujDQzFt7d2gbkjeiSliMjLItAjLjNQdAYauXqp+SHAg2TueduSyPknJclzVPodwBkaQVNzV3Pt1l4Jy6Wq8DlzgHOZWoD9Mx0oz2uLnRlSDz0MsWpOXQfOjYxLE/XtYLCt0atb1KGECMItC51R5pIG6k5kiIbaJZSabDKK/yva9VS+rHFB3EgahyBgdiSbGnmrnla4MQjdrgI3VJ3koVCJAtHVO5eMTHir55deTSjLJxM57CbRZRIVyzcE8A0ML/G1vC9tnh3qa4PjAEbcVyP2GtNLlfNVz+7eTZgQ6iL1ngY1kaurE6ZJRsT54wd3MKV4noB16+I90Fx5d9uxFHhtF+yPQaacgAJX3HP5x6nMJZDofnTWtLriG66zLE7FuhjgDG9XeOR4MgEXWbB2imrKaZZ7S6VkE9n+FAmUxoKvXLiFL840VUIW0OHqxvJ7NdpcaBOGGJzwe2mUB/S61Nx/fDHetcXbQWMswXcMPpOl2MfKm27dZlYWTNVLwNRnr1+nkm0QglVoayBqDWumeUha4l4lhmhLzClcSt4bwIBvooMWBlJ1U5gjMZr2zMM5/Qb82FjduZW8/dgk8uvvASeS8U7ssukqGpQoqnwYybq66iIzvbpS+7Nt2s62jsZLGuUVnKiJmYkqhA4t/Nd3sccjMohBAYs7XlCgnuYDa7UIsBxOkDbDOpgVViu7CV+mnoLQNsUKvFK3k/0AS6ikiv6y5XjXpCXmvs1whH+ 2Qklc088 DwXBm8ulVdfPMxD2DQJUrOAsmpyYzndWPr9xGMR1hzkFuaL7gchX/07noYCVspQ9tn6t/tVc4LrdI8dORp7UbGdGMA/ZYbCabiKlEMtz4Clshg6RCfeQ5Cij37tYqZgOnS86alIL5ckPPV3kV/m0l9NwFr1BEwljOt4oKQL320sAZpkl2aNL4qnyGdoHH9D/3cz77wrt6nksOQaPTq9W/ljRvWU1xn9al0hd8jfzu6RqOieW3PyuJcWJsxRwnaQ4d22SLh0Xea/i8KttmsOl8pvm6f6A7qhHrkNST8qzcgtB/kTRaWdD4tBhsL8ZAhBDKTgnIeW/HemNFPmDrrhZUAFsIc7EvNoIjuIictX3NaQmGiAWDfQ+LliWCAJn3C+o8ipwbmWG4tD1BD7V21COtx7xHW28Kx2dCQoDMbLYlm9y/U3g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 20, 2025 at 2:52=E2=80=AFPM Nico Pache wrot= e: > > On Tue, May 20, 2025 at 12:06=E2=80=AFAM Yafang Shao wrote: > > > > Background > > ---------- > > > > At my current employer, PDD, we have consistently configured THP to "ne= ver" > > on our production servers due to past incidents caused by its behavior: > > > > - Increased memory consumption > > THP significantly raises overall memory usage. > > > > - Latency spikes > > Random latency spikes occur due to more frequent memory compaction > > activity triggered by THP. > > > > These issues have made sysadmins hesitant to switch to "madvise" or > > "always" modes. > > > > New Motivation > > -------------- > > > > We have now identified that certain AI workloads achieve substantial > > performance gains with THP enabled. However, we=E2=80=99ve also verifie= d that some > > workloads see little to no benefit=E2=80=94or are even negatively impac= ted=E2=80=94by THP. > > > > In our Kubernetes environment, we deploy mixed workloads on a single se= rver > > to maximize resource utilization. Our goal is to selectively enable THP= for > > services that benefit from it while keeping it disabled for others. Thi= s > > approach allows us to incrementally enable THP for additional services = and > > assess how to make it more viable in production. > > > > Proposed Solution > > ----------------- > > > > For this use case, Johannes suggested introducing a dedicated mode [0].= In > > this new mode, we could implement BPF-based THP adjustment for fine-gra= ined > > control over tasks or cgroups. If no BPF program is attached, THP remai= ns > > in "never" mode. This solution elegantly meets our needs while avoiding= the > > complexity of managing BPF alongside other THP modes. > > > > A selftest example demonstrates how to enable THP for the current task > > while keeping it disabled for others. > > > > Alternative Proposals > > --------------------- > > > > - Gutierrez=E2=80=99s cgroup-based approach [1] > > - Proposed adding a new cgroup file to control THP policy. > > - However, as Johannes noted, cgroups are designed for hierarchical > > resource allocation, not arbitrary policy settings [2]. > > > > - Usama=E2=80=99s per-task THP proposal based on prctl() [3]: > > - Enabling THP per task via prctl(). > > - As David pointed out, neither madvise() nor prctl() works in "never= " > > mode [4], making this solution insufficient for our needs. > Hi Yafang Shao, > > I believe you would have to invert your logic and disable the > processes you dont want using THPs, and have THP=3D"madvise"|"always". I > have yet to look over Usama's solution in detail but I believe this is > possible based on his cover letter. > > I also have an alternative solution proposed here! > https://lore.kernel.org/lkml/20250515033857.132535-1-npache@redhat.com/ > > It's different in the sense it doesn't give you granular control per > process, cgroup, or BPF programmability, but it "may" suit your needs > by taming the THP waste and removing the latency spikes of PF time THP > compactions/allocations. Thank you for developing this feature. I'll review it carefully. The challenge we face is that our system administration team doesn't permit enabling THP globally in production by setting it to "madvise" or "always". As a result, we can only experiment with your feature on our test servers at this stage. Therefore, our immediate priority isn't THP optimization, but rather finding a way to safely enable THP in production first. The kernel team needs a solution that addresses this fundamental deployment hurdle before we can consider performance improvements. --=20 Regards Yafang