From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3112C433F5 for ; Fri, 26 Nov 2021 03:31:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB7A06B0075; Thu, 25 Nov 2021 22:31:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C67196B0078; Thu, 25 Nov 2021 22:31:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2ED26B007B; Thu, 25 Nov 2021 22:31:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay025.a.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id A02AE6B0075 for ; Thu, 25 Nov 2021 22:31:42 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8CB5280447 for ; Fri, 26 Nov 2021 03:31:25 +0000 (UTC) X-FDA: 78849656214.01.3708A9D Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf21.hostedemail.com (Postfix) with ESMTP id D3E65D0369F3 for ; Fri, 26 Nov 2021 03:31:19 +0000 (UTC) Received: by mail-lf1-f47.google.com with SMTP id n12so20861593lfe.1 for ; Thu, 25 Nov 2021 19:31:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lvdY5wI//P2q44HSsEXPLAPlxbCrWzDVqYZZI1tG+Q4=; b=an7o/GdBo9noKKDoxaUOyvptN195/2NF9HNGi+HNsrwtVf28nLMtWkQJedQi0sXVdv bI8RQplA6PGsqGhryxvdyLOEY4qIsxtfN5Gl5ZM0vdRs3/wrKv82hd7x1wdb/jbTjsQR JmFYdhoiRUZOK9NY3UYjYyRpp5Y4mlQjWlBuzkIQBXrMsVaRH7obD27jX6ZIyalUpUFq 3fOW7wQ3HlhI0Ed6q+wzVpWBnj3QvLlsdZzfp6c0h11Rdqdgsp5lp3zzheRlaghH1FTr C06t+z96TRLt54p6npHuymfXpNpJbPZemqxhYQAR7xoR0ipNilR75HcbEAWkZ6V5Cj1w QKYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lvdY5wI//P2q44HSsEXPLAPlxbCrWzDVqYZZI1tG+Q4=; b=J6jZIe31xynwlkz5tzFvtPZrAZKZhywDAdOl3o7VlOee6uKQrngSpNYjrbEms2HKiL YYoaSnOxDzU+qbr9rGPlpOHFzAH8ekKMHxFkEsfgmh2IT4DHFOA6b5amxcQjM8xK6sgR ZxmBrEuwFT8Fi6yya3bpwluqstaarjDgXK6Q11CJ9M6jvj4WMvEKA0qJLYTufdTzks40 BYyrHHEbakuXmn1DSTTWShar5qmJm+h0zFH67ozHW8Xksbmug6U4ShvpN6ZC6seuWueO arP1tx1eYYpbncrMRMVymFOrhkn1MEfbgKm3QDIGBZFZ5gHp8tG+vmYs1Fn2Z58HKOiD LVRA== X-Gm-Message-State: AOAM531HnnycMYbnjoC+fPfaZ/8evncvVVLsQ64arzciBFyY0anjvpwb rZ6c6QZtBT6co2mmKWBLv31BXPxUP4pSY/A1TUQKTGyIeSs= X-Google-Smtp-Source: ABdhPJyfN8zNmA8euaWFafeyUlbbfd3ZPHRcLhJAZoI8V3wTh0A2gsvk4r7oDvjPoG5hri62zvuNdW84oQb59+z3OCo= X-Received: by 2002:a05:6512:5c2:: with SMTP id o2mr26710763lfo.8.1637897480251; Thu, 25 Nov 2021 19:31:20 -0800 (PST) MIME-Version: 1.0 References: <20211125024523.2468946-1-shakeelb@google.com> In-Reply-To: From: Shakeel Butt Date: Thu, 25 Nov 2021 19:31:08 -0800 Message-ID: Subject: Re: [PATCH v2] mm: split thp synchronously on MADV_DONTNEED and munmap To: David Hildenbrand Cc: "Kirill A . Shutemov" , Yang Shi , Peter Xu , Zi Yan , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: wesu19tymp8414uecmxfurfainzdzw5w Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="an7o/GdB"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of shakeelb@google.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=shakeelb@google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D3E65D0369F3 X-HE-Tag: 1637897479-82328 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 25, 2021 at 12:39 AM David Hildenbrand wrote: > > On 25.11.21 03:45, Shakeel Butt wrote: > > Many applications do sophisticated management of their heap memory for > > better performance but with low cost. We have a bunch of such > > applications running on our production and examples include caching and > > data storage services. These applications keep their hot data on the > > THPs for better performance and release the cold data through > > MADV_DONTNEED to keep the memory cost low. > > > > The kernel defers the split and release of THPs until there is memory > > pressure. This complicates the memory management of these sophisticated > > applications which then needs to look into low level kernel handling of > > THPs to better gauge their headroom for expansion. > > > > More specifically these applications monitor their cgroup usage to decide > > if they can expand the memory footprint or release some (unneeded/cold) > > buffer. They uses madvise(MADV_DONTNEED) to release the memory which > > basically puts the THP into defer list. These deferred THPs are still > > charged to the cgroup which leads to bloated usage read by the application > > and making wrong decisions. In addition these applications are very > > latency sensitive and would prefer to not face memory reclaim due to > > non-deterministic nature of reclaim. > > > > Internally we added a cgroup interface to trigger the split of deferred > > THPs for that cgroup but this is hacky and exposing kernel internals to > > users. This patch solves this problem in a more general way for the users > > by splitting the THPS synchronously on MADV_DONTNEED. This patch does > > the same for munmap() too. > > > > I'll have to defer diving into the code. > > Just a comment: It might be good to add that there are still cases where > splitting the compound page can fail -- for example, if the page is > still pinned/referenced. > > So if you have a THP and intended to only pin/reference e.g., the first > 4k of it (e.g., O_DIRECT, io_uring fixed buffers), MADV_DONTNEED/unmap > e.g., the last 4k of it will not split synchronously. > > In addition to explicit user action on a compound page; I remember there > might be other kernel-internal temporary references that could > theoretically block splitting, but maybe most of them are at least for > now limited to !compound pages. > Hi David, Thanks for your time (and apologies) but I have to rescind this patch for now due to mistaken performance impact. Let's move the discussion to the other thread and decide next steps there. thanks, Shakeel