From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E353C433EF for ; Mon, 22 Nov 2021 18:41:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 753E56B0071; Mon, 22 Nov 2021 13:41:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6DD776B0072; Mon, 22 Nov 2021 13:41:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57ECE6B0073; Mon, 22 Nov 2021 13:41:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0047.hostedemail.com [216.40.44.47]) by kanga.kvack.org (Postfix) with ESMTP id 427456B0071 for ; Mon, 22 Nov 2021 13:41:18 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F41418BEEA for ; Mon, 22 Nov 2021 18:41:07 +0000 (UTC) X-FDA: 78837433374.12.EE62C61 Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf12.hostedemail.com (Postfix) with ESMTP id B43D710000BE for ; Mon, 22 Nov 2021 18:41:07 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id bu18so84702328lfb.0 for ; Mon, 22 Nov 2021 10:41:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vMBMH3nxAgqI4nR+9LTYPLCFPERH9Hzssq8tQRoSaGI=; b=Y0OIjE1vDrlexxo0+110L50WfDc4kTctY9TPA9hVk36RaoQCVI1iim1Q5nQ7CEe1uE m9eHvSD0Ce4cnZlIv38oUWKgsykKE5y3in/OekCyDuxahfNQbiKDa+5hocHbH6b0JMNm noMXAh8jw8uZqHTytdffuWlu22jXBpRdnsbA84sL9yn1zwVB3xr4p2vfXXjq3hQQTmoX 8xz3DaYHSdUAfeH0NVsYa7IlIrn7DI02aLW1OGIUjSy6t/1A6j+81JdRf9nTKmTd3Nlp uK6v+Y+rZsLCoiAPzFHmZtBHJYFAQrFW46UJtldLZXWLivpQpmhV7Jf/IEeS0YjRVYxR mkaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vMBMH3nxAgqI4nR+9LTYPLCFPERH9Hzssq8tQRoSaGI=; b=skRpjPV7PUqSj/qZlOBub4k2L53HbQpR3Tej8ENAfEbvInVVI9VMEWZ95T/OYLv7qy uptNeEIr6/4Pbp5+eLT6FMFdQEgWguQMnD1TdWYk7vxkkUuzXRpIFlYvDb5VjfH/cKqX +oWTiusR/k+c/01WSbR6Z2ZilZ30oE9XKwlqrKa8EwHoHDNNllrLxD2OcGsyUPCDkJuW 3pGeFJ9CFxG/r2PISdgQrzu6W6i6OV8lzNp0HZzJls+V/U21P4svHOQDe2KHq5zzoYGL xfrB8Sz3ahcVDSNM84zQILh5FohIQS/M+HeAPfzYGs4FfFlFqU0DWHJHl8m3qTUQ0BOa 9MHA== X-Gm-Message-State: AOAM530U2iwv48YPGz0o07Dmft02ERlmAqmd15T3muO1FjlxdU4JxVVQ 4PMAJeRwJ7FZZn3KHix0Mxp/+IKH2bNFycOwqsyCLQ== X-Google-Smtp-Source: ABdhPJxNjhxtgSjAfzMzTYV16WCiCzzXlQWQNLKgZY+GlWQaGMfo/XwM3kpeXQXhUmRHojCW01u5D+v7spCSniELBhE= X-Received: by 2002:a05:6512:1113:: with SMTP id l19mr59438709lfg.184.1637606465730; Mon, 22 Nov 2021 10:41:05 -0800 (PST) MIME-Version: 1.0 References: <20211120201230.920082-1-shakeelb@google.com> <25b36a5c-5bbd-5423-0c67-05cd6c1432a7@redhat.com> In-Reply-To: <25b36a5c-5bbd-5423-0c67-05cd6c1432a7@redhat.com> From: Shakeel Butt Date: Mon, 22 Nov 2021 10:40:54 -0800 Message-ID: Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED To: David Hildenbrand Cc: "Kirill A . Shutemov" , Yang Shi , Zi Yan , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B43D710000BE X-Stat-Signature: waht5s5arfbq67sjhihjnkoe1eqczd5x Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Y0OIjE1v; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of shakeelb@google.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=shakeelb@google.com X-HE-Tag: 1637606467-137467 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 22, 2021 at 12:32 AM David Hildenbrand wrote: > > On 20.11.21 21:12, Shakeel Butt wrote: > > Many applications do sophisticated management of their heap memory for > > better performance but with low cost. We have a bunch of such > > applications running on our production and examples include caching and > > data storage services. These applications keep their hot data on the > > THPs for better performance and release the cold data through > > MADV_DONTNEED to keep the memory cost low. > > > > The kernel defers the split and release of THPs until there is memory > > pressure. This causes complicates the memory management of these > > sophisticated applications which then needs to look into low level > > kernel handling of THPs to better gauge their headroom for expansion. > > Can you elaborate a bit on that point? What exactly does such an > application do? I would have assumed that it's mostly transparent for > applications. > The application monitors its cgroup usage to decide if it can expand the memory footprint or release some (unneeded/cold) buffer. It releases madvise(MADV_DONTNEED) to release the memory which basically puts the THP into defer list. These deferred THPs are still charged to the cgroup which leads to bloated usage read by the application and making wrong decisions. Internally we added a cgroup interface to trigger the split of deferred THPs for that cgroup but this is hacky and exposing kernel internals to users. I want to solve this problem in a more general way for the users. > > In > > addition these applications are very latency sensitive and would prefer > > to not face memory reclaim due to non-deterministic nature of reclaim. > > That makes sense. > > > > > This patch let such applications not worry about the low level handling > > of THPs in the kernel and splits the THPs synchronously on > > MADV_DONTNEED. > > The main user I'm concerned about is virtio-balloon, which ends up > discarding VM memory via MADV_DONTNEED when inflating the balloon in the > guest in 4k granularity, but also during "free page reporting" > continuously when e.g., a 2MiB page becomes free in the guest. We want > both activities to be fast, and especially during "free page reporting", > to defer any heavy work. Thanks for the info. What is the source virtio-balloon used for free pages? > > Do we have a performance evaluation how much overhead is added e.g., for > a single 4k MADV_DONTNEED call on a THP or on a MADV_DONTNEED call that > covers the whole THP? I did a simple benchmark of madvise(MADV_DONTNEED) on 10000 THPs on x86 for both settings you suggested. I don't see any statistically significant difference with and without the patch. Let me know if you want me to try something else. Thanks for the review. Shakeel