From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3FDB6C433F5
	for <linux-mm@archiver.kernel.org>; Tue, 22 Mar 2022 03:12:09 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8B59E6B0072; Mon, 21 Mar 2022 23:12:08 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 83EAF6B0073; Mon, 21 Mar 2022 23:12:08 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6B9356B0074; Mon, 21 Mar 2022 23:12:08 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27])
	by kanga.kvack.org (Postfix) with ESMTP id 5AC8B6B0072
	for <linux-mm@kvack.org>; Mon, 21 Mar 2022 23:12:08 -0400 (EDT)
Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id 23E332377D
	for <linux-mm@kvack.org>; Tue, 22 Mar 2022 03:12:08 +0000 (UTC)
X-FDA: 79270548336.12.84212EA
Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174])
	by imf19.hostedemail.com (Postfix) with ESMTP id 93DA81A001E
	for <linux-mm@kvack.org>; Tue, 22 Mar 2022 03:12:07 +0000 (UTC)
Received: by mail-qt1-f174.google.com with SMTP id j21so13584518qta.0
        for <linux-mm@kvack.org>; Mon, 21 Mar 2022 20:12:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=message-id:date:from:to:cc:subject:references:mime-version
         :content-disposition:in-reply-to;
        bh=wG5fSZfV4Ui132L8PxZtgVxXrjvZE1zrn5D7t7om3CQ=;
        b=j+MhUKpC99YlXHJmofUFmvZ7JMm8f1lD1dEv5viTB7yo1w6WsEn9LRkwRmi1udZlBo
         lbw1PxzP/UzoGGZbdcu+DXrMkyzobI34uZwgD/t6h0mWYPTldfk3E8qx6HEovODviD0B
         DgNFp/S6SO3uQePK5Xea2muNAMpjhqXZSzvPmPey7PfiVqkzpOwbG7MNUotqcJx28qg9
         hRbiykT5yVt4nHsmCISYmgHePdMh+hEvl8kn7L6yUHHomu9VFDnJTG/Ezy02wpZw9GFk
         b+UBkQlsZ3pq+F/DoQAHvXiB4TmJovm7+ta6stuu1yoJD/owNDh7p6WAG2O+ZhsW2Cao
         d1Sw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:message-id:date:from:to:cc:subject:references
         :mime-version:content-disposition:in-reply-to;
        bh=wG5fSZfV4Ui132L8PxZtgVxXrjvZE1zrn5D7t7om3CQ=;
        b=cryQouPNfL5x7S+LGosQgFrwlzh3W+rxpv1T6EkHVEQ//VqAHIFLCmdqfhZdUMDO85
         E6PxevqkrC9j31qr84mI9H/rIC2n1vaxdEmeUF89kMFmdtWhwqxrO+kILsqkdfDZ7TRF
         YCDVWn6zmKPPuDkBg7lRo7w9TCtFNnXkhJsvmAfCw52QDbQ/+Nhme2HSaWup69yMauBT
         LMab3T155+phFpZFJg2CtOYzZfY5RZm479yRQBuitcf/vq+h/YwYKXY2cNvxRmCoask3
         jgxIG82AUzCI6MimTmpb/jvSBeM+NOoI/y9Q4XGPSmc8/O7J4/DyLJTRTzK+i4lMgU8q
         QL0Q==
X-Gm-Message-State: AOAM5321YqsVHUk/eROZMIUyoWkjKwcXV0tEw21PeHPpnYd6Ap6AMDT0
	Il9i7BYMZK7cG9Wg9aIIWxo=
X-Google-Smtp-Source: ABdhPJwzZKQggTLLNuTOg+kQxkzYF5H19omxmoi4DnXhZEUanJJDTiU+sV1zxIxuk1GtpC+shRknig==
X-Received: by 2002:a05:622a:40a:b0:2e0:7235:f7a9 with SMTP id n10-20020a05622a040a00b002e07235f7a9mr18677883qtx.500.1647918726766;
        Mon, 21 Mar 2022 20:12:06 -0700 (PDT)
Received: from localhost ([193.203.214.57])
        by smtp.gmail.com with ESMTPSA id s19-20020a05622a179300b002e1ceeb21d0sm12911910qtk.97.2022.03.21.20.12.05
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 21 Mar 2022 20:12:06 -0700 (PDT)
Message-ID: <62393e86.1c69fb81.bb254.3d1a@mx.google.com>
X-Google-Original-Message-ID: <20220322031203.GB2326136@cgel.zte@gmail.com>
Date: Tue, 22 Mar 2022 03:12:03 +0000
From: CGEL <cgel.zte@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: bsingharora@gmail.com, akpm@linux-foundation.org,
	yang.yang29@zte.com.cn, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH] delayacct: track delays from ksm cow
References: <20220316133420.2131707-1-yang.yang29@zte.com.cn>
 <412dc01c-8829-eac2-52c7-3f704dbb5a98@redhat.com>
 <6232970f.1c69fb81.4e365.c9f2@mx.google.com>
 <4e76476b-1da0-09c5-7dc4-0b2db796a549@redhat.com>
 <62330402.1c69fb81.d2ba6.0538@mx.google.com>
 <987bd014-c5ab-52cb-627e-2085560cb327@redhat.com>
 <6233e342.1c69fb81.692f.6286@mx.google.com>
 <2bb1c357-5335-9d96-d862-bd51c1014193@redhat.com>
 <6236c600.1c69fb81.7cd4.a900@mx.google.com>
 <0414c610-7f56-2dd2-0d83-ac3a5194eb60@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <0414c610-7f56-2dd2-0d83-ac3a5194eb60@redhat.com>
X-Rspam-User: 
X-Rspamd-Queue-Id: 93DA81A001E
X-Stat-Signature: bdb6p77ixdg5nu1858xwtq93y7gott3i
Authentication-Results: imf19.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=j+MhUKpC;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf19.hostedemail.com: domain of cgel.zte@gmail.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=cgel.zte@gmail.com
X-Rspamd-Server: rspam01
X-HE-Tag: 1647918727-848500
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Mar 21, 2022 at 04:45:40PM +0100, David Hildenbrand wrote:
> On 20.03.22 07:13, CGEL wrote:
> > On Fri, Mar 18, 2022 at 09:24:44AM +0100, David Hildenbrand wrote:
> >> On 18.03.22 02:41, CGEL wrote:
> >>> On Thu, Mar 17, 2022 at 11:05:22AM +0100, David Hildenbrand wrote:
> >>>> On 17.03.22 10:48, CGEL wrote:
> >>>>> On Thu, Mar 17, 2022 at 09:17:13AM +0100, David Hildenbrand wrote:
> >>>>>> On 17.03.22 03:03, CGEL wrote:
> >>>>>>> On Wed, Mar 16, 2022 at 03:56:23PM +0100, David Hildenbrand wrote:
> >>>>>>>> On 16.03.22 14:34, cgel.zte@gmail.com wrote:
> >>>>>>>>> From: Yang Yang <yang.yang29@zte.com.cn>
> >>>>>>>>>
> >>>>>>>>> Delay accounting does not track the delay of ksm cow.  When tasks
> >>>>>>>>> have many ksm pages, it may spend a amount of time waiting for ksm
> >>>>>>>>> cow.
> >>>>>>>>>
> >>>>>>>>> To get the impact of tasks in ksm cow, measure the delay when ksm
> >>>>>>>>> cow happens. This could help users to decide whether to user ksm
> >>>>>>>>> or not.
> >>>>>>>>>
> >>>>>>>>> Also update tools/accounting/getdelays.c:
> >>>>>>>>>
> >>>>>>>>>     / # ./getdelays -dl -p 231
> >>>>>>>>>     print delayacct stats ON
> >>>>>>>>>     listen forever
> >>>>>>>>>     PID     231
> >>>>>>>>>
> >>>>>>>>>     CPU             count     real total  virtual total    delay total  delay average
> >>>>>>>>>                      6247     1859000000     2154070021     1674255063          0.268ms
> >>>>>>>>>     IO              count    delay total  delay average
> >>>>>>>>>                         0              0              0ms
> >>>>>>>>>     SWAP            count    delay total  delay average
> >>>>>>>>>                         0              0              0ms
> >>>>>>>>>     RECLAIM         count    delay total  delay average
> >>>>>>>>>                         0              0              0ms
> >>>>>>>>>     THRASHING       count    delay total  delay average
> >>>>>>>>>                         0              0              0ms
> >>>>>>>>>     KSM             count    delay total  delay average
> >>>>>>>>>                      3635      271567604              0ms
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> TBH I'm not sure how particularly helpful this is and if we want this.
> >>>>>>>>
> >>>>>>> Thanks for replying.
> >>>>>>>
> >>>>>>> Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want
> >>>>>>> save memory, it's a tradeoff by suffering delay on ksm cow. Users can
> >>>>>>> get to know how much memory ksm saved by reading
> >>>>>>> /sys/kernel/mm/ksm/pages_sharing, but they don't know what the costs of
> >>>>>>> ksm cow delay, and this is important of some delay sensitive tasks. If
> >>>>>>> users know both saved memory and ksm cow delay, they could better use
> >>>>>>> madvise(, , MADV_MERGEABLE).
> >>>>>>
> >>>>>> But that happens after the effects, no?
> >>>>>>
> >>>>>> IOW a user already called madvise(, , MADV_MERGEABLE) and then gets the
> >>>>>> results.
> >>>>>>
> >>>>> Image user are developing or porting their applications on experiment
> >>>>> machine, they could takes those benchmark as feedback to adjust whether
> >>>>> to use madvise(, , MADV_MERGEABLE) or it's range.
> >>>>
> >>>> And why can't they run it with and without and observe performance using
> >>>> existing metrics (or even application-specific metrics?)?
> >>>>
> >>>>
> >>> I think the reason why we need this patch, is just like why we need                                                                                                     
> >>> swap,reclaim,thrashing getdelay information. When system is complex,
> >>> it's hard to precise tell which kernel activity impact the observe
> >>> performance or application-specific metrics, preempt? cgroup throttle?
> >>> swap? reclaim? IO?
> >>>
> >>> So if we could get the factor's precise impact data, when we are tunning
> >>> the factor(for this patch it's ksm), it's more efficient.
> >>>
> >>
> >> I'm not convinced that we want to make or write-fault handler more
> >> complicated for such a corner case with an unclear, eventual use case.
> > 
> > IIRC, KSM is designed for VM. But recently we found KSM works well for
> > system with many containers(save about 10%~20% of total memroy), and
> > container technology is more popular today, so KSM may be used more.
> > 
> > To reduce the impact for write-fault handler, we may write a new function
> > with ifdef CONFIG_KSM inside to do this job?
> 
> Maybe we just want to catch the impact of the write-fault handler when
> copying more generally?
>
We know kernel has different kind of COW, some are transparent for user.
For example child process may cause COW, and user should not care this
performance impact, because it's kernel inside mechanism, user is hard
to do something. But KSM is different, user can do the policy tuning in
userspace. If we metric all the COW, it may be noise, doesn't it?
> > 
> >> IIRC, whenever using KSM you're already agreeing to eventually pay a
> >> performance price, and the price heavily depends on other factors in the
> >> system. Simply looking at the number of write-faults might already give
> >> an indication what changed with KSM being enabled.
> >>
> > While saying "you're already agreeing to pay a performance price", I think
> > this is the shortcoming of KSM that putting off it being used more widely.
> > It's not easy for user/app to decide how to use madvise(, ,MADV_MERGEABLE).
> 
> ... and my point is that the metric you're introducing might absolutely
> not be expressive for such users playing with MADV_MERGEABLE. IMHO
> people will look at actual application performance to figure out what
> "harm" will be done, no?
> 
> But I do see value in capturing how many COW we have in general --
> either via a counter or via a delay as proposed by you.
> 
Thanks for your affirmative. As describe above, or we add a vm counter:
KSM_COW? 
> > 
> > Is there a more easy way to use KSM, enjoying memory saving while minimum
> > the performance price for container? We think it's possible, and are working
> > for a new patch: provide a knob for cgroup to enable/disable KSM for all tasks
> > in this cgroup, so if your container is delay sensitive just leave it, and if
> > not you can easy to enable KSM without modify app code.
> > 
> > Before using the new knob, user might want to know the precise impact of KSM.
> > I think write-faults is indirection. If indirection is good enough, why we need
> > taskstats and PSI? By the way, getdelays support container statistics.
> 
> Would anything speak against making this more generic and capturing the
> delay for any COW, not just for KSM?
I think we'd better to export data to userspace that is meaning for user.
User may no need kernel inside mechanism'data.

Thanks.
> 
> -- 
> Thanks,
> 
> David / dhildenb