From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB353C433F5 for ; Tue, 22 Mar 2022 07:55:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4434A6B0072; Tue, 22 Mar 2022 03:55:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F3BB8D0001; Tue, 22 Mar 2022 03:55:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 293E36B0074; Tue, 22 Mar 2022 03:55:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 150C36B0072 for ; Tue, 22 Mar 2022 03:55:24 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C827723A5D for ; Tue, 22 Mar 2022 07:55:23 +0000 (UTC) X-FDA: 79271262126.04.AC75479 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 1C0C04002D for ; Tue, 22 Mar 2022 07:55:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647935722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BJc58o01x4xi25HDj10JnBcdFVegGKt6zvmfuuygfHw=; b=hDe7IU5eZBhK9mCnn1+LhJ3pXH9Mc53eqBxYvyMiBH8aBvtmK3vM5LWwOf15ZbJSe1oMsL jG4/zP0lxy2Yr/Z2xBJGiggFaXF/t1XIigpxBJ9z35gnIqnSuXz1DF8St4CJPztJcEdFUN +B6IthrvP39et7dz4dYDo1YvjRkqiac= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-219-AKigObWoN8WGtDtjM2vYbQ-1; Tue, 22 Mar 2022 03:55:19 -0400 X-MC-Unique: AKigObWoN8WGtDtjM2vYbQ-1 Received: by mail-wm1-f71.google.com with SMTP id v2-20020a05600c214200b0038c7c02deceso494062wml.8 for ; Tue, 22 Mar 2022 00:55:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=BJc58o01x4xi25HDj10JnBcdFVegGKt6zvmfuuygfHw=; b=fjmJZ88JFPbq0iBzR7RscVNQDXB+3RwzRi4GlgleL8fkLUazWmtOk+RIbKUMDHdV9A K00MKDBcvfr1eFX7TsKlFGJ79RVnt9D/CGJFftBnn/YcgAb6HsTH8/jh1FpTd2MFf/Rs SMb2Ar/NcimMppef0eOypQYm836nTRUfD3O/CItxcLgDIIX47s36aDNceNEmCvqu+QtM aJdG6q/Y40H5+R4/1AsRJv3PpLS6ILv1nQacFvlVNoVG3wx6XejJdy0ke+IIkrvAaebP Q9AuFLTQf2mBKECEqV38mUTlwCJaxooJ/RZF9UEqswifzNq5TaAbk7uyLiNfN5YNwsge Qcxw== X-Gm-Message-State: AOAM5319JtiHSTDCK8a+6x93hi+BkkwtJfFgP9k9IO0eBYG0WR9ta2ru ldwFWcajCK7O3HTGCkOhtiYTace71xTx7T9esun/GfZXr3yHIu1ogQP6eL0A6Kd/R3EI2+QvxtM hlrEFl+ZcREA= X-Received: by 2002:a05:600c:3d12:b0:38c:a561:f622 with SMTP id bh18-20020a05600c3d1200b0038ca561f622mr2482083wmb.139.1647935717916; Tue, 22 Mar 2022 00:55:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwte+M8exmMlZKydOJ27q3uEeckqtuo0Lnx4b7RV8jlFmFnE6yWNzJTZtfnTorfiXQkViJMcg== X-Received: by 2002:a05:600c:3d12:b0:38c:a561:f622 with SMTP id bh18-20020a05600c3d1200b0038ca561f622mr2482054wmb.139.1647935717459; Tue, 22 Mar 2022 00:55:17 -0700 (PDT) Received: from ?IPV6:2003:cb:c708:de00:549e:e4e4:98df:ff72? (p200300cbc708de00549ee4e498dfff72.dip0.t-ipconnect.de. [2003:cb:c708:de00:549e:e4e4:98df:ff72]) by smtp.gmail.com with ESMTPSA id n14-20020a7bcbce000000b0038c7776a300sm1698090wmi.0.2022.03.22.00.55.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Mar 2022 00:55:17 -0700 (PDT) Message-ID: Date: Tue, 22 Mar 2022 08:55:15 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: CGEL Cc: bsingharora@gmail.com, akpm@linux-foundation.org, yang.yang29@zte.com.cn, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20220316133420.2131707-1-yang.yang29@zte.com.cn> <412dc01c-8829-eac2-52c7-3f704dbb5a98@redhat.com> <6232970f.1c69fb81.4e365.c9f2@mx.google.com> <4e76476b-1da0-09c5-7dc4-0b2db796a549@redhat.com> <62330402.1c69fb81.d2ba6.0538@mx.google.com> <987bd014-c5ab-52cb-627e-2085560cb327@redhat.com> <6233e342.1c69fb81.692f.6286@mx.google.com> <2bb1c357-5335-9d96-d862-bd51c1014193@redhat.com> <6236c600.1c69fb81.7cd4.a900@mx.google.com> <0414c610-7f56-2dd2-0d83-ac3a5194eb60@redhat.com> <62393e86.1c69fb81.bb254.3d1a@mx.google.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] delayacct: track delays from ksm cow In-Reply-To: <62393e86.1c69fb81.bb254.3d1a@mx.google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hDe7IU5e; spf=none (imf12.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Stat-Signature: ouhia5zcrq7tsiutrm8r1d8koq7ttger X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1C0C04002D X-HE-Tag: 1647935722-929394 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 22.03.22 04:12, CGEL wrote: > On Mon, Mar 21, 2022 at 04:45:40PM +0100, David Hildenbrand wrote: >> On 20.03.22 07:13, CGEL wrote: >>> On Fri, Mar 18, 2022 at 09:24:44AM +0100, David Hildenbrand wrote: >>>> On 18.03.22 02:41, CGEL wrote: >>>>> On Thu, Mar 17, 2022 at 11:05:22AM +0100, David Hildenbrand wrote: >>>>>> On 17.03.22 10:48, CGEL wrote: >>>>>>> On Thu, Mar 17, 2022 at 09:17:13AM +0100, David Hildenbrand wrote: >>>>>>>> On 17.03.22 03:03, CGEL wrote: >>>>>>>>> On Wed, Mar 16, 2022 at 03:56:23PM +0100, David Hildenbrand wrote: >>>>>>>>>> On 16.03.22 14:34, cgel.zte@gmail.com wrote: >>>>>>>>>>> From: Yang Yang >>>>>>>>>>> >>>>>>>>>>> Delay accounting does not track the delay of ksm cow. When tasks >>>>>>>>>>> have many ksm pages, it may spend a amount of time waiting for ksm >>>>>>>>>>> cow. >>>>>>>>>>> >>>>>>>>>>> To get the impact of tasks in ksm cow, measure the delay when ksm >>>>>>>>>>> cow happens. This could help users to decide whether to user ksm >>>>>>>>>>> or not. >>>>>>>>>>> >>>>>>>>>>> Also update tools/accounting/getdelays.c: >>>>>>>>>>> >>>>>>>>>>> / # ./getdelays -dl -p 231 >>>>>>>>>>> print delayacct stats ON >>>>>>>>>>> listen forever >>>>>>>>>>> PID 231 >>>>>>>>>>> >>>>>>>>>>> CPU count real total virtual total delay total delay average >>>>>>>>>>> 6247 1859000000 2154070021 1674255063 0.268ms >>>>>>>>>>> IO count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> SWAP count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> RECLAIM count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> THRASHING count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> KSM count delay total delay average >>>>>>>>>>> 3635 271567604 0ms >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> TBH I'm not sure how particularly helpful this is and if we want this. >>>>>>>>>> >>>>>>>>> Thanks for replying. >>>>>>>>> >>>>>>>>> Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want >>>>>>>>> save memory, it's a tradeoff by suffering delay on ksm cow. Users can >>>>>>>>> get to know how much memory ksm saved by reading >>>>>>>>> /sys/kernel/mm/ksm/pages_sharing, but they don't know what the costs of >>>>>>>>> ksm cow delay, and this is important of some delay sensitive tasks. If >>>>>>>>> users know both saved memory and ksm cow delay, they could better use >>>>>>>>> madvise(, , MADV_MERGEABLE). >>>>>>>> >>>>>>>> But that happens after the effects, no? >>>>>>>> >>>>>>>> IOW a user already called madvise(, , MADV_MERGEABLE) and then gets the >>>>>>>> results. >>>>>>>> >>>>>>> Image user are developing or porting their applications on experiment >>>>>>> machine, they could takes those benchmark as feedback to adjust whether >>>>>>> to use madvise(, , MADV_MERGEABLE) or it's range. >>>>>> >>>>>> And why can't they run it with and without and observe performance using >>>>>> existing metrics (or even application-specific metrics?)? >>>>>> >>>>>> >>>>> I think the reason why we need this patch, is just like why we need >>>>> swap,reclaim,thrashing getdelay information. When system is complex, >>>>> it's hard to precise tell which kernel activity impact the observe >>>>> performance or application-specific metrics, preempt? cgroup throttle? >>>>> swap? reclaim? IO? >>>>> >>>>> So if we could get the factor's precise impact data, when we are tunning >>>>> the factor(for this patch it's ksm), it's more efficient. >>>>> >>>> >>>> I'm not convinced that we want to make or write-fault handler more >>>> complicated for such a corner case with an unclear, eventual use case. >>> >>> IIRC, KSM is designed for VM. But recently we found KSM works well for >>> system with many containers(save about 10%~20% of total memroy), and >>> container technology is more popular today, so KSM may be used more. >>> >>> To reduce the impact for write-fault handler, we may write a new function >>> with ifdef CONFIG_KSM inside to do this job? >> >> Maybe we just want to catch the impact of the write-fault handler when >> copying more generally? >> > We know kernel has different kind of COW, some are transparent for user. > For example child process may cause COW, and user should not care this > performance impact, because it's kernel inside mechanism, user is hard > to do something. But KSM is different, user can do the policy tuning in > userspace. If we metric all the COW, it may be noise, doesn't it? Only to some degree I think. The other delays (e.g., SWAP, RECLAIM) are also not completely transparent to the user, no? I mean, user space might affect them to some degree with some tunables, but it's not completely transparent for the user either. IIRC, we have these sources of COW that result in a r/w anon page (-> MAP_PRIVATE): (1) R/O-mapped, (possibly) shared anonymous page (fork() or KSM) (2) R/O-mapped, shared zeropage (e.g., KSM, read-only access to unpopulated page in MAP_ANON) (3) R/O-mapped, shared file/device/... page that requires a private copy on modifications (e.g., MAP_PRIVATE !MAP_ANON) Note that your current patch won't catch when KSM placed the shared zeropage (use_zero_page). Tracking the overall overhead might be of value I think, and it would still allow for determining how much KSM is involved by measuring with and without KSM enabled. >>> >>>> IIRC, whenever using KSM you're already agreeing to eventually pay a >>>> performance price, and the price heavily depends on other factors in the >>>> system. Simply looking at the number of write-faults might already give >>>> an indication what changed with KSM being enabled. >>>> >>> While saying "you're already agreeing to pay a performance price", I think >>> this is the shortcoming of KSM that putting off it being used more widely. >>> It's not easy for user/app to decide how to use madvise(, ,MADV_MERGEABLE). >> >> ... and my point is that the metric you're introducing might absolutely >> not be expressive for such users playing with MADV_MERGEABLE. IMHO >> people will look at actual application performance to figure out what >> "harm" will be done, no? >> >> But I do see value in capturing how many COW we have in general -- >> either via a counter or via a delay as proposed by you. >> > Thanks for your affirmative. As describe above, or we add a vm counter: > KSM_COW? As I'm messing with the COW logic lately (e.g., [1]) I'd welcome vm counters for all different kind of COW-related events, especially (1) COW of an anon, !KSM page (2) COW of a KSM page (3) COW of the shared zeropage (4) Reuse instead of COW I used some VM counters myself to debug/test some of my latest changes. >>> >>> Is there a more easy way to use KSM, enjoying memory saving while minimum >>> the performance price for container? We think it's possible, and are working >>> for a new patch: provide a knob for cgroup to enable/disable KSM for all tasks >>> in this cgroup, so if your container is delay sensitive just leave it, and if >>> not you can easy to enable KSM without modify app code. >>> >>> Before using the new knob, user might want to know the precise impact of KSM. >>> I think write-faults is indirection. If indirection is good enough, why we need >>> taskstats and PSI? By the way, getdelays support container statistics. >> >> Would anything speak against making this more generic and capturing the >> delay for any COW, not just for KSM? > I think we'd better to export data to userspace that is meaning for user. > User may no need kernel inside mechanism'data. Reading Documentation/accounting/delay-accounting.rst I wonder what we best put in there. "Tasks encounter delays in execution when they wait for some kernel resource to become available." I mean, in any COW event we are waiting for the kernel to create a copy. This could be of value even if we add separate VM counters. [1] https://lore.kernel.org/linux-mm/20220315104741.63071-2-david@redhat.com/T/ -- Thanks, David / dhildenb