From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FA00C19F32 for ; Sun, 2 Mar 2025 21:41:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 774F26B007B; Sun, 2 Mar 2025 16:41:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 755DA6B0083; Sun, 2 Mar 2025 16:41:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 613876B0085; Sun, 2 Mar 2025 16:41:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 43E126B007B for ; Sun, 2 Mar 2025 16:41:52 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BECFD4C4EC for ; Sun, 2 Mar 2025 21:41:51 +0000 (UTC) X-FDA: 83177933622.29.E00298E Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf06.hostedemail.com (Postfix) with ESMTP id 1C56B180011 for ; Sun, 2 Mar 2025 21:41:49 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=TttXo9gk; spf=pass (imf06.hostedemail.com: domain of sj@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740951710; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qsCGwQ85kXFFCBdpOtwBJeTXiu5RXmz7Avdwr4y2FMo=; b=5KQCd4fZ0djzHozGlBsr2FhjuLj71T1tcLgNmzXWgAlWA/l1WsoiNp39zrxZDkvQMvRXPW WYchTJwh/MxXr+mlNwXX5F4D3qIYOLypik4pcfhcmDwmuFITjEu4IB6hogtivJTnnUsEhW xvfZc3mwln8VzwnuAGpZLYaHqfbOp3s= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=TttXo9gk; spf=pass (imf06.hostedemail.com: domain of sj@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740951710; a=rsa-sha256; cv=none; b=ES67U2fKZiRVW6C5+40YELLh0LA3wavb9HI2IS1QljvH7areJHwZ5cXDbtOVKeeCJKzYwX V/bmQwP9fNXx3Tx0y6lRzl4hbaMxhfj85fEx0EPBteghNEx9d1YjGJ1BrZAPgdBjv3NtR0 Hc+k0M8Qw2+Q4FXKCihqBSJFUHceTu4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id B89016114C; Sun, 2 Mar 2025 21:41:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96E3EC4CED6; Sun, 2 Mar 2025 21:41:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1740951708; bh=TS5/fPL3sXyZQnxMiq4LqF/VK8bO7BSw0sx1YR3h7Vs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TttXo9gkQNCoAOvkUFlZhP6Czz37aVz8zjMAUTRK8TEeCU/x5JUlzBuUhw9BtwsNb s+w8ycZVIfcFf/cA+81UVBkbTM62j9KWhDFTlJuJDB7gM/6JGWfdLRxgFJyyTbUSrl LlixIYEL1VVYx7LGzqy1kA+D0y1LtqONKyE/ZB3gEAHZKcmkFOozko1ZYDnPNa2UWy YmZ9spENNpYD13fTU+ewQ/k9WX1dGW8AiFHsxn4TiVZ92UTGmuSiHGoLzVOZMmWGLT 5djAus+nQk3KzNyzdRTzAP55HFHsExeWHQn5QRxX8KCU/kJ6uaoEgfaMutlXIAiBo6 Rihq6ATzTBfbA== From: SeongJae Park To: SeongJae Park Cc: Andrew Morton , damon@lists.linux.dev, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH 03/13] mm/damon/core: make damon_set_attrs() be safe to be called from damon_call() Date: Sun, 2 Mar 2025 13:41:45 -0800 Message-Id: <20250302214145.356806-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250226063651.513178-4-sj@kernel.org> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: tgwbx93ouyauinf54phmz7rfcaoxn3k7 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 1C56B180011 X-Rspam-User: X-HE-Tag: 1740951709-982269 X-HE-Meta: U2FsdGVkX1/Oq+xuLIW6fK7k9YBKjpI6UqtXUyFFi/UOry9QsRsZd7oXWenDwsqXOjbc75yW8+SuEY8iHFFv4cejQbfwRsYHoZPOKn17M9BiRKn79tbeKYZUXHsiWPoJ9Fa9airJ5dmjWtyCH73cmWKjuYneyJQjyB38jSIvj6hfK6stfYJYJl7Ee3QmFBEZHU2AaBLWvZHbNFxRYPFsSfOUF1j5fU6RcID1O1DMkxObhK0e6rG8SuiqF+GT18s/N8uiusQ5jqQ0V2+kRTtz/r0NnjhdSWLHNG6c+G1o5cXINp9j3Zn2W93yYO3rEDJFqVBFVAYVpUHrNJpy2i2y8T7bYTA6CqYRgJ/CWWc2UftZeshxjJY0IiQgDF+RdbqlGOP9fvgfSiWSPrB8Isl97Ea9VYyteiIPA/P5g3SCu/rleKRjY0FlXgu1zeNxMSXuKjkrz2yhGfOyYTv2pF0QJW05sJ6PuQ3V7NVsprSaj0Zk6aGvxWkikOtKAWlt5Wf7DDbUVhO9TeWA1BEzgteyppiKtu7BzyTzNRTgSYSdWpZ07aaGswlqh6pxefKERLw0621xBaJRMqDMPW8p7aJcjReOJ8BSmkEc96K2O7ClrteNPr8evHvXVmNfDwWbhPWXSXDvr0/zEdNB8pqBQ9UEsUIxRsSitrbCamA1hOf1k7x3S5r4HfRewa7VpoMQotIrXHa6MXIRi7eM+yT6ohbAW5ZH1mnL+Hoxo4N2VYLosdAUo0jHxeVmmH8kCYmr/IpoOFKALtsObzGjjqoxM4vvmbS9wIAwDlLjvySBjzuGFsciRfVYj9X0mgKBjbK9xq27L/pOgqgw912XoLMZJRLp4+PMQS9g+rFNDhqaSJvZaB5JfyUAYqOgk0kZAiFLCzjJ5oe30Bw9YCvJy6NpFdFRoNotoTTGznKvSxdX+rKaLlDb+/vMKqym3339gwyJL+S3KetmaDqgoagy0wF3zZo 21qa1Bqh qiQo9gDPHefFW5K/Dgh/bF2/g2FaOKl+BVx1C7K2nJ067Vx1vb8QEPGOwjXy3qsQJljC5T71HUJkzi0o9BauXPEOZmBFKgVIQAx7cB6plo+fTwNtK1St+PD0jwVWaJYSbkmbTXcI9PD7cQ27n1jiwZFHQjl9zHQLOfwu61th40ZOUxqMLa4wXs5GfsQIFp9ve+Az7DulIxxpTMdy9VNHEHlr16tdDEujcUbXH2W4SeFn9A7il2WX3YoJWRnCaHOQ+awxQmfTzhBdEJEgn7aAsdYBjkw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 25 Feb 2025 22:36:41 -0800 SeongJae Park wrote: > Currently all DAMON kernel API callers do online DAMON parameters commit > from damon_callback->after_aggregation because only those are safe place > to call the DAMON monitoring attributes update function, namely > damon_set_attrs(). > > Because damon_callback hooks provides no synchronization, the callers > works in asynchronous ways or implement their own inefficient and > complicated synchronization mechanisms. It also means online DAMON > parameters commit can take up to one aggregation interval. On large > systems having long aggregation intervals, that can be too slow. The > synchronization can be done in more efficient and simple way while > removing the latency constraint if it can be done using damon_call(). > > The fact that damon_call() can be executed in the middle of the > aggregation makes damon_set_attrs() unsafe to be called from it, though. > Two real problems can occur in the case. First, converting the not yet > completely aggregated nr_accesses for new user-set intervals can > arguably degrade the accuracy or at least make the logic complicated. > Second, kdamond_reset_aggregated() will not be called after the > monitoring results update, so next aggregation starts from unclean > state. This can result in inconsistent and unexpected nr_accesses. > > Make it safe as follows. Catch the middle-of-the-aggregation case from > damon_set_attrs() and pass the information to nr_accesses conversion > logic. The logic works as before if this is not the case (called after > the current aggregation is completed). If not, it drops the nr_accesses > information that so far aggregated, and make the status same to the > beginning of this aggregation, but as if the last aggregation was ran > with the updated sampling/aggregation intervals. This itself has no problem. But this can cause a problem when this is applied on top of DAMON monitoring intervals auto-tune patch[1], which makes damon_set_attrs() can also be called from if-caluse for aggregated information sharing and reset. The problematic case happens when damon_set_attrs() from kdamond_call() and kdamond_tune_intervals() in same iteration. It means it is the last iteration of the current aggregation. damon_set_attrs() that called from kdamond_call() understands the aggregation is finished and updates aggregation information correctly. But, it also updates damon_ctx->next_aggregation_sis. When damon_set_attrs() is called again from kdamond_tune_intervals(), it shows the prior damon_set_attrs() invocation updated ->next_aggregation_sis and think this is in the middle of the aggregation. Hence it further resets the aggregated information. Now, kdamond_reset_aggregated() is called after the second invocation of damon_set_attrs() in the same kdamond main loop iteration, and corrupts the aggregated information of regions. Particularly, this can mad the pseudo-moving-sum access frequency information broken. Simplest fix would be resetting the prior damon_set_attrs() updated ->next_intervals_tune_sis, like below diff. --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -2538,6 +2538,24 @@ static int kdamond_fn(void *data) if (ctx->attrs.intervals_goal.aggrs && ctx->passed_sample_intervals >= ctx->next_intervals_tune_sis) { + /* + * ctx->next_aggregation_sis might be updated + * from kdamond_call(). In the case, + * damon_set_attrs() which will be called from + * kdamond_tune_interval() may wrongly think + * this is in the middle of the current + * aggregation, and make aggregation + * information reset for all regions. Then, + * following kdamond_reset_aggregated() call + * will make the region information invalid, + * particularly for ->nr_accesses_bp. + * + * Reset ->next_aggregation_sis to avoid that. + * It will anyway correctly updated after this + * if caluse. + */ + ctx->next_aggregation_sis = + next_aggregation_sis; ctx->next_intervals_tune_sis += ctx->attrs.aggr_samples * ctx->attrs.intervals_goal.aggrs; I will add the change to this patch or the autotune patch, depending on their submission order. [1] https://lore.kernel.org/20250228220328.49438-3-sj@kernel.org Thanks, SJ > > Signed-off-by: SeongJae Park > --- > mm/damon/core.c | 38 ++++++++++++++++++++++++++----------- > mm/damon/tests/core-kunit.h | 6 +++--- > 2 files changed, 30 insertions(+), 14 deletions(-) > > diff --git a/mm/damon/core.c b/mm/damon/core.c > index 0578e89dff13..5b807caaec95 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c > @@ -602,11 +602,25 @@ static unsigned int damon_nr_accesses_for_new_attrs(unsigned int nr_accesses, > } > > static void damon_update_monitoring_result(struct damon_region *r, > - struct damon_attrs *old_attrs, struct damon_attrs *new_attrs) > + struct damon_attrs *old_attrs, struct damon_attrs *new_attrs, > + bool aggregating) > { > - r->nr_accesses = damon_nr_accesses_for_new_attrs(r->nr_accesses, > - old_attrs, new_attrs); > - r->nr_accesses_bp = r->nr_accesses * 10000; > + if (!aggregating) { > + r->nr_accesses = damon_nr_accesses_for_new_attrs( > + r->nr_accesses, old_attrs, new_attrs); > + r->nr_accesses_bp = r->nr_accesses * 10000; > + } else { > + /* > + * if this is called in the middle of the aggregation, reset > + * the aggregations we made so far for this aggregation > + * interval. In other words, make the status like > + * kdamond_reset_aggregated() is called. > + */ > + r->last_nr_accesses = damon_nr_accesses_for_new_attrs( > + r->last_nr_accesses, old_attrs, new_attrs); > + r->nr_accesses_bp = r->last_nr_accesses * 10000; > + r->nr_accesses = 0; > + } > r->age = damon_age_for_new_attrs(r->age, old_attrs, new_attrs); > } > > @@ -619,7 +633,7 @@ static void damon_update_monitoring_result(struct damon_region *r, > * ->nr_accesses and ->age of given damon_ctx's regions for new damon_attrs. > */ > static void damon_update_monitoring_results(struct damon_ctx *ctx, > - struct damon_attrs *new_attrs) > + struct damon_attrs *new_attrs, bool aggregating) > { > struct damon_attrs *old_attrs = &ctx->attrs; > struct damon_target *t; > @@ -634,7 +648,7 @@ static void damon_update_monitoring_results(struct damon_ctx *ctx, > damon_for_each_target(t, ctx) > damon_for_each_region(r, t) > damon_update_monitoring_result( > - r, old_attrs, new_attrs); > + r, old_attrs, new_attrs, aggregating); > } > > /* > @@ -661,10 +675,10 @@ static bool damon_valid_intervals_goal(struct damon_attrs *attrs) > * @ctx: monitoring context > * @attrs: monitoring attributes > * > - * This function should be called while the kdamond is not running, or an > - * access check results aggregation is not ongoing (e.g., from > - * &struct damon_callback->after_aggregation or > - * &struct damon_callback->after_wmarks_check callbacks). > + * This function should be called while the kdamond is not running, an access > + * check results aggregation is not ongoing (e.g., from &struct > + * damon_callback->after_aggregation or &struct > + * damon_callback->after_wmarks_check callbacks), or from damon_call(). > * > * Every time interval is in micro-seconds. > * > @@ -675,6 +689,8 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs) > unsigned long sample_interval = attrs->sample_interval ? > attrs->sample_interval : 1; > struct damos *s; > + bool aggregating = ctx->passed_sample_intervals < > + ctx->next_aggregation_sis; > > if (!damon_valid_intervals_goal(attrs)) > return -EINVAL; > @@ -695,7 +711,7 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs) > ctx->next_ops_update_sis = ctx->passed_sample_intervals + > attrs->ops_update_interval / sample_interval; > > - damon_update_monitoring_results(ctx, attrs); > + damon_update_monitoring_results(ctx, attrs, aggregating); > ctx->attrs = *attrs; > > damon_for_each_scheme(s, ctx) > diff --git a/mm/damon/tests/core-kunit.h b/mm/damon/tests/core-kunit.h > index 532c6a6f21f9..be0fea9ee5fc 100644 > --- a/mm/damon/tests/core-kunit.h > +++ b/mm/damon/tests/core-kunit.h > @@ -348,19 +348,19 @@ static void damon_test_update_monitoring_result(struct kunit *test) > > new_attrs = (struct damon_attrs){ > .sample_interval = 100, .aggr_interval = 10000,}; > - damon_update_monitoring_result(r, &old_attrs, &new_attrs); > + damon_update_monitoring_result(r, &old_attrs, &new_attrs, false); > KUNIT_EXPECT_EQ(test, r->nr_accesses, 15); > KUNIT_EXPECT_EQ(test, r->age, 2); > > new_attrs = (struct damon_attrs){ > .sample_interval = 1, .aggr_interval = 1000}; > - damon_update_monitoring_result(r, &old_attrs, &new_attrs); > + damon_update_monitoring_result(r, &old_attrs, &new_attrs, false); > KUNIT_EXPECT_EQ(test, r->nr_accesses, 150); > KUNIT_EXPECT_EQ(test, r->age, 2); > > new_attrs = (struct damon_attrs){ > .sample_interval = 1, .aggr_interval = 100}; > - damon_update_monitoring_result(r, &old_attrs, &new_attrs); > + damon_update_monitoring_result(r, &old_attrs, &new_attrs, false); > KUNIT_EXPECT_EQ(test, r->nr_accesses, 150); > KUNIT_EXPECT_EQ(test, r->age, 20); > > -- > 2.39.5