RE: [LSF/MM/BPF TOPIC] Generalized data temperature estimation framework

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "slava@dubeyko.com" <slava@dubeyko.com>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Greg Farnum <gfarnum@ibm.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"javier.gonz@samsung.com" <javier.gonz@samsung.com>
Subject: RE: [LSF/MM/BPF TOPIC] Generalized data temperature estimation framework
Date: Mon, 27 Jan 2025 23:42:42 +0000	[thread overview]
Message-ID: <1a33cb72ace2f427aa5006980b0b4f253d98ce6f.camel@ibm.com> (raw)
In-Reply-To: <833b054b-f179-4bc8-912b-dad057d193cd@acm.org>

On Mon, 2025-01-27 at 12:37 -0800, Bart Van Assche wrote:
> On 1/24/25 1:11 PM, Viacheslav Dubeyko wrote:
> > On Fri, 2025-01-24 at 12:44 -0800, Bart Van Assche wrote:
> > > On 1/23/25 12:33 PM, Viacheslav Dubeyko wrote:
> > > > I would like to discuss a generalized data "temperature"
> > > > estimation framework.
> > > 
> > > Is data available that shows the effectiveness of this approach and
> > > that compares this approach with existing approaches?
> > 
> > Yes, I did the benchmarking. I can see the quantitative estimation of
> > files' temperature.
> 
> What has been measured in these benchmarks?
> 

How temperature can be used depends on file system. So, my goal of benchmarking
was to see the temperature values under file's updates. I integrated the
temperature estimation framework into SSDFS file system and the temperature
value has been stored into system log with the goal to see that math is working.
And temperature is only quantitative estimation that can be used by any means.

If we would like to compare the benchmarking results, then it means that we
would like to compare the techniques of different file systems. Potentially, we
can integrate the temperature estimation framework in any file system, but it
needs to elaborate how a particular file system can benefit from it.

So, as far as I can see, benchmarking is slightly tricky point here. 

> > Which existing approaches would you like to compare?
> 
> F2FS has a built-in algorithm for assigning data temperatures.
> 

Maybe, it is time to generalize this approach too? The generalized framework
could contain several algorithms.

If I understood correctly, F2FS approach is based on static assigning different
temperatures to different files' extensions. And if we processing a file for
particular extension, then we assume that this file is hot or cold. Am I correct
here?

If I am correct, then the goal of suggested approach is to switch from static
assumption about data nature and to estimate it on quantitative basis with the
goal to classify data on more fair basis. But it doesn't mean that F2FS way and
suggested approach should compete. Technically speaking, both approaches could
be complimentary ones.

> > And what could we imply by effectiveness of the approach? Do you have
> > a vision how we can estimate the effectiveness? :)
> 
> Isn't the goal of providing data temperature information to the device
> to reduce write amplification (W.A.)? I think that W.A. data would be
> useful but I'm not sure whether such data is easy to extract from a
> storage device.
> 

Yes, we can consider it as one of the goals. Because, we can consider of
improving performance, decreasing GC burden, collaborating effectively with
storage device. The reducing of write amplification is important goal and it is
possible to try to estimate it without extracting the data from storage device
(but how accurate could be this data?). But, again, the problem here that we can
estimate efficiency of file system(s) but not the temperature estimation
framework itself. Maybe, we can consider of integration of suggested framework
into F2FS? Because, we can compare the apples with apples, finally. What do you
think?

Thanks,
Slava.

next prev parent reply	other threads:[~2025-01-27 23:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-23 20:33 Viacheslav Dubeyko
2025-01-24 20:44 ` Bart Van Assche
2025-01-24 21:11   ` Viacheslav Dubeyko
2025-01-27 20:37     ` Bart Van Assche
2025-01-27 23:42       ` Viacheslav Dubeyko [this message]
2025-01-28 22:41         ` Bart Van Assche
2025-01-28 22:57           ` Viacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a33cb72ace2f427aa5006980b0b4f253d98ce6f.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=bvanassche@acm.org \
    --cc=gfarnum@ibm.com \
    --cc=javier.gonz@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=slava@dubeyko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox