linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Byungchul Park <byungchul.park@lge.com>
Cc: Jan Kara <jack@suse.cz>,
	torvalds@linux-foundation.org, damien.lemoal@opensource.wdc.com,
	linux-ide@vger.kernel.org, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, peterz@infradead.org,
	will@kernel.org, tglx@linutronix.de, rostedt@goodmis.org,
	joel@joelfernandes.org, sashal@kernel.org,
	daniel.vetter@ffwll.ch, chris@chris-wilson.co.uk,
	duyuyang@gmail.com, johannes.berg@intel.com, tj@kernel.org,
	willy@infradead.org, david@fromorbit.com, amir73il@gmail.com,
	bfields@fieldses.org, gregkh@linuxfoundation.org,
	kernel-team@lge.com, linux-mm@kvack.org,
	akpm@linux-foundation.org, mhocko@kernel.org, minchan@kernel.org,
	hannes@cmpxchg.org, vdavydov.dev@gmail.com, sj@kernel.org,
	jglisse@redhat.com, dennis@kernel.org, cl@linux.com,
	penberg@kernel.org, rientjes@google.com, vbabka@suse.cz,
	ngupta@vflare.org, linux-block@vger.kernel.org, axboe@kernel.dk,
	paolo.valente@linaro.org, josef@toxicpanda.com,
	linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	jack@suse.com, jlayton@kernel.org, dan.j.williams@intel.com,
	hch@infradead.org, djwong@kernel.org,
	dri-devel@lists.freedesktop.org, airlied@linux.ie,
	rodrigosiqueiramelo@gmail.com, melissa.srw@gmail.com,
	hamohammed.sa@gmail.com
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1
Date: Wed, 2 Mar 2022 21:32:35 -0500	[thread overview]
Message-ID: <YiAow5gi21zwUT54@mit.edu> (raw)
In-Reply-To: <20220303010033.GB20752@X58A-UD3R>

On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote:
> 
> Unfortunately, it's neither perfect nor safe without another wakeup
> source - rescue wakeup source.
> 
>    consumer			producer
> 
> 				lock L
> 				(too much work queued == true)
> 				unlock L
> 				--- preempted
>    lock L
>    unlock L
>    do work
>    lock L
>    unlock L
>    do work
>    ...
>    (no work == true)
>    sleep
> 				--- scheduled in
> 				sleep

That's not how things work in ext4.  It's **way** more complicated
than that.  We have multiple wait channels, one wake up the consumer
(read: the commit thread), and one which wakes up any processes
waiting for commit thread to have made forward progress.  We also have
two spin-lock protected sequence number, one which indicates the
current commited transaction #, and one indicating the transaction #
that needs to be committed.

On the commit thread, it will sleep on j_wait_commit, and when it is
woken up, it will check to see if there is work to be done
(j_commit_sequence != j_commit_request), and if so, do the work, and
then wake up processes waiting on the wait_queue j_wait_done_commit.
(Again, all of this uses the pattern, "prepare to wait", then check to
see if we should sleep, if we do need to sleep, unlock j_state_lock,
then sleep.   So this prevents any races leading to lost wakeups.

On the start_this_handle() thread, if we current transaction is too
full, we set j_commit_request to its transaction id to indicate that
we want the current transaction to be committed, and then we wake up
the j_wait_commit wait queue and then we enter a loop where do a
prepare_to_wait in j_wait_done_commit, check to see if
j_commit_sequence == the transaction id that we want to be completed,
and if it's not done yet, we unlock the j_state_lock spinlock, and go
to sleep.  Again, because of the prepare_to_wait, there is no chance
of a lost wakeup.

So there really is no "consumer" and "producer" here.  If you really
insist on using this model, which really doesn't apply, for one
thread, it's the consumer with respect to one wait queue, and the
producer with respect to the *other* wait queue.  For the other
thread, the consumer and producer roles are reversed.

And of course, this is a highly simplified model, since we also have a
wait queue used by the commit thread to wait for the number of active
handles on a particular transaction to go to zero, and
stop_this_handle() will wake up commit thread via this wait queue when
the last active handle on a particular transaction is retired.  (And
yes, that parameter is also protected by a different spin lock which
is per-transaction).

So it seems to me that a fundamental flaw in DEPT's model is assuming
that the only waiting paradigm that can be used is consumer/producer,
and that's simply not true.  The fact that you use the term "lock" is
also going to lead a misleading line of reasoning, because properly
speaking, they aren't really locks.  We are simply using wait channels
to wake up processes as necessary, and then they will check other
variables to decide whether or not they need to sleep or not, and we
have an invariant that when these variables change indicating forward
progress, the associated wait channel will be woken up.

Cheers,

						- Ted


P.S.  This model is also highly simplified since there are other
reasons why the commit thread can be woken up, some which might be via
a timeout, and some which is via the j_wait_commit wait channel but
not because j_commit_request has been changed, but because file system
is being unmounted, or the file system is being frozen in preparation
of a snapshot, etc.  These are *not* necessary to prevent a deadlock,
because under normal circumstances the two wake channels are
sufficient of themselves.  So please don't think of them as "rescue
wakeup sources"; again, that's highly misleading and the wrong way to
think of them.

And to make things even more complicated, we have more than 2 wait
channel --- we have *five*:

	/**
	 * @j_wait_transaction_locked:
	 *
	 * Wait queue for waiting for a locked transaction to start committing,
	 * or for a barrier lock to be released.
	 */
	wait_queue_head_t	j_wait_transaction_locked;

	/**
	 * @j_wait_done_commit: Wait queue for waiting for commit to complete.
	 */
	wait_queue_head_t	j_wait_done_commit;

	/**
	 * @j_wait_commit: Wait queue to trigger commit.
	 */
	wait_queue_head_t	j_wait_commit;

	/**
	 * @j_wait_updates: Wait queue to wait for updates to complete.
	 */
	wait_queue_head_t	j_wait_updates;

	/**
	 * @j_wait_reserved:
	 *
	 * Wait queue to wait for reserved buffer credits to drop.
	 */
	wait_queue_head_t	j_wait_reserved;

	/**
	 * @j_fc_wait:
	 *
	 * Wait queue to wait for completion of async fast commits.
	 */
	wait_queue_head_t	j_fc_wait;


"There are more things in heaven and Earth, Horatio,
 Than are dreamt of in your philosophy."
      	  	    - William Shakespeare, Hamlet


  reply	other threads:[~2022-03-03  2:33 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 10:57 [PATCH 00/16] DEPT(Dependency Tracker) Byungchul Park
2022-02-17 10:57 ` [PATCH 01/16] llist: Move llist_{head,node} definition to types.h Byungchul Park
2022-02-17 10:57 ` [PATCH 02/16] dept: Implement Dept(Dependency Tracker) Byungchul Park
2022-02-17 15:54   ` Steven Rostedt
2022-02-17 17:36   ` Steven Rostedt
2022-02-18  6:09     ` Byungchul Park
2022-02-17 19:46   ` kernel test robot
2022-02-17 19:46   ` kernel test robot
2022-02-17 10:57 ` [PATCH 03/16] dept: Embed Dept data in Lockdep Byungchul Park
2022-02-17 10:57 ` [PATCH 04/16] dept: Apply Dept to spinlock Byungchul Park
2022-02-17 10:57 ` [PATCH 05/16] dept: Apply Dept to mutex families Byungchul Park
2022-02-17 10:57 ` [PATCH 06/16] dept: Apply Dept to rwlock Byungchul Park
2022-02-17 10:57 ` [PATCH 07/16] dept: Apply Dept to wait_for_completion()/complete() Byungchul Park
2022-02-17 19:46   ` kernel test robot
2022-02-17 10:57 ` [PATCH 08/16] dept: Apply Dept to seqlock Byungchul Park
2022-02-17 10:57 ` [PATCH 09/16] dept: Apply Dept to rwsem Byungchul Park
2022-02-17 10:57 ` [PATCH 10/16] dept: Add proc knobs to show stats and dependency graph Byungchul Park
2022-02-17 15:55   ` Steven Rostedt
2022-02-17 10:57 ` [PATCH 11/16] dept: Introduce split map concept and new APIs for them Byungchul Park
2022-02-17 10:57 ` [PATCH 12/16] dept: Apply Dept to wait/event of PG_{locked,writeback} Byungchul Park
2022-02-17 10:57 ` [PATCH 13/16] dept: Apply SDT to swait Byungchul Park
2022-02-17 10:57 ` [PATCH 14/16] dept: Apply SDT to wait(waitqueue) Byungchul Park
2022-02-17 10:57 ` [PATCH 15/16] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread Byungchul Park
2022-02-17 10:57 ` [PATCH 16/16] dept: Distinguish each syscall context from another Byungchul Park
2022-02-17 11:10 ` Report 1 in ext4 and journal based on v5.17-rc1 Byungchul Park
2022-02-17 11:10   ` Report 2 " Byungchul Park
2022-02-21 19:02     ` Jan Kara
2022-02-23  0:35       ` Byungchul Park
2022-02-23 14:48         ` Jan Kara
2022-02-24  1:11           ` Byungchul Park
2022-02-24 10:22             ` Jan Kara
2022-02-28  9:28               ` Byungchul Park
2022-02-28 10:14                 ` Jan Kara
2022-02-28 21:25                   ` Theodore Ts'o
2022-03-03  1:36                     ` Byungchul Park
2022-03-03  1:00                   ` Byungchul Park
2022-03-03  2:32                     ` Theodore Ts'o [this message]
2022-03-03  5:23                       ` Byungchul Park
2022-03-03 14:36                         ` Theodore Ts'o
2022-03-04  0:42                           ` Byungchul Park
2022-03-05  3:26                             ` Theodore Ts'o
2022-03-05 14:15                               ` Byungchul Park
2022-03-05 15:05                                 ` Joel Fernandes
2022-03-07  2:43                                   ` Byungchul Park
2022-03-04  3:20                           ` Byungchul Park
2022-03-05  3:40                             ` Theodore Ts'o
2022-03-05 14:55                               ` Byungchul Park
2022-03-05 15:12                                 ` Reimar Döffinger
2022-03-06  3:30                                 ` Theodore Ts'o
2022-03-06 10:51                                   ` Byungchul Park
2022-03-06 14:19                                     ` Theodore Ts'o
2022-03-10  1:45                                       ` Byungchul Park
2022-03-03  9:54                     ` Jan Kara
2022-03-04  1:56                       ` Byungchul Park
2022-02-17 13:27   ` Report 1 " Matthew Wilcox
2022-02-18  0:41     ` Byungchul Park
2022-02-22  8:27   ` Jan Kara
2022-02-23  1:40     ` Byungchul Park
2022-02-23  3:30     ` Byungchul Park
2022-02-17 15:51 ` [PATCH 00/16] DEPT(Dependency Tracker) Theodore Ts'o
2022-02-17 17:00   ` Steven Rostedt
2022-02-17 17:06     ` Matthew Wilcox
2022-02-19 10:05       ` Byungchul Park
2022-02-18  4:19     ` Theodore Ts'o
2022-02-19 10:34       ` Byungchul Park
2022-02-19 10:18     ` Byungchul Park
2022-02-19  9:54   ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YiAow5gi21zwUT54@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bfields@fieldses.org \
    --cc=byungchul.park@lge.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=cl@linux.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=david@fromorbit.com \
    --cc=dennis@kernel.org \
    --cc=djwong@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=duyuyang@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hamohammed.sa@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jlayton@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=johannes.berg@intel.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@lge.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=melissa.srw@gmail.com \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=ngupta@vflare.org \
    --cc=paolo.valente@linaro.org \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=rodrigosiqueiramelo@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=sashal@kernel.org \
    --cc=sj@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox