linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Byungchul Park <byungchul.park@lge.com>
Cc: peterz@infradead.org, axboe@kernel.dk, johan@kernel.org,
	tglx@linutronix.de, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, tj@kernel.org, johannes.berg@intel.com,
	oleg@redhat.com, amir73il@gmail.com, david@fromorbit.com,
	darrick.wong@oracle.com, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	hch@infradead.org, idryomov@gmail.com, kernel-team@lge.com
Subject: Re: [PATCH v4 7/7] block: Assign a lock_class per gendisk used for wait_for_completion()
Date: Wed, 25 Oct 2017 08:01:24 +0200	[thread overview]
Message-ID: <20171025060123.6mugpdpje6hx32nx@gmail.com> (raw)
In-Reply-To: <1508908272-15757-8-git-send-email-byungchul.park@lge.com>


* Byungchul Park <byungchul.park@lge.com> wrote:

> Darrick posted the following warning and Dave Chinner analyzed it:
> 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.14.0-rc1-fixes #1 Tainted: G        W
> > ------------------------------------------------------
> > loop0/31693 is trying to acquire lock:
> >  (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs]
> >
> > but now in release context of a crosslock acquired at the following:
> >  ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 ((complete)&ret.event){+.+.}:
> >        lock_acquire+0xab/0x200
> >        wait_for_completion_io+0x4e/0x1a0
> >        submit_bio_wait+0x7f/0xb0
> >        blkdev_issue_zeroout+0x71/0xa0
> >        xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs]
> >        xfs_bmapi_write+0x374/0x11f0 [xfs]
> >        xfs_iomap_write_direct+0x2ac/0x430 [xfs]
> >        xfs_file_iomap_begin+0x20d/0xd50 [xfs]
> >        iomap_apply+0x43/0xe0
> >        dax_iomap_rw+0x89/0xf0
> >        xfs_file_dax_write+0xcc/0x220 [xfs]
> >        xfs_file_write_iter+0xf0/0x130 [xfs]
> >        __vfs_write+0xd9/0x150
> >        vfs_write+0xc8/0x1c0
> >        SyS_write+0x45/0xa0
> >        entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > -> #1 (&xfs_nondir_ilock_class){++++}:
> >        lock_acquire+0xab/0x200
> >        down_write_nested+0x4a/0xb0
> >        xfs_ilock+0x263/0x330 [xfs]
> >        xfs_setattr_size+0x152/0x370 [xfs]
> >        xfs_vn_setattr+0x6b/0x90 [xfs]
> >        notify_change+0x27d/0x3f0
> >        do_truncate+0x5b/0x90
> >        path_openat+0x237/0xa90
> >        do_filp_open+0x8a/0xf0
> >        do_sys_open+0x11c/0x1f0
> >        entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}:
> >        up_write+0x1c/0x40
> >        xfs_iunlock+0x1d0/0x310 [xfs]
> >        xfs_file_fallocate+0x8a/0x310 [xfs]
> >        loop_queue_work+0xb7/0x8d0
> >        kthread_worker_fn+0xb9/0x1f0
> >
> > Chain exists of:
> >   &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event
> >
> >  Possible unsafe locking scenario by crosslock:
> >
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&xfs_nondir_ilock_class);
> >   lock((complete)&ret.event);
> >                                lock(&(&ip->i_mmaplock)->mr_lock);
> >                                unlock((complete)&ret.event);
> >
> >                *** DEADLOCK ***
> 
> The warning is a false positive, caused by the fact that all
> wait_for_completion()s in submit_bio_wait() are waiting with the same
> lock class.
> 
> However, some bios have nothing to do with others, for example, the case
> might happen while using loop devices, between bios of an upper device
> and a lower device(=loop device).
> 
> The safest way to assign different lock classes to different devices is
> to do it for each gendisk. In other words, this patch assigns a
> lockdep_map per gendisk and uses it when initializing completion in
> submit_bio_wait().
> 
> Of course, it might be too conservative. But, making it safest for now
> and extended by block layer experts later is good, at the moment.
> 
> Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
> Analyzed-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  block/bio.c           |  2 +-
>  block/genhd.c         | 10 ++--------
>  include/linux/genhd.h | 24 ++++++++++++++++++++++--
>  3 files changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 99d0ca5..a3cb1d1 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -935,7 +935,7 @@ static void submit_bio_wait_endio(struct bio *bio)
>   */
>  int submit_bio_wait(struct bio *bio)
>  {
> -	DECLARE_COMPLETION_ONSTACK(done);
> +	DECLARE_COMPLETION_ONSTACK_MAP(done, bio->bi_disk->lockdep_map);
>  
>  	bio->bi_private = &done;
>  	bio->bi_end_io = submit_bio_wait_endio;
> diff --git a/block/genhd.c b/block/genhd.c
> index dd305c6..630c0da 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1354,13 +1354,7 @@ dev_t blk_lookup_devt(const char *name, int partno)
>  }
>  EXPORT_SYMBOL(blk_lookup_devt);
>  
> -struct gendisk *alloc_disk(int minors)
> -{
> -	return alloc_disk_node(minors, NUMA_NO_NODE);
> -}
> -EXPORT_SYMBOL(alloc_disk);
> -
> -struct gendisk *alloc_disk_node(int minors, int node_id)
> +struct gendisk *__alloc_disk_node(int minors, int node_id)
>  {
>  	struct gendisk *disk;
>  	struct disk_part_tbl *ptbl;
> @@ -1411,7 +1405,7 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
>  	}
>  	return disk;
>  }
> -EXPORT_SYMBOL(alloc_disk_node);
> +EXPORT_SYMBOL(__alloc_disk_node);
>  
>  struct kobject *get_disk(struct gendisk *disk)
>  {
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 6d85a75..f6ec6a2 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -206,6 +206,9 @@ struct gendisk {
>  #endif	/* CONFIG_BLK_DEV_INTEGRITY */
>  	int node_id;
>  	struct badblocks *bb;
> +#ifdef CONFIG_LOCKDEP
> +	struct lockdep_map lockdep_map;
> +#endif
>  };
>  
>  static inline struct gendisk *part_to_disk(struct hd_struct *part)
> @@ -590,8 +593,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
>  extern void delete_partition(struct gendisk *, int);
>  extern void printk_all_partitions(void);
>  
> -extern struct gendisk *alloc_disk_node(int minors, int node_id);
> -extern struct gendisk *alloc_disk(int minors);
> +extern struct gendisk *__alloc_disk_node(int minors, int node_id);
>  extern struct kobject *get_disk(struct gendisk *disk);
>  extern void put_disk(struct gendisk *disk);
>  extern void blk_register_region(dev_t devt, unsigned long range,
> @@ -615,6 +617,24 @@ extern ssize_t part_fail_store(struct device *dev,
>  			       const char *buf, size_t count);
>  #endif /* CONFIG_FAIL_MAKE_REQUEST */
>  
> +#define alloc_disk_node(m, id) \
> +({									\
> +	static struct lock_class_key __key;				\
> +	const char *__name;						\
> +	struct gendisk *ret;						\
> +									\
> +	__name = "(complete)"#m"("#id")";				\
> +									\
> +	ret = __alloc_disk_node(m, id);					\
> +									\
> +	if (ret)							\
> +		lockdep_init_map(&ret->lockdep_map, __name, &__key, 0); \
> +									\
> +	ret;								\
> +})

Beyond the #ifdef reduction I mentioned in the other thread, there's four other 
things I noticed that need to be fixed in this patch:

 - Please write out 'minor' instead of the 'm' abbreviation that is meaningless. 
   'm' is only used for trivial wrappers, but this wrapper is not trivial - so 
   proper canonical variable names should be used.

 - Since __key and __name is already double underscores that is customary for
   macros to avoid variable name shadowing, why is 'ret' not such a name?

 - But, 'ret' is the typical name used for integer returns, not for pointers! 
   Please check the gendisk code for what the typical name for gendisk pointers
   is and use that instead of making up new, weird patterns ...

 - The "(complete)"#minor"("#id")" generated name is pretty bad. Firstly 
   "complete" is a verb (or adjective), while lock(dep) symbol names should be 
   nouns! But even "completion" is pretty opaque, how about "gendisk_completion"?

More careful patches please!

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-25  6:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-25  5:11 [PATCH v4 0/7] cross-release: Enhence performance and fix false positives Byungchul Park
2017-10-25  5:11 ` [PATCH v4 1/7] block: use DECLARE_COMPLETION_ONSTACK in submit_bio_wait Byungchul Park
2017-10-25  5:11 ` [PATCH v4 2/7] locking/lockdep: Add a boot parameter allowing unwind in cross-release and disable it by default Byungchul Park
2017-10-25  5:11 ` [PATCH v4 3/7] locking/lockdep: Remove the BROKEN flag from CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS Byungchul Park
2017-10-25  5:11 ` [PATCH v4 4/7] locking/lockdep: Introduce CONFIG_BOOTPARAM_LOCKDEP_CROSSRELEASE_FULLSTACK Byungchul Park
2017-10-25  5:11 ` [PATCH v4 5/7] completion: Add support for initializing completion with lockdep_map Byungchul Park
2017-10-25  5:11 ` [PATCH v4 6/7] workqueue: Remove unnecessary acquisitions wrt workqueue flush Byungchul Park
2017-10-25  6:35   ` Ingo Molnar
2017-10-25  5:11 ` [PATCH v4 7/7] block: Assign a lock_class per gendisk used for wait_for_completion() Byungchul Park
2017-10-25  6:01   ` Ingo Molnar [this message]
2017-10-25  6:34     ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171025060123.6mugpdpje6hx32nx@gmail.com \
    --to=mingo@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=byungchul.park@lge.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=idryomov@gmail.com \
    --cc=johan@kernel.org \
    --cc=johannes.berg@intel.com \
    --cc=kernel-team@lge.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox