From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=5Xj8=4X=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_ADSP_ALL,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E5278C10DCE
	for <linux-mm@archiver.kernel.org>; Fri,  6 Mar 2020 18:41:10 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 8FBB520880
	for <linux-mm@archiver.kernel.org>; Fri,  6 Mar 2020 18:41:10 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="lGhe2eke"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FBB520880
Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 2D5016B0003; Fri,  6 Mar 2020 13:41:10 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 285AF6B0006; Fri,  6 Mar 2020 13:41:10 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1734B6B0007; Fri,  6 Mar 2020 13:41:10 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245])
	by kanga.kvack.org (Postfix) with ESMTP id EB2F06B0003
	for <linux-mm@kvack.org>; Fri,  6 Mar 2020 13:41:09 -0500 (EST)
Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id B67CB6137
	for <linux-mm@kvack.org>; Fri,  6 Mar 2020 18:41:09 +0000 (UTC)
X-FDA: 76565804658.22.meal42_a141c38c1447
X-HE-Tag: meal42_a141c38c1447
X-Filterd-Recvd-Size: 15254
Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90])
	by imf20.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri,  6 Mar 2020 18:41:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1583520069; x=1615056069;
  h=date:from:to:cc:subject:message-id:references:
   mime-version:content-transfer-encoding:in-reply-to;
  bh=LX5YzkH7Pbr0juu9cPywdaUjXZNtvoQ0/dDSAKHCVE4=;
  b=lGhe2ekeUTrE+GPc4kEA7MAhBqWFcrI8n5tzXyWRWVySpfeTTeM5sA3M
   1zhSaKc6NznpCv1xwAsIfVmGfeRgrXRPxRjrmLC0rnfuhVXl7V8CEKmTd
   UPE1VKItmpW6Jl+oUZioRxXiul6mCevabBRi5bQydrWZbbleSHe1y4cwF
   I=;
IronPort-SDR: rwMd0NgzgS1CgXt2MLcRmEQROUZmIPkBtalvEa9cgrPMXvnjpRZkxoMo0OTcBJIfaxLKTY9Oo+
 0SjUBhxRD1JA==
X-IronPort-AV: E=Sophos;i="5.70,523,1574121600"; 
   d="scan'208";a="19961119"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-a70de69e.us-east-1.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 06 Mar 2020 18:40:57 +0000
Received: from EX13MTAUEB002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162])
	by email-inbound-relay-1e-a70de69e.us-east-1.amazon.com (Postfix) with ESMTPS id CABEEA2C8A;
	Fri,  6 Mar 2020 18:40:49 +0000 (UTC)
Received: from EX13D08UEB004.ant.amazon.com (10.43.60.142) by
 EX13MTAUEB002.ant.amazon.com (10.43.60.12) with Microsoft SMTP Server (TLS)
 id 15.0.1367.3; Fri, 6 Mar 2020 18:40:34 +0000
Received: from EX13MTAUEA001.ant.amazon.com (10.43.61.82) by
 EX13D08UEB004.ant.amazon.com (10.43.60.142) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Fri, 6 Mar 2020 18:40:33 +0000
Received: from dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com
 (172.22.96.68) by mail-relay.amazon.com (10.43.61.243) with Microsoft SMTP
 Server id 15.0.1367.3 via Frontend Transport; Fri, 6 Mar 2020 18:40:33 +0000
Received: by dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (Postfix, from userid 4335130)
	id 821E640865; Fri,  6 Mar 2020 18:40:33 +0000 (UTC)
Date: Fri, 6 Mar 2020 18:40:33 +0000
From: Anchal Agarwal <anchalag@amazon.com>
To: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>
CC: <tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>, <hpa@zytor.com>,
	<x86@kernel.org>, <boris.ostrovsky@oracle.com>, <jgross@suse.com>,
	<linux-pm@vger.kernel.org>, <linux-mm@kvack.org>, <kamatam@amazon.com>,
	<sstabellini@kernel.org>, <konrad.wilk@oracle.com>, <axboe@kernel.dk>,
	<davem@davemloft.net>, <rjw@rjwysocki.net>, <len.brown@intel.com>,
	<pavel@ucw.cz>, <peterz@infradead.org>, <eduval@amazon.com>,
	<sblbir@amazon.com>, <xen-devel@lists.xenproject.org>, <vkuznets@redhat.com>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<dwmw@amazon.co.uk>, <fllinden@amaozn.com>, <benh@kernel.crashing.org>
Subject: Re: [RFC PATCH v3 06/12] xen-blkfront: add callbacks for PM suspend
 and hibernation
Message-ID: <20200306184033.GA25358@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
References: <cover.1581721799.git.anchalag@amazon.com>
 <890c404c585d7790514527f0c021056a7be6e748.1581721799.git.anchalag@amazon.com>
 <20200221142445.GZ4679@Air-de-Roger>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
In-Reply-To: <20200221142445.GZ4679@Air-de-Roger>
User-Agent: Mutt/1.5.21 (2010-09-15)
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Feb 21, 2020 at 03:24:45PM +0100, Roger Pau Monn=E9 wrote:
> On Fri, Feb 14, 2020 at 11:25:34PM +0000, Anchal Agarwal wrote:
> > From: Munehisa Kamata <kamatam@amazon.com
> >=20
> > Add freeze, thaw and restore callbacks for PM suspend and hibernation
> > support. All frontend drivers that needs to use PM_HIBERNATION/PM_SUS=
PEND
> > events, need to implement these xenbus_driver callbacks.
> > The freeze handler stops a block-layer queue and disconnect the
> > frontend from the backend while freeing ring_info and associated reso=
urces.
> > The restore handler re-allocates ring_info and re-connect to the
> > backend, so the rest of the kernel can continue to use the block devi=
ce
> > transparently. Also, the handlers are used for both PM suspend and
> > hibernation so that we can keep the existing suspend/resume callbacks=
 for
> > Xen suspend without modification. Before disconnecting from backend,
> > we need to prevent any new IO from being queued and wait for existing
> > IO to complete. Freeze/unfreeze of the queues will guarantee that the=
re
> > are no requests in use on the shared ring.
> >=20
> > Note:For older backends,if a backend doesn't have commit'12ea729645ac=
e'
> > xen/blkback: unmap all persistent grants when frontend gets disconnec=
ted,
> > the frontend may see massive amount of grant table warning when freei=
ng
> > resources.
> > [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> > [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!
> >=20
> > In this case, persistent grants would need to be disabled.
> >=20
> > [Anchal Changelog: Removed timeout/request during blkfront freeze.
> > Fixed major part of the code to work with blk-mq]
> > Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
> > Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
> > ---
> >  drivers/block/xen-blkfront.c | 119 ++++++++++++++++++++++++++++++++-=
--
> >  1 file changed, 112 insertions(+), 7 deletions(-)
> >=20
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfron=
t.c
> > index 478120233750..d715ed3cb69a 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -47,6 +47,8 @@
> >  #include <linux/bitmap.h>
> >  #include <linux/list.h>
> >  #include <linux/workqueue.h>
> > +#include <linux/completion.h>
> > +#include <linux/delay.h>
> > =20
> >  #include <xen/xen.h>
> >  #include <xen/xenbus.h>
> > @@ -79,6 +81,8 @@ enum blkif_state {
> >  	BLKIF_STATE_DISCONNECTED,
> >  	BLKIF_STATE_CONNECTED,
> >  	BLKIF_STATE_SUSPENDED,
> > +	BLKIF_STATE_FREEZING,
> > +	BLKIF_STATE_FROZEN
> >  };
> > =20
> >  struct grant {
> > @@ -220,6 +224,7 @@ struct blkfront_info
> >  	struct list_head requests;
> >  	struct bio_list bio_list;
> >  	struct list_head info_list;
> > +	struct completion wait_backend_disconnected;
> >  };
> > =20
> >  static unsigned int nr_minors;
> > @@ -261,6 +266,7 @@ static DEFINE_SPINLOCK(minor_lock);
> >  static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)=
;
> >  static void blkfront_gather_backend_features(struct blkfront_info *i=
nfo);
> >  static int negotiate_mq(struct blkfront_info *info);
> > +static void __blkif_free(struct blkfront_info *info);
>=20
> I'm not particularly found of adding underscore prefixes to functions,
> I would rather use a more descriptive name if possible.
> blkif_free_{queues/rings} maybe?
>
Apologies for delayed response as I was OOTO.
Appreciate your feedback. Will fix
> > =20
> >  static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
> >  {
> > @@ -995,6 +1001,7 @@ static int xlvbd_init_blk_queue(struct gendisk *=
gd, u16 sector_size,
> >  	info->sector_size =3D sector_size;
> >  	info->physical_sector_size =3D physical_sector_size;
> >  	blkif_set_queue_limits(info);
> > +	init_completion(&info->wait_backend_disconnected);
> > =20
> >  	return 0;
> >  }
> > @@ -1218,6 +1225,8 @@ static void xlvbd_release_gendisk(struct blkfro=
nt_info *info)
> >  /* Already hold rinfo->ring_lock. */
> >  static inline void kick_pending_request_queues_locked(struct blkfron=
t_ring_info *rinfo)
> >  {
> > +	if (unlikely(rinfo->dev_info->connected =3D=3D BLKIF_STATE_FREEZING=
))
> > +		return;
>=20
> Do you really need this check here?
>=20
> The queue will be frozen and quiesced in blkfront_freeze when the state
> is set to BLKIF_STATE_FREEZING, and then the call to
> blk_mq_start_stopped_hw_queues is just a noop as long as the queue is
> quiesced (see blk_mq_run_hw_queue).
>=20
You are right. Will fix it. May have skipped this part of the patch when =
fixing
blkfront_freeze.
> >  	if (!RING_FULL(&rinfo->ring))
> >  		blk_mq_start_stopped_hw_queues(rinfo->dev_info->rq, true);
> >  }
> > @@ -1341,8 +1350,6 @@ static void blkif_free_ring(struct blkfront_rin=
g_info *rinfo)
> > =20
> >  static void blkif_free(struct blkfront_info *info, int suspend)
> >  {
> > -	unsigned int i;
> > -
> >  	/* Prevent new requests being issued until we fix things up. */
> >  	info->connected =3D suspend ?
> >  		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
> > @@ -1350,6 +1357,13 @@ static void blkif_free(struct blkfront_info *i=
nfo, int suspend)
> >  	if (info->rq)
> >  		blk_mq_stop_hw_queues(info->rq);
> > =20
> > +	__blkif_free(info);
> > +}
> > +
> > +static void __blkif_free(struct blkfront_info *info)
> > +{
> > +	unsigned int i;
> > +
> >  	for (i =3D 0; i < info->nr_rings; i++)
> >  		blkif_free_ring(&info->rinfo[i]);
> > =20
> > @@ -1553,8 +1567,10 @@ static irqreturn_t blkif_interrupt(int irq, vo=
id *dev_id)
> >  	struct blkfront_ring_info *rinfo =3D (struct blkfront_ring_info *)d=
ev_id;
> >  	struct blkfront_info *info =3D rinfo->dev_info;
> > =20
> > -	if (unlikely(info->connected !=3D BLKIF_STATE_CONNECTED))
> > -		return IRQ_HANDLED;
> > +	if (unlikely(info->connected !=3D BLKIF_STATE_CONNECTED)) {
> > +		if (info->connected !=3D BLKIF_STATE_FREEZING)
>=20
> Please fold this into the previous if condition:
>=20
> if (unlikely(info->connected !=3D BLKIF_STATE_CONNECTED &&
>              info->connected !=3D BLKIF_STATE_FREEZING))
> 	return IRQ_HANDLED;
>
ACK
> > +	}
> > =20
> >  	spin_lock_irqsave(&rinfo->ring_lock, flags);
> >   again:
> > @@ -2020,6 +2036,7 @@ static int blkif_recover(struct blkfront_info *=
info)
> >  	struct bio *bio;
> >  	unsigned int segs;
> > =20
> > +	bool frozen =3D info->connected =3D=3D BLKIF_STATE_FROZEN;
>=20
> Please place this together with the rest of the local variable
> declarations.
>=20
ACK
> >  	blkfront_gather_backend_features(info);
> >  	/* Reset limits changed by blk_mq_update_nr_hw_queues(). */
> >  	blkif_set_queue_limits(info);
> > @@ -2046,6 +2063,9 @@ static int blkif_recover(struct blkfront_info *=
info)
> >  		kick_pending_request_queues(rinfo);
> >  	}
> > =20
> > +	if (frozen)
> > +		return 0;
>=20
> I have to admit my memory is fuzzy here, but don't you need to
> re-queue requests in case the backend has different limits of indirect
> descriptors per request for example?
>=20
> Or do we expect that the frontend is always going to be resumed on the
> same backend, and thus features won't change?
>=20
So to understand your question better here, AFAIU the  maximum number of =
indirect=20
grefs is fixed by the backend, but the frontend can issue requests with a=
ny=20
number of indirect segments as long as it's less than the number provided=
 by=20
the backend. So by your question you mean this max number of MAX_INDIRECT=
_SEGMENTS=20
256 on backend can change ?=20
> > +
> >  	list_for_each_entry_safe(req, n, &info->requests, queuelist) {
> >  		/* Requeue pending requests (flush or discard) */
> >  		list_del_init(&req->queuelist);
> > @@ -2359,6 +2379,7 @@ static void blkfront_connect(struct blkfront_in=
fo *info)
> > =20
> >  		return;
> >  	case BLKIF_STATE_SUSPENDED:
> > +	case BLKIF_STATE_FROZEN:
> >  		/*
> >  		 * If we are recovering from suspension, we need to wait
> >  		 * for the backend to announce it's features before
> > @@ -2476,12 +2497,37 @@ static void blkback_changed(struct xenbus_dev=
ice *dev,
> >  		break;
> > =20
> >  	case XenbusStateClosed:
> > -		if (dev->state =3D=3D XenbusStateClosed)
> > +		if (dev->state =3D=3D XenbusStateClosed) {
> > +			if (info->connected =3D=3D BLKIF_STATE_FREEZING) {
> > +				__blkif_free(info);
> > +				info->connected =3D BLKIF_STATE_FROZEN;
> > +				complete(&info->wait_backend_disconnected);
> > +				break;
> > +			}
> > +
> >  			break;
> > +		}
> > +
> > +		/*
> > +		 * We may somehow receive backend's Closed again while thawing
> > +		 * or restoring and it causes thawing or restoring to fail.
> > +		 * Ignore such unexpected state anyway.
> > +		 */
> > +		if (info->connected =3D=3D BLKIF_STATE_FROZEN &&
> > +				dev->state =3D=3D XenbusStateInitialised) {
>=20
> I'm not sure you need the extra dev->state =3D=3D XenbusStateInitialise=
d.
> If the frotnend is in state BLKIF_STATE_FROZEN you can likely ignore
> the notification of the backend switched to closed state, regardless
> of the frontend state?
>=20
I see. Sounds plausible will do my set of testing and figure out if it do=
es
not break anything.
> > +			dev_dbg(&dev->dev,
> > +					"ignore the backend's Closed state: %s",
> > +					dev->nodename);
> > +			break;
> > +		}
> >  		/* fall through */
> >  	case XenbusStateClosing:
> > -		if (info)
> > -			blkfront_closing(info);
> > +		if (info) {
> > +			if (info->connected =3D=3D BLKIF_STATE_FREEZING)
> > +				xenbus_frontend_closed(dev);
> > +			else
> > +				blkfront_closing(info);
> > +		}
> >  		break;
> >  	}
> >  }
> > @@ -2625,6 +2671,62 @@ static void blkif_release(struct gendisk *disk=
, fmode_t mode)
> >  	mutex_unlock(&blkfront_mutex);
> >  }
> > =20
> > +static int blkfront_freeze(struct xenbus_device *dev)
> > +{
> > +	unsigned int i;
> > +	struct blkfront_info *info =3D dev_get_drvdata(&dev->dev);
> > +	struct blkfront_ring_info *rinfo;
> > +	/* This would be reasonable timeout as used in xenbus_dev_shutdown(=
) */
> > +	unsigned int timeout =3D 5 * HZ;
> > +	int err =3D 0;
> > +
> > +	info->connected =3D BLKIF_STATE_FREEZING;
> > +
> > +	blk_mq_freeze_queue(info->rq);
> > +	blk_mq_quiesce_queue(info->rq);
>=20
> Don't you need to also drain the queue and make sure it's empty?
>=20
blk_mq_freeze_queue and blk_mq_quiesce_queue should take care of running =
HW queues synchronously
and making sure all the ongoing dispatches have finished. Did I understan=
d your question right?
> > +
> > +	for (i =3D 0; i < info->nr_rings; i++) {
> > +		rinfo =3D &info->rinfo[i];
> > +
> > +		gnttab_cancel_free_callback(&rinfo->callback);
> > +		flush_work(&rinfo->work);
> > +	}
> > +
> > +	/* Kick the backend to disconnect */
> > +	xenbus_switch_state(dev, XenbusStateClosing);
> > +
> > +	/*
> > +	 * We don't want to move forward before the frontend is diconnected
> > +	 * from the backend cleanly.
> > +	 */
> > +	timeout =3D wait_for_completion_timeout(&info->wait_backend_disconn=
ected,
> > +					      timeout);
> > +	if (!timeout) {
> > +		err =3D -EBUSY;
> > +		xenbus_dev_error(dev, err, "Freezing timed out;"
> > +				 "the device may become inconsistent state");
> > +	}
> > +
> > +	return err;
> > +}
> > +
> > +static int blkfront_restore(struct xenbus_device *dev)
> > +{
> > +	struct blkfront_info *info =3D dev_get_drvdata(&dev->dev);
> > +	int err =3D 0;
> > +
> > +	err =3D talk_to_blkback(dev, info);
> > +	blk_mq_unquiesce_queue(info->rq);
> > +	blk_mq_unfreeze_queue(info->rq);
> > +
> > +	if (err)
> > +		goto out;
>=20
> There's no need for an out label here, just return err, or even
> simpler:
>=20
ok.
> if (!err)
> 	blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
>=20
> return err;
>=20
> Thanks, Roger.
>
Thanks,
Anchal