From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=hIPo=75=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_ADSP_ALL,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D52F9C433DF
	for <linux-mm@archiver.kernel.org>; Tue, 16 Jun 2020 22:30:57 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 7AE48207D3
	for <linux-mm@archiver.kernel.org>; Tue, 16 Jun 2020 22:30:57 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="HydIri+6"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7AE48207D3
Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 03DC96B0003; Tue, 16 Jun 2020 18:30:57 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id F30586B0005; Tue, 16 Jun 2020 18:30:56 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id DF6126B0006; Tue, 16 Jun 2020 18:30:56 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0233.hostedemail.com [216.40.44.233])
	by kanga.kvack.org (Postfix) with ESMTP id C28D86B0003
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 18:30:56 -0400 (EDT)
Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 58A01180AD804
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 22:30:56 +0000 (UTC)
X-FDA: 76936521312.06.road60_621217026e02
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin06.hostedemail.com (Postfix) with ESMTP id 2752E100692E6
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 22:30:56 +0000 (UTC)
X-HE-Tag: road60_621217026e02
X-Filterd-Recvd-Size: 14652
Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154])
	by imf22.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 22:30:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1592346656; x=1623882656;
  h=date:from:to:cc:subject:message-id:references:
   mime-version:content-transfer-encoding:in-reply-to;
  bh=/5Ov+gppYN7KJ5aQn25HOqJLUAtFPPVAyop7Fexu5eI=;
  b=HydIri+69QqYVrfQMhu7WQHXisg0S9+aHXTLmvwgGxYKQl9GBQ8KPyth
   QIImQLWGLmAEobj7Ay8UN5iwBQwhRs20ZQ+adVZQ+U79jPOVuTKOVoZiO
   cu+dFC6jEyVlIwej4VyU2esY0CR2XIaJFkdIoTLPTsqGMx8pp5PLA2SBB
   s=;
IronPort-SDR: 0wp0vaMsG06ZplyB+5A5c/qT8EcLICE9E/lPyLBASpFVCVcIdV45ygLSeUw2A92LeT7u282MWJ
 LMXVSzXi0/BQ==
X-IronPort-AV: E=Sophos;i="5.73,520,1583193600"; 
   d="scan'208";a="38021711"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-53356bf6.us-west-2.amazon.com) ([10.43.8.6])
  by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 16 Jun 2020 22:30:53 +0000
Received: from EX13MTAUWB001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162])
	by email-inbound-relay-2a-53356bf6.us-west-2.amazon.com (Postfix) with ESMTPS id 77B67A1F74;
	Tue, 16 Jun 2020 22:30:50 +0000 (UTC)
Received: from EX13D01UWB002.ant.amazon.com (10.43.161.136) by
 EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Tue, 16 Jun 2020 22:30:03 +0000
Received: from EX13MTAUWB001.ant.amazon.com (10.43.161.207) by
 EX13d01UWB002.ant.amazon.com (10.43.161.136) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Tue, 16 Jun 2020 22:30:03 +0000
Received: from dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com
 (172.22.96.68) by mail-relay.amazon.com (10.43.161.249) with Microsoft SMTP
 Server id 15.0.1497.2 via Frontend Transport; Tue, 16 Jun 2020 22:30:03 +0000
Received: by dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (Postfix, from userid 4335130)
	id 3892B40139; Tue, 16 Jun 2020 22:30:03 +0000 (UTC)
Date: Tue, 16 Jun 2020 22:30:03 +0000
From: Anchal Agarwal <anchalag@amazon.com>
To: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>, "tglx@linutronix.de"
	<tglx@linutronix.de>, "mingo@redhat.com" <mingo@redhat.com>, "bp@alien8.de"
	<bp@alien8.de>, "hpa@zytor.com" <hpa@zytor.com>, "x86@kernel.org"
	<x86@kernel.org>, "jgross@suse.com" <jgross@suse.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>, "linux-mm@kvack.org"
	<linux-mm@kvack.org>, "Kamata, Munehisa" <kamatam@amazon.com>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>, "konrad.wilk@oracle.com"
	<konrad.wilk@oracle.com>, "axboe@kernel.dk" <axboe@kernel.dk>,
	"davem@davemloft.net" <davem@davemloft.net>, "rjw@rjwysocki.net"
	<rjw@rjwysocki.net>, "len.brown@intel.com" <len.brown@intel.com>,
	"pavel@ucw.cz" <pavel@ucw.cz>, "peterz@infradead.org" <peterz@infradead.org>,
	"Valentin, Eduardo" <eduval@amazon.com>, "Singh, Balbir" <sblbir@amazon.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>, "netdev@vger.kernel.org"
	<netdev@vger.kernel.org>, "linux-kernel@vger.kernel.org"
	<linux-kernel@vger.kernel.org>, "Woodhouse, David" <dwmw@amazon.co.uk>,
	"benh@kernel.crashing.org" <benh@kernel.crashing.org>
Subject: Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and
 hibernation]
Message-ID: <20200616223003.GA28769@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
References: <7FD7505E-79AA-43F6-8D5F-7A2567F333AB@amazon.com>
 <20200604070548.GH1195@Air-de-Roger>
 <20200616214925.GA21684@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
In-Reply-To: <20200616214925.GA21684@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Rspamd-Queue-Id: 2752E100692E6
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monn=E9 wrote:
> > CAUTION: This email originated from outside of the organization. Do n=
ot click links or open attachments unless you can confirm the sender and =
know the content is safe.
> >=20
> >=20
> >=20
> > Hello,
> >=20
> > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > >  CAUTION: This email originated from outside of the organization. D=
o not click links or open attachments unless you can confirm the sender a=
nd know the content is safe.
> > >
> > >
> > >
> > >     On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
> > >     > From: Munehisa Kamata <kamatam@amazon.com>
> > >     >
> > >     > S4 power transition states are much different than xen
> > >     > suspend/resume. Former is visible to the guest and frontend d=
rivers should
> > >     > be aware of the state transitions and should be able to take =
appropriate
> > >     > actions when needed. In transition to S4 we need to make sure=
 that at least
> > >     > all the in-flight blkif requests get completed, since they pr=
obably contain
> > >     > bits of the guest's memory image and that's not going to get =
saved any
> > >     > other way. Hence, re-issuing of in-flight requests as in case=
 of xen resume
> > >     > will not work here. This is in contrast to xen-suspend where =
we need to
> > >     > freeze with as little processing as possible to avoid dirtyin=
g RAM late in
> > >     > the migration cycle and we know that in-flight data can wait.
> > >     >
> > >     > Add freeze, thaw and restore callbacks for PM suspend and hib=
ernation
> > >     > support. All frontend drivers that needs to use PM_HIBERNATIO=
N/PM_SUSPEND
> > >     > events, need to implement these xenbus_driver callbacks. The =
freeze handler
> > >     > stops block-layer queue and disconnect the frontend from the =
backend while
> > >     > freeing ring_info and associated resources. Before disconnect=
ing from the
> > >     > backend, we need to prevent any new IO from being queued and =
wait for existing
> > >     > IO to complete. Freeze/unfreeze of the queues will guarantee =
that there are no
> > >     > requests in use on the shared ring. However, for sanity we sh=
ould check
> > >     > state of the ring before disconnecting to make sure that ther=
e are no
> > >     > outstanding requests to be processed on the ring. The restore=
 handler
> > >     > re-allocates ring_info, unquiesces and unfreezes the queue an=
d re-connect to
> > >     > the backend, so that rest of the kernel can continue to use t=
he block device
> > >     > transparently.
> > >     >
> > >     > Note:For older backends,if a backend doesn't have commit'12ea=
729645ace'
> > >     > xen/blkback: unmap all persistent grants when frontend gets d=
isconnected,
> > >     > the frontend may see massive amount of grant table warning wh=
en freeing
> > >     > resources.
> > >     > [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> > >     > [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in u=
se!
> > >     >
> > >     > In this case, persistent grants would need to be disabled.
> > >     >
> > >     > [Anchal Changelog: Removed timeout/request during blkfront fr=
eeze.
> > >     > Reworked the whole patch to work with blk-mq and incorporate =
upstream's
> > >     > comments]
> > >
> > >     Please tag versions using vX and it would be helpful if you cou=
ld list
> > >     the specific changes that you performed between versions. There=
 where
> > >     3 RFC versions IIRC, and there's no log of the changes between =
them.
> > >
> > > I will elaborate on "upstream's comments" in my changelog in my nex=
t round of patches.
> >=20
> > Sorry for being picky, but can you please make sure your email client
> > properly quotes previous emails on reply. Note the lack of '>' added
> > to the quoted parts of your reply.
> That was just my outlook probably. Note taken.
> >=20
> > >     > +                     }
> > >     > +
> > >     >                       break;
> > >     > +             }
> > >     > +
> > >     > +             /*
> > >     > +              * We may somehow receive backend's Closed agai=
n while thawing
> > >     > +              * or restoring and it causes thawing or restor=
ing to fail.
> > >     > +              * Ignore such unexpected state regardless of t=
he backend state.
> > >     > +              */
> > >     > +             if (info->connected =3D=3D BLKIF_STATE_FROZEN) =
{
> > >
> > >     I think you can join this with the previous dev->state =3D=3D X=
enbusStateClosed?
> > >
> > >     Also, won't the device be in the Closed state already if it's i=
n state
> > >     frozen?
> > > Yes but I think this mostly due to a hypothetical case if during th=
awing backend switches to Closed state.
> > > I am not entirely sure if that could happen. Could use some experti=
se here.
> >=20
> > I think the frontend seeing the backend in the closed state during
> > restore would be a bug that should prevent the frontend from
> > resuming.
> >=20
> > >     > +     /* Kick the backend to disconnect */
> > >     > +     xenbus_switch_state(dev, XenbusStateClosing);
> > >     > +
> > >     > +     /*
> > >     > +      * We don't want to move forward before the frontend is=
 diconnected
> > >     > +      * from the backend cleanly.
> > >     > +      */
> > >     > +     timeout =3D wait_for_completion_timeout(&info->wait_bac=
kend_disconnected,
> > >     > +                                           timeout);
> > >     > +     if (!timeout) {
> > >     > +             err =3D -EBUSY;
> > >
> > >     Note err is only used here, and I think could just be dropped.
> > >
> > > This err is what's being returned from the function. Am I missing a=
nything?
> >=20
> > Just 'return -EBUSY;' directly, and remove the top level variable. Yo=
u
> > can also use -EBUSY directly in the xenbus_dev_error call. Anyway, no=
t
> > that important.
> >=20
> > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;=
"
> > >     > +                              "the device may become inconsi=
stent state");
> > >
> > >     Leaving the device in this state is quite bad, as it's in a clo=
sed
> > >     state and with the queues frozen. You should make an attempt to
> > >     restore things to a working state.
> > >
> > > You mean if backend closed after timeout? Is there a way to know th=
at? I understand it's not good to
> > > leave it in this state however, I am still trying to find if there =
is a good way to know if backend is still connected after timeout.
> > > Hence the message " the device may become inconsistent state".  I d=
idn't see a timeout not even once on my end so that's why
> > > I may be looking for an alternate perspective here. may be need to =
thaw everything back intentionally is one thing I could think of.
> >=20
> > You can manually force this state, and then check that it will behave
> > correctly. I would expect that on a failure to disconnect from the
> > backend you should switch the frontend to the 'Init' state in order t=
o
> > try to reconnect to the backend when possible.
> >=20
> From what I understand forcing manually is, failing the freeze without
> disconnect and try to revive the connection by unfreezing the
> queues->reconnecting to backend [which never got diconnected]. May be e=
ven
> tearing down things manually because I am not sure what state will fron=
tend
> see if backend fails to to disconnect at any point in time. I assumed c=
onnected.
> Then again if its "CONNECTED" I may not need to tear down everything an=
d start
> from Initialising state because that may not work.
>=20
> So I am not so sure about backend's state so much, lets say if  xen_blk=
if_disconnect fail,
> I don't see it getting handled in the backend then what will be backend=
's state?
> Will it still switch xenbus state to 'Closed'? If not what will fronten=
d see,=20
> if it tries to read backend's state through xenbus_read_driver_state ?
>=20
> So the flow be like:
> Front end marks XenbusStateClosing
> Backend marks its state as XenbusStateClosing
>     Frontend marks XenbusStateClosed
>     Backend disconnects calls xen_blkif_disconnect
>        Backend fails to disconnect, the above function returns EBUSY
>        What will be state of backend here?=20
>        Frontend did not tear down the rings if backend does not switche=
s the
>        state to 'Closed' in case of failure.
>=20
> If backend stays in CONNECTED state, then even if we mark it Initialise=
d in frontend, backend
> won't be calling connect(). {From reading code in frontend_changed}
> IMU, Initialising will fail since backend dev->state !=3D XenbusStateCl=
osed plus
> we did not tear down anything so calling talk_to_blkback may not be nee=
ded
>=20
> Does that sound correct?
Send that too quickly, I also meant to add XenBusIntialised state should =
be ok
only if we expect backend will stay in "Connected" state. Also, I experim=
ented
with that notion. I am little worried about the correctness here.=20
Can the backend  come to an Unknown state somehow?
> > >     > +     }
> > >     > +
> > >     > +     return err;
> > >     > +}
> > >     > +
> > >     > +static int blkfront_restore(struct xenbus_device *dev)
> > >     > +{
> > >     > +     struct blkfront_info *info =3D dev_get_drvdata(&dev->de=
v);
> > >     > +     int err =3D 0;
> > >     > +
> > >     > +     err =3D talk_to_blkback(dev, info);
> > >     > +     blk_mq_unquiesce_queue(info->rq);
> > >     > +     blk_mq_unfreeze_queue(info->rq);
> > >     > +     if (!err)
> > >     > +         blk_mq_update_nr_hw_queues(&info->tag_set, info->nr=
_rings);
> > >
> > >     Bad indentation. Also shouldn't you first update the queues and=
 then
> > >     unfreeze them?
> > > Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes=
 the queue
> > > So I don't think the order could be reversed.
> >=20
> > Regardless of what blk_mq_update_nr_hw_queues does, I don't think it'=
s
> > correct to unfreeze the queues without having updated them. Also the
> > freezing/unfreezing uses a refcount, so I think it's perfectly fine t=
o
> > call blk_mq_update_nr_hw_queues first and then unfreeze the queues.
> >=20
> > Also note that talk_to_blkback returning an error should likely
> > prevent any unfreezing, as the queues won't be updated to match the
> > parameters of the backend.
> >
> I think you are right here. Will send out fixes in V2
> > Thanks, Roger.
> >=20
> Thanks,
> Anchal
>=20