From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA9CBC433E0 for ; Tue, 30 Jun 2020 08:30:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6CA44206A1 for ; Tue, 30 Jun 2020 08:30:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6CA44206A1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=citrix.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 078F58D0010; Tue, 30 Jun 2020 04:30:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 028818D0007; Tue, 30 Jun 2020 04:30:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E32078D0010; Tue, 30 Jun 2020 04:30:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id CC5EC8D0007 for ; Tue, 30 Jun 2020 04:30:19 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4BE042C81 for ; Tue, 30 Jun 2020 08:30:19 +0000 (UTC) X-FDA: 76985206158.13.bear97_5904d0e26e76 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 20F9618140B67 for ; Tue, 30 Jun 2020 08:30:19 +0000 (UTC) X-HE-Tag: bear97_5904d0e26e76 X-Filterd-Recvd-Size: 5921 Received: from esa6.hc3370-68.iphmx.com (esa6.hc3370-68.iphmx.com [216.71.155.175]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 08:30:18 +0000 (UTC) Authentication-Results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none IronPort-SDR: vJWCbZ24qNBCnRa1jeMfzFihapPAEXWOb/MH/cTYYQUyip68MUIwSFss/OBbvXPJYIxsyk1cMP fykjYjWPPolzXUfAKEKbi8v5e6uTYPGHa6se6prvblrrjZ315fDhXE1Ldp+PwYlfxKFSuTxQzJ fPz8b/Ptrkeq2wV7Rn+c0mEizLc08zRTfpE0PDc/3s/85ulln/xVa/ZpKNcjFMEisNHCGoh3Sk yQnP6MGeMNavHGNxNXThtwWpJyX4Yohpdvq55ZlG0FcG1zCwqQ5tX3XMLY0i4aQJXKK1yVMij0 oLM= X-SBRS: 2.7 X-MesageID: 21593289 X-Ironport-Server: esa6.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.75,296,1589256000"; d="scan'208";a="21593289" Date: Tue, 30 Jun 2020 10:30:06 +0200 From: Roger Pau =?utf-8?B?TW9ubsOp?= To: Anchal Agarwal CC: Boris Ostrovsky , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "hpa@zytor.com" , "x86@kernel.org" , "jgross@suse.com" , "linux-pm@vger.kernel.org" , "linux-mm@kvack.org" , "Kamata, Munehisa" , "sstabellini@kernel.org" , "konrad.wilk@oracle.com" , "axboe@kernel.dk" , "davem@davemloft.net" , "rjw@rjwysocki.net" , "len.brown@intel.com" , "pavel@ucw.cz" , "peterz@infradead.org" , "Valentin, Eduardo" , "Singh, Balbir" , "xen-devel@lists.xenproject.org" , "vkuznets@redhat.com" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Woodhouse, David" , "benh@kernel.crashing.org" Subject: Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation] Message-ID: <20200630083006.GJ735@Air-de-Roger> References: <20200604070548.GH1195@Air-de-Roger> <20200616214925.GA21684@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> <20200617083528.GW735@Air-de-Roger> <20200619234312.GA24846@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> <20200622083846.GF735@Air-de-Roger> <20200623004314.GA28586@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> <20200623081903.GP735@Air-de-Roger> <20200625183659.GA26586@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> <20200626091239.GA735@Air-de-Roger> <20200629192035.GA13195@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline In-Reply-To: <20200629192035.GA13195@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> X-ClientProxiedBy: AMSPEX02CAS02.citrite.net (10.69.22.113) To AMSPEX02CL02.citrite.net (10.69.22.126) X-Rspamd-Queue-Id: 20F9618140B67 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 29, 2020 at 07:20:35PM +0000, Anchal Agarwal wrote: > On Fri, Jun 26, 2020 at 11:12:39AM +0200, Roger Pau Monn=C3=A9 wrote: > > So the frontend should do: > >=20 > > - Switch to Closed state (and cleanup everything required). > > - Wait for backend to switch to Closed state (must be done > > asynchronously, handled in blkback_changed). > > - Switch frontend to XenbusStateInitialising, that will in turn force > > the backend to switch to XenbusStateInitWait. > > - After that it should just follow the normal connection procedure. > >=20 > > I think the part that's missing is the frontend doing the state chang= e > > to XenbusStateInitialising when the backend switches to the Closed > > state. > >=20 > > > I was of the view we may just want to mark frontend closed which sh= ould do > > > the job of freeing resources and then following the same flow as > > > blkfront_restore. That does not seems to work correctly 100% of the= time. > >=20 > > I think the missing part is that you must wait for the backend to > > switch to the Closed state, or else the switch to > > XenbusStateInitialising won't be picked up correctly by the backend > > (because it's still doing it's cleanup). > >=20 > > Using blkfront_restore might be an option, but you need to assert the > > backend is in the initial state before using that path. > > > Yes, I agree and I make sure that XenbusStateInitialising only triggers > on frontend once backend is disconnected. msleep in a loop not that gra= ceful but > works. > Frontend only switches to XenbusStateInitialising once it sees backend > as Closed. The issue here is and may require more debugging is: > 1. Hibernate instance->Closing failed, artificially created situation b= y not > marking frontend Closed in the first place during freezing. > 2. System comes back up fine restored to 'backend connected'. I'm not sure I'm following what is happening here, what should happen IMO is that the backend will eventually reach the Closed state? Ie: the frontend has initiated the disconnection from the backend by setting the Closing state, and the backend will have to eventually reach the Closed state. At that point the frontend can initiate a reconnection by switching to the Initialising state. > 3. Re-run (1) again without reboot > 4. (4) fails to recover basically freezing does not fail at all which i= s weird > because it should timeout as it passes through same path. It hits a = BUG in > talk_to_blkback() and instance crashes. It's hard to tell exactly. I guess you would have to figure what makes the frontend not get stuck at the same place as the first attempt. Roger.