From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f72.google.com (mail-pl0-f72.google.com [209.85.160.72]) by kanga.kvack.org (Postfix) with ESMTP id BDF176B0005 for ; Sat, 21 Apr 2018 02:17:25 -0400 (EDT) Received: by mail-pl0-f72.google.com with SMTP id i1-v6so1672630pld.11 for ; Fri, 20 Apr 2018 23:17:25 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id g14si1139572pgu.363.2018.04.20.23.17.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 20 Apr 2018 23:17:24 -0700 (PDT) Subject: Re: [Xen-devel] [Bug 198497] handle_mm_fault / xen_pmd_val / radix_tree_lookup_slot Null pointer References: <20180420133951.GC10788@bombadil.infradead.org> <76a4ee3b-e00a-5032-df90-07d8e207f707@citrix.com> <5ADA0A6D02000078001BD177@prv1-mh.provo.novell.com> <5ADA0F1502000078001BD1D2@prv1-mh.provo.novell.com> <547c3c73-5eb2-05de-aa2a-54690883bd52@oracle.com> From: Juergen Gross Message-ID: <0f55b773-3fcd-0300-cd03-3774b9d05ae3@suse.com> Date: Sat, 21 Apr 2018 08:17:18 +0200 MIME-Version: 1.0 In-Reply-To: <547c3c73-5eb2-05de-aa2a-54690883bd52@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: de-DE Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Boris Ostrovsky , Jan Beulich , Jason Andryuk Cc: bugzilla-daemon@bugzilla.kernel.org, Andrew Cooper , Matthew Wilcox , linux-mm@kvack.org, akpm@linux-foundation.org, xen-devel@lists.xen.org, labbott@redhat.com On 20/04/18 21:20, Boris Ostrovsky wrote: > On 04/20/2018 12:02 PM, Jan Beulich wrote: >>>>> On 20.04.18 at 17:52, wrote: >>> On Fri, Apr 20, 2018 at 11:42 AM, Jan Beulich wrote: >>>>>>> On 20.04.18 at 17:25, wrote: >>>>> On 20/04/18 16:20, Jason Andryuk wrote: >>>>>> Adding xen-devel and the Linux Xen maintainers. >>>>>> >>>>>> Summary: Some Xen users (and maybe others) are hitting a BUG in >>>>>> __radix_tree_lookup() under do_swap_page() - example backtrace is >>>>>> provided at the end. Matthew Wilcox provided a band-aid patch that >>>>>> prints errors like the following instead of triggering the bug. >>>>>> >>>>>> Skylake 32bit PAE Dom0: >>>>>> Bad swp_entry: 80000000 >>>>>> mm/swap_state.c:683: bad pte d3a39f1c(8000000400000000) >>>>>> >>>>>> Ivy Bridge 32bit PAE Dom0: >>>>>> Bad swp_entry: 40000000 >>>>>> mm/swap_state.c:683: bad pte d3a05f1c(8000000200000000) >>>>>> >>>>>> Other 32bit DomU: >>>>>> Bad swp_entry: 4000000 >>>>>> mm/swap_state.c:683: bad pte e2187f30(8000000200000000) >>>>>> >>>>>> Other 32bit: >>>>>> Bad swp_entry: 2000000 >>>>>> mm/swap_state.c:683: bad pte ef3a3f38(8000000100000000) >>>>>> >>>>>> The Linux bugzilla has more info >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198497 >>>>>> >>>>>> This may not be exclusive to Xen Linux, but most of the reports are on >>>>>> Xen. Matthew wonders if Xen might be stepping on the upper bits of a >>>>>> pte. >>>>> Yes - Xen does use the upper bits of a PTE, but only 1 in release >>>>> builds, and a second in debug builds. I don't understand where you're >>>>> getting the 3rd bit in there. >>>> The former supposedly is _PAGE_GUEST_KERNEL, which we use for 64-bit >>>> guests only. Above talk is of 32-bit guests only. >>>> >>>> In addition both this and _PAGE_GNTTAB are used on present PTEs only, >>>> while above talk is about swap entries. >>> This hits a BUG going through do_swap_page, but it seems like users >>> don't think they are actually using swap at the time. One reporter >>> didn't have any swap configured. Some of this information was further >>> down in my original message. >>> >>> I'm wondering if somehow we have a PTE that should be empty and should >>> be lazily filled. For some reason, the entry has some bits set and is >>> causing the trouble. Would Xen mess with the PTEs in that case? >> As said in my previous reply - both of the bits Andrew has mentioned can >> only ever be set when the present bit is also set (which doesn't appear to >> be the case here). The set bits above are actually in the range of bits >> designated to the address, which Xen wouldn't ever play with. > > > The bug description starts with: "On a Xen VM running as pvh" > > So is this a PV or a PVH guest? The stack backtrace suggests PV. Juergen